CN110705642B - Classification model, classification method, classification device, electronic equipment and storage medium - Google Patents

Classification model, classification method, classification device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110705642B
CN110705642B CN201910943039.XA CN201910943039A CN110705642B CN 110705642 B CN110705642 B CN 110705642B CN 201910943039 A CN201910943039 A CN 201910943039A CN 110705642 B CN110705642 B CN 110705642B
Authority
CN
China
Prior art keywords
feature
classification
ith
layer
classified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910943039.XA
Other languages
Chinese (zh)
Other versions
CN110705642A (en
Inventor
刘少栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Internet Security Software Co Ltd
Original Assignee
Beijing Kingsoft Internet Security Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Internet Security Software Co Ltd filed Critical Beijing Kingsoft Internet Security Software Co Ltd
Priority to CN201910943039.XA priority Critical patent/CN110705642B/en
Publication of CN110705642A publication Critical patent/CN110705642A/en
Application granted granted Critical
Publication of CN110705642B publication Critical patent/CN110705642B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a classification model, a classification method, a classification device, electronic equipment and a storage medium, and belongs to the technical field of computer application. Wherein the model comprises: n classification layers, n being an integer greater than 1; and each layer of nodes in the classification model is used for carrying out two-classification processing on the object to be classified corresponding to each node according to the characteristic performances of the object to be classified on different characteristics, and each two adjacent nodes in the same layer comprise the same child node. Therefore, through the classification model, not only is the accurate portrait information of the user acquired, the user grouping is facilitated, but also the accuracy of the user classification is improved.

Description

Classification model, classification method, classification device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer application technologies, and in particular, to a classification model, a classification method, a classification device, an electronic device, and a storage medium.
Background
With the continuous development of internet technology, the internet is no longer merely a medium for communication between people, and work, study, entertainment, shopping and the like using the internet are normal states of information society. It is often desirable for an internet service provider to obtain a reference portrayal of a user to provide the user with a service conforming to his needs, or to precisely deliver advertisements, etc., according to the points of interest of the user.
However, many internet service carriers such as web pages or APP in the related art can be used without registering or providing accurate personal information, so that an internet service provider cannot directly obtain accurate portrait information of a user, thereby being unfavorable for user grouping and failing to provide accurate services according to interest points of the user.
Disclosure of Invention
The classification model, the method, the device, the electronic equipment and the storage medium are used for solving the problems that in the related technology, a plurality of web pages or APP and other internet service carriers are not required to be registered or accurate personal information is not required to be provided for use, so that an internet service provider cannot directly obtain accurate portrait information of a user, the user classification is not facilitated, and accurate service cannot be provided according to interest points of the user.
In one aspect, the classification model provided by the embodiment of the application includes: n classification layers, n being an integer greater than 1; and each layer of nodes in the classification model is used for carrying out two-classification processing on the object to be classified corresponding to each node according to the characteristic performances of the object to be classified on different characteristics, and each two adjacent nodes in the same layer comprise the same child node.
Optionally, in a possible implementation form of the embodiment of the first aspect, each node is configured to divide the corresponding object to be classified equally into two sub-nodes.
The classification method provided by the embodiment of the other aspect of the application comprises the following steps: acquiring a to-be-classified object set and the feature expression of each object in the object set on each feature; ith of the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, the ith feature is calculated x Each object in the object set is assigned to the j in the j-th layer x Nodes, or assigned to j x+1 A node; ith of the ith layer x+1 The node according to the ith x+1 Feature representation of each object in the object set on the ith feature, the ith feature is calculated x+1 Each object in the object set is assigned to the j in the j-th layer x+1 Nodes, or assigned to j x+2 A node; wherein i is an integer less than or equal to n, x is an integer greater than 0 and less than or equal to i+1, j=i+1, and the pre-generated classification model comprises n classification layers.
Optionally, in a possible implementation form of the embodiment of the second aspect, an ith of the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, the ith feature is calculated x Each object in the object set is assigned to the j in the j-th layer x Nodes, or assigned to j x+1 A node, comprising:
ith of the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, the ith feature is calculated x Each object in the object set is evenly distributed to the jth layer in the jth layer x Nodes, or assigned to j x+1 And (5) a node.
Optionally, in another possible implementation form of the embodiment of the second aspect, the method further includes:
acquiring a classification request, wherein the request comprises a target classification system identifier and a reference characteristic;
and acquiring a corresponding target classification model from the classification model generated by pre-training according to the target classification system identification and the reference characteristic.
Optionally, in a further possible implementation form of the embodiment of the second aspect, the method further includes:
acquiring a training sample set, wherein the training sample set comprises the characteristic performance of each sample object on each characteristic and the actual type identification in a target classification system;
determining a first priority order of each feature in the target classification system according to the information gain of each feature in the target classification system;
sequentially carrying out multi-stage two-classification processing on each sample object according to the characteristic expression of each sample object on the corresponding characteristic by taking the first priority order of each characteristic as an order, and determining a first prediction type identifier of each sample object in a target classification system;
Judging whether a first prediction type identifier corresponding to each sample object is matched with the actual type identifier or not;
if the number of sample objects matched with the first prediction type identification and the actual type identification in the training sample set is greater than or equal to a threshold value, determining a feature sequence corresponding to each layer in a classification model corresponding to the target classification system according to the first priority sequence of each feature;
the classification model comprises n classification layers, n is an integer greater than 1, and every two adjacent nodes in the same layer comprise the same child node.
Optionally, in a further possible implementation form of the embodiment of the second aspect, after the determining whether the first prediction type identifier corresponding to each sample object matches the actual type identifier, the method further includes:
if the number of sample objects matched with the first prediction type identifier and the actual type identifier in the training sample set is smaller than a threshold value, adjusting the first priority sequence of each feature, and determining the second priority sequence of the adjusted feature;
sequentially carrying out multistage classification processing on each sample object according to the characteristic expression of each sample object on the corresponding characteristic by taking the second priority order as the order, and redetermining a second prediction type identifier of each sample object;
If the number of sample objects matched with the second prediction type identifier and the actual type identifier in the training sample set is smaller than the threshold value, continuing to adjust the second priority sequence of each feature until the number of sample objects matched with the prediction type identifier and the actual type identifier in the training sample set is determined to be larger than the threshold value by taking the adjusted priority sequence as the sequence, and determining the feature sequence corresponding to each layer in the classification model corresponding to the target classification system according to the adjusted priority sequence of the feature.
Optionally, in a further possible implementation form of the embodiment of the second aspect, the training sample set includes a feature representation of each sample object over L features, L being an integer greater than n;
said determining said first order of priority for each feature in said object classification system further comprises:
and according to the first priority order of each feature in the target classification system, determining n features with the priority level being the features corresponding to each layer of the classification model corresponding to the target classification system.
In still another aspect, a classification device provided in an embodiment of the present application includes: the first acquisition module is used for acquiring an object set to be classified and the characteristic expression of each object in the object set on each characteristic; a first distribution module for the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, the ith feature is calculated x Each object in the object set is assigned to the j in the j-th layer x Nodes, or assigned to j x+1 A node; a second distribution module for the ith layer x+1 The node according to the ith x+1 Feature representation of each object in the object set on the ith feature, the ith feature is calculated x+1 Each object in the object set is assigned to the j in the j-th layer x+1 Nodes, or assigned to j x+2 A node; wherein i is an integer less than or equal to n, x is an integer greater than 0 and less than or equal to i+1, j=i+1, the pre-generated classification modelComprising n classification layers.
Optionally, in a possible implementation form of the embodiment of the third aspect, the first allocation module is specifically configured to:
ith of the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, the ith feature is calculated x Each object in the object set is evenly distributed to the jth layer in the jth layer x Nodes, or assigned to j x+1 And (5) a node.
Optionally, in another possible implementation form of the embodiment of the third aspect, the apparatus further includes:
the second acquisition module is used for acquiring a classification request, wherein the request comprises a target classification system identifier and a reference characteristic;
And the third acquisition module is used for acquiring a corresponding target classification model from the classification model generated by training in advance according to the target classification system identification and the reference characteristic.
Optionally, in a further possible implementation form of the embodiment of the third aspect, the apparatus further comprises:
the fourth acquisition module is used for acquiring a training sample set, wherein the training sample set comprises the characteristic expression of each sample object on each characteristic and the actual type identification in the target classification system;
a first determining module, configured to determine a first priority order of each feature in the target classification system according to an information gain of each feature in the target classification system;
the second determining module is used for sequentially carrying out multi-level two-classification processing on each sample object according to the characteristic expression of each sample object on the corresponding characteristic by taking the first priority order of each characteristic as an order, and determining a first prediction type identifier of each sample object in the target classification system;
the judging module is used for judging whether the first prediction type identifier corresponding to each sample object is matched with the actual type identifier or not;
the third determining module is used for determining the feature sequence corresponding to each layer in the classification model corresponding to the target classification system according to the first priority sequence of each feature if the number of sample objects matched with the first prediction type identifier and the actual type identifier in the training sample set is greater than or equal to a threshold value;
The classification model comprises n classification layers, n is an integer greater than 1, and every two adjacent nodes in the same layer comprise the same child node.
Optionally, in a further possible implementation form of the embodiment of the third aspect, the apparatus further comprises:
the adjusting module is used for adjusting the first priority order of each feature and determining the second priority order of the adjusted features if the number of sample objects matched with the first prediction type identifier and the actual type identifier in the training sample set is smaller than a threshold value;
a fourth determining module, configured to sequentially perform multistage classification processing on each sample object according to the feature performance of each sample object on the corresponding feature in order of the second priority order, and redetermine a second prediction type identifier of each sample object;
and the iteration module is used for continuously adjusting the second priority order of each feature if the number of sample objects matched with the second prediction type identifier and the actual type identifier in the training sample set is smaller than a threshold value, determining that the number of sample objects matched with the prediction type identifier and the actual type identifier in the training sample set is larger than the threshold value by taking the adjusted priority order as an order, and determining the feature order corresponding to each layer in the classification model corresponding to the target classification system according to the adjusted priority order of the features.
Optionally, in a further possible implementation form of the embodiment of the third aspect, the training sample set includes a feature representation of each sample object over L features, L being an integer greater than n; the device further comprises:
and a fifth determining module, configured to determine, according to the first priority order of each feature in the target classification system, that n features with priorities being the features corresponding to each layer of the classification model corresponding to the target classification system.
In another aspect, an electronic device provided in an embodiment of the present application includes: a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the classification method as described above when executing the program.
A further embodiment of the present application provides a computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements a classification method as described above.
In another aspect, the present application provides a computer program, which when executed by a processor, implements the classification method described in the embodiments of the present application.
According to the classification model, the method, the device, the electronic equipment, the computer readable storage medium and the computer program provided by the embodiment of the application, through each layer of nodes in the classification model, the objects to be classified corresponding to each node are subjected to classification processing according to the characteristic performances of the objects to be classified on different characteristics, wherein the classification model comprises: n classification layers, n is an integer greater than 1, and every two adjacent nodes in the same layer contain one same child node. Therefore, each layer of nodes in the classification model performs multiple classification treatments on the object to be classified layer by layer according to different characteristics in multiple characteristics of the object to be classified, so that a final classification result of the object to be classified is obtained, accurate portrait information of users is obtained, user classification is facilitated, and the accuracy of user classification is improved.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
fig. 1 is a schematic structural diagram of a classification model according to an embodiment of the present application;
FIG. 2 is a flow chart of a classification method according to an embodiment of the present disclosure;
FIG. 3 is a flow chart of another classification method according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a sorting device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the like or similar elements throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.
Aiming at the problems that in the related technology, a plurality of web pages or Internet service carriers such as APP are used without registering or providing accurate personal information, an Internet service provider cannot directly acquire accurate portrait information of a user, so that user grouping is not facilitated, and accurate service cannot be provided according to interest points of the user, a classification model is provided.
According to the classification model provided by the embodiment of the application, through each layer of nodes in the classification model, according to the characteristic performances of the objects to be classified on different characteristics, the objects to be classified corresponding to each node are subjected to classification processing, wherein the classification model comprises: n classification layers, n is an integer greater than 1, and every two adjacent nodes in the same layer contain one same child node. Therefore, each layer of nodes in the classification model performs multiple classification treatments on the object to be classified layer by layer according to different characteristics in multiple characteristics of the object to be classified, so that a final classification result of the object to be classified is obtained, accurate portrait information of users is obtained, user classification is facilitated, and the accuracy of user classification is improved.
The classification model, the method, the device, the electronic equipment, the storage medium and the computer program provided by the application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic structural diagram of a classification model according to an embodiment of the present application.
As shown in fig. 1, the classification model includes:
n classification layers, n being an integer greater than 1;
and each layer of nodes in the classification model is used for carrying out two-classification processing on the object to be classified corresponding to each node according to the characteristic performances of the object to be classified on different characteristics, and each two adjacent nodes in the same layer comprise the same child node.
Where the m-th layer of the classification model includes m+1 nodes, e.g., layer 0 includes 1 node, layer 1 includes 2 nodes, and so on.
In the embodiment of the application, the feature expression of the object to be classified on different features can be obtained by embedding points in a webpage or an APP (application) and other service carriers, and the feature expression of the object to be classified on different features obtained by embedding the points is input into the classification model, so that the classification model classifies the object to be classified. The characteristics of the object to be classified, which need to be acquired, can be preset in advance according to actual needs, and the plurality of characteristics have priority orders, so that when the classification model performs two-classification processing on the object to be classified, the characteristics used when each layer of nodes classify the object to be classified can be determined according to the priority orders of the characteristics.
Specifically, the 0 th layer of the classification model comprises 1 node, and the node performs two classification treatments on all the objects to be classified according to the characteristic performance of each object to be classified on the 0 th characteristic, namely, all the objects to be classified are classified into two types and are respectively transmitted to the 2 nodes of the 1 st layer; and then, carrying out two-classification processing on a part of the objects to be classified corresponding to the two nodes of the layer 1 according to the characteristic performance of the objects to be classified on the characteristic of the layer 1, and respectively transmitting the two-classification processing to 3 nodes of the layer 2, wherein the right branch of the left node of the layer 1 and the left branch of the right node are combined into one node, and the like until all the layers of the classification model are traversed. The priority order numbers of the characteristics of the object to be classified are 0, 1, 2, … and n, and the priority order numbers correspond to the numbers of nodes of each layer of the classification model.
It should be noted that, the features of the object to be classified may be preset in advance, for example, when the gender of the object to be classified is classified, the features of the object to be classified may include downloaded features such as APP, user name, wallpaper type, online time, and the like, and the feature expression of the object to be classified on each feature is obtained by a buried point manner. And the priority order of each feature can be determined according to the information gain of each feature, so that the final classification result is more accurate.
Further, each node of each layer of the classification model can equally divide the corresponding object to be classified when performing two-classification processing on the corresponding object to be classified. That is, in one possible implementation form of the embodiment of the present application, each node of the classification model is specifically configured to divide the corresponding object to be classified into two sub-nodes equally.
As a possible implementation manner, each node may divide the corresponding object to be classified equally into two sub-nodes when classifying the corresponding object to be classified. Specifically, each node can determine the characteristics according to the hierarchy where the node is located during classification, and sort the corresponding objects to be classified according to the characteristic performance of the corresponding objects to be classified on the characteristics, so as to divide the corresponding objects to be classified into two classes according to the median of the characteristic performance of the corresponding objects to be classified on the characteristics, namely, two sub-nodes corresponding to each node are two approximately balanced branches.
According to the classification model provided by the embodiment of the application, through each layer of nodes in the classification model, according to the characteristic performances of the objects to be classified on different characteristics, the objects to be classified corresponding to each node are subjected to classification processing, wherein the classification model comprises: n classification layers, n is an integer greater than 1, and every two adjacent nodes in the same layer contain one same child node. Therefore, each layer of nodes in the classification model performs multiple classification treatments on the object to be classified layer by layer according to different characteristics in multiple characteristics of the object to be classified, so that a final classification result of the object to be classified is obtained, accurate portrait information of users is obtained, user classification is facilitated, and the accuracy of user classification is improved.
In order to implement the above embodiment, the present application also proposes a classification method.
Fig. 2 is a flow chart of a classification method according to an embodiment of the present application.
As shown in fig. 2, the classification method includes the steps of:
step 101, obtaining feature expression of each object in the object set to be classified on each feature.
The object set to be classified comprises a plurality of objects to be classified and feature expression of each object on each feature.
The characteristics of the object may be preset in advance according to actual needs, for example, when the gender of the object to be classified is classified, the characteristics of the object to be classified may include characteristics such as downloaded APP, user name, wallpaper type, online time, and the like.
The feature representation of each object on each feature refers to specific values of the object on each feature or specific contents included in the object. For example, object a features "downloaded APP" features "WeChat, QQ", features "user name" features "123", and so on.
It should be noted that, the object set to be classified and the feature expression of each object in the object set to be classified on each feature may be obtained by embedding points in a web page or an APP or other service carrier.
Step 102, ith layer of ith x The node according to the ith x Feature representation of each object in the object set on the ith feature, will be the ith x Each object in the object set is assigned to the j in the j-th layer x Nodes, or assigned to j x+1 And a node, wherein i is an integer less than or equal to n, x is an integer greater than 0 and less than or equal to i+1, j=i+1, and the pre-generated classification model comprises n classification layers.
Wherein, the m-th layer of the pre-generated classification model includes m+1 nodes, for example, the 0-th layer includes 1 node, the 1-th layer includes 2 nodes, and so on.
In the embodiment of the present application, each object in the object set to be classified may be classified by using a classification model having n classification layers, which is generated in advance. Each node in the pre-generated classification model can perform two classification processing on each object in the corresponding object set according to the characteristic expression of each object in different characteristics in the corresponding object set, and the classification results are respectively transmitted to two child nodes of the next layer.
In particular, for the ith layer x The node, the corresponding object set to be classified is the ith x Object set, in the pair i x When the object set is subjected to the two-class processing, the ith x The node according to the ith x Feature representation of each object in the object set on the ith feature, will be the ith x Each object in the object set is assigned to the j in the j-th layer x Nodes, or assigned to j x+1 Node, where j=i+1, i.e., i x And each object in the object set is distributed to two nodes of the (i+1) th layer until each layer of nodes in the classification model are traversed, and classification of the object set to be classified can be considered to be completed.
It should be noted that, the features of the objects have a priority order and are preset in a pre-generated classification model, when each layer of nodes of the classification model classify the corresponding object set, the features with the same priority order as the hierarchy of the nodes are determined according to the hierarchy of the nodes in the classification model, and the objects in the corresponding object set are classified according to the feature expression of the objects on the features.
For example, the classification purpose is to classify the gender of the object to be classified, the pre-generated classification model includes 4 classification layers, the acquired features of the object in the object set to be classified include "downloaded APP, user name, wallpaper type, and online time" 4 features, the priority order of the feature "downloaded APP" is "0", the priority order of the feature "user name" is "1", the priority order of the feature "wallpaper type" is "2", and the priority order of the feature "online time" is"3". When classifying the object set to be classified, 0 of the 0 th layer of the classification model 1 The node classifies 100% of the objects in the object set to be classified according to the characteristic expression of each object in the object set to be classified on the characteristic downloaded APP, and when the object is determined to be male, the object is allocated to 1 1 Nodes (i.e., left nodes of layer 1 of the classification model shown in fig. 1), assign an object to 1 when it is determined that the object is female 2 Nodes (i.e., the right nodes of layer 1 of the classification model shown in fig. 1).
Similarly, at layer 0 of the classification model, layer 0 1 After the node carries out the two-classification processing on 100% of the objects in the object set to be classified, the two nodes of the layer 1 of the classification model continue to carry out the further classification processing on the classification result of the layer 0. I.e. layer 1, 1 1 Node according to 1 st 1 The feature expression of each object in the object set (namely, the object set consisting of each object determined to be male by layer 0) on the feature 'user name' for layer 1 1 Each object in the object set is subjected to a two-classification process, and when the object is determined to be male, the object is assigned to 2 1 Nodes (i.e., left nodes of layer 2 of the classification model shown in fig. 1), assign an object to 2 when it is determined that the object is female 2 Nodes (i.e., intermediate nodes at layer 2 of the classification model shown in fig. 1); layer 1. 1 2 Node according to 1 st 2 The feature expression of each object in the object set (namely, the object set consisting of each object determined to be female by layer 0) on the feature 'user name' for layer 1 2 Each object in the object set is subjected to a two-classification process, and when the object is determined to be male, the object is assigned to 2 2 Nodes (i.e., intermediate nodes of layer 2 of the classification model of fig. 1), assign an object to 2 when it is determined that the object is female 3 Nodes (i.e., the right nodes of layer 2 of the classification model shown in fig. 1).
Similarly, after two nodes of the layer 1 of the classification model respectively perform two classification processes on the objects in the two object sets, the node of the layer 2 of the classification model performs two classification processes on each object in the corresponding object set according to the characteristic performance of each object in the corresponding object set on the characteristic "wallpaper type", and the node of the layer 3 performs two classification processes on each object in the corresponding object set according to the characteristic performance of each object in the corresponding object set on the characteristic "online time".
Further, each node of each layer of the classification model can equally divide the corresponding object to be classified when performing two-classification processing on the corresponding object to be classified. That is, in one possible implementation manner of the embodiment of the present application, the step 102 may include:
ith of the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, the ith feature is calculated x Each object in the object set is evenly distributed to the jth layer in the jth layer x Nodes, or assigned to j x+1 And (5) a node.
As a possible implementation manner, each node may divide the corresponding object set equally into two sub-nodes when classifying the corresponding object set. In particular, for the ith layer x The node, the corresponding object set to be classified is the ith x Object set, in the pair i x When the object set is subjected to the two-class processing, the ith x The node according to the ith x Feature representation of each object in the object set on the ith feature, for the ith x Ordering the objects in the object set according to the ith x The median of feature expression of each object on the ith feature in the object set is the ith x Each object in the object set is evenly distributed to the jth layer in the jth layer x Nodes, or assigned to j x+1 Nodes, i.e. two sub-nodes corresponding to each node are two approximately balanced branches, wherein j=i+1, i.e. i x And each object in the object set is distributed to two nodes of the (i+1) th layer until each layer of nodes in the classification model are traversed, and classification of the object set to be classified can be considered to be completed.
It should be noted that, when each node classifies its corresponding object set and equally divides the corresponding object set into two child nodes, the number of objects in the object set corresponding to the intermediate node of each layer gradually decreases. For example, there is only one node at layer 0 of the classification model, and the number of objects in the corresponding object set is 100% of the total number of objects; the layer 1 of the classification model comprises two nodes, and the number of objects in the object set corresponding to each node is 50% of the total number of the objects; the layer 2 of the classification model comprises 3 nodes, the number of objects in the object set corresponding to the middle node is 50% of the total number of objects, the number of objects in the object set corresponding to the nodes on the left side and the nodes on the right side is 25%, and the number of objects in the object set corresponding to the middle node of the layer 2n of the classification model approaches 0, wherein n approaches infinity. Namely, each object in the object set to be classified can be gradually classified to the leftmost node and the rightmost node through the classification model, so that when the number of layers of the classification model is enough, each object in the object set to be classified can be classified into two types according to the classification targets. The specific proving process is as follows:
The number of objects included in the object set corresponding to the intermediate node of the classification model layer 2n can be expressed by the formula (1):
Figure GDA0004066277810000091
the proving process for equation (1) is as follows:
is obtainable from the Stirling equation:
Figure GDA0004066277810000092
thereby:
Figure GDA0004066277810000093
Figure GDA0004066277810000094
from equations (1), (3) and (4):
Figure GDA0004066277810000101
step 103, ith layer of ith x+1 The node according to the ith x+1 Feature representation of each object in the object set on the ith feature, will be the ith x+1 Each object in the object set is assigned to the j in the j-th layer x+1 Nodes, or assigned to j x+2 And (5) a node.
In this embodiment of the present application, for a classification layer including a plurality of nodes, each node needs to perform two classification processes on each object in its corresponding object set, that is, the value of x needs to be gradually increased by taking 1 as a step, and the value of straight value x is equal to i+1, that is, until all nodes of the layer are traversed, a specific classification process may be described in detail with reference to the above steps, which is not described herein.
Furthermore, a plurality of classification models can be generated by training in advance according to different classification targets and object features used in classification, so that when a classification request is acquired, a proper classification model can be selected according to specific classification targets and object features used in classification, and the classification accuracy is further improved. That is, in one possible implementation manner of the embodiment of the present application, the above classification method may further include:
Acquiring a classification request, wherein the request comprises a target classification system identifier and a reference characteristic;
and acquiring a corresponding target classification model from the classification model generated by pre-training according to the target classification system identification and the reference characteristic.
The target classification system identifier can be used for indicating a classification target corresponding to the classification request. For example, if the target classification system is identified as "gender", the classification target corresponding to the classification request is to classify the gender of the object to be classified; and the target classification system is identified as 'age', and the classification target corresponding to the classification request is used for classifying the age of the object to be classified.
It should be noted that, the target classification system identifier may be numbered in advance, that is, a simple number, letter, or the like may be used to represent the target classification system identifier. For example, the target classification system corresponding to the preset "gender" is identified as "1", the target classification system corresponding to the "age" is identified as "2", and so on.
The reference features refer to features according to a classification model when classifying objects to be classified. Such as downloaded APP, user name, wallpaper type, online time, etc.
In this embodiment of the present application, a plurality of classification models may be generated by training in advance according to a common classification target according to the classification target and a reference feature, so that when a classification request is acquired, a classification model in which a target classification identifier or a reference feature matches a target classification system identifier and a reference feature in the classification request is selected from the plurality of classification models according to a target classification system identifier and a reference feature included in the classification request, so as to classify an object to be classified.
For example, when the reference feature is "downloaded APP, download address, on-line time", the corresponding classification model is model X, and when the reference feature is "user name, wallpaper type, download address, downloaded APP", the corresponding classification model is model Y; the corresponding classification model is model S when the target classification system is identified as "gender", model Z when the target classification system is identified as "age", and so on.
According to the classification method provided by the embodiment of the application, the object set to be classified and the feature expression of each object in the object set on each feature are obtained, and the ith layer of the ith layer is used for obtaining the feature expression x The node according to the ith x Feature representation of each object in the object set on the ith feature, will be the ith x Each object in the object set is assigned to the j in the j-th layer x Nodes, or assigned to j x+1 A node, wherein i is an integer less than or equal to n, x is an integer greater than 0 and less than or equal to i+1, j=i+1, and the pre-generated classification model comprises n classification layers and passes through the ith layer of the ith layer x+1 The node according to the ith x+1 Feature representation of each object in the object set on the ith feature, will be the ith x+1 Each object in the object set is assigned to the j in the j-th layer x+1 Nodes, or assigned to j x+2 And (5) a node. Therefore, each layer of nodes in the classification model performs multiple classification treatments on the object to be classified layer by layer according to different characteristics in multiple characteristics of the object to be classified, so that a final classification result of the object to be classified is obtained, accurate portrait information of users is obtained, user classification is facilitated, and the accuracy of user classification is improved.
In one possible implementation form of the embodiment of the application, the classification model corresponding to the target classification system can also be generated by constructing a training sample set and training according to the feature performance of each object in each feature in the training sample set and the actual type identification in the target classification system.
The classification method provided in the embodiment of the present application is further described below with reference to fig. 3.
Fig. 3 is a flow chart of another classification method according to an embodiment of the present application.
As shown in fig. 3, the classification method includes the steps of:
step 201, a training sample set is obtained, wherein the training sample set comprises feature performance of each sample object on each feature and actual type identification in a target classification system.
In the embodiment of the application, when training the classification model corresponding to the target classification system, a training sample set corresponding to the target classification system needs to be constructed, wherein the training sample set comprises the feature expression of each sample object on each feature and the actual type identification in the target classification system.
For example, if the target classification system classifies the gender of the object to be classified, the training sample set needs to include the feature representation of each sample object on the features, and the gender of each sample object.
As a possible implementation manner, the training sample set may be constructed by acquiring actual identity information of the test user and usage data of the test user in an actual use process. For example, when the target classification system classifies the gender of the object to be classified, the test user may fill in the actual gender information as the actual type identifier of the sample object in the target classification system, and continuously acquire the usage data of the test user in the test stage, so as to determine the feature performance of each sample data on each feature.
Step 202, determining a first priority order of each feature in the target classification system according to the information gain of each feature in the target classification system.
The information gain refers to the difference between the entropy before and after dividing the data set by a certain feature.
The larger the information gain corresponding to the feature is, the faster the purity of the data set increases by dividing the data set by the feature. Therefore, in the embodiment of the application, when determining the first priority order of each feature in the target classification system according to the information gain of each feature in the target classification system, the larger the information gain of each feature in the target classification system is, the higher the corresponding priority order is.
For example, the target classification system is characterized by "downloaded APP, user name, wallpaper type, and online time", where the information gain of the user name is greater than the information gain of the downloaded APP, the information gain of the downloaded APP is greater than the information gain of the wallpaper type, and the information gain of the wallpaper type is greater than the information gain of the online time, it may be determined that the first priority order of the user name is higher than the first priority order of the downloaded APP, the first priority order of the downloaded APP is higher than the first priority order of the wallpaper type, and the first priority order of the wallpaper type is higher than the first priority order of the online time. In addition, the first priority order of each feature may be represented by a number, for example, in the above example, the first priority order of the user name may be "0", the first priority order of the downloaded APP may be "1", the first priority order of the wallpaper type may be "2", and the first priority order of the online time may be "3".
Further, to ensure that the final classification result is more accurate, the number of features according to which the classification model is trained may be very large, and the number of levels of the classification model may be empirically preset, resulting in a greater number of features than the number of levels of the classification model. That is, in one possible implementation manner of the embodiment of the present application, the training sample set includes a feature expression of each sample object on L features, where L is an integer greater than n;
accordingly, after the step 202, the method may further include:
and according to the first priority order of each feature in the target classification system, determining n features with the priority being the first n as the features corresponding to each layer of the classification model corresponding to the target classification system.
In this embodiment of the present application, if the number of feature dimensions where feature expressions of sample objects included in the training sample set are located is greater than the number of levels of the classification model, that is, L is greater than n, then according to the determined first priority order of each feature in the target classification system, n features with priorities located in the first n may be determined as features corresponding to each layer of the classification model corresponding to the target classification system.
It should be noted that, the number of levels n of the classification model may be pre-specified, for example, the number of levels with optimal classification performance may be determined as the number of levels n of the classification model according to practical experience, so as to ensure the classification accuracy of the trained classification model.
And 203, sequentially carrying out multi-stage two-classification processing on each sample object according to the feature expression of each sample object on the corresponding feature by taking the first priority order of each feature as an order, and determining a first prediction type identification of each sample object in a target classification system.
In this embodiment of the present application, after determining the first priority order of each feature, each sample object may be sequentially subjected to multi-level classification processing according to the feature performance of each sample object on each feature by using the first priority order of each feature as an order, so as to determine the first prediction type identifier of each sample object in the target classification system.
For example, the target classification system is characterized by "downloaded APP, user name, wallpaper type, and online time", and the first priority order of the user name is higher than the first priority order of the downloaded APP, the first priority order of the downloaded APP is higher than the first priority order of the wallpaper type, and the first priority order of the wallpaper type is higher than the first priority order of the online time, then the two classification processes may be performed on each sample object according to the feature representation of each sample object on the feature "user name" to generate a sample set 1 and a sample set 2; then, according to the characteristic performance of each sample object on the characteristic downloaded APP, respectively carrying out classification processing on the sample set 1 and the sample set 2 to generate a sample set 3, a sample set 4 and a sample set 5; then, according to the characteristic performance of each sample object on the characteristic wallpaper type, respectively performing classification processing on the sample set 3, the sample set 4 and the sample set 5 to generate a sample set 6, a sample set 7, a sample set 8 and a sample set 9; and then, respectively carrying out classification processing on the sample set 6, the sample set 7, the sample set 8 and the sample set 9 according to the upper characteristic expression of each sample object in the characteristic 'on-line time', so as to generate a sample set 10, a sample set 11, a sample set 12, a sample set 13 and a sample set 14, and further determining a first prediction type identification of each sample object according to a final classification result.
Step 204, determining whether the first prediction type identifier corresponding to each sample object is matched with the actual type identifier.
In step 205, if the number of sample objects matching the first prediction type identifier and the actual type identifier in the training sample set is greater than or equal to the threshold, determining a feature sequence corresponding to each layer in a classification model corresponding to the target classification system according to the first priority sequence of each feature, where the classification model includes n classification layers, n is an integer greater than 1, and every two adjacent nodes in the same layer include one identical child node.
In the embodiment of the present application, after determining the first prediction type identifier corresponding to each sample object, it may be determined whether the first prediction type identifier corresponding to each sample object is matched with the actual type identifier.
Specifically, if the number of sample objects in the training sample set, in which the first prediction type identifier matches with the actual type identifier, is greater than or equal to the threshold, it may be determined that the object set is classified according to the first priority order corresponding to each feature, so that a more accurate classification result may be obtained, so that, according to the first priority order of each feature, the feature order corresponding to each layer in the classification model corresponding to the target classification system may be determined, that is, the feature according to which the first priority order is "0" when the 0 th layer of the classification model corresponding to the target classification system classifies, the feature according to which the first priority order is "1" when the 1 st layer classifies, and so on.
Further, if the number of sample objects matching the first prediction type identifier and the actual type identifier in the training sample set is smaller than the threshold, it may be determined that the classification effect of classifying the object set is poor according to the first priority order corresponding to each feature, so that the priority order corresponding to each feature needs to be redetermined. That is, in one possible implementation manner of the embodiment of the present application, after the step 204, the method may further include:
if the number of sample objects matched with the first prediction type identifier and the actual type identifier in the training sample set is smaller than a threshold value, adjusting the first priority sequence of each feature, and determining the second priority sequence of the adjusted feature;
sequentially carrying out multistage classification processing on each sample object according to the characteristic expression of each sample object on the corresponding characteristic by taking the second priority order as the order, and re-determining a second prediction type identifier of each sample object;
if the number of sample objects matched with the second prediction type identifier and the actual type identifier in the training sample set is smaller than the threshold value, continuing to adjust the second priority sequence of each feature until the number of sample objects matched with the prediction type identifier and the actual type identifier in the training sample set is determined to be larger than the threshold value by taking the adjusted priority sequence as the sequence, and determining the feature sequence corresponding to each layer in the classification model corresponding to the target classification system according to the adjusted priority sequence of the feature.
In this embodiment of the present application, if the number of sample objects matching the first prediction type identifier and the actual type identifier in the training sample set is smaller than a threshold, it may be determined that the classification effect of classifying the object set according to the first priority order corresponding to each feature is poor, so that the first priority order corresponding to each feature may be adjusted, and in order of the adjusted second priority order, according to the feature representation of each sample object on the corresponding feature, the second prediction type identifier of each sample object is redetermined, and further it is determined whether the number of sample objects matching the second prediction type identifier and the actual type identifier in the training sample set is greater than or equal to the threshold, if yes, the feature order corresponding to each layer in the classification model corresponding to the target classification system may be determined according to the second priority order of each feature; if the number of the sample objects is still smaller than the threshold value, the second priority order can be continuously adjusted, the adjusted priority order is used as an order, the sample objects are subjected to multi-level classification processing according to the characteristic performance of the sample objects on the corresponding characteristics in sequence, the prediction type identification of each sample object is redetermined until the number of the sample objects matched with the actual type identification in the determined training sample set is larger than the threshold value by taking the adjusted priority order as the order, and the characteristic order corresponding to each layer in the classification model corresponding to the target classification system can be determined according to the priority order of the adjusted characteristics.
According to the classification method provided by the embodiment of the application, a training sample set is obtained, wherein the training sample set comprises the feature performance of each sample object on each feature and the actual type identification in a target classification system, a first priority order of each feature in the target classification system is determined according to the information gain of each feature in the target classification system, then the first priority order of each feature is used as an order, the multi-level two-classification processing is carried out on each sample object according to the feature performance of each sample object on the corresponding feature in sequence, the first prediction type identification of each sample object in the target classification system is determined, and then when the number of sample objects matched with the actual type identification in the training sample set is larger than or equal to a threshold value, the feature order corresponding to each layer in a classification model corresponding to the target classification system is determined according to the first priority order of each feature. Therefore, the classification model corresponding to the target classification system is generated through training, and the object set to be classified is classified by using the trained classification model, so that the accurate portrait information of the user is obtained, the user classification is facilitated, and the accuracy of the user classification is improved.
In order to implement the above embodiment, the present application further proposes a classification device.
Fig. 4 is a schematic structural diagram of a sorting device according to an embodiment of the present application.
As shown in fig. 4, the sorting apparatus 30 includes:
a first obtaining module 31, configured to obtain an object set to be classified and a feature expression of each object in the object set on each feature;
a first allocation module 32 for the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, the ith feature is calculated x Each object in the object set is assigned to the j in the j-th layer x Nodes, or assigned to j x+1 A node;
a second distribution module 33 for the ith layer x+1 The node according to the ith x+1 Feature representation of each object in the object set on the ith feature, the ith feature is calculated x+1 Each object in the object set is assigned to the j in the j-th layer x+1 Nodes, or assigned to j x+2 A node;
wherein i is an integer less than or equal to n, x is an integer greater than 0 and less than or equal to i+1, j=i+1, and the pre-generated classification model comprises n classification layers.
In practical use, the classification device provided in the embodiments of the present application may be configured in any electronic device to perform the foregoing classification method.
According to the classifying device provided by the embodiment of the application, the object set to be classified is obtainedAnd feature representation of each object in the object set on each feature and passing through the ith layer of the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, will be the ith x Each object in the object set is assigned to the j in the j-th layer x Nodes, or assigned to j x+1 A node, wherein i is an integer less than or equal to n, x is an integer greater than 0 and less than or equal to i+1, j=i+1, and the pre-generated classification model comprises n classification layers and passes through the ith layer of the ith layer x+1 The node according to the ith x+1 Feature representation of each object in the object set on the ith feature, will be the ith x+1 Each object in the object set is assigned to the j in the j-th layer x+1 Nodes, or assigned to j x+2 And (5) a node. Therefore, each layer of nodes in the classification model performs multiple classification treatments on the object to be classified layer by layer according to different characteristics in multiple characteristics of the object to be classified, so that a final classification result of the object to be classified is obtained, accurate portrait information of users is obtained, user classification is facilitated, and the accuracy of user classification is improved.
In one possible implementation form of the present application, the first allocation module 32 is specifically configured to:
Ith of the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, the ith feature is calculated x Each object in the object set is evenly distributed to the jth layer in the jth layer x Nodes, or assigned to j x+1 And (5) a node.
In one possible implementation form of the present application, the classification device 30 further includes:
the second acquisition module is used for acquiring a classification request, wherein the request comprises a target classification system identifier and a reference characteristic;
and the third acquisition module is used for acquiring a corresponding target classification model from the classification model generated by training in advance according to the target classification system identification and the reference characteristic.
Further, in another possible implementation manner of the present application, the classification device 30 further includes:
the fourth acquisition module is used for acquiring a training sample set, wherein the training sample set comprises the characteristic expression of each sample object on each characteristic and the actual type identification in the target classification system;
a first determining module, configured to determine a first priority order of each feature in the target classification system according to an information gain of each feature in the target classification system;
the second determining module is used for sequentially carrying out multi-level two-classification processing on each sample object according to the characteristic expression of each sample object on the corresponding characteristic by taking the first priority order of each characteristic as an order, and determining a first prediction type identifier of each sample object in the target classification system;
The judging module is used for judging whether the first prediction type identifier corresponding to each sample object is matched with the actual type identifier or not;
the third determining module is used for determining the feature sequence corresponding to each layer in the classification model corresponding to the target classification system according to the first priority sequence of each feature if the number of sample objects matched with the first prediction type identifier and the actual type identifier in the training sample set is greater than or equal to a threshold value;
the classification model comprises n classification layers, n is an integer greater than 1, and every two adjacent nodes in the same layer comprise the same child node.
Further, in still another possible implementation form of the present application, the classification device 30 further includes:
the adjusting module is used for adjusting the first priority order of each feature and determining the second priority order of the adjusted features if the number of sample objects matched with the first prediction type identifier and the actual type identifier in the training sample set is smaller than a threshold value;
a fourth determining module, configured to sequentially perform multistage classification processing on each sample object according to the feature performance of each sample object on the corresponding feature in order of the second priority order, and redetermine a second prediction type identifier of each sample object;
And the iteration module is used for continuously adjusting the second priority order of each feature if the number of sample objects matched with the second prediction type identifier and the actual type identifier in the training sample set is smaller than a threshold value, determining that the number of sample objects matched with the prediction type identifier and the actual type identifier in the training sample set is larger than the threshold value by taking the adjusted priority order as an order, and determining the feature order corresponding to each layer in the classification model corresponding to the target classification system according to the adjusted priority order of the features.
Further, in yet another possible implementation form of the present application, the training sample set includes a feature representation of each sample object over L features, L being an integer greater than n; correspondingly, the sorting device 30 further includes:
and a fifth determining module, configured to determine, according to the first priority order of each feature in the target classification system, that n features with priorities being the features corresponding to each layer of the classification model corresponding to the target classification system.
It should be noted that the explanation of the embodiment of the classification model shown in fig. 1 and the explanation of the embodiment of the classification method shown in fig. 2 and 3 are also applicable to the classification device 30 of this embodiment, and are not repeated here.
According to the classifying device provided by the embodiment of the application, the training sample set is obtained, wherein the training sample set comprises the feature performance of each sample object on each feature and the actual type identification in the target classifying system, the first priority order of each feature in the target classifying system is determined according to the information gain of each feature in the target classifying system, then the first priority order of each feature is used as the order, the multi-level two-classification processing is carried out on each sample object according to the feature performance of each sample object on the corresponding feature in sequence, the first prediction type identification of each sample object in the target classifying system is determined, and then when the number of the sample objects matched with the actual type identification in the training sample set is larger than or equal to a threshold value, the feature order corresponding to each layer in the classifying model corresponding to the target classifying system is determined according to the first priority order of each feature. Therefore, the classification model corresponding to the target classification system is generated through training, and the object set to be classified is classified by using the trained classification model, so that the accurate portrait information of the user is obtained, the user classification is facilitated, and the accuracy of the user classification is improved.
In order to implement the above embodiment, the present application further proposes an electronic device.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
As shown in fig. 5, the electronic device 200 includes:
a memory 210 and a processor 220, a bus 230 connecting the different components (including the memory 210 and the processor 220), the memory 210 storing a computer program which when executed by the processor 220 implements the classification method described in the embodiments of the present application.
Bus 230 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 200 typically includes a variety of electronic device readable media. Such media can be any available media that is accessible by electronic device 200 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 210 may also include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 240 and/or cache memory 250. The electronic device 200 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 260 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard disk drive"). Although not shown in fig. 5, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 230 via one or more data medium interfaces. Memory 210 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the present application.
Program/utility 280 having a set (at least one) of program modules 270 may be stored in, for example, memory 210, such program modules 270 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 270 generally perform the functions and/or methods in the embodiments described herein.
The electronic device 200 may also communicate with one or more external devices 290 (e.g., keyboard, pointing device, display 291, etc.), one or more devices that enable a user to interact with the electronic device 200, and/or any device (e.g., network card, modem, etc.) that enables the electronic device 200 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 292. Also, electronic device 200 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 293. As shown, network adapter 293 communicates with other modules of electronic device 200 over bus 230. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 200, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processor 220 executes various functional applications and data processing by running programs stored in the memory 210.
It should be noted that, the implementation process and the technical principle of the electronic device in this embodiment refer to the foregoing explanation of the classification method in this embodiment, and are not repeated herein.
The electronic device provided in the embodiment of the present application may execute the classification method as described above, by acquiring the object set to be classified and the feature expression of each object in the object set on each feature, and by the ith layer of the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, will be the ith x Each object in the object set is assigned to the j in the j-th layer x Nodes, or assigned to j x+1 A node, wherein i is an integer less than or equal to n, x is an integer greater than 0 and less than or equal to i+1, j=i+1, and the pre-generated classification model comprises n classification layers and passes through the ith layer of the ith layer x+1 The node according to the ith x+1 Feature representation of each object in the object set on the ith feature, will be the ith x+1 Each object in the object set is assigned to the j in the j-th layer x+1 Nodes, or assigned to j x+2 And (5) a node. Therefore, each layer of nodes in the classification model performs multiple classification treatments on the object to be classified layer by layer according to different characteristics in multiple characteristics of the object to be classified, so that a final classification result of the object to be classified is obtained, accurate portrait information of users is obtained, user classification is facilitated, and the accuracy of user classification is improved.
To achieve the above embodiments, the present application also proposes a computer-readable storage medium.
Wherein the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the classification method according to the embodiments of the present application.
In order to implement the above embodiments, an embodiment of a further aspect of the present application provides a computer program, which when executed by a processor, implements the classification method described in the embodiments of the present application.
In alternative implementations, the present embodiments may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on the remote electronic device or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic device may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., connected through the internet using an internet service provider).
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (15)

1. A classification method, applied to user classification, comprising:
the method comprises the steps that a classification request is obtained, wherein the classification request comprises a target classification system identifier, reference characteristics of objects to be classified and characteristic performances of the objects to be classified on the reference characteristics, the target classification system identifier comprises gender and age, the reference characteristics comprise user names, downloaded APP, wallpaper types and online time, and the characteristic performances refer to values or contents of each object to be classified on the reference characteristics;
According to the target classification system identification and the reference characteristics, a corresponding target classification model is obtained from a classification model generated by training in advance, if the target classification system identification is sex, the sex of the object to be classified is classified through the target classification model, if the target classification system identification is age, the age of the object to be classified is classified through the target classification model, wherein the classification model comprises n classification layers, n is an integer greater than 1, each reference characteristic of the object to be classified has a priority order and is preset in the classification model, each layer of nodes in the classification model carries out two classification processing on the object to be classified corresponding to each node according to the characteristic expression of the object to be classified on different characteristics, and each two adjacent nodes in the same layer contain one same sub node.
2. The classification method of claim 1, wherein each node is configured to divide the corresponding object to be classified equally into two sub-nodes.
3. The classification method according to claim 1 or 2, wherein classifying the object to be classified by the target classification model comprises:
Acquiring a to-be-classified object set and the feature expression of each object in the object set on each feature;
ith of the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, the ith feature is calculated x Each object in the object set is assigned to the j in the j-th layer x Nodes, or assigned to j x+1 A node;
ith of the ith layer x+1 The node according to the ith x+1 Feature representation of each object in the object set on the ith feature, the ith feature is calculated x+1 Each object in the object set is assigned to the j in the j-th layer x+1 Nodes, or assigned to j x+2 A node;
wherein i is an integer less than or equal to n, x is an integer greater than 0 and less than or equal to i+1, j=i+1, and the target classification model comprises n classification layers.
4. The method of claim 3, wherein an ith of the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, the ith feature is calculated x Each object in the object set is assigned to the j in the j-th layer x Nodes, or assigned to j x+1 A node, comprising:
ith of the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, the ith feature is calculated x Each object in the object set is evenly distributed to the jth layer in the jth layer x Nodes, or assigned to j x+1 And (5) a node.
5. The method as recited in claim 4, further comprising:
acquiring a training sample set, wherein the training sample set comprises the characteristic performance of each sample object on each characteristic and the actual type identification in a target classification system;
determining a first priority order of each feature in the target classification system according to the information gain of each feature in the target classification system;
sequentially carrying out multi-stage two-classification processing on each sample object according to the characteristic expression of each sample object on the corresponding characteristic by taking the first priority order of each characteristic as an order, and determining a first prediction type identifier of each sample object in a target classification system;
judging whether a first prediction type identifier corresponding to each sample object is matched with the actual type identifier or not;
if the number of sample objects matched with the first prediction type identification and the actual type identification in the training sample set is greater than or equal to a threshold value, determining a feature sequence corresponding to each layer in a classification model corresponding to the target classification system according to the first priority sequence of each feature;
the classification model comprises n classification layers, n is an integer greater than 1, and every two adjacent nodes in the same layer comprise the same child node.
6. The method of claim 5, wherein after determining whether the first prediction type identifier corresponding to each sample object matches the actual type identifier, further comprising:
if the number of sample objects matched with the first prediction type identifier and the actual type identifier in the training sample set is smaller than a threshold value, adjusting the first priority sequence of each feature, and determining the second priority sequence of the adjusted feature;
sequentially carrying out multistage classification processing on each sample object according to the characteristic expression of each sample object on the corresponding characteristic by taking the second priority order as the order, and redetermining a second prediction type identifier of each sample object;
if the number of sample objects matched with the second prediction type identifier and the actual type identifier in the training sample set is smaller than the threshold value, continuing to adjust the second priority sequence of each feature until the number of sample objects matched with the prediction type identifier and the actual type identifier in the training sample set is determined to be larger than the threshold value by taking the adjusted priority sequence as the sequence, and determining the feature sequence corresponding to each layer in the classification model corresponding to the target classification system according to the adjusted priority sequence of the feature.
7. The method of claim 5, wherein the training sample set comprises a feature representation of each sample object over L features, L being an integer greater than n;
said determining said first order of priority for each feature in said object classification system further comprises:
and according to the first priority order of each feature in the target classification system, determining n features with the priority level being the features corresponding to each layer of the classification model corresponding to the target classification system.
8. A classification device for classifying users, comprising:
the second acquisition module is used for acquiring a classification request, wherein the classification request comprises a target classification system identifier, reference characteristics of objects to be classified and characteristic performances of the objects to be classified on the reference characteristics, the target classification system identifier comprises gender and age, the reference characteristics comprise a user name, a downloaded APP, a wallpaper type and online time, and the characteristic performances refer to values or contents of each object to be classified on the reference characteristics;
the third obtaining module is configured to obtain a corresponding target classification model from a classification model generated by training in advance according to the target classification system identifier and the reference feature, if the target classification system identifier is a gender, classify the gender of the object to be classified through the target classification model, and if the target classification system identifier is an age, classify the age of the object to be classified through the target classification model, wherein the classification model includes n classification layers, n is an integer greater than 1, each reference feature of the object to be classified has a priority order and is preset in the classification model, each layer of nodes in the classification model performs two classification processing on the object to be classified corresponding to each node according to the feature expression of the object to be classified on different features, and each two adjacent nodes in the same layer include one same sub node.
9. The apparatus of claim 8, wherein the third acquisition module comprises:
the first acquisition module is used for acquiring an object set to be classified and the characteristic expression of each object in the object set on each characteristic;
a first distribution module for the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, the ith feature is calculated x Each object in the object set is assigned to the j in the j-th layer x Nodes, or assigned to j x+1 A node;
a second distribution module for the ith layer x+1 The node according to the ith x+1 The feature representation of each object in the object set on the ith feature,the ith is taken x+1 Each object in the object set is assigned to the j in the j-th layer x+1 Nodes, or assigned to j x+2 A node;
wherein i is an integer less than or equal to n, x is an integer greater than 0 and less than or equal to i+1, j=i+1, and the target classification model comprises n classification layers.
10. The apparatus according to claim 9, wherein the first allocation module is specifically configured to:
ith of the ith layer x The node according to the ith x Feature representation of each object in the object set on the ith feature, the ith feature is calculated x Each object in the object set is evenly distributed to the jth layer in the jth layer x Nodes, or assigned to j x+1 And (5) a node.
11. The apparatus as claimed in claim 9 or 10, further comprising:
the fourth acquisition module is used for acquiring a training sample set, wherein the training sample set comprises the characteristic expression of each sample object on each characteristic and the actual type identification in the target classification system;
a first determining module, configured to determine a first priority order of each feature in the target classification system according to an information gain of each feature in the target classification system;
the second determining module is used for sequentially carrying out multi-level two-classification processing on each sample object according to the characteristic expression of each sample object on the corresponding characteristic by taking the first priority order of each characteristic as an order, and determining a first prediction type identifier of each sample object in the target classification system;
the judging module is used for judging whether the first prediction type identifier corresponding to each sample object is matched with the actual type identifier or not;
the third determining module is used for determining the feature sequence corresponding to each layer in the classification model corresponding to the target classification system according to the first priority sequence of each feature if the number of sample objects matched with the first prediction type identifier and the actual type identifier in the training sample set is greater than or equal to a threshold value;
The classification model comprises n classification layers, n is an integer greater than 1, and every two adjacent nodes in the same layer comprise the same child node.
12. The apparatus as recited in claim 11, further comprising:
the adjusting module is used for adjusting the first priority order of each feature and determining the second priority order of the adjusted features if the number of sample objects matched with the first prediction type identifier and the actual type identifier in the training sample set is smaller than a threshold value;
a fourth determining module, configured to sequentially perform multistage classification processing on each sample object according to the feature performance of each sample object on the corresponding feature in order of the second priority order, and redetermine a second prediction type identifier of each sample object;
and the iteration module is used for continuously adjusting the second priority order of each feature if the number of sample objects matched with the second prediction type identifier and the actual type identifier in the training sample set is smaller than a threshold value, determining that the number of sample objects matched with the prediction type identifier and the actual type identifier in the training sample set is larger than the threshold value by taking the adjusted priority order as an order, and determining the feature order corresponding to each layer in the classification model corresponding to the target classification system according to the adjusted priority order of the features.
13. The apparatus of claim 11, wherein the training sample set comprises a feature representation of each sample object over L features, L being an integer greater than n; the device further comprises:
and a fifth determining module, configured to determine, according to the first priority order of each feature in the target classification system, that n features with priorities being the features corresponding to each layer of the classification model corresponding to the target classification system.
14. An electronic device, comprising: memory, a processor and a program stored on the memory and executable on the processor, characterized in that the processor implements the classification method according to any of claims 1-7 when executing the program.
15. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the classification method according to any one of claims 1-7.
CN201910943039.XA 2019-09-30 2019-09-30 Classification model, classification method, classification device, electronic equipment and storage medium Active CN110705642B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910943039.XA CN110705642B (en) 2019-09-30 2019-09-30 Classification model, classification method, classification device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910943039.XA CN110705642B (en) 2019-09-30 2019-09-30 Classification model, classification method, classification device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110705642A CN110705642A (en) 2020-01-17
CN110705642B true CN110705642B (en) 2023-05-23

Family

ID=69197068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910943039.XA Active CN110705642B (en) 2019-09-30 2019-09-30 Classification model, classification method, classification device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110705642B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093216A (en) * 2013-02-04 2013-05-08 北京航空航天大学 Gender classification method and system thereof based on facial images
CN105791242A (en) * 2014-12-24 2016-07-20 阿里巴巴集团控股有限公司 Object type identification method and system, server and client
CN109635872A (en) * 2018-12-17 2019-04-16 上海观安信息技术股份有限公司 Personal identification method, electronic equipment and computer program product
CN109885597A (en) * 2019-01-07 2019-06-14 平安科技(深圳)有限公司 Tenant group processing method, device and electric terminal based on machine learning
CN109961077A (en) * 2017-12-22 2019-07-02 广东欧珀移动通信有限公司 Gender prediction's method, apparatus, storage medium and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10970605B2 (en) * 2017-01-03 2021-04-06 Samsung Electronics Co., Ltd. Electronic apparatus and method of operating the same

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093216A (en) * 2013-02-04 2013-05-08 北京航空航天大学 Gender classification method and system thereof based on facial images
CN105791242A (en) * 2014-12-24 2016-07-20 阿里巴巴集团控股有限公司 Object type identification method and system, server and client
CN109961077A (en) * 2017-12-22 2019-07-02 广东欧珀移动通信有限公司 Gender prediction's method, apparatus, storage medium and electronic equipment
CN109635872A (en) * 2018-12-17 2019-04-16 上海观安信息技术股份有限公司 Personal identification method, electronic equipment and computer program product
CN109885597A (en) * 2019-01-07 2019-06-14 平安科技(深圳)有限公司 Tenant group processing method, device and electric terminal based on machine learning

Also Published As

Publication number Publication date
CN110705642A (en) 2020-01-17

Similar Documents

Publication Publication Date Title
CN107704625B (en) Method and device for field matching
US11455473B2 (en) Vector representation based on context
US11901047B2 (en) Medical visual question answering
CN113254785B (en) Recommendation model training method, recommendation method and related equipment
US10885332B2 (en) Data labeling for deep-learning models
US10825071B2 (en) Adaptive multi-perceptual similarity detection and resolution
CN104361415B (en) A kind of choosing method and device for showing information
CN110708285B (en) Flow monitoring method, device, medium and electronic equipment
CN113592605B (en) Product recommendation method, device, equipment and storage medium based on similar products
US9881079B2 (en) Quantification based classifier
CN115130711A (en) Data processing method and device, computer and readable storage medium
WO2019001463A1 (en) Data processing method and apparatus
CN115222443A (en) Client group division method, device, equipment and storage medium
CN109034199B (en) Data processing method and device, storage medium and electronic equipment
CN107729944B (en) Identification method and device of popular pictures, server and storage medium
CN112269875B (en) Text classification method, device, electronic equipment and storage medium
CN111667018B (en) Object clustering method and device, computer readable medium and electronic equipment
CN117216393A (en) Information recommendation method, training method and device of information recommendation model and equipment
US11861459B2 (en) Automatic determination of suitable hyper-local data sources and features for modeling
CN110705642B (en) Classification model, classification method, classification device, electronic equipment and storage medium
CN116503608A (en) Data distillation method based on artificial intelligence and related equipment
CN113591881B (en) Intention recognition method and device based on model fusion, electronic equipment and medium
CN111325372A (en) Method for establishing prediction model, prediction method, device, medium and equipment
CN112417260B (en) Localized recommendation method, device and storage medium
CN110262906B (en) Interface label recommendation method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant