CN111078742A - User classification model training method, user classification method and device - Google Patents

User classification model training method, user classification method and device Download PDF

Info

Publication number
CN111078742A
CN111078742A CN201911252400.0A CN201911252400A CN111078742A CN 111078742 A CN111078742 A CN 111078742A CN 201911252400 A CN201911252400 A CN 201911252400A CN 111078742 A CN111078742 A CN 111078742A
Authority
CN
China
Prior art keywords
user
sample
data
sample user
classification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911252400.0A
Other languages
Chinese (zh)
Other versions
CN111078742B (en
Inventor
海梓晗
潘峰
高雅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Miaozhen Information Technology Co Ltd
Original Assignee
Miaozhen Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Miaozhen Information Technology Co Ltd filed Critical Miaozhen Information Technology Co Ltd
Priority to CN201911252400.0A priority Critical patent/CN111078742B/en
Publication of CN111078742A publication Critical patent/CN111078742A/en
Application granted granted Critical
Publication of CN111078742B publication Critical patent/CN111078742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a user classification model training method, a user classification method and a user classification model training device, wherein the method comprises the following steps: when a user classification model is trained, generating feature data of each sample user through user data of each sample user in a preset time period, and then training to obtain the user classification model based on the feature data of each sample user and a corresponding gender label of each sample user, wherein the sample feature data comprises: the application program installation information corresponding to the terminal equipment of the sample user is used for training the user classification model by learning the use characteristics of the users with different genders on different application programs, so that the user classification model has higher classification accuracy.

Description

User classification model training method, user classification method and device
Technical Field
The present application relates to the field of machine learning technologies, and in particular, to a user classification model training method, a user classification method, and a user classification device.
Background
In the prior art, a binary classification model can be generally adopted to classify the genders of different user groups. The binary classification model belongs to a linear regression model, and is characterized in that the probability that each sample data can be distributed to a certain class is output by utilizing a large batch of input sample data, so that the purpose of classifying the sample data is achieved.
With the wide application of the mobile device, the behavior data of the mobile device in the advertising activity can be used as features, and the features are characterized by different weights, so that the sample data is classified according to the gender of the user.
However, the existing two-classification model has poor classification accuracy for some features with strong correlation.
Disclosure of Invention
In view of this, the present application at least provides a user classification model training method, a user classification method and a user classification device, which can classify the genders of users, so that the goal of digital marketing is more accurate, and the return on investment is improved.
In a first aspect, an embodiment of the present application provides a user classification model training method, including:
the method comprises the steps of obtaining sample user data of each sample user in a plurality of sample users in a first preset time period and a corresponding gender label of each sample user;
generating sample feature data for each of the sample users based on the sample user data for each of the sample users; the sample feature data includes: the application program installation information corresponding to the terminal equipment of the sample user;
and training to obtain the user classification model based on the sample characteristic data of each sample user and the gender label corresponding to each sample user.
In an alternative embodiment, the application installation information includes one or more of:
the time when each application program is installed in the terminal equipment, the frequency of using each application program in the preset time period by a user, and the classification of each application program.
In an optional embodiment, the sample feature data further includes:
historical pushing information pushed by the sample user to at least one application program in the terminal equipment and/or operation information for operating the historical pushing information by the sample;
the historical push information comprises one or more of the following: the content of the pushed information, the industry of the pushed information, the classification of the pushed information, the pushing media of the pushed information and the distribution platform of the pushed information;
the operation information comprises one or more of the following: clicking operation on the pushed historical push information, forwarding operation on the pushed historical push information, and timestamp information corresponding to the operation information.
In an optional embodiment, after the generating sample feature data of each of the sample users based on the sample user data of each of the sample users, the method further comprises:
and performing data cleaning on the characteristic data of each sample user based on the user data of each sample user.
In an alternative embodiment, the data cleansing includes:
based on the installation information of the application program of the mobile equipment, filtering the installation information of the application program of the mobile equipment, the time interval between installation and uninstallation of which is lower than a preset time threshold value, and/or invalid push messages;
the invalid push message comprises: push messages for which the user is not operating are sampled.
In an optional implementation manner, the training to obtain the user classification model based on the sample feature data of each sample user and the gender tag corresponding to each sample user includes:
randomly grouping the characteristic data of each sample user to obtain a model training set and a model verification set;
training a basic classification model based on the model training set and the gender labels corresponding to the characteristic data of each sample user in the model training set to obtain a user classification model which is preliminarily trained;
and verifying the preliminarily trained user classification model based on the model verification set and the gender labels corresponding to the characteristic data of each sample user in the model verification set, and obtaining the user classification model after the verification is passed.
In a second aspect, an embodiment of the present application provides a user classification method, including:
acquiring user data of a sample to be classified of a sample user to be classified in a second preset time period;
generating characteristic data of each sample user to be classified based on the sample user data to be classified; the characteristic data of the sample user to be classified comprises: the application program installation information corresponding to the terminal equipment of the sample user to be classified;
and inputting the characteristic data of each sample user to be classified into a user classification model obtained by any one of the user classification model training methods in the first aspect to obtain a user classification result.
In a third aspect, an embodiment of the present application further provides a user classification model training apparatus, where the user classification model training apparatus includes: first acquisition module, first generation module and training module, wherein:
the first obtaining module is used for obtaining sample user data of each sample user in a plurality of sample users within a first preset time period and a corresponding gender label of each sample user;
the first generation module is configured to generate sample feature data of each sample user based on the sample user data of each sample user; the sample feature data includes: the application program installation information corresponding to the terminal equipment of the sample user;
the training module is used for training to obtain the user classification model based on the sample characteristic data of each sample user and the gender label corresponding to each sample user.
In an alternative embodiment, the application installation information includes one or more of:
the time when each application program is installed in the terminal equipment, the frequency of using each application program in the preset time period by a user, and the classification of each application program.
In an optional embodiment, the sample feature data further includes:
historical pushing information pushed by the sample user to at least one application program in the terminal equipment and/or operation information for operating the historical pushing information by the sample;
the historical push information comprises one or more of the following: the content of the pushed information, the industry of the pushed information, the classification of the pushed information, the pushing media of the pushed information and the distribution platform of the pushed information;
the operation information comprises one or more of the following: clicking operation on the pushed historical push information, forwarding operation on the pushed historical push information, and timestamp information corresponding to the operation information.
In an optional embodiment, the first generating module, after the generating sample feature data of each sample user based on the sample user data of each sample user, is configured to:
and performing data cleaning on the characteristic data of each sample user based on the user data of each sample user.
In an alternative embodiment, the data cleansing includes:
based on the installation information of the application program of the mobile equipment, filtering the installation information of the application program of the mobile equipment, the time interval between installation and uninstallation of which is lower than a preset time threshold value, and/or invalid push messages;
the invalid push message comprises: push messages for which the user is not operating are sampled.
In an optional implementation manner, the training module is configured to train to obtain the user classification model based on sample feature data of each sample user and the gender label corresponding to each sample user, and includes:
randomly grouping the characteristic data of each sample user to obtain a model training set and a model verification set;
training a basic classification model based on the model training set and the gender labels corresponding to the characteristic data of each sample user in the model training set to obtain a user classification model which is preliminarily trained;
and verifying the preliminarily trained user classification model based on the model verification set and the gender labels corresponding to the characteristic data of each sample user in the model verification set, and obtaining the user classification model after the verification is passed.
In a fourth aspect, an embodiment of the present application further provides a user classification device, where the user classification device includes: a second obtaining module, a second generating module and a determining module, wherein:
the second obtaining module is used for obtaining the user data of the sample to be classified of the sample user to be classified in a second preset time period;
the second generation module is used for generating feature data of each sample user to be classified based on the sample user data to be classified; the characteristic data of the sample user to be classified comprises: the application program installation information corresponding to the terminal equipment of the sample user to be classified;
the input module is configured to input the feature data of each sample user to be classified into a user classification model obtained by any one of the user classification model training methods in the first aspect, so as to obtain a user classification result.
In a fifth aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect described above, or any possible implementation of the first aspect;
or to perform the steps of the second aspect described above, or any one of the possible embodiments of the second aspect.
In a sixth aspect, this application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, where the computer program is executed by a processor to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect;
or to perform the steps of the second aspect described above, or any one of the possible embodiments of the second aspect.
According to the user classification model training method, the user classification method and the user classification device provided by the embodiment of the application, when a user classification model is trained, the feature data of each sample user in a plurality of sample users is generated through the acquired user data of each sample user in a preset time period, and then the user classification model is trained based on the feature data of each sample user and the corresponding gender label of each sample user, wherein in the method, the sample feature data comprises the following steps: the application program installation information corresponding to the terminal equipment of the sample user is used for training the user classification model by learning the use characteristics of the users with different genders on different application programs, so that the user classification model has higher classification accuracy.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a flow chart illustrating a method for training a user classification model according to an embodiment of the present disclosure;
FIG. 2 is a flow chart illustrating a user classification method provided by an embodiment of the present application;
FIG. 3 is a schematic structural diagram illustrating a user classification model training apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram illustrating a user classifying apparatus according to an embodiment of the present application;
fig. 5 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
In view of the prior art, a binary model can be generally used to classify the gender of different user groups. The binary classification model belongs to a linear regression model, and is characterized in that the probability that each sample data can be distributed under a certain class is output by utilizing a large batch of input sample data, so that the purpose of classifying the sample data is achieved. However, the classification accuracy of the existing two-classification model for some features with strong correlation is poor, and the classification accuracy of a user cannot be met.
When the user classification model is trained, firstly, user data of each sample user in a plurality of sample users in a preset time period and a corresponding gender label of each sample user are obtained; then, generating characteristic data of each sample user based on the user data of each sample user; the sample feature data includes: the application program installation information corresponding to the terminal equipment of the sample user; and finally, training to obtain the user classification model based on the characteristic data of each sample user. In the application, as the downloading quantity and the downloading frequency of the male user and the downloading frequency of the female user are different aiming at different application programs, the gender of the user can be classified aiming at the difference, and the accuracy of the gender classification of the user is improved.
The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solution proposed by the present application to the above-mentioned problems in the following should be the contribution of the inventor to the present application in the process of the present application.
The technical solutions in the present application will be described clearly and completely with reference to the drawings in the present application, and it should be understood that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the present application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The execution subject of the user classification model training method and the user classification method provided by the embodiment of the present disclosure is generally a computer device with certain computing power, and the computer device includes: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the user classification model training method and the user classification method may be implemented by a processor calling computer-readable instructions stored in a memory.
The following describes a user classification model training method and a user classification method provided in the embodiments of the present disclosure, taking an execution subject as a computer device as an example.
Example one
Referring to fig. 1, a flowchart of a user classification model training method provided in an embodiment of the present application is shown, where the method includes steps S101 to S103, where:
s101: the method comprises the steps of obtaining sample user data of each sample user in a plurality of sample users in a first preset time period and a corresponding gender label of each sample user.
S102: generating sample feature data for each of the sample users based on the sample user data for each of the sample users; the sample feature data includes: and the application program installation information corresponding to the terminal equipment of the sample user.
S103: and training to obtain the user classification model based on the sample characteristic data of each sample user and the gender label corresponding to each sample user.
The following describes each of the above-mentioned steps S101 to S103 in detail.
Firstly, the method comprises the following steps: in the above S101, a plurality of sample users are obtained, and user data corresponding to each sample user in a first preset time period and a gender tag corresponding to each sample user are obtained.
For example, a part of sample users using the mobile device within a preset time period is selected, and the part of sample users are grouped according to the corresponding gender labels of the part of sample users, namely, male sample users and female sample users.
Illustratively, for the sample users after grouping, a male sample user is determined to be a positive sample, and a female sample user is determined to be a negative sample, for subsequent processing.
II, secondly: in S102 described above, feature data of each sample user is generated based on the user data of each sample user acquired in step S101.
Wherein the sample feature data comprises: and the application program installation information corresponding to the terminal equipment of the sample user.
Illustratively, the application installation information includes one or more of:
the time when each application program is installed in the terminal equipment, the frequency of using each application program in the preset time period by a user, and the classification of each application program.
For example, for some mobile device applications, the number of male users and female users downloading their applications may be different, for example, for some cosmetic applications, the target user is mainly female user, and therefore, the amount of downloading for the application by female user may be much higher than that by male user; for some fighting game applications, the target user is mainly a male user, and therefore, the downloading amount of the application by the male user may be much higher than that of the female user; applications for some life classes, such as: the difference of the download volumes of the taxi taking software, the takeout software and the like of male and female users is possibly small, so that the corresponding application programs can be popularized more specifically.
Illustratively, the frequency of use may also vary for male and female users for a particular mobile device application, such as: for some shopping applications, the download volumes of male and female users may not differ much, but female users may be more frequently used than male users; for some race-type game applications, the download amounts of male users and female users may not be different, but the male users may be more frequently used than the male users and the female users, so that the corresponding applications can be popularized more specifically.
In addition, the sample feature data further includes:
historical pushing information pushed by the sample user to at least one application program in the terminal equipment and/or operation information for operating the historical pushing information by the sample;
the historical push information comprises one or more of the following: the content of the pushed information, the industry of the pushed information, the classification of the pushed information, the pushing media of the pushed information and the distribution platform of the pushed information;
the operation information comprises one or more of the following: clicking operation on the pushed historical push information, forwarding operation on the pushed historical push information, and timestamp information corresponding to the operation information.
For example, some beauty makeup type push messages may be less attractive to male users, while female users may have a high possibility of clicking to view, and from the perspective of saving resources, the push messages may be pushed for female users; and the push message of some large fighting games can be pushed for male users.
After the generating sample feature data for each of the sample users based on the sample user data for each of the sample users, comprising:
and performing data cleaning on the characteristic data of each sample user based on the user data of each sample user.
Wherein the data cleansing comprises:
based on the installation information of the application program of the mobile equipment, filtering the installation information of the application program of the mobile equipment, the time interval between installation and uninstallation of which is lower than a preset time threshold value, and/or invalid push messages;
the invalid push message comprises: push messages for which the user is not operating are sampled.
For example, for a certain application, if the time interval between installation and uninstallation displayed in the installation information of the application is lower than the preset time threshold, the installation information of the application of the mobile device may be considered invalid, and no processing is performed on the application.
For the history push information pushed by the application program, if the sample user does not operate the history push information, the history push information can be considered invalid, and no processing is performed on the history push information.
For example, after the data cleansing is performed on the feature data of each sample user, the feature data of each sample user after the data cleansing may be further stored according to a standard storage format. For example: { application 1, application 2, application 3, …, application n }, and the like, facilitates the subsequent model training process.
Thirdly, the method comprises the following steps: in step S103, the user classification model is obtained through training based on the feature data of each sample user obtained in step S102 and the gender label corresponding to each sample user.
Randomly grouping the characteristic data of each sample user to obtain a model training set and a model verification set;
training a basic classification model based on the model training set and the gender labels corresponding to the characteristic data of each sample user in the model training set to obtain a user classification model which is preliminarily trained;
and verifying the preliminarily trained user classification model based on the model verification set and the gender labels corresponding to the characteristic data of each sample user in the model verification set, and obtaining the user classification model after the verification is passed.
Based on the above research, the embodiment of the application provides a user classification model training method. When a user classification model is trained, generating feature data of each sample user through user data of each sample user in a preset time period, and then training to obtain the user classification model based on the feature data of each sample user and a corresponding gender label of each sample user, wherein the sample feature data comprises: the application program installation information corresponding to the terminal equipment of the sample user is used for training the user classification model by learning the use characteristics of the users with different genders on different application programs, so that the user classification model has higher classification accuracy.
Example two
Referring to fig. 2, a flowchart of a user classification method provided in the second embodiment of the present application is shown, where the method includes steps S201 to S203, where:
s201: and acquiring the user data of the sample to be classified of the sample user to be classified in a second preset time period.
S202: generating characteristic data of each sample user to be classified based on the sample user data to be classified; the characteristic data of the sample user to be classified comprises: and the application program installation information corresponding to the terminal equipment of the sample user to be classified.
S203: and inputting the characteristic data of each sample user to be classified into the user classification model obtained by any one of the user classification model training methods to obtain a user classification result.
The following describes each of the above-mentioned steps S201 to S203 in detail.
The specific implementation manner of S201 to S202 is similar to that of S101 to S102, and is not described herein again.
In step S203, the feature data of each to-be-classified sample user obtained in steps S201 to S202 is input into the user classification model obtained by any one of the user classification model training methods obtained in the first embodiment, so as to obtain a user classification result.
After the user classification result is obtained, a final classification result of the sample user to be classified may be determined based on the user classification result.
Wherein the determining a final classification result of the sample user to be classified based on the user classification result comprises:
obtaining the confidence of the user classification result based on the user classification result;
screening the user classification results based on the confidence degrees of the user classification results and/or the magnitude of user data of the sample users to be classified in a preset time period, and determining the final classification results of the sample users to be classified;
the screening method comprises one or more of the following steps: and the confidence degree of the user classification result reaches a preset confidence degree threshold value of the user classification result, the magnitude of the user data of the sample user to be classified in a preset time period reaches a magnitude threshold value of the user data of the sample user to be classified in the preset time period, and the user classification result is randomly selected.
Illustratively, the user classification result corresponding to the confidence level of the user classification result reaching a preset confidence level threshold of the user classification result is selected as a final classification result.
For example, the corresponding user classification result that the magnitude of the user data of the sample user to be classified in the preset time period reaches the magnitude threshold of the user data of the sample user to be classified in the preset time period may be selected as the final classification result.
For example, the corresponding user classification result may also be determined as a final classification result by a random selection method.
For example, the selection method can be selected according to actual needs.
Based on the research, the user classification method provided by the embodiment of the application. Acquiring user data of a sample to be classified of a sample user to be classified in a second preset time period; generating characteristic data of each sample user to be classified based on the sample user data to be classified; the characteristic data of the sample user to be classified comprises: the application program installation information corresponding to the terminal equipment of the sample user to be classified; and the characteristic data of each sample user to be classified is input into the user classification model obtained by any one of the user classification model training methods to obtain a user classification result. The classification accuracy of the users is improved by using the use characteristics of the users with different genders for different application programs and using the trained user classification model to classify the users.
EXAMPLE III
Referring to fig. 3, which is a schematic diagram of a user classification model training apparatus provided in a third embodiment of the present application, the user classification model training apparatus includes: a first acquisition module 31, a first generation module 32 and a training module 33, wherein:
the first obtaining module 31 is configured to obtain sample user data of each sample user in a plurality of sample users within a first preset time period, and a gender tag corresponding to each sample user;
a first generating module 32, configured to generate sample feature data of each sample user based on the sample user data of each sample user; the sample feature data includes: the application program installation information corresponding to the terminal equipment of the sample user;
the training module 33 is configured to train to obtain the user classification model based on the sample feature data of each sample user and the gender label corresponding to each sample user.
Based on the research, the embodiment of the application provides a user classification model training device. When a user classification model is trained, generating feature data of each sample user through user data of each sample user in a preset time period, and then training to obtain the user classification model based on the feature data of each sample user and a corresponding gender label of each sample user, wherein the sample feature data comprises: the application program installation information corresponding to the terminal equipment of the sample user is used for training the user classification model by learning the use characteristics of the users with different genders on different application programs, so that the user classification model has higher classification accuracy.
In one possible embodiment, the application installation information includes one or more of the following:
the time when each application program is installed in the terminal equipment, the frequency of using each application program in the preset time period by a user, and the classification of each application program.
In a possible embodiment, the sample feature data further includes:
historical pushing information pushed by the sample user to at least one application program in the terminal equipment and/or operation information for operating the historical pushing information by the sample;
the historical push information comprises one or more of the following: the content of the pushed information, the industry of the pushed information, the classification of the pushed information, the pushing media of the pushed information and the distribution platform of the pushed information;
the operation information comprises one or more of the following: clicking operation on the pushed historical push information, forwarding operation on the pushed historical push information, and timestamp information corresponding to the operation information.
In a possible implementation, the first generating module 32 is configured to, after the generating sample feature data of each sample user based on the sample user data of each sample user, include:
and performing data cleaning on the characteristic data of each sample user based on the user data of each sample user.
In one possible embodiment, the data cleansing includes:
based on the installation information of the application program of the mobile equipment, filtering the installation information of the application program of the mobile equipment, the time interval between installation and uninstallation of which is lower than a preset time threshold value, and/or invalid push messages;
the invalid push message comprises: push messages for which the user is not operating are sampled.
In a possible implementation manner, the training module 33 is configured to train to obtain the user classification model based on the sample feature data of each sample user and the gender label corresponding to each sample user, and includes:
randomly grouping the characteristic data of each sample user to obtain a model training set and a model verification set;
training a basic classification model based on the model training set and the gender labels corresponding to the characteristic data of each sample user in the model training set to obtain a user classification model which is preliminarily trained;
and verifying the preliminarily trained user classification model based on the model verification set and the gender labels corresponding to the characteristic data of each sample user in the model verification set, and obtaining the user classification model after the verification is passed.
Example four
Referring to fig. 4, a user classifying device according to a fourth embodiment of the present application is provided, where the user classifying device includes: a second obtaining module 41, a second generating module 42, and an input module 43, wherein:
a second obtaining module 41, configured to obtain user data of a sample to be classified in a second preset time period;
a second generating module 42, configured to generate feature data of each sample user to be classified based on the sample user data to be classified; the characteristic data of the sample user to be classified comprises: the application program installation information corresponding to the terminal equipment of the sample user to be classified;
and the input module 43 is configured to input the feature data of each sample user to be classified into a user classification model obtained by any one of the above user classification model training methods, so as to obtain a user classification result.
Based on the research, the user classification device provided by the embodiment of the application. Acquiring user data of a sample to be classified of a sample user to be classified in a second preset time period; generating characteristic data of each sample user to be classified based on the sample user data to be classified; the characteristic data of the sample user to be classified comprises: the application program installation information corresponding to the terminal equipment of the sample user to be classified; and the characteristic data of each sample user to be classified is input into the user classification model obtained by any one of the user classification model training methods to obtain a user classification result. The classification accuracy of the users is improved by using the use characteristics of the users with different genders for different application programs and using the trained user classification model to classify the users.
EXAMPLE five
An embodiment of the present application further provides a computer device 500, as shown in fig. 5, which is a schematic structural diagram of the computer device 500 provided in the embodiment of the present application, and includes:
a processor 51, a memory 52, and a bus 53; the storage 52 is used for storing execution instructions and comprises a memory 521 and an external storage 522; the memory 521 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 51 and the data exchanged with the external memory 522 such as a hard disk, the processor 51 exchanges data with the external memory 522 through the memory 521, and when the computer device 500 operates, the processor 51 communicates with the memory 52 through the bus 53, so that the processor 51 executes the following instructions in a user mode:
the method comprises the steps of obtaining sample user data of each sample user in a plurality of sample users in a first preset time period and a corresponding gender label of each sample user;
generating sample feature data for each of the sample users based on the sample user data for each of the sample users; the sample feature data includes: the application program installation information corresponding to the terminal equipment of the sample user;
and training to obtain the user classification model based on the sample characteristic data of each sample user and the gender label corresponding to each sample user.
In one possible embodiment, the application installation information includes, in the instructions executed by the processor 51, one or more of the following:
the time when each application program is installed in the terminal equipment, the frequency of using each application program in the preset time period by a user, and the classification of each application program.
In a possible embodiment, the instructions executed by the processor 51 further include:
historical pushing information pushed by the sample user to at least one application program in the terminal equipment and/or operation information for operating the historical pushing information by the sample;
the historical push information comprises one or more of the following: the content of the pushed information, the industry of the pushed information, the classification of the pushed information, the pushing media of the pushed information and the distribution platform of the pushed information;
the operation information comprises one or more of the following: clicking operation on the pushed historical push information, forwarding operation on the pushed historical push information, and timestamp information corresponding to the operation information.
In a possible embodiment, the instructions executed by the processor 51, after the generating the sample feature data of each sample user based on the sample user data of each sample user, include:
and performing data cleaning on the characteristic data of each sample user based on the user data of each sample user.
In a possible embodiment, the data cleansing, in the instructions executed by the processor 51, includes:
based on the installation information of the application program of the mobile equipment, filtering the installation information of the application program of the mobile equipment, the time interval between installation and uninstallation of which is lower than a preset time threshold value, and/or invalid push messages;
the invalid push message comprises: push messages for which the user is not operating are sampled.
In a possible implementation manner, in the instructions executed by the processor 51, the training to obtain the user classification model based on the sample feature data of each sample user and the gender label corresponding to each sample user includes:
randomly grouping the characteristic data of each sample user to obtain a model training set and a model verification set;
training a basic classification model based on the model training set and the gender labels corresponding to the characteristic data of each sample user in the model training set to obtain a user classification model which is preliminarily trained;
and verifying the preliminarily trained user classification model based on the model verification set and the gender labels corresponding to the characteristic data of each sample user in the model verification set, and obtaining the user classification model after the verification is passed.
The processor 51 also executes the following instructions:
acquiring user data of a sample to be classified of a sample user to be classified in a second preset time period;
generating user characteristic data of each sample user to be classified based on the user data to be classified; the user characteristic data comprises: the application program installation information corresponding to the terminal equipment of the sample user to be classified;
and inputting the user characteristic data of each sample user to be classified into the user classification model obtained by any one of the user classification model training methods to obtain a user classification result.
The present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for training a user classification model and the steps of the user classification method in the foregoing method embodiments are performed.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (11)

1. A user classification model training method is characterized by comprising the following steps:
the method comprises the steps of obtaining sample user data of each sample user in a plurality of sample users in a first preset time period and a corresponding gender label of each sample user;
generating sample feature data for each of the sample users based on the sample user data for each of the sample users; the sample feature data includes: the application program installation information corresponding to the terminal equipment of the sample user;
and training to obtain the user classification model based on the sample characteristic data of each sample user and the gender label corresponding to each sample user.
2. The method of claim 1, wherein the application installation information comprises one or more of:
the time when each application program is installed in the terminal equipment, the frequency of using each application program in the preset time period by a user, and the classification of each application program.
3. The method of claim 2, wherein the sample feature data further comprises:
historical pushing information pushed by the sample user to at least one application program in the terminal equipment and/or operation information for operating the historical pushing information by the sample;
the historical push information comprises one or more of the following: the content of the pushed information, the industry of the pushed information, the classification of the pushed information, the pushing media of the pushed information and the distribution platform of the pushed information;
the operation information comprises one or more of the following: clicking operation on the pushed historical push information, forwarding operation on the pushed historical push information, and timestamp information corresponding to the operation information.
4. The method of claim 1, wherein after the generating sample feature data for each of the sample users based on the sample user data for each of the sample users, comprising:
and performing data cleaning on the characteristic data of each sample user based on the user data of each sample user.
5. The method of claim 4, wherein the data cleansing comprises:
based on the installation information of the application program of the mobile equipment, filtering the installation information of the application program of the mobile equipment, the time interval between installation and uninstallation of which is lower than a preset time threshold value, and/or invalid push messages;
the invalid push message comprises: push messages for which the user is not operating are sampled.
6. The method of claim 1, wherein the training the user classification model based on the sample feature data of each sample user and the gender label corresponding to each sample user comprises:
randomly grouping the characteristic data of each sample user to obtain a model training set and a model verification set;
training a basic classification model based on the model training set and the gender labels corresponding to the characteristic data of each sample user in the model training set to obtain a user classification model which is preliminarily trained;
and verifying the preliminarily trained user classification model based on the model verification set and the gender labels corresponding to the characteristic data of each sample user in the model verification set, and obtaining the user classification model after the verification is passed.
7. A method for classifying a user, the method comprising:
acquiring user data of a sample to be classified of a sample user to be classified in a second preset time period;
generating characteristic data of each sample user to be classified based on the sample user data to be classified; the characteristic data of the sample user to be classified comprises: the application program installation information corresponding to the terminal equipment of the sample user to be classified;
inputting the feature data of each sample user to be classified into the user classification model obtained by the user classification model training method of any one of claims 1 to 6 to obtain a user classification result.
8. A user classification model training apparatus, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring sample user data of each sample user in a plurality of sample users within a first preset time period and a corresponding gender label of each sample user;
a first generation module, configured to generate sample feature data of each sample user based on the sample user data of each sample user; the sample feature data includes: the application program installation information corresponding to the terminal equipment of the sample user;
and the training module is used for training to obtain the user classification model based on the sample characteristic data of each sample user and the gender label corresponding to each sample user.
9. A user classifying apparatus, the method comprising:
the second acquisition module is used for acquiring the user data of the sample to be classified of the sample user to be classified in a second preset time period;
the second generation module is used for generating the characteristic data of each sample user to be classified based on the sample user data to be classified; the characteristic data of the sample user to be classified comprises: the application program installation information corresponding to the terminal equipment of the sample user to be classified;
an input module, configured to input the feature data of each sample user to be classified into the user classification model obtained by the user classification model training method according to any one of claims 1 to 6, so as to obtain a user classification result.
10. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the method of any of claims 1 to 7.
11. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 7.
CN201911252400.0A 2019-12-09 2019-12-09 User classification model training method, user classification method and device Active CN111078742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911252400.0A CN111078742B (en) 2019-12-09 2019-12-09 User classification model training method, user classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911252400.0A CN111078742B (en) 2019-12-09 2019-12-09 User classification model training method, user classification method and device

Publications (2)

Publication Number Publication Date
CN111078742A true CN111078742A (en) 2020-04-28
CN111078742B CN111078742B (en) 2023-09-05

Family

ID=70313432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911252400.0A Active CN111078742B (en) 2019-12-09 2019-12-09 User classification model training method, user classification method and device

Country Status (1)

Country Link
CN (1) CN111078742B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434136A (en) * 2020-12-08 2021-03-02 深圳市欢太科技有限公司 Gender classification method, gender classification device, electronic equipment and computer storage medium
CN113095589A (en) * 2021-04-23 2021-07-09 北京明略昭辉科技有限公司 Population attribute determination method, device, equipment and storage medium
CN113850632A (en) * 2021-11-29 2021-12-28 平安科技(深圳)有限公司 User category determination method, device, equipment and storage medium
CN115689626A (en) * 2022-10-31 2023-02-03 荣耀终端有限公司 User attribute determination method of terminal equipment and electronic equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005038109A (en) * 2003-07-18 2005-02-10 Hitachi Ltd Service providing system, and method and device for communicating service request of customer
CN105095401A (en) * 2015-07-07 2015-11-25 北京嘀嘀无限科技发展有限公司 Method and apparatus for identifying gender
CN105654131A (en) * 2015-12-30 2016-06-08 小米科技有限责任公司 Classification model training method and device
CN106126597A (en) * 2016-06-20 2016-11-16 乐视控股(北京)有限公司 User property Forecasting Methodology and device
CN106453055A (en) * 2016-10-28 2017-02-22 努比亚技术有限公司 Method and apparatus for pushing information through user behaviors, and terminal
CN106897727A (en) * 2015-12-21 2017-06-27 百度在线网络技术(北京)有限公司 A kind of user's gender identification method and device
CN107886366A (en) * 2017-11-22 2018-04-06 深圳市金立通信设备有限公司 Generation method, sex fill method, terminal and the storage medium of Gender Classification model
CN108399418A (en) * 2018-01-23 2018-08-14 北京奇艺世纪科技有限公司 A kind of user classification method and device
CN109409949A (en) * 2018-10-17 2019-03-01 北京字节跳动网络技术有限公司 Determination method, apparatus, electronic equipment and the storage medium of user group's classification
CN109408723A (en) * 2018-11-06 2019-03-01 北京奇艺世纪科技有限公司 A kind of method for pushing and device
CN110096526A (en) * 2019-04-30 2019-08-06 秒针信息技术有限公司 A kind of prediction technique and prediction meanss of user property label
CN110191151A (en) * 2019-04-17 2019-08-30 广州精选速购网络科技有限公司 Information-pushing method, device, equipment and medium based on smart machine

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005038109A (en) * 2003-07-18 2005-02-10 Hitachi Ltd Service providing system, and method and device for communicating service request of customer
CN105095401A (en) * 2015-07-07 2015-11-25 北京嘀嘀无限科技发展有限公司 Method and apparatus for identifying gender
CN106897727A (en) * 2015-12-21 2017-06-27 百度在线网络技术(北京)有限公司 A kind of user's gender identification method and device
CN105654131A (en) * 2015-12-30 2016-06-08 小米科技有限责任公司 Classification model training method and device
CN106126597A (en) * 2016-06-20 2016-11-16 乐视控股(北京)有限公司 User property Forecasting Methodology and device
CN106453055A (en) * 2016-10-28 2017-02-22 努比亚技术有限公司 Method and apparatus for pushing information through user behaviors, and terminal
CN107886366A (en) * 2017-11-22 2018-04-06 深圳市金立通信设备有限公司 Generation method, sex fill method, terminal and the storage medium of Gender Classification model
CN108399418A (en) * 2018-01-23 2018-08-14 北京奇艺世纪科技有限公司 A kind of user classification method and device
CN109409949A (en) * 2018-10-17 2019-03-01 北京字节跳动网络技术有限公司 Determination method, apparatus, electronic equipment and the storage medium of user group's classification
CN109408723A (en) * 2018-11-06 2019-03-01 北京奇艺世纪科技有限公司 A kind of method for pushing and device
CN110191151A (en) * 2019-04-17 2019-08-30 广州精选速购网络科技有限公司 Information-pushing method, device, equipment and medium based on smart machine
CN110096526A (en) * 2019-04-30 2019-08-06 秒针信息技术有限公司 A kind of prediction technique and prediction meanss of user property label

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
戴斌;李寿山;贡正仙;周国栋;: "基于多类型文本的半监督性别分类方法研究", 山西大学学报(自然科学版), no. 01 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434136A (en) * 2020-12-08 2021-03-02 深圳市欢太科技有限公司 Gender classification method, gender classification device, electronic equipment and computer storage medium
CN112434136B (en) * 2020-12-08 2024-04-23 深圳市欢太科技有限公司 Sex classification method, apparatus, electronic device and computer storage medium
CN113095589A (en) * 2021-04-23 2021-07-09 北京明略昭辉科技有限公司 Population attribute determination method, device, equipment and storage medium
CN113850632A (en) * 2021-11-29 2021-12-28 平安科技(深圳)有限公司 User category determination method, device, equipment and storage medium
CN115689626A (en) * 2022-10-31 2023-02-03 荣耀终端有限公司 User attribute determination method of terminal equipment and electronic equipment
CN115689626B (en) * 2022-10-31 2024-03-01 荣耀终端有限公司 User attribute determining method of terminal equipment and electronic equipment

Also Published As

Publication number Publication date
CN111078742B (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN111078742A (en) User classification model training method, user classification method and device
CN105427129B (en) Information delivery method and system
CN110472154B (en) Resource pushing method and device, electronic equipment and readable storage medium
CN105045916A (en) Mobile game recommendation system and recommendation method thereof
CN111836063B (en) Live broadcast content identification method and device
CN108829769B (en) Suspicious group discovery method and device
CN111260220B (en) Group control equipment identification method and device, electronic equipment and storage medium
CN111242709A (en) Message pushing method and device, equipment and storage medium thereof
CN105183295A (en) Classification method for application icons and terminal
CN110782291A (en) Advertisement delivery user determination method and device, storage medium and electronic device
CN106910135A (en) User recommends method and device
CN114223012A (en) Push object determination method and device, terminal equipment and storage medium
CN112785069A (en) Prediction method and device for terminal equipment changing machine, storage medium and electronic equipment
CN111047332B (en) Model training and risk identification method, device and equipment
CN117611272A (en) Commodity recommendation method and device and electronic equipment
CN113486238A (en) Information pushing method, device and equipment based on user portrait and storage medium
CN113313615A (en) Method and device for quantitatively grading and grading enterprise judicial risks
CN113807436A (en) User mining method and device, computer equipment and readable storage medium
CN112907282A (en) Architecture application method based on global e-commerce industry advertisement DMP
CN113051126A (en) Image construction method, device and equipment and storage medium
CN110598211A (en) Article identification method and device, storage medium and electronic device
CN111967518B (en) Application labeling method, application labeling device and terminal equipment
CN112581161B (en) Object selection method and device, storage medium and electronic equipment
CN113742571B (en) Message pushing method and device based on big data and storage medium
CN116167829B (en) Multidimensional and multi-granularity user behavior analysis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant