CN111078742B - User classification model training method, user classification method and device - Google Patents

User classification model training method, user classification method and device Download PDF

Info

Publication number
CN111078742B
CN111078742B CN201911252400.0A CN201911252400A CN111078742B CN 111078742 B CN111078742 B CN 111078742B CN 201911252400 A CN201911252400 A CN 201911252400A CN 111078742 B CN111078742 B CN 111078742B
Authority
CN
China
Prior art keywords
user
sample
data
sample user
classified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911252400.0A
Other languages
Chinese (zh)
Other versions
CN111078742A (en
Inventor
海梓晗
潘峰
高雅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Miaozhen Information Technology Co Ltd
Original Assignee
Miaozhen Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Miaozhen Information Technology Co Ltd filed Critical Miaozhen Information Technology Co Ltd
Priority to CN201911252400.0A priority Critical patent/CN111078742B/en
Publication of CN111078742A publication Critical patent/CN111078742A/en
Application granted granted Critical
Publication of CN111078742B publication Critical patent/CN111078742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a user classification model training method, a user classification method and a device, wherein the method comprises the following steps: when training a user classification model, generating feature data of each sample user through user data of each sample user in a preset time period, and training to obtain the user classification model based on the feature data of each sample user and the gender label corresponding to each sample user, wherein the sample feature data comprises the following steps: application program installation information corresponding to the terminal equipment of the sample user is used for training the user classification model by learning the use characteristics of the users with different sexes on different application programs, so that the user classification model has higher classification accuracy.

Description

User classification model training method, user classification method and device
Technical Field
The present application relates to the field of machine learning technologies, and in particular, to a user classification model training method, a user classification method, and a device.
Background
In the prior art, a classification model is typically used to sexually classify different user populations. The classification model belongs to a linear regression model, and is characterized in that the probability that each sample data can be distributed under a certain class is output by utilizing a large amount of input sample data, so that the purpose of classifying the sample data is achieved.
With the widespread use of mobile devices, the sample data may be classified based on the gender of the user by using the behavior data of the mobile device in an advertising campaign as features and characterizing the features with different weights.
However, the existing classification model has poor classification accuracy for some features with strong correlation.
Disclosure of Invention
In view of this, the application provides at least one user classification model training method, user classification method and device, which can classify the gender of the user, so that the digital marketing target is more accurate, and the return on investment is improved.
In a first aspect, an embodiment of the present application provides a method for training a user classification model, including:
acquiring sample user data of each sample user in a plurality of sample users in a first preset time period and sex labels corresponding to each sample user;
generating sample feature data for each of the sample users based on the sample user data for each of the sample users; the sample characteristic data includes: application program installation information corresponding to the terminal equipment of the sample user;
and training to obtain the user classification model based on the sample characteristic data of each sample user and the gender label corresponding to each sample user.
In an alternative embodiment, the application installation information includes one or more of the following:
the time when each application is installed in the terminal device, the frequency with which the user uses each application within the preset time period, and the classification of each application.
In an alternative embodiment, the sample feature data further comprises:
the sample user carries out historical push information pushed by at least one application program in the terminal equipment and/or operation information for operating the historical push information by the sample;
the history push information comprises one or more of the following: content of the pushed information, industries to which the pushed information belongs, classification of the pushed information, pushing media of the pushed information and a distributing platform of the pushed information;
the operation information includes one or more of the following: clicking operation on the pushed history pushing information, forwarding operation on the pushed history pushing information, and timestamp information corresponding to the operation information.
In an alternative embodiment, after said generating sample feature data for each of said sample users based on said sample user data for said each sample user, the method comprises:
And carrying out data cleaning on the characteristic data of each sample user based on the user data of each sample user.
In an alternative embodiment, the data cleansing includes:
based on the installation information of the mobile device application program, filtering the installation information of the mobile device application program with the time interval between installation and uninstallation being lower than a preset time threshold value and/or invalid push messages;
the invalid push message includes: push messages on which the sample user is not operating.
In an optional implementation manner, the training to obtain the user classification model based on the sample feature data of each sample user and the gender label corresponding to each sample user includes:
randomly grouping the characteristic data of each sample user to obtain a model training set and a model verification set;
training a basic classification model based on the model training set and gender labels corresponding to the feature data of each sample user in the model training set to obtain a user classification model which is preliminarily trained;
and verifying the user classification model which is preliminarily trained based on the model verification set and sex labels corresponding to the feature data of each sample user in the model verification set, and obtaining the user classification model after verification is passed.
In a second aspect, an embodiment of the present application provides a user classification method, including:
acquiring sample user data to be classified of a sample user to be classified in a second preset time period;
generating characteristic data of each sample user to be classified based on the sample user data to be classified; the feature data of the sample users to be classified comprises: application program installation information corresponding to the terminal equipment of the sample user to be classified;
inputting the characteristic data of each sample user to be classified into the user classification model obtained by the user classification model training method of any one of the first aspect, and obtaining a user classification result.
In a third aspect, an embodiment of the present application further provides a user classification model training apparatus, where the user classification model training apparatus includes: the system comprises a first acquisition module, a first generation module and a training module, wherein:
the first acquisition module is used for acquiring sample user data of each sample user in a plurality of sample users in a first preset time period and gender labels corresponding to each sample user;
the first generation module is used for generating sample characteristic data of each sample user based on the sample user data of each sample user; the sample characteristic data includes: application program installation information corresponding to the terminal equipment of the sample user;
The training module is used for training to obtain the user classification model based on the sample characteristic data of each sample user and the gender label corresponding to each sample user.
In an alternative embodiment, the application installation information includes one or more of the following:
the time when each application is installed in the terminal device, the frequency with which the user uses each application within the preset time period, and the classification of each application.
In an alternative embodiment, the sample feature data further comprises:
the sample user carries out historical push information pushed by at least one application program in the terminal equipment and/or operation information for operating the historical push information by the sample;
the history push information comprises one or more of the following: content of the pushed information, industries to which the pushed information belongs, classification of the pushed information, pushing media of the pushed information and a distributing platform of the pushed information;
the operation information includes one or more of the following: clicking operation on the pushed history pushing information, forwarding operation on the pushed history pushing information, and timestamp information corresponding to the operation information.
In an alternative embodiment, the first generating module is configured to, after the generating, based on the sample user data of each sample user, sample feature data of each sample user, include:
and carrying out data cleaning on the characteristic data of each sample user based on the user data of each sample user.
In an alternative embodiment, the data cleansing includes:
based on the installation information of the mobile device application program, filtering the installation information of the mobile device application program with the time interval between installation and uninstallation being lower than a preset time threshold value and/or invalid push messages;
the invalid push message includes: push messages on which the sample user is not operating.
In an optional implementation manner, the training module is configured to train to obtain the user classification model based on sample feature data of each sample user and the gender label corresponding to each sample user, and includes:
randomly grouping the characteristic data of each sample user to obtain a model training set and a model verification set;
training a basic classification model based on the model training set and gender labels corresponding to the feature data of each sample user in the model training set to obtain a user classification model which is preliminarily trained;
And verifying the user classification model which is preliminarily trained based on the model verification set and sex labels corresponding to the feature data of each sample user in the model verification set, and obtaining the user classification model after verification is passed.
In a fourth aspect, an embodiment of the present application further provides a user classification apparatus, where the user classification apparatus includes: the device comprises a second acquisition module, a second generation module and a determination module, wherein:
the second acquisition module is used for acquiring the to-be-classified sample user data of the to-be-classified sample user in a second preset time period;
the second generation module is used for generating characteristic data of each sample user to be classified based on the sample user data to be classified; the feature data of the sample users to be classified comprises: application program installation information corresponding to the terminal equipment of the sample user to be classified;
the input module is used for inputting the characteristic data of each sample user to be classified into the user classification model obtained by the user classification model training method according to any one of the first aspect, and obtaining a user classification result.
In a fifth aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication over the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect, or any of the possible implementations of the first aspect;
Or performing the steps of the second aspect described above, or any of the possible embodiments of the second aspect.
In a sixth aspect, embodiments of the present application further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the first aspect, or any of the possible implementations of the first aspect;
or performing the steps of the second aspect described above, or any of the possible embodiments of the second aspect.
According to the user classification model training method, the user classification method and the user classification device, when the user classification model is trained, the characteristic data of each sample user is generated through the user data of each sample user in a preset time period, and then the user classification model is obtained through training based on the characteristic data of each sample user and the gender label corresponding to each sample user, wherein the sample characteristic data comprises the following steps: application program installation information corresponding to the terminal equipment of the sample user is used for training the user classification model by learning the use characteristics of the users with different sexes on different application programs, so that the user classification model has higher classification accuracy.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 shows a flowchart of a user classification model training method provided by an embodiment of the present application;
FIG. 2 is a flow chart of a user classification method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a training device for a user classification model according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a user classification device according to an embodiment of the present application;
fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
In view of the prior art, a classification model may be generally employed to sexually classify different user populations. The classification model belongs to a linear regression model, and is characterized in that the probability that each sample data can be distributed under a certain class is output by utilizing a large amount of input sample data, so that the purpose of classifying the sample data is achieved. However, the existing classification model has poor classification accuracy for some features with strong correlation, and cannot meet the classification accuracy of users.
When training the user classification model, the user classification model training method and the user classification method and the device provided by the embodiment of the application firstly acquire user data of each sample user in a preset time period and gender labels corresponding to each sample user in a plurality of sample users; then, generating characteristic data of each sample user based on the user data of each sample user; the sample characteristic data includes: application program installation information corresponding to the terminal equipment of the sample user; and finally, training to obtain the user classification model based on the characteristic data of each sample user. In the application, as the downloading quantity and the downloading frequency of the male user and the female user aiming at different application programs are different, the gender of the user can be classified aiming at the difference, and the accuracy of classifying the gender of the user is improved.
The present application is directed to a method for manufacturing a semiconductor device, and a semiconductor device manufactured by the method.
The following description of the embodiments of the present application will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the application are shown. The components of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The user classification model training method and the execution subject of the user classification method provided by the embodiments of the present disclosure are generally computer devices with a certain computing capability, where the computer devices include, for example: the terminal device, or server or other processing device, may be a User Equipment (UE), mobile device, user terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle mounted device, wearable device, etc. In some possible implementations, the user classification model training method and the user classification method may be implemented by a processor invoking computer readable instructions stored in a memory.
The user classification model training method and the user classification method provided in the embodiments of the present disclosure are described below by taking an execution subject as a computer device as an example.
Example 1
Referring to fig. 1, a flowchart of a user classification model training method according to a first embodiment of the present application is shown, where the method includes steps S101 to S103, in which:
s101: and acquiring sample user data of each sample user in a plurality of sample users in a first preset time period, and a gender label corresponding to each sample user.
S102: generating sample feature data for each of the sample users based on the sample user data for each of the sample users; the sample characteristic data includes: and the application program installation information corresponding to the terminal equipment of the sample user.
S103: and training to obtain the user classification model based on the sample characteristic data of each sample user and the gender label corresponding to each sample user.
Hereinafter, each of the above-mentioned S101 to S103 will be described in detail.
And (3) a step of: in S101, a plurality of sample users are obtained, and user data corresponding to each sample user in a first preset time period and a gender label corresponding to each sample user are obtained.
Illustratively, a portion of sample users of the mobile device are selected for use within a preset time period and grouped for their corresponding gender labels, i.e., male sample users and female sample users.
For example, for sample users after grouping, a male sample user is determined to be a positive sample and a female sample user is determined to be a negative sample for subsequent processing.
And II: in the above S102, the feature data of each sample user is generated based on the user data of each sample user acquired in step S101.
Wherein the sample characteristic data comprises: and the application program installation information corresponding to the terminal equipment of the sample user.
Illustratively, the application installation information includes one or more of the following:
the time when each application is installed in the terminal device, the frequency with which the user uses each application within the preset time period, and the classification of each application.
For example, for some mobile device applications, the number of male and female users downloading their applications may be different, e.g., for some cosmetic applications, the target user is primarily a female user, and thus the download amount of the application by a female user may be much higher than that of a male user; for some combat game applications, the target user is mainly a male user, and thus, the downloading amount of the male user for the application may be far higher than that of the female user; applications for some life classes, such as: taxi taking software, take-away software and the like, and download differences of male and female users thereof may be small, so that corresponding application programs can be promoted more pertinently.
For example, the frequency of use may also be different for a particular mobile device application, for example: for some shopping class applications, the download amounts of male and female users may not differ much, but female users may be more frequently used than male users; for some racing-type game applications, the downloading amounts of male and female users may not be different, but the male users may be more frequently used than the female and male users, and thus, the corresponding application may be promoted more pertinently.
Furthermore, the sample characteristic data further includes:
the sample user carries out historical push information pushed by at least one application program in the terminal equipment and/or operation information for operating the historical push information by the sample;
the history push information comprises one or more of the following: content of the pushed information, industries to which the pushed information belongs, classification of the pushed information, pushing media of the pushed information and a distributing platform of the pushed information;
the operation information includes one or more of the following: clicking operation on the pushed history pushing information, forwarding operation on the pushed history pushing information, and timestamp information corresponding to the operation information.
For example, the target groups for which the historical push information pushed by different application programs is different, for example, some cosmetic push messages may be less attractive to male users, and female users may click to view, so that the push messages may be pushed to female users from the perspective of saving resources; while some push messages of large combat games can be pushed against male users.
After said generating sample feature data for each of said sample users based on said sample user data for said each sample user, comprising:
and carrying out data cleaning on the characteristic data of each sample user based on the user data of each sample user.
Wherein, the data cleaning includes:
based on the installation information of the mobile device application program, filtering the installation information of the mobile device application program with the time interval between installation and uninstallation being lower than a preset time threshold value and/or invalid push messages;
the invalid push message includes: push messages on which the sample user is not operating.
For example, for an application program, if the time interval between installation and uninstallation of the installation information of the application program is lower than the preset time threshold, the installation information of the application program of the mobile device can be considered invalid, and no processing is performed on the installation information.
For the history push information pushed by the application program, if the sample user does not operate the history push information, the push information can be considered invalid, and no processing is performed on the history push information.
For example, after the data cleaning is performed on the feature data of each sample user, the feature data of each sample user after the data cleaning may be stored according to a standard storage format. For example: { application 1, application 2, application 3, …, application n } etc., facilitating the subsequent model training process.
Thirdly,: in S103, the user classification model is obtained by training based on the feature data of each sample user obtained in step S102 and the gender label corresponding to each sample user.
The feature data of each sample user is randomly grouped to obtain a model training set and a model verification set;
training a basic classification model based on the model training set and gender labels corresponding to the feature data of each sample user in the model training set to obtain a user classification model which is preliminarily trained;
and verifying the user classification model which is preliminarily trained based on the model verification set and sex labels corresponding to the feature data of each sample user in the model verification set, and obtaining the user classification model after verification is passed.
Based on the above study, the embodiment of the application provides a training method for a user classification model. When training a user classification model, generating feature data of each sample user through user data of each sample user in a preset time period, and training to obtain the user classification model based on the feature data of each sample user and the gender label corresponding to each sample user, wherein the sample feature data comprises the following steps: application program installation information corresponding to the terminal equipment of the sample user is used for training the user classification model by learning the use characteristics of the users with different sexes on different application programs, so that the user classification model has higher classification accuracy.
Example two
Referring to fig. 2, a flowchart of a user classification method according to a second embodiment of the present application is shown, where the method includes steps S201 to S203, in which:
s201: and acquiring the sample user data to be classified of the sample user to be classified in a second preset time period.
S202: generating characteristic data of each sample user to be classified based on the sample user data to be classified; the feature data of the sample users to be classified comprises: and the application program installation information corresponding to the terminal equipment of the sample user to be classified.
S203: and inputting the characteristic data of each sample user to be classified into a user classification model obtained by the user classification model training method of any one of the above items to obtain a user classification result.
Hereinafter, each of the above-described S201 to S203 will be described in detail.
The specific implementation of S201 to S202 is similar to that of S101 to S102, and will not be repeated here.
In S203, the feature data of each of the to-be-classified sample users obtained in steps S201 to S202 is input into the user classification model obtained by the user classification model training method according to any one of the first embodiments, so as to obtain a user classification result.
After the user classification result is obtained, a final classification result of the sample user to be classified can be determined based on the user classification result.
Wherein the determining the final classification result of the sample user to be classified based on the user classification result includes:
based on the user classification result, obtaining the confidence coefficient of the user classification result;
screening the user classification result based on the confidence level of the user classification result and/or the magnitude of the user data of the sample user to be classified in a preset time period, and determining the final classification result of the sample user to be classified;
The screening method comprises one or more of the following: the confidence coefficient of the user classification result reaches a confidence coefficient threshold value of a preset user classification result, the magnitude of the user data of the sample user to be classified in a preset time period reaches a magnitude threshold value of the user data of the sample user to be classified in the preset time period, and the user classification result is randomly selected.
For example, selecting a corresponding user classification result with the confidence coefficient reaching a preset confidence coefficient threshold value of the user classification result as a final classification result.
For example, a corresponding user classification result that the magnitude of the user data of the sample user to be classified in the preset time period reaches the magnitude threshold of the user data of the sample user to be classified in the preset time period may be selected as the final classification result.
For example, the corresponding user classification result may be determined to be the final classification result by a randomly selected method.
The selection method can be selected according to actual needs.
Based on the above study, the embodiment of the application provides a user classification method. Acquiring sample user data to be classified of a sample user to be classified in a second preset time period; generating characteristic data of each sample user to be classified based on the sample user data to be classified; the feature data of the sample users to be classified comprises: application program installation information corresponding to the terminal equipment of the sample user to be classified; and inputting the characteristic data of each sample user to be classified into the user classification model obtained by the user classification model training method of any one of the above items to obtain a user classification result. And classifying the users by using the use characteristics of the users with different sexes and using the trained user classification model, thereby improving the classification accuracy of the users.
Example III
Referring to fig. 3, a schematic diagram of a training device for a user classification model according to a third embodiment of the present application is shown, where the training device for a user classification model includes: a first acquisition module 31, a first generation module 32, and a training module 33, wherein:
a first obtaining module 31, configured to obtain sample user data of each sample user in a first preset time period and a gender label corresponding to each sample user;
a first generation module 32, configured to generate sample feature data of each sample user based on the sample user data of each sample user; the sample characteristic data includes: application program installation information corresponding to the terminal equipment of the sample user;
and a training module 33, configured to train to obtain the user classification model based on the sample feature data of each sample user and the gender label corresponding to each sample user.
Based on the above study, the embodiment of the application provides a training device for a user classification model. When training a user classification model, generating feature data of each sample user through user data of each sample user in a preset time period, and training to obtain the user classification model based on the feature data of each sample user and the gender label corresponding to each sample user, wherein the sample feature data comprises the following steps: application program installation information corresponding to the terminal equipment of the sample user is used for training the user classification model by learning the use characteristics of the users with different sexes on different application programs, so that the user classification model has higher classification accuracy.
In one possible embodiment, the application installation information includes one or more of the following:
the time when each application is installed in the terminal device, the frequency with which the user uses each application within the preset time period, and the classification of each application.
In a possible implementation manner, the sample characteristic data further includes:
the sample user carries out historical push information pushed by at least one application program in the terminal equipment and/or operation information for operating the historical push information by the sample;
the history push information comprises one or more of the following: content of the pushed information, industries to which the pushed information belongs, classification of the pushed information, pushing media of the pushed information and a distributing platform of the pushed information;
the operation information includes one or more of the following: clicking operation on the pushed history pushing information, forwarding operation on the pushed history pushing information, and timestamp information corresponding to the operation information.
In a possible implementation manner, the first generating module 32 is configured to, after the generating, based on the sample user data of each sample user, sample feature data of each sample user, include:
And carrying out data cleaning on the characteristic data of each sample user based on the user data of each sample user.
In a possible embodiment, the data cleansing includes:
based on the installation information of the mobile device application program, filtering the installation information of the mobile device application program with the time interval between installation and uninstallation being lower than a preset time threshold value and/or invalid push messages;
the invalid push message includes: push messages on which the sample user is not operating.
In a possible implementation manner, the training module 33 is configured to train to obtain the user classification model based on the sample feature data of each sample user and the gender label corresponding to each sample user, and includes:
randomly grouping the characteristic data of each sample user to obtain a model training set and a model verification set;
training a basic classification model based on the model training set and gender labels corresponding to the feature data of each sample user in the model training set to obtain a user classification model which is preliminarily trained;
and verifying the user classification model which is preliminarily trained based on the model verification set and sex labels corresponding to the feature data of each sample user in the model verification set, and obtaining the user classification model after verification is passed.
Example IV
Referring to fig. 4, a user classification device according to a fourth embodiment of the present application includes: a second acquisition module 41, a second generation module 42, and an input module 43, wherein:
a second obtaining module 41, configured to obtain to-be-classified sample user data of a to-be-classified sample user in a second preset time period;
a second generating module 42, configured to generate feature data of each sample user to be classified based on the sample user data to be classified; the feature data of the sample users to be classified comprises: application program installation information corresponding to the terminal equipment of the sample user to be classified;
and the input module 43 is configured to input the feature data of each sample user to be classified into the user classification model obtained by the user classification model training method according to any one of the above-mentioned items, so as to obtain a user classification result.
Based on the above study, the embodiment of the application provides a user classification device. Acquiring sample user data to be classified of a sample user to be classified in a second preset time period; generating characteristic data of each sample user to be classified based on the sample user data to be classified; the feature data of the sample users to be classified comprises: application program installation information corresponding to the terminal equipment of the sample user to be classified; and inputting the characteristic data of each sample user to be classified into the user classification model obtained by the user classification model training method of any one of the above items to obtain a user classification result. And classifying the users by using the use characteristics of the users with different sexes and using the trained user classification model, thereby improving the classification accuracy of the users.
Example five
The embodiment of the application also provides a computer device 500, as shown in fig. 5, which is a schematic structural diagram of the computer device 500 provided in the embodiment of the application, including:
a processor 51, a memory 52, and a bus 53; memory 52 is used to store execution instructions, including memory 521 and external storage 522; the internal memory 521 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 51 and data exchanged with the external memory 522 such as a hard disk, and the processor 51 exchanges data with the external memory 522 through the internal memory 521, and when the computer device 500 operates, the processor 51 and the memory 52 communicate with each other through the bus 53, so that the processor 51 executes the following instructions in a user mode:
acquiring sample user data of each sample user in a plurality of sample users in a first preset time period and sex labels corresponding to each sample user;
generating sample feature data for each of the sample users based on the sample user data for each of the sample users; the sample characteristic data includes: application program installation information corresponding to the terminal equipment of the sample user;
And training to obtain the user classification model based on the sample characteristic data of each sample user and the gender label corresponding to each sample user.
In a possible implementation manner, the instructions executed by the processor 51 include one or more of the following application installation information:
the time when each application is installed in the terminal device, the frequency with which the user uses each application within the preset time period, and the classification of each application.
In a possible implementation manner, in the instructions executed by the processor 51, the sample feature data further includes:
the sample user carries out historical push information pushed by at least one application program in the terminal equipment and/or operation information for operating the historical push information by the sample;
the history push information comprises one or more of the following: content of the pushed information, industries to which the pushed information belongs, classification of the pushed information, pushing media of the pushed information and a distributing platform of the pushed information;
the operation information includes one or more of the following: clicking operation on the pushed history pushing information, forwarding operation on the pushed history pushing information, and timestamp information corresponding to the operation information.
In a possible implementation manner, the instructions executed by the processor 51 include, after the generating, based on the sample user data of each sample user, sample feature data of each sample user:
and carrying out data cleaning on the characteristic data of each sample user based on the user data of each sample user.
In a possible implementation manner, the data cleansing in the instructions executed by the processor 51 includes:
based on the installation information of the mobile device application program, filtering the installation information of the mobile device application program with the time interval between installation and uninstallation being lower than a preset time threshold value and/or invalid push messages;
the invalid push message includes: push messages on which the sample user is not operating.
In a possible implementation manner, in the instructions executed by the processor 51, the training to obtain the user classification model based on the sample feature data of each sample user and the gender label corresponding to each sample user includes:
randomly grouping the characteristic data of each sample user to obtain a model training set and a model verification set;
Training a basic classification model based on the model training set and gender labels corresponding to the feature data of each sample user in the model training set to obtain a user classification model which is preliminarily trained;
and verifying the user classification model which is preliminarily trained based on the model verification set and sex labels corresponding to the feature data of each sample user in the model verification set, and obtaining the user classification model after verification is passed.
The processor 51 also executes the following instructions:
acquiring sample user data to be classified of a sample user to be classified in a second preset time period;
generating user characteristic data of each sample user to be classified based on the user data to be classified; the user characteristic data includes: application program installation information corresponding to the terminal equipment of the sample user to be classified;
and inputting the user characteristic data of each sample user to be classified into the user classification model obtained by the user classification model training method of any one of the above items to obtain a user classification result.
The embodiment of the application also provides a computer readable storage medium, and a computer program is stored on the computer readable storage medium, and the computer program is executed by a processor to execute the steps of the user classification model training method and the user classification method in the embodiment of the method.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for training a user classification model, the method comprising:
acquiring sample user data of each sample user in a plurality of sample users in a first preset time period and sex labels corresponding to each sample user;
generating sample feature data for each of the sample users based on the sample user data for each of the sample users; the sample characteristic data includes: application program installation information corresponding to the terminal equipment of the sample user; the application installation information includes one or more of the following: the time when each application is installed in the terminal device, the frequency at which the user uses each application within the preset time period, and the classification of each application; the sample characteristic data further comprises history pushing information pushed by the sample user to at least one application program in the terminal equipment and/or operation information for operating the history pushing information by the sample;
training to obtain the user classification model based on sample feature data of each sample user and the gender label corresponding to each sample user; the user classification model is used for receiving the sample user data to be classified of the sample user to be classified in a second preset time period and outputting a user classification result; based on the user classification result, obtaining the confidence coefficient of the user classification result; and screening the user classification result based on the confidence level of the user classification result and/or the magnitude of the user data of the sample user to be classified in a preset time period, and determining the final classification result of the sample user to be classified.
2. The method of claim 1, wherein the sample characterization data further comprises:
the history push information comprises one or more of the following: content of the pushed information, industries to which the pushed information belongs, classification of the pushed information, pushing media of the pushed information and a distributing platform of the pushed information;
the operation information includes one or more of the following: clicking operation on the pushed history pushing information, forwarding operation on the pushed history pushing information, and timestamp information corresponding to the operation information.
3. The method of claim 1, wherein after said generating sample feature data for each of said sample users based on said sample user data for said each of said sample users, comprising:
and carrying out data cleaning on the characteristic data of each sample user based on the user data of each sample user.
4. A method according to claim 3, wherein the data cleansing comprises:
based on the installation information of the mobile device application program, filtering the installation information of the mobile device application program with the time interval between installation and uninstallation being lower than a preset time threshold value and/or invalid push messages;
The invalid push message includes: push messages on which the sample user is not operating.
5. The method according to claim 1, wherein training the user classification model based on sample feature data of each sample user and the gender label corresponding to each sample user comprises:
randomly grouping the characteristic data of each sample user to obtain a model training set and a model verification set;
training a basic classification model based on the model training set and gender labels corresponding to the feature data of each sample user in the model training set to obtain a user classification model which is preliminarily trained;
and verifying the user classification model which is preliminarily trained based on the model verification set and sex labels corresponding to the feature data of each sample user in the model verification set, and obtaining the user classification model after verification is passed.
6. A method of classifying users, the method comprising:
acquiring sample user data to be classified of a sample user to be classified in a second preset time period;
generating characteristic data of each sample user to be classified based on the sample user data to be classified; the feature data of the sample users to be classified comprises: application program installation information corresponding to the terminal equipment of the sample user to be classified;
Inputting the characteristic data of each sample user to be classified into a user classification model obtained by the user classification model training method according to any one of claims 1-5, and obtaining a user classification result.
7. A user classification model training apparatus, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring sample user data of each sample user in a first preset time period and gender labels corresponding to each sample user;
a first generation module, configured to generate sample feature data of each sample user based on the sample user data of each sample user; the sample characteristic data includes: application program installation information corresponding to the terminal equipment of the sample user; the application installation information includes one or more of the following: the time when each application is installed in the terminal device, the frequency at which the user uses each application within the preset time period, and the classification of each application; the sample characteristic data further comprises history pushing information pushed by the sample user to at least one application program in the terminal equipment and/or operation information for operating the history pushing information by the sample;
The training module is used for training to obtain the user classification model based on the sample characteristic data of each sample user and the gender label corresponding to each sample user; the user classification model is used for receiving the sample user data to be classified of the sample user to be classified in a second preset time period and outputting a user classification result; based on the user classification result, obtaining the confidence coefficient of the user classification result; and screening the user classification result based on the confidence level of the user classification result and/or the magnitude of the user data of the sample user to be classified in a preset time period, and determining the final classification result of the sample user to be classified.
8. A user classification apparatus, the method comprising:
the second acquisition module is used for acquiring the sample user data to be classified of the sample user to be classified in a second preset time period;
the second generation module is used for generating characteristic data of each sample user to be classified based on the sample user data to be classified; the feature data of the sample users to be classified comprises: application program installation information corresponding to the terminal equipment of the sample user to be classified;
The input module is used for inputting the characteristic data of each sample user to be classified into the user classification model obtained by the user classification model training method according to any one of claims 1-5 to obtain a user classification result.
9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication over the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the method of any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, performs the steps of the method according to any of claims 1 to 6.
CN201911252400.0A 2019-12-09 2019-12-09 User classification model training method, user classification method and device Active CN111078742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911252400.0A CN111078742B (en) 2019-12-09 2019-12-09 User classification model training method, user classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911252400.0A CN111078742B (en) 2019-12-09 2019-12-09 User classification model training method, user classification method and device

Publications (2)

Publication Number Publication Date
CN111078742A CN111078742A (en) 2020-04-28
CN111078742B true CN111078742B (en) 2023-09-05

Family

ID=70313432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911252400.0A Active CN111078742B (en) 2019-12-09 2019-12-09 User classification model training method, user classification method and device

Country Status (1)

Country Link
CN (1) CN111078742B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434136B (en) * 2020-12-08 2024-04-23 深圳市欢太科技有限公司 Sex classification method, apparatus, electronic device and computer storage medium
CN113095589A (en) * 2021-04-23 2021-07-09 北京明略昭辉科技有限公司 Population attribute determination method, device, equipment and storage medium
CN113850632B (en) * 2021-11-29 2022-03-01 平安科技(深圳)有限公司 User category determination method, device, equipment and storage medium
CN115689626B (en) * 2022-10-31 2024-03-01 荣耀终端有限公司 User attribute determining method of terminal equipment and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005038109A (en) * 2003-07-18 2005-02-10 Hitachi Ltd Service providing system, and method and device for communicating service request of customer
CN105095401A (en) * 2015-07-07 2015-11-25 北京嘀嘀无限科技发展有限公司 Method and apparatus for identifying gender
CN105654131A (en) * 2015-12-30 2016-06-08 小米科技有限责任公司 Classification model training method and device
CN106453055A (en) * 2016-10-28 2017-02-22 努比亚技术有限公司 Method and apparatus for pushing information through user behaviors, and terminal
CN107886366A (en) * 2017-11-22 2018-04-06 深圳市金立通信设备有限公司 Generation method, sex fill method, terminal and the storage medium of Gender Classification model
CN108399418A (en) * 2018-01-23 2018-08-14 北京奇艺世纪科技有限公司 A kind of user classification method and device
CN110096526A (en) * 2019-04-30 2019-08-06 秒针信息技术有限公司 A kind of prediction technique and prediction meanss of user property label
CN110191151A (en) * 2019-04-17 2019-08-30 广州精选速购网络科技有限公司 Information-pushing method, device, equipment and medium based on smart machine

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897727A (en) * 2015-12-21 2017-06-27 百度在线网络技术(北京)有限公司 A kind of user's gender identification method and device
CN106126597A (en) * 2016-06-20 2016-11-16 乐视控股(北京)有限公司 User property Forecasting Methodology and device
CN109409949A (en) * 2018-10-17 2019-03-01 北京字节跳动网络技术有限公司 Determination method, apparatus, electronic equipment and the storage medium of user group's classification
CN109408723A (en) * 2018-11-06 2019-03-01 北京奇艺世纪科技有限公司 A kind of method for pushing and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005038109A (en) * 2003-07-18 2005-02-10 Hitachi Ltd Service providing system, and method and device for communicating service request of customer
CN105095401A (en) * 2015-07-07 2015-11-25 北京嘀嘀无限科技发展有限公司 Method and apparatus for identifying gender
CN105654131A (en) * 2015-12-30 2016-06-08 小米科技有限责任公司 Classification model training method and device
CN106453055A (en) * 2016-10-28 2017-02-22 努比亚技术有限公司 Method and apparatus for pushing information through user behaviors, and terminal
CN107886366A (en) * 2017-11-22 2018-04-06 深圳市金立通信设备有限公司 Generation method, sex fill method, terminal and the storage medium of Gender Classification model
CN108399418A (en) * 2018-01-23 2018-08-14 北京奇艺世纪科技有限公司 A kind of user classification method and device
CN110191151A (en) * 2019-04-17 2019-08-30 广州精选速购网络科技有限公司 Information-pushing method, device, equipment and medium based on smart machine
CN110096526A (en) * 2019-04-30 2019-08-06 秒针信息技术有限公司 A kind of prediction technique and prediction meanss of user property label

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多类型文本的半监督性别分类方法研究;戴斌;李寿山;贡正仙;周国栋;;山西大学学报(自然科学版)(第01期);全文 *

Also Published As

Publication number Publication date
CN111078742A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
CN111078742B (en) User classification model training method, user classification method and device
CN108280115B (en) Method and device for identifying user relationship
CN106886518B (en) Microblog account number classification method
CN110472154B (en) Resource pushing method and device, electronic equipment and readable storage medium
CN106709318B (en) A kind of recognition methods of user equipment uniqueness, device and calculate equipment
CN110210899B (en) Advertisement pushing method, device and equipment based on advertisement similarity
CN103748579A (en) Processing data in a mapreduce framework
CN109242537A (en) Advertisement placement method, device, computer equipment and storage medium
CN110689084B (en) Abnormal user identification method and device
CN108304432B (en) Information push processing method, information push processing device and storage medium
CN107644106B (en) Method, terminal device and storage medium for automatically mining service middleman
CN111160624B (en) User intention prediction method, user intention prediction device and terminal equipment
CN109885834B (en) Method and device for predicting age and gender of user
CN111163072A (en) Method and device for determining characteristic value in machine learning model and electronic equipment
CN106910135A (en) User recommends method and device
CN108090193B (en) Abnormal text recognition method and device
CN111612085A (en) Method and device for detecting abnormal point in peer-to-peer group
CN108710656B (en) Content pushing method and device
CN111836064B (en) Live broadcast content identification method and device
CN110827080A (en) Directional pushing method and device
CN113762423A (en) Data processing and model training method and device, electronic equipment and storage medium
CN110737693A (en) Data mining processing method, device, equipment and computer readable storage medium
CN110598211A (en) Article identification method and device, storage medium and electronic device
CN113051126A (en) Image construction method, device and equipment and storage medium
CN111581485B (en) Information distribution method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant