CN109885834B - Method and device for predicting age and gender of user - Google Patents

Method and device for predicting age and gender of user Download PDF

Info

Publication number
CN109885834B
CN109885834B CN201910120476.1A CN201910120476A CN109885834B CN 109885834 B CN109885834 B CN 109885834B CN 201910120476 A CN201910120476 A CN 201910120476A CN 109885834 B CN109885834 B CN 109885834B
Authority
CN
China
Prior art keywords
app
feature
information
feature set
age
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910120476.1A
Other languages
Chinese (zh)
Other versions
CN109885834A (en
Inventor
高洁
关键
张涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN201910120476.1A priority Critical patent/CN109885834B/en
Publication of CN109885834A publication Critical patent/CN109885834A/en
Application granted granted Critical
Publication of CN109885834B publication Critical patent/CN109885834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method for predicting the age and sex of a user, relates to the field of computers, and is used for accurately predicting the age of a terminal user. The method comprises the following steps: acquiring a first set and a second set, wherein the first set comprises first terminal information, first application program APP information and corresponding user age and gender information of a plurality of users, and the second set comprises second terminal information and second APP information of the plurality of users; and carrying out feature extraction on the first set to obtain a first feature set, carrying out feature extraction on the second set to obtain a second feature set, training a machine learning algorithm according to the first feature set and corresponding user age and gender information to determine parameters of the machine learning algorithm and obtain a third feature set, and substituting the fourth feature set into the machine learning algorithm adopting the parameters to obtain the age and gender information of the target user. The embodiment of the application is applied to the prediction of the age and the gender of the end user.

Description

Method and device for predicting age and gender of user
Technical Field
The present application relates to the field of computers, and in particular, to a method and an apparatus for predicting age and gender of a user.
Background
In the current society, people use mobile phones to surf the internet, shop, socialize, work and the like every day, the mobile phones almost bear all behaviors and preferences of one user, and an operator can help an Application (APP) enterprise to know behavior characteristics of a terminal user by predicting the age of the user of a mobile phone terminal, so that the APP is better developed; the system can also help operators, e-commerce companies and the like to develop more accurate internet advertisement putting services, so that the advertisement putting cost is effectively saved.
The prior art predicts the age of an end user through the installation list information of the end user. Although the ages of users can be predicted through the installation list information, the installation list information is static information and cannot capture some service behavior characteristics of the users using the APP, for example, if two users both install certain game software, but the use behaviors of the two users on the game software are completely different, the two users are likely to be people in different age groups, so that the attraction degree and the use habits of the game software are completely different, and the ages predicted through the installation list information are exactly the same, so that the existing prediction method has the problem of inaccurate prediction.
Disclosure of Invention
The embodiment of the application provides a method and a device for predicting the age and sex of a user, which are used for solving the problem of inaccurate prediction in the conventional prediction method.
In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:
in a first aspect, an embodiment of the present application provides a method for predicting a user's age and gender, where the method includes:
acquiring a first set and a second set, wherein the first set comprises first terminal information, first application program APP information and corresponding user age and gender information of a plurality of users, the second set comprises second terminal information and second APP information of the plurality of users, the first APP information and the second APP information comprise APP usage information, and the APP usage information is used for indicating the time of using the corresponding APP by the user;
performing feature extraction on the first set to obtain a first feature set, and performing feature extraction on the second set to obtain a second feature set, where the first feature set includes features of the first terminal information and features of the first APP information, and the second feature set includes features of the second terminal information and features of the second APP information;
training a machine learning algorithm according to the first feature set and the corresponding user age and gender information to determine parameters of the machine learning algorithm and obtain a third feature set, wherein the third feature set is a set of features of which corresponding loss function values in the first feature set are smaller than a first preset value;
and substituting a fourth feature set into a machine learning algorithm adopting the parameters to obtain age and gender information of the target user, wherein the fourth feature set is a set of features which are the same as the third feature set in the second feature set.
In a second aspect, an embodiment of the present application provides an apparatus for predicting a user's age and gender, including:
an obtaining unit, configured to obtain a first set and a second set, where the first set includes first terminal information, first application APP information, and corresponding user age and gender information of multiple users, the second set includes second terminal information and second APP information of the multiple users, the first APP information and the second APP information include APP usage information, and the APP usage information is used to indicate a time when the user uses a corresponding APP;
an extracting unit, configured to perform feature extraction on the first set acquired by the acquiring unit to obtain a first feature set, and perform feature extraction on the second set acquired by the acquiring unit to obtain a second feature set, where the first feature set includes a feature of the first terminal information and a feature of the first APP information, and the second feature set includes a feature of the second terminal information and a feature of the second APP information;
the training unit is used for training a machine learning algorithm according to the first feature set extracted by the extraction unit and the corresponding user age and gender information acquired by the acquisition unit to determine parameters of the machine learning algorithm and acquire a third feature set, wherein the third feature set is a set of features of which corresponding loss function values in the first feature set are smaller than a first preset value;
and the predicting unit is used for substituting a fourth feature set into a machine learning algorithm adopting the parameters obtained by the training unit to obtain the age and gender information of the target user, wherein the fourth feature set is a set of features which are the same as the third feature set in the second feature set.
In a third aspect, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computer, cause the computer to perform the method for predicting the age and sex of a user according to the first aspect.
In a fourth aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of predicting age gender of a user according to the first aspect.
In a fifth aspect, an apparatus for predicting a user age is provided, including: a processor and a memory, wherein the memory is used for storing programs, and the processor calls the programs stored in the memory to execute the method for predicting the age and sex of the user according to the first aspect.
According to the method and the device for predicting the age and the sex of the users, a first set and a second set are obtained, wherein the first set comprises first terminal information, first application program APP information and corresponding age and sex information of the users, and the second set comprises second terminal information and second APP information of the users; and carrying out feature extraction on the first set to obtain a first feature set, carrying out feature extraction on the second set to obtain a second feature set, training a machine learning algorithm according to the first feature set and corresponding user age and gender information to determine parameters of the machine learning algorithm and obtain a third feature set, and substituting the fourth feature set into the machine learning algorithm adopting the parameters to obtain the age and gender information of the target user. Compare and predict terminal user's age through installation list information among the prior art, terminal information and APP information have been introduced to this application, and the APP information includes APP use information to can use the different users ' age sex of same APP according to APP use information differentiation, improved user age sex prediction's accuracy.
Drawings
Fig. 1 is a first flowchart illustrating a method for predicting the age and gender of a user according to an embodiment of the present disclosure;
fig. 2 is a flowchart illustrating a second method for predicting the age and gender of a user according to an embodiment of the present disclosure;
fig. 3 is a third flowchart illustrating a method for predicting the age and gender of a user according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an apparatus for predicting the age and sex of a user according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Carry out accurate age prediction to end user, can help APP enterprise to know own user's behavioral characteristics to better develop current APP product, can also help operator, E-commerce company etc. to develop more accurate internet advertisement service of putting, thereby effectual saving advertisement expense of putting.
The main thinking of this application is through introducing terminal information and APP information, and the APP information includes APP use information to can use the different users 'of same APP age according to APP use information differentiation, improved user age gender prediction's accuracy.
Examples 1,
The embodiment of the application provides a method for predicting the age and sex of a user, and as shown in fig. 1, the method for predicting the age and sex of the user comprises the following steps:
s101, acquiring a first set and a second set.
Illustratively, the first set includes first terminal information, first application program APP information and corresponding user age and gender information of a plurality of users, the second set includes second terminal information and second APP information of the plurality of users, the first APP information and the second APP information include APP usage information, and the APP usage information is used for indicating a time when the user uses the corresponding APP.
Illustratively, the number of users included in the first set is twice the number of users included in the second set, e.g., assuming 75000 users, the first set may include 50000 users 'information and the second set includes 25000 users' information.
Illustratively, the terminal information of the user includes an identification number (ID) of the user, a terminal brand, a terminal model, and a terminal price. The APP information includes APP usage information indicating a time when the user uses the corresponding APP, e.g., a start time and an end time when the user uses the corresponding APP.
Illustratively, the APP information further includes APP installation list information, a user ID, APP names, a first class of APP, and a second class of APP, the APP installation list information includes names of all APPs installed by the user, the first class of APP may be a financial class, the second class of APP may be investment management, and the first class of APP and the second class of APP may be empty, but do not affect the overall prediction method.
And S102, performing feature extraction on the first set to obtain a first feature set, and performing feature extraction on the second set to obtain a second feature set.
Illustratively, the first feature set includes features of the first terminal information and features of the first APP information, and the second feature set includes features of the second terminal information and features of the second APP information.
Illustratively, the characteristics of the terminal information include the terminal brand, the terminal model and the terminal price of the user, the characteristics of the APP information include the APP installation number, the frequency of use of the APP key vocabulary in each period of the day and the usage of the APP key vocabulary in each period of the day, the APP installation number can be obtained according to the APP installation list information, and the APP key vocabulary is obtained according to the APP installation list information.
Specifically, as shown in fig. 2, the step S102 includes steps S1021 to S1024:
and S1021, obtaining the weight value of the APP vocabulary according to the APP installation list information.
For example, the APP vocabulary may be the name of APP, and the weight value of the APP vocabulary may be obtained according to APP installation list information and a term-inverse document frequency (TF-IDF) algorithm. The main idea of TFIDF is that if a word or phrase occurs frequently in one article and rarely in other articles, the word or phrase is considered to have good class discrimination ability and is suitable for classification.
The formula of the TF-IDF algorithm is TF-ITF i,j =TF i,j ×IDF i Wherein, TF-ITF i,j Weighted value of word i, TF i,j Indicating the frequency of occurrence of word i in article j, IDF i Representing the inverse document frequency of the word i.
TF i,j The calculation formula of (2) is as follows:
Figure BDA0001971719250000051
wherein n is i,j Represents the number of times the word i appears in article j, Σ k n k,j Representing the sum of all words appearing in article j.
IDF i The importance of the word i is measured, and can be obtained by dividing the total number of documents by the number of documents containing the word, and taking the obtained quotient to be a logarithm with the base of 10, wherein the calculation formula is as follows:
Figure BDA0001971719250000061
where D represents the total number of files, | { j: t | { i ∈d j Denotes the number of files containing the word i.
And S1022, determining an APP vocabulary with the weight value higher than a second preset value as an APP key vocabulary.
For example, 5 to 10 APP words with the highest weight values may be selected as the APP key words, that is, APP words with weight values higher than the second preset value are selected as the APP key words.
S1023, obtaining the use frequency of each time interval of the APP key words and the use amount of each time interval of the APP key words corresponding to the first set according to the APP use information corresponding to the APP key words in the first set; and determining that the first characteristic set comprises the usage frequency of each time interval of the APP key words and the usage amount of each time interval of the APP key words corresponding to the first characteristic set.
Illustratively, the usage frequency of each period of the APP key words comprises the usage frequency of 0 to 6 points, the usage frequency of 6 to 12 points, the usage frequency of 12 to 18 points and the usage frequency of 18 to 24 points of the APP key words, and the usage amount of each period of the APP key words comprises the usage amount of 0 to 6 points, the usage amount of 6 to 12 points, the usage amount of 12 to 18 points and the usage amount of 18 to 24 points.
S1024, obtaining the use frequency of each time interval of the APP key words and the use amount of each time interval of the APP key words corresponding to the second set according to the APP use information corresponding to the APP key words in the second set; and determining that the second characteristic set comprises the usage frequency of each time interval of the APP key words and the usage amount of each time interval of the APP key words corresponding to the second characteristic set.
S103, training the machine learning algorithm according to the first feature set and the corresponding age and gender information of the user to determine parameters of the machine learning algorithm and obtain a third feature set.
Illustratively, the third feature set is a set of features in the first feature set, for which the corresponding loss function values are smaller than the first preset value.
Illustratively, the machine learning algorithm may be a gradient boosting algorithm (lightGBM), and the lightGBM is a learning algorithm based on a decision tree algorithm, and has the advantages of faster training efficiency, low memory usage, higher accuracy, support of parallelization learning, and capability of processing large-scale data.
Illustratively, the third feature set is initially empty.
Specifically, as shown in fig. 3, the step S103 includes steps S1031 to S1033:
and S1031, substituting the first feature, the third feature set and corresponding user age and gender information into a machine learning algorithm for training, and adjusting parameters of the machine learning algorithm to obtain first parameters of the machine learning algorithm.
Illustratively, the first feature is one feature in a training set, and the training set is a set of a preset number of features in the first feature set.
Illustratively, the first feature set is divided into a training set and a verification set, the features in the training set are used for training the machine learning algorithm, and the features in the verification set are used for cross-verifying the trained machine learning algorithm and verifying whether the trained machine learning algorithm meets the requirements. The ratio of the data size of the training set to the data size of the validation set was 4:1, i.e., 80% of the data was used for training and 20% of the data was used for cross validation.
S1032, substituting the second characteristics into a machine learning algorithm adopting the first parameters to obtain first age gender information, calculating a loss function value according to the first age gender information, and if the loss function value is smaller than a first preset value, determining that the third characteristic set comprises the first characteristics.
Illustratively, the second feature is the same feature in the verification set as the first feature and the third feature, and the verification set is a set of features in the first feature set except the training set.
Illustratively, the formula for the loss function is:
Figure BDA0001971719250000071
where Loss represents the value of the Loss function, N represents the number of users, i represents the ith user, j represents the age number in Table 1, y i,j Indicating whether the user i belongs to the category j, and the value is 0 or 1, for example, if the user i belongs to the category j, then y i,j Is taken to be 1, p i,j The probability that the user i predicted according to the machine learning algorithm belongs to the category j is represented, and the value is between 0 and 1, namely p i,j The age and gender information of the user is obtained by substituting the second characteristic into the LightGBM algorithm.
TABLE 1 age-gender correspondence table of users
Figure BDA0001971719250000072
Figure BDA0001971719250000081
And S1033, repeatedly executing the steps S1031 to S1032 until the features of the training set are all substituted into the machine learning algorithm, and obtaining a third feature set and parameters of the machine learning algorithm.
And S104, substituting the fourth feature set into a machine learning algorithm adopting the parameters determined in the step S103 to obtain the age and gender information of the target user.
Illustratively, the fourth feature set is a set of features in the second feature set that are the same as the third feature set.
According to the method for predicting the age and gender of the user, a first set and a second set are obtained, wherein the first set comprises first terminal information, first application program APP information and corresponding age and gender information of the user, and the second set comprises second terminal information and second APP information of the user; and carrying out feature extraction on the first set to obtain a first feature set, carrying out feature extraction on the second set to obtain a second feature set, training a machine learning algorithm according to the first feature set and corresponding user age and gender information to determine parameters of the machine learning algorithm and obtain a third feature set, and substituting the fourth feature set into the machine learning algorithm adopting the parameters to obtain the age and gender information of the target user. Compare and predict terminal user's age through installation list information among the prior art, terminal information and APP information have been introduced to this application, and the APP information includes APP use information to can use the different users ' age sex of same APP according to APP use information differentiation, improved user age sex prediction's accuracy.
Examples 2,
An embodiment of the present application provides a device for predicting age and gender of a user, which is applied to the method for predicting age and gender of a user, as shown in fig. 4, the device 40 includes: an acquisition unit 41, an extraction unit 42, a training unit 43, and a prediction unit 44.
An obtaining unit 41, configured to obtain a first set and a second set, where the first set includes first terminal information, first application APP information, and corresponding user age and gender information of multiple users, the second set includes second terminal information and second APP information of the multiple users, and the first APP information and the second APP information include APP usage information, and the APP usage information is used to indicate a time when the user uses a corresponding APP.
An extracting unit 42, configured to perform feature extraction on the first set acquired by the acquiring unit 41 to obtain a first feature set, and perform feature extraction on the second set acquired by the acquiring unit 41 to obtain a second feature set, where the first feature set includes features of the first terminal information and features of the first APP information, and the second feature set includes features of the second terminal information and features of the second APP information.
The training unit 43 is configured to train the machine learning algorithm according to the first feature set extracted by the extracting unit 42 and the corresponding user age and gender information acquired by the acquiring unit 41, so as to determine parameters of the machine learning algorithm and obtain a third feature set, where the third feature set is a set of features in the first feature set, where a corresponding loss function value is smaller than a first preset value.
And the predicting unit 44 is configured to substitute a fourth feature set into the machine learning algorithm that uses the parameters obtained by the training unit 43 to obtain the age and gender information of the target user, where the fourth feature set is a set of features in the second feature set that are the same as the third feature set.
The extraction unit 42 is specifically configured to:
and obtaining the weight value of the APP vocabulary according to the APP installation list information.
And determining the APP vocabulary with the weight value higher than the second preset value as the APP key vocabulary.
Obtaining the use frequency of each time interval of the APP key words corresponding to the first set and the use amount of each time interval of the APP key words according to the APP use information corresponding to the APP key words in the first set; and determining that the first characteristic set comprises the usage frequency of each time interval of the APP key words and the usage amount of each time interval of the APP key words corresponding to the first characteristic set.
Obtaining the use frequency of each time interval of the APP key words and the use amount of each time interval of the APP key words corresponding to the second set according to the APP use information corresponding to the APP key words in the second set; and determining that the second characteristic set comprises the usage frequency of each time period of the APP key words and the usage amount of each time period of the APP key words corresponding to the second characteristic set.
The third feature set is initially empty, and the training unit 43 is specifically configured to perform the following steps:
and a, substituting the first feature, the third feature set and corresponding user age and gender information into a machine learning algorithm for training, and adjusting parameters of the machine learning algorithm to obtain a first parameter of the machine learning algorithm, wherein the first feature is one feature in a training set, and the training set is a set of a preset number of features in the first feature set.
B, substituting the second characteristics into a machine learning algorithm adopting the first parameters to obtain first age gender information, calculating a loss function value according to the first age gender information, and if the loss function value is smaller than a first preset value, determining that a third characteristic set comprises the first characteristics, wherein the second characteristics are the same characteristics as the first characteristics and the third characteristic set in a verification set, and the verification set is a set of characteristics in the first characteristic set except for the training set;
and (c) repeatedly executing the steps a to b until the features of the training set are all substituted into the machine learning algorithm to obtain a third feature set and parameters of the machine learning algorithm.
Embodiments of the present application provide a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computer, cause the computer to perform a method of predicting age and gender of a user as described in fig. 1-3.
Embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform a method of predicting the age and gender of a user as described in fig. 1-3.
An embodiment of the present application provides a device for predicting a user's age and gender, including: a processor and a memory, the memory for storing a program, the processor calling the program stored in the memory to perform the method for predicting the age and sex of a user as described in fig. 1-3.
Since the device for predicting the age and sex of the user, the computer-readable storage medium, and the computer program product in the embodiments of the present application can be applied to the method for predicting the age and sex of the user, the technical effects obtained by the method can also refer to the embodiments of the method, and the embodiments of the present application are not described herein again.
The above units may be individually configured processors, or may be implemented by being integrated into one of the processors of the controller, or may be stored in a memory of the controller in the form of program codes, and the functions of the above units may be called and executed by one of the processors of the controller. The processor described herein may be a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

Claims (9)

1. A method for predicting the age and sex of a user,
acquiring a first set and a second set, wherein the first set comprises first terminal information, first application program APP information and corresponding user age and gender information of a plurality of users, the second set comprises second terminal information and second APP information of the plurality of users, the first APP information and the second APP information comprise APP usage information, and the APP usage information is used for indicating the time of using the corresponding APP by the user;
performing feature extraction on the first set to obtain a first feature set, and performing feature extraction on the second set to obtain a second feature set, where the first feature set includes features of the first terminal information and features of the first APP information, and the second feature set includes features of the second terminal information and features of the second APP information;
training a machine learning algorithm according to the first feature set and the corresponding user age and gender information to determine parameters of the machine learning algorithm and obtain a third feature set, wherein the third feature set is a set of features of which corresponding loss function values in the first feature set are smaller than a first preset value;
and substituting a fourth feature set into a machine learning algorithm adopting the parameters to obtain age and gender information of the target user, wherein the fourth feature set is a set of features which are the same as the third feature set in the second feature set.
2. The method of claim 1, wherein the extracting features of the first set to obtain a first feature set and extracting features of the second set to obtain a second feature set comprises:
obtaining a weight value of an APP vocabulary according to the APP installation list information;
determining an APP vocabulary with the weight value higher than a second preset value as an APP key vocabulary;
obtaining the usage frequency of each time period of the APP key words corresponding to the first set and the usage amount of each time period of the APP key words according to the APP usage information corresponding to the APP key words in the first set; determining that the first feature set comprises the usage frequency of each time interval of the APP key words corresponding to the first feature set and the usage amount of each time interval of the APP key words;
obtaining the use frequency of each time period of the APP key words corresponding to the second set and the use amount of each time period of the APP key words according to the APP use information corresponding to the APP key words in the second set; and determining that the second feature set comprises the usage frequency of each time period of the APP key words and the usage amount of each time period of the APP key words corresponding to the second feature set.
3. The method for predicting age and gender of a user as claimed in claim 1, wherein the third feature set is initially empty, and the training of the machine learning algorithm according to the first feature set and the corresponding age and gender information of the user to determine the parameters of the machine learning algorithm and obtain the third feature set comprises the following steps:
step a, substituting a first feature, a third feature set and the corresponding user age and gender information into the machine learning algorithm for training, and adjusting parameters of the machine learning algorithm to obtain a first parameter of the machine learning algorithm, wherein the first feature is one feature in a training set, and the training set is a set of a preset number of features in the first feature set;
b, substituting a second feature into a machine learning algorithm adopting a first parameter to obtain first age and gender information, calculating a loss function value according to the first age and gender information, and if the loss function value is smaller than a first preset value, determining that a third feature set comprises the first feature, wherein the second feature is the same feature as the first feature and the third feature set in a verification set, and the verification set is a set of features in the first feature set except the training set;
and repeating the steps a to b until the features of the training set are all substituted into the machine learning algorithm to obtain the third feature set and the parameters of the machine learning algorithm.
4. An apparatus for predicting the age and sex of a user,
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first set and a second set, the first set comprises first terminal information, first application program APP information and corresponding user age and gender information of a plurality of users, the second set comprises second terminal information and second APP information of the plurality of users, the first APP information and the second APP information comprise APP usage information, and the APP usage information is used for indicating the time when the corresponding APP is used by the users;
an extracting unit, configured to perform feature extraction on the first set acquired by the acquiring unit to obtain a first feature set, and perform feature extraction on the second set acquired by the acquiring unit to obtain a second feature set, where the first feature set includes a feature of the first terminal information and a feature of the first APP information, and the second feature set includes a feature of the second terminal information and a feature of the second APP information;
the training unit is used for training a machine learning algorithm according to the first feature set extracted by the extraction unit and the corresponding user age and gender information acquired by the acquisition unit so as to determine parameters of the machine learning algorithm and obtain a third feature set, wherein the third feature set is a set of features of which the corresponding loss function values in the first feature set are smaller than a first preset value;
and the predicting unit is used for substituting a fourth feature set into a machine learning algorithm adopting the parameters obtained by the training unit to obtain the age and gender information of the target user, wherein the fourth feature set is a set of features which are the same as the third feature set in the second feature set.
5. The apparatus according to claim 4, wherein the extracting unit is specifically configured to:
obtaining a weight value of an APP vocabulary according to the APP installation list information;
determining an APP vocabulary with the weight value higher than a second preset value as an APP key vocabulary;
obtaining the use frequency of each time period of the APP key words corresponding to the first set and the use amount of each time period of the APP key words according to the APP use information corresponding to the APP key words in the first set; determining that the first feature set comprises the usage frequency of each time period of the APP key words corresponding to the first feature set and the usage amount of each time period of the APP key words;
obtaining the usage frequency of each time period of the APP key words corresponding to the second set and the usage amount of each time period of the APP key words according to the APP usage information corresponding to the APP key words in the second set; and determining that the second feature set comprises the usage frequency of each time period of the APP key words and the usage amount of each time period of the APP key words corresponding to the second feature set.
6. The apparatus as claimed in claim 4, wherein the third feature set is initially empty, and the training unit is specifically configured to perform the following steps:
step a, substituting a first feature, a third feature set and the corresponding user age and gender information into the machine learning algorithm for training, and adjusting parameters of the machine learning algorithm to obtain a first parameter of the machine learning algorithm, wherein the first feature is one feature in a training set, and the training set is a set of a preset number of features in the first feature set;
b, substituting a second feature into a machine learning algorithm adopting a first parameter to obtain first age and gender information, calculating a loss function value according to the first age and gender information, and if the loss function value is smaller than a first preset value, determining that a third feature set comprises the first feature, wherein the second feature is the same feature as the first feature and the third feature set in a verification set, and the verification set is a set of features in the first feature set except the training set;
and repeating the steps a to b until the features of the training set are all substituted into the machine learning algorithm to obtain the third feature set and the parameters of the machine learning algorithm.
7. A computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computer, cause the computer to perform the method of predicting the age and sex of a user according to any one of claims 1 to 3.
8. A computer program product comprising instructions which, when run on a computer, cause the computer to carry out a method of prediction of the age and gender of a user as claimed in any one of claims 1 to 3.
9. An apparatus for predicting age and gender of a user, comprising: a processor and a memory for storing a program, the processor calling the program stored in the memory to perform the method for predicting the age and sex of a user according to any one of claims 1 to 3.
CN201910120476.1A 2019-02-18 2019-02-18 Method and device for predicting age and gender of user Active CN109885834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910120476.1A CN109885834B (en) 2019-02-18 2019-02-18 Method and device for predicting age and gender of user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910120476.1A CN109885834B (en) 2019-02-18 2019-02-18 Method and device for predicting age and gender of user

Publications (2)

Publication Number Publication Date
CN109885834A CN109885834A (en) 2019-06-14
CN109885834B true CN109885834B (en) 2022-09-16

Family

ID=66928365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910120476.1A Active CN109885834B (en) 2019-02-18 2019-02-18 Method and device for predicting age and gender of user

Country Status (1)

Country Link
CN (1) CN109885834B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112488742A (en) * 2019-09-12 2021-03-12 北京三星通信技术研究有限公司 User attribute information prediction method and device, electronic equipment and storage medium
CN111291798B (en) * 2020-01-21 2021-04-20 北京工商大学 User basic attribute prediction method based on ensemble learning
CN113726900A (en) * 2021-09-02 2021-11-30 四川启睿克科技有限公司 System for judging age bracket of user child
CN115689626B (en) * 2022-10-31 2024-03-01 荣耀终端有限公司 User attribute determining method of terminal equipment and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11288584B2 (en) * 2016-06-23 2022-03-29 Tata Consultancy Services Limited Systems and methods for predicting gender and age of users based on social media data
CN108256537A (en) * 2016-12-28 2018-07-06 北京酷我科技有限公司 A kind of user gender prediction method and system

Also Published As

Publication number Publication date
CN109885834A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
CN109885834B (en) Method and device for predicting age and gender of user
CN106651057B (en) Mobile terminal user age prediction method based on installation package sequence list
CN109634698B (en) Menu display method and device, computer equipment and storage medium
CN110362601B (en) Metadata standard mapping method, device, equipment and storage medium
CN104081392A (en) Influence scores for social media profiles
CN108921587B (en) Data processing method and device and server
CN111078742B (en) User classification model training method, user classification method and device
CN107316156B (en) Data processing method, device, server and storage medium
CN111090807A (en) Knowledge graph-based user identification method and device
CN111325614A (en) Recommendation method and device of electronic object and electronic equipment
CN111652471A (en) List distribution control method and device, electronic equipment and storage medium
CN110781410A (en) Community detection method and device
CN110647537A (en) Data searching method, device and storage medium
CN112650940A (en) Recommendation method and device of application program, computer equipment and storage medium
CN116089616A (en) Theme text acquisition method, device, equipment and storage medium
CN111784069B (en) User preference prediction method, device, equipment and storage medium
CN115375453A (en) System resource allocation method and device
CN113626340A (en) Test requirement identification method and device, electronic equipment and storage medium
CN107368597B (en) Information output method and device
CN112131468A (en) Data processing method and device in recommendation system
CN111274474A (en) Object recommendation method, electronic device and computer-readable storage medium
CN111339432A (en) Recommendation method and device of electronic object and electronic equipment
CN111382244B (en) Deep retrieval matching classification method and device and terminal equipment
CN108984556B (en) Method, apparatus and computer-readable storage medium for data processing
CN115563276A (en) Data analysis method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant