CN109885834B - Method and device for predicting age and gender of user - Google Patents
Method and device for predicting age and gender of user Download PDFInfo
- Publication number
- CN109885834B CN109885834B CN201910120476.1A CN201910120476A CN109885834B CN 109885834 B CN109885834 B CN 109885834B CN 201910120476 A CN201910120476 A CN 201910120476A CN 109885834 B CN109885834 B CN 109885834B
- Authority
- CN
- China
- Prior art keywords
- app
- feature
- information
- feature set
- age
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a method for predicting the age and sex of a user, relates to the field of computers, and is used for accurately predicting the age of a terminal user. The method comprises the following steps: acquiring a first set and a second set, wherein the first set comprises first terminal information, first application program APP information and corresponding user age and gender information of a plurality of users, and the second set comprises second terminal information and second APP information of the plurality of users; and carrying out feature extraction on the first set to obtain a first feature set, carrying out feature extraction on the second set to obtain a second feature set, training a machine learning algorithm according to the first feature set and corresponding user age and gender information to determine parameters of the machine learning algorithm and obtain a third feature set, and substituting the fourth feature set into the machine learning algorithm adopting the parameters to obtain the age and gender information of the target user. The embodiment of the application is applied to the prediction of the age and the gender of the end user.
Description
Technical Field
The present application relates to the field of computers, and in particular, to a method and an apparatus for predicting age and gender of a user.
Background
In the current society, people use mobile phones to surf the internet, shop, socialize, work and the like every day, the mobile phones almost bear all behaviors and preferences of one user, and an operator can help an Application (APP) enterprise to know behavior characteristics of a terminal user by predicting the age of the user of a mobile phone terminal, so that the APP is better developed; the system can also help operators, e-commerce companies and the like to develop more accurate internet advertisement putting services, so that the advertisement putting cost is effectively saved.
The prior art predicts the age of an end user through the installation list information of the end user. Although the ages of users can be predicted through the installation list information, the installation list information is static information and cannot capture some service behavior characteristics of the users using the APP, for example, if two users both install certain game software, but the use behaviors of the two users on the game software are completely different, the two users are likely to be people in different age groups, so that the attraction degree and the use habits of the game software are completely different, and the ages predicted through the installation list information are exactly the same, so that the existing prediction method has the problem of inaccurate prediction.
Disclosure of Invention
The embodiment of the application provides a method and a device for predicting the age and sex of a user, which are used for solving the problem of inaccurate prediction in the conventional prediction method.
In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:
in a first aspect, an embodiment of the present application provides a method for predicting a user's age and gender, where the method includes:
acquiring a first set and a second set, wherein the first set comprises first terminal information, first application program APP information and corresponding user age and gender information of a plurality of users, the second set comprises second terminal information and second APP information of the plurality of users, the first APP information and the second APP information comprise APP usage information, and the APP usage information is used for indicating the time of using the corresponding APP by the user;
performing feature extraction on the first set to obtain a first feature set, and performing feature extraction on the second set to obtain a second feature set, where the first feature set includes features of the first terminal information and features of the first APP information, and the second feature set includes features of the second terminal information and features of the second APP information;
training a machine learning algorithm according to the first feature set and the corresponding user age and gender information to determine parameters of the machine learning algorithm and obtain a third feature set, wherein the third feature set is a set of features of which corresponding loss function values in the first feature set are smaller than a first preset value;
and substituting a fourth feature set into a machine learning algorithm adopting the parameters to obtain age and gender information of the target user, wherein the fourth feature set is a set of features which are the same as the third feature set in the second feature set.
In a second aspect, an embodiment of the present application provides an apparatus for predicting a user's age and gender, including:
an obtaining unit, configured to obtain a first set and a second set, where the first set includes first terminal information, first application APP information, and corresponding user age and gender information of multiple users, the second set includes second terminal information and second APP information of the multiple users, the first APP information and the second APP information include APP usage information, and the APP usage information is used to indicate a time when the user uses a corresponding APP;
an extracting unit, configured to perform feature extraction on the first set acquired by the acquiring unit to obtain a first feature set, and perform feature extraction on the second set acquired by the acquiring unit to obtain a second feature set, where the first feature set includes a feature of the first terminal information and a feature of the first APP information, and the second feature set includes a feature of the second terminal information and a feature of the second APP information;
the training unit is used for training a machine learning algorithm according to the first feature set extracted by the extraction unit and the corresponding user age and gender information acquired by the acquisition unit to determine parameters of the machine learning algorithm and acquire a third feature set, wherein the third feature set is a set of features of which corresponding loss function values in the first feature set are smaller than a first preset value;
and the predicting unit is used for substituting a fourth feature set into a machine learning algorithm adopting the parameters obtained by the training unit to obtain the age and gender information of the target user, wherein the fourth feature set is a set of features which are the same as the third feature set in the second feature set.
In a third aspect, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computer, cause the computer to perform the method for predicting the age and sex of a user according to the first aspect.
In a fourth aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of predicting age gender of a user according to the first aspect.
In a fifth aspect, an apparatus for predicting a user age is provided, including: a processor and a memory, wherein the memory is used for storing programs, and the processor calls the programs stored in the memory to execute the method for predicting the age and sex of the user according to the first aspect.
According to the method and the device for predicting the age and the sex of the users, a first set and a second set are obtained, wherein the first set comprises first terminal information, first application program APP information and corresponding age and sex information of the users, and the second set comprises second terminal information and second APP information of the users; and carrying out feature extraction on the first set to obtain a first feature set, carrying out feature extraction on the second set to obtain a second feature set, training a machine learning algorithm according to the first feature set and corresponding user age and gender information to determine parameters of the machine learning algorithm and obtain a third feature set, and substituting the fourth feature set into the machine learning algorithm adopting the parameters to obtain the age and gender information of the target user. Compare and predict terminal user's age through installation list information among the prior art, terminal information and APP information have been introduced to this application, and the APP information includes APP use information to can use the different users ' age sex of same APP according to APP use information differentiation, improved user age sex prediction's accuracy.
Drawings
Fig. 1 is a first flowchart illustrating a method for predicting the age and gender of a user according to an embodiment of the present disclosure;
fig. 2 is a flowchart illustrating a second method for predicting the age and gender of a user according to an embodiment of the present disclosure;
fig. 3 is a third flowchart illustrating a method for predicting the age and gender of a user according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an apparatus for predicting the age and sex of a user according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Carry out accurate age prediction to end user, can help APP enterprise to know own user's behavioral characteristics to better develop current APP product, can also help operator, E-commerce company etc. to develop more accurate internet advertisement service of putting, thereby effectual saving advertisement expense of putting.
The main thinking of this application is through introducing terminal information and APP information, and the APP information includes APP use information to can use the different users 'of same APP age according to APP use information differentiation, improved user age gender prediction's accuracy.
Examples 1,
The embodiment of the application provides a method for predicting the age and sex of a user, and as shown in fig. 1, the method for predicting the age and sex of the user comprises the following steps:
s101, acquiring a first set and a second set.
Illustratively, the first set includes first terminal information, first application program APP information and corresponding user age and gender information of a plurality of users, the second set includes second terminal information and second APP information of the plurality of users, the first APP information and the second APP information include APP usage information, and the APP usage information is used for indicating a time when the user uses the corresponding APP.
Illustratively, the number of users included in the first set is twice the number of users included in the second set, e.g., assuming 75000 users, the first set may include 50000 users 'information and the second set includes 25000 users' information.
Illustratively, the terminal information of the user includes an identification number (ID) of the user, a terminal brand, a terminal model, and a terminal price. The APP information includes APP usage information indicating a time when the user uses the corresponding APP, e.g., a start time and an end time when the user uses the corresponding APP.
Illustratively, the APP information further includes APP installation list information, a user ID, APP names, a first class of APP, and a second class of APP, the APP installation list information includes names of all APPs installed by the user, the first class of APP may be a financial class, the second class of APP may be investment management, and the first class of APP and the second class of APP may be empty, but do not affect the overall prediction method.
And S102, performing feature extraction on the first set to obtain a first feature set, and performing feature extraction on the second set to obtain a second feature set.
Illustratively, the first feature set includes features of the first terminal information and features of the first APP information, and the second feature set includes features of the second terminal information and features of the second APP information.
Illustratively, the characteristics of the terminal information include the terminal brand, the terminal model and the terminal price of the user, the characteristics of the APP information include the APP installation number, the frequency of use of the APP key vocabulary in each period of the day and the usage of the APP key vocabulary in each period of the day, the APP installation number can be obtained according to the APP installation list information, and the APP key vocabulary is obtained according to the APP installation list information.
Specifically, as shown in fig. 2, the step S102 includes steps S1021 to S1024:
and S1021, obtaining the weight value of the APP vocabulary according to the APP installation list information.
For example, the APP vocabulary may be the name of APP, and the weight value of the APP vocabulary may be obtained according to APP installation list information and a term-inverse document frequency (TF-IDF) algorithm. The main idea of TFIDF is that if a word or phrase occurs frequently in one article and rarely in other articles, the word or phrase is considered to have good class discrimination ability and is suitable for classification.
The formula of the TF-IDF algorithm is TF-ITF i,j =TF i,j ×IDF i Wherein, TF-ITF i,j Weighted value of word i, TF i,j Indicating the frequency of occurrence of word i in article j, IDF i Representing the inverse document frequency of the word i.
TF i,j The calculation formula of (2) is as follows:wherein n is i,j Represents the number of times the word i appears in article j, Σ k n k,j Representing the sum of all words appearing in article j.
IDF i The importance of the word i is measured, and can be obtained by dividing the total number of documents by the number of documents containing the word, and taking the obtained quotient to be a logarithm with the base of 10, wherein the calculation formula is as follows:where D represents the total number of files, | { j: t | { i ∈d j Denotes the number of files containing the word i.
And S1022, determining an APP vocabulary with the weight value higher than a second preset value as an APP key vocabulary.
For example, 5 to 10 APP words with the highest weight values may be selected as the APP key words, that is, APP words with weight values higher than the second preset value are selected as the APP key words.
S1023, obtaining the use frequency of each time interval of the APP key words and the use amount of each time interval of the APP key words corresponding to the first set according to the APP use information corresponding to the APP key words in the first set; and determining that the first characteristic set comprises the usage frequency of each time interval of the APP key words and the usage amount of each time interval of the APP key words corresponding to the first characteristic set.
Illustratively, the usage frequency of each period of the APP key words comprises the usage frequency of 0 to 6 points, the usage frequency of 6 to 12 points, the usage frequency of 12 to 18 points and the usage frequency of 18 to 24 points of the APP key words, and the usage amount of each period of the APP key words comprises the usage amount of 0 to 6 points, the usage amount of 6 to 12 points, the usage amount of 12 to 18 points and the usage amount of 18 to 24 points.
S1024, obtaining the use frequency of each time interval of the APP key words and the use amount of each time interval of the APP key words corresponding to the second set according to the APP use information corresponding to the APP key words in the second set; and determining that the second characteristic set comprises the usage frequency of each time interval of the APP key words and the usage amount of each time interval of the APP key words corresponding to the second characteristic set.
S103, training the machine learning algorithm according to the first feature set and the corresponding age and gender information of the user to determine parameters of the machine learning algorithm and obtain a third feature set.
Illustratively, the third feature set is a set of features in the first feature set, for which the corresponding loss function values are smaller than the first preset value.
Illustratively, the machine learning algorithm may be a gradient boosting algorithm (lightGBM), and the lightGBM is a learning algorithm based on a decision tree algorithm, and has the advantages of faster training efficiency, low memory usage, higher accuracy, support of parallelization learning, and capability of processing large-scale data.
Illustratively, the third feature set is initially empty.
Specifically, as shown in fig. 3, the step S103 includes steps S1031 to S1033:
and S1031, substituting the first feature, the third feature set and corresponding user age and gender information into a machine learning algorithm for training, and adjusting parameters of the machine learning algorithm to obtain first parameters of the machine learning algorithm.
Illustratively, the first feature is one feature in a training set, and the training set is a set of a preset number of features in the first feature set.
Illustratively, the first feature set is divided into a training set and a verification set, the features in the training set are used for training the machine learning algorithm, and the features in the verification set are used for cross-verifying the trained machine learning algorithm and verifying whether the trained machine learning algorithm meets the requirements. The ratio of the data size of the training set to the data size of the validation set was 4:1, i.e., 80% of the data was used for training and 20% of the data was used for cross validation.
S1032, substituting the second characteristics into a machine learning algorithm adopting the first parameters to obtain first age gender information, calculating a loss function value according to the first age gender information, and if the loss function value is smaller than a first preset value, determining that the third characteristic set comprises the first characteristics.
Illustratively, the second feature is the same feature in the verification set as the first feature and the third feature, and the verification set is a set of features in the first feature set except the training set.
Illustratively, the formula for the loss function is:where Loss represents the value of the Loss function, N represents the number of users, i represents the ith user, j represents the age number in Table 1, y i,j Indicating whether the user i belongs to the category j, and the value is 0 or 1, for example, if the user i belongs to the category j, then y i,j Is taken to be 1, p i,j The probability that the user i predicted according to the machine learning algorithm belongs to the category j is represented, and the value is between 0 and 1, namely p i,j The age and gender information of the user is obtained by substituting the second characteristic into the LightGBM algorithm.
TABLE 1 age-gender correspondence table of users
And S1033, repeatedly executing the steps S1031 to S1032 until the features of the training set are all substituted into the machine learning algorithm, and obtaining a third feature set and parameters of the machine learning algorithm.
And S104, substituting the fourth feature set into a machine learning algorithm adopting the parameters determined in the step S103 to obtain the age and gender information of the target user.
Illustratively, the fourth feature set is a set of features in the second feature set that are the same as the third feature set.
According to the method for predicting the age and gender of the user, a first set and a second set are obtained, wherein the first set comprises first terminal information, first application program APP information and corresponding age and gender information of the user, and the second set comprises second terminal information and second APP information of the user; and carrying out feature extraction on the first set to obtain a first feature set, carrying out feature extraction on the second set to obtain a second feature set, training a machine learning algorithm according to the first feature set and corresponding user age and gender information to determine parameters of the machine learning algorithm and obtain a third feature set, and substituting the fourth feature set into the machine learning algorithm adopting the parameters to obtain the age and gender information of the target user. Compare and predict terminal user's age through installation list information among the prior art, terminal information and APP information have been introduced to this application, and the APP information includes APP use information to can use the different users ' age sex of same APP according to APP use information differentiation, improved user age sex prediction's accuracy.
Examples 2,
An embodiment of the present application provides a device for predicting age and gender of a user, which is applied to the method for predicting age and gender of a user, as shown in fig. 4, the device 40 includes: an acquisition unit 41, an extraction unit 42, a training unit 43, and a prediction unit 44.
An obtaining unit 41, configured to obtain a first set and a second set, where the first set includes first terminal information, first application APP information, and corresponding user age and gender information of multiple users, the second set includes second terminal information and second APP information of the multiple users, and the first APP information and the second APP information include APP usage information, and the APP usage information is used to indicate a time when the user uses a corresponding APP.
An extracting unit 42, configured to perform feature extraction on the first set acquired by the acquiring unit 41 to obtain a first feature set, and perform feature extraction on the second set acquired by the acquiring unit 41 to obtain a second feature set, where the first feature set includes features of the first terminal information and features of the first APP information, and the second feature set includes features of the second terminal information and features of the second APP information.
The training unit 43 is configured to train the machine learning algorithm according to the first feature set extracted by the extracting unit 42 and the corresponding user age and gender information acquired by the acquiring unit 41, so as to determine parameters of the machine learning algorithm and obtain a third feature set, where the third feature set is a set of features in the first feature set, where a corresponding loss function value is smaller than a first preset value.
And the predicting unit 44 is configured to substitute a fourth feature set into the machine learning algorithm that uses the parameters obtained by the training unit 43 to obtain the age and gender information of the target user, where the fourth feature set is a set of features in the second feature set that are the same as the third feature set.
The extraction unit 42 is specifically configured to:
and obtaining the weight value of the APP vocabulary according to the APP installation list information.
And determining the APP vocabulary with the weight value higher than the second preset value as the APP key vocabulary.
Obtaining the use frequency of each time interval of the APP key words corresponding to the first set and the use amount of each time interval of the APP key words according to the APP use information corresponding to the APP key words in the first set; and determining that the first characteristic set comprises the usage frequency of each time interval of the APP key words and the usage amount of each time interval of the APP key words corresponding to the first characteristic set.
Obtaining the use frequency of each time interval of the APP key words and the use amount of each time interval of the APP key words corresponding to the second set according to the APP use information corresponding to the APP key words in the second set; and determining that the second characteristic set comprises the usage frequency of each time period of the APP key words and the usage amount of each time period of the APP key words corresponding to the second characteristic set.
The third feature set is initially empty, and the training unit 43 is specifically configured to perform the following steps:
and a, substituting the first feature, the third feature set and corresponding user age and gender information into a machine learning algorithm for training, and adjusting parameters of the machine learning algorithm to obtain a first parameter of the machine learning algorithm, wherein the first feature is one feature in a training set, and the training set is a set of a preset number of features in the first feature set.
B, substituting the second characteristics into a machine learning algorithm adopting the first parameters to obtain first age gender information, calculating a loss function value according to the first age gender information, and if the loss function value is smaller than a first preset value, determining that a third characteristic set comprises the first characteristics, wherein the second characteristics are the same characteristics as the first characteristics and the third characteristic set in a verification set, and the verification set is a set of characteristics in the first characteristic set except for the training set;
and (c) repeatedly executing the steps a to b until the features of the training set are all substituted into the machine learning algorithm to obtain a third feature set and parameters of the machine learning algorithm.
Embodiments of the present application provide a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computer, cause the computer to perform a method of predicting age and gender of a user as described in fig. 1-3.
Embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform a method of predicting the age and gender of a user as described in fig. 1-3.
An embodiment of the present application provides a device for predicting a user's age and gender, including: a processor and a memory, the memory for storing a program, the processor calling the program stored in the memory to perform the method for predicting the age and sex of a user as described in fig. 1-3.
Since the device for predicting the age and sex of the user, the computer-readable storage medium, and the computer program product in the embodiments of the present application can be applied to the method for predicting the age and sex of the user, the technical effects obtained by the method can also refer to the embodiments of the method, and the embodiments of the present application are not described herein again.
The above units may be individually configured processors, or may be implemented by being integrated into one of the processors of the controller, or may be stored in a memory of the controller in the form of program codes, and the functions of the above units may be called and executed by one of the processors of the controller. The processor described herein may be a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
Claims (9)
1. A method for predicting the age and sex of a user,
acquiring a first set and a second set, wherein the first set comprises first terminal information, first application program APP information and corresponding user age and gender information of a plurality of users, the second set comprises second terminal information and second APP information of the plurality of users, the first APP information and the second APP information comprise APP usage information, and the APP usage information is used for indicating the time of using the corresponding APP by the user;
performing feature extraction on the first set to obtain a first feature set, and performing feature extraction on the second set to obtain a second feature set, where the first feature set includes features of the first terminal information and features of the first APP information, and the second feature set includes features of the second terminal information and features of the second APP information;
training a machine learning algorithm according to the first feature set and the corresponding user age and gender information to determine parameters of the machine learning algorithm and obtain a third feature set, wherein the third feature set is a set of features of which corresponding loss function values in the first feature set are smaller than a first preset value;
and substituting a fourth feature set into a machine learning algorithm adopting the parameters to obtain age and gender information of the target user, wherein the fourth feature set is a set of features which are the same as the third feature set in the second feature set.
2. The method of claim 1, wherein the extracting features of the first set to obtain a first feature set and extracting features of the second set to obtain a second feature set comprises:
obtaining a weight value of an APP vocabulary according to the APP installation list information;
determining an APP vocabulary with the weight value higher than a second preset value as an APP key vocabulary;
obtaining the usage frequency of each time period of the APP key words corresponding to the first set and the usage amount of each time period of the APP key words according to the APP usage information corresponding to the APP key words in the first set; determining that the first feature set comprises the usage frequency of each time interval of the APP key words corresponding to the first feature set and the usage amount of each time interval of the APP key words;
obtaining the use frequency of each time period of the APP key words corresponding to the second set and the use amount of each time period of the APP key words according to the APP use information corresponding to the APP key words in the second set; and determining that the second feature set comprises the usage frequency of each time period of the APP key words and the usage amount of each time period of the APP key words corresponding to the second feature set.
3. The method for predicting age and gender of a user as claimed in claim 1, wherein the third feature set is initially empty, and the training of the machine learning algorithm according to the first feature set and the corresponding age and gender information of the user to determine the parameters of the machine learning algorithm and obtain the third feature set comprises the following steps:
step a, substituting a first feature, a third feature set and the corresponding user age and gender information into the machine learning algorithm for training, and adjusting parameters of the machine learning algorithm to obtain a first parameter of the machine learning algorithm, wherein the first feature is one feature in a training set, and the training set is a set of a preset number of features in the first feature set;
b, substituting a second feature into a machine learning algorithm adopting a first parameter to obtain first age and gender information, calculating a loss function value according to the first age and gender information, and if the loss function value is smaller than a first preset value, determining that a third feature set comprises the first feature, wherein the second feature is the same feature as the first feature and the third feature set in a verification set, and the verification set is a set of features in the first feature set except the training set;
and repeating the steps a to b until the features of the training set are all substituted into the machine learning algorithm to obtain the third feature set and the parameters of the machine learning algorithm.
4. An apparatus for predicting the age and sex of a user,
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first set and a second set, the first set comprises first terminal information, first application program APP information and corresponding user age and gender information of a plurality of users, the second set comprises second terminal information and second APP information of the plurality of users, the first APP information and the second APP information comprise APP usage information, and the APP usage information is used for indicating the time when the corresponding APP is used by the users;
an extracting unit, configured to perform feature extraction on the first set acquired by the acquiring unit to obtain a first feature set, and perform feature extraction on the second set acquired by the acquiring unit to obtain a second feature set, where the first feature set includes a feature of the first terminal information and a feature of the first APP information, and the second feature set includes a feature of the second terminal information and a feature of the second APP information;
the training unit is used for training a machine learning algorithm according to the first feature set extracted by the extraction unit and the corresponding user age and gender information acquired by the acquisition unit so as to determine parameters of the machine learning algorithm and obtain a third feature set, wherein the third feature set is a set of features of which the corresponding loss function values in the first feature set are smaller than a first preset value;
and the predicting unit is used for substituting a fourth feature set into a machine learning algorithm adopting the parameters obtained by the training unit to obtain the age and gender information of the target user, wherein the fourth feature set is a set of features which are the same as the third feature set in the second feature set.
5. The apparatus according to claim 4, wherein the extracting unit is specifically configured to:
obtaining a weight value of an APP vocabulary according to the APP installation list information;
determining an APP vocabulary with the weight value higher than a second preset value as an APP key vocabulary;
obtaining the use frequency of each time period of the APP key words corresponding to the first set and the use amount of each time period of the APP key words according to the APP use information corresponding to the APP key words in the first set; determining that the first feature set comprises the usage frequency of each time period of the APP key words corresponding to the first feature set and the usage amount of each time period of the APP key words;
obtaining the usage frequency of each time period of the APP key words corresponding to the second set and the usage amount of each time period of the APP key words according to the APP usage information corresponding to the APP key words in the second set; and determining that the second feature set comprises the usage frequency of each time period of the APP key words and the usage amount of each time period of the APP key words corresponding to the second feature set.
6. The apparatus as claimed in claim 4, wherein the third feature set is initially empty, and the training unit is specifically configured to perform the following steps:
step a, substituting a first feature, a third feature set and the corresponding user age and gender information into the machine learning algorithm for training, and adjusting parameters of the machine learning algorithm to obtain a first parameter of the machine learning algorithm, wherein the first feature is one feature in a training set, and the training set is a set of a preset number of features in the first feature set;
b, substituting a second feature into a machine learning algorithm adopting a first parameter to obtain first age and gender information, calculating a loss function value according to the first age and gender information, and if the loss function value is smaller than a first preset value, determining that a third feature set comprises the first feature, wherein the second feature is the same feature as the first feature and the third feature set in a verification set, and the verification set is a set of features in the first feature set except the training set;
and repeating the steps a to b until the features of the training set are all substituted into the machine learning algorithm to obtain the third feature set and the parameters of the machine learning algorithm.
7. A computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computer, cause the computer to perform the method of predicting the age and sex of a user according to any one of claims 1 to 3.
8. A computer program product comprising instructions which, when run on a computer, cause the computer to carry out a method of prediction of the age and gender of a user as claimed in any one of claims 1 to 3.
9. An apparatus for predicting age and gender of a user, comprising: a processor and a memory for storing a program, the processor calling the program stored in the memory to perform the method for predicting the age and sex of a user according to any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910120476.1A CN109885834B (en) | 2019-02-18 | 2019-02-18 | Method and device for predicting age and gender of user |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910120476.1A CN109885834B (en) | 2019-02-18 | 2019-02-18 | Method and device for predicting age and gender of user |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109885834A CN109885834A (en) | 2019-06-14 |
CN109885834B true CN109885834B (en) | 2022-09-16 |
Family
ID=66928365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910120476.1A Active CN109885834B (en) | 2019-02-18 | 2019-02-18 | Method and device for predicting age and gender of user |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109885834B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112488742A (en) * | 2019-09-12 | 2021-03-12 | 北京三星通信技术研究有限公司 | User attribute information prediction method and device, electronic equipment and storage medium |
CN111291798B (en) * | 2020-01-21 | 2021-04-20 | 北京工商大学 | User basic attribute prediction method based on ensemble learning |
CN113726900A (en) * | 2021-09-02 | 2021-11-30 | 四川启睿克科技有限公司 | System for judging age bracket of user child |
CN115689626B (en) * | 2022-10-31 | 2024-03-01 | 荣耀终端有限公司 | User attribute determining method of terminal equipment and electronic equipment |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11288584B2 (en) * | 2016-06-23 | 2022-03-29 | Tata Consultancy Services Limited | Systems and methods for predicting gender and age of users based on social media data |
CN108256537A (en) * | 2016-12-28 | 2018-07-06 | 北京酷我科技有限公司 | A kind of user gender prediction method and system |
-
2019
- 2019-02-18 CN CN201910120476.1A patent/CN109885834B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN109885834A (en) | 2019-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109885834B (en) | Method and device for predicting age and gender of user | |
CN106651057B (en) | Mobile terminal user age prediction method based on installation package sequence list | |
CN109634698B (en) | Menu display method and device, computer equipment and storage medium | |
CN110362601B (en) | Metadata standard mapping method, device, equipment and storage medium | |
CN104081392A (en) | Influence scores for social media profiles | |
CN108921587B (en) | Data processing method and device and server | |
CN111078742B (en) | User classification model training method, user classification method and device | |
CN107316156B (en) | Data processing method, device, server and storage medium | |
CN111090807A (en) | Knowledge graph-based user identification method and device | |
CN111325614A (en) | Recommendation method and device of electronic object and electronic equipment | |
CN111652471A (en) | List distribution control method and device, electronic equipment and storage medium | |
CN110781410A (en) | Community detection method and device | |
CN110647537A (en) | Data searching method, device and storage medium | |
CN112650940A (en) | Recommendation method and device of application program, computer equipment and storage medium | |
CN116089616A (en) | Theme text acquisition method, device, equipment and storage medium | |
CN111784069B (en) | User preference prediction method, device, equipment and storage medium | |
CN115375453A (en) | System resource allocation method and device | |
CN113626340A (en) | Test requirement identification method and device, electronic equipment and storage medium | |
CN107368597B (en) | Information output method and device | |
CN112131468A (en) | Data processing method and device in recommendation system | |
CN111274474A (en) | Object recommendation method, electronic device and computer-readable storage medium | |
CN111339432A (en) | Recommendation method and device of electronic object and electronic equipment | |
CN111382244B (en) | Deep retrieval matching classification method and device and terminal equipment | |
CN108984556B (en) | Method, apparatus and computer-readable storage medium for data processing | |
CN115563276A (en) | Data analysis method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |