CN111209925A - Gender prediction method, device and computer-readable storage medium - Google Patents

Gender prediction method, device and computer-readable storage medium Download PDF

Info

Publication number
CN111209925A
CN111209925A CN201811388916.3A CN201811388916A CN111209925A CN 111209925 A CN111209925 A CN 111209925A CN 201811388916 A CN201811388916 A CN 201811388916A CN 111209925 A CN111209925 A CN 111209925A
Authority
CN
China
Prior art keywords
tested
user
data
gender
gender prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811388916.3A
Other languages
Chinese (zh)
Inventor
王帅强
成艺
胡恒魁
赵佳枢
丁卓冶
殷大伟
赵一鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201811388916.3A priority Critical patent/CN111209925A/en
Publication of CN111209925A publication Critical patent/CN111209925A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention discloses a method and a device for predicting the performance and a computer readable storage medium, and relates to the field of data processing. The gender prediction method comprises the following steps: generating data to be tested according to historical data corresponding to the equipment to be tested; and inputting the data to be tested into a pre-trained gender prediction model to obtain a gender prediction result of the user corresponding to the equipment to be tested, wherein the gender prediction model is trained according to historical data of equipment-level users, and user accounts corresponding to the equipment-level users are used in the same equipment. The embodiment of the invention can carry out model training based on the historical data of the equipment-level user, thereby enabling the gender marked by the training data to be more accurate and improving the accuracy of the gender prediction model. Meanwhile, during prediction, the gender of the user can be accurately predicted according to the historical data corresponding to the device to be tested. Thus, the accuracy of gender prediction is improved.

Description

Gender prediction method, device and computer-readable storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a method and an apparatus for predicting gender, and a computer-readable storage medium.
Background
In the relevant research of user gender prediction, the judgment method based on the rule is simple and quick, but the accuracy is limited. When a user registers an account, the user usually cannot fill in complete personal information, the coverage rate is usually low when the registered gender is directly used as the real gender of the user, and the real degree is also uncertain greatly. The sex data is obtained through the identification card number analysis, the reliability is high, but the coverage rate is usually lower when the identification card number is used as sensitive data.
At present, the research on gender prediction of users mostly focuses on the field of social media, and gender prediction is carried out according to the speaking and interactive contents of the users. In the E-commerce field, only a few users can leave messages and evaluate commodities. Therefore, the user gender prediction method in the social platform cannot be directly popularized to e-commerce application.
Disclosure of Invention
After the inventor analyzes, most user gender images in the e-commerce platform refer to the gender of a certain registered account. However, in an actual application scenario, there may be a case where one registered account logs in to multiple different devices at the same time, that is, multiple people use one registered account together. For example, in a family, both couples log in the same account on their mobile phones to browse and purchase goods. Data research has found that more than 1/3 registered accounts can be logged on multiple devices. Therefore, the accuracy of gender prediction is currently low.
The embodiment of the invention aims to solve the technical problem that: how to improve the accuracy of gender prediction.
According to a first aspect of some embodiments of the present invention, there is provided a gender prediction method, comprising: generating data to be tested according to historical data corresponding to the equipment to be tested; inputting the data to be tested into a pre-trained gender prediction model to obtain a gender prediction result of a user corresponding to the device to be tested, wherein the gender prediction model is trained according to historical data of the device-level user, and a user account corresponding to the device-level user is used in the same device.
In some embodiments, when the user account number of the device to be tested is the same user account number, the data to be tested is generated according to the history data generated by the user who does not log in the device to be tested and the history data generated by the user account number corresponding to the device to be tested.
In some embodiments, when a user account for logging in to a device to be tested includes multiple user accounts, obtaining to-be-tested data corresponding to the same user account according to historical data generated by the same user account corresponding to the device to be tested; and inputting the data to be tested corresponding to the same user account into a pre-trained gender prediction model, and generating a gender prediction result of the same user account corresponding to the equipment to be tested.
In some embodiments, different types of data to be tested are generated according to a comparison result between preset operation times in historical data corresponding to the equipment to be tested and a preset threshold; and inputting the data to be tested into a pre-trained gender prediction model corresponding to the comparison result to obtain a gender prediction result of the user corresponding to the equipment to be tested.
In some embodiments, when the preset operation times in the history data corresponding to the device to be tested are greater than a preset value, the data to be tested includes a first commodity content characteristic and a user behavior characteristic; the user behavior characteristics include at least one of: information on the operated commodity in each category, information on the operated commodity under each brand, and information on the operated commodity with each type attribute; the first article content feature includes the number of operations on a word segmentation in the operated article title.
In some embodiments, when the preset operation times in the history data corresponding to the device to be tested are not greater than the preset value, the data to be tested includes a second commodity content feature, and the second commodity content feature is determined according to a word vector of a word segmentation of a title of the history operation commodity.
In some embodiments, the gender prediction method further comprises: generating historical data of equipment level users according to the historical data of the user accounts and the equipment information; generating training data according to historical data of the equipment-level user, wherein the marking value of the training data is the gender information of the equipment-level user; and training the model by using the training data and the marking value to obtain a gender prediction model so as to predict the gender of the user of the equipment by using the gender prediction model.
In some embodiments, the same user account is logged into a device corresponding to the device level user.
According to a second aspect of some embodiments of the present invention, there is provided a gender prediction device, comprising: the to-be-tested data generation module is configured to generate to-be-tested data according to historical data corresponding to the to-be-tested equipment; the gender prediction module is configured to input data to be tested into a pre-trained gender prediction model to obtain a gender prediction result of a user corresponding to the device to be tested, wherein the gender prediction model is trained according to historical data of the device-level user, and a user account corresponding to the device-level user is used in the same device.
In some embodiments, the to-be-tested data generation module is further configured to generate the to-be-tested data according to historical data generated by a user who does not log in the to-be-tested device and historical data generated by a user account corresponding to the to-be-tested device, when the user accounts logging in the to-be-tested device are the same user account.
In some embodiments, the to-be-tested data generation module is further configured to, under the condition that the user account logged in the to-be-tested device includes multiple user accounts, obtain to-be-tested data corresponding to the same user account according to historical data generated by the same user account corresponding to the to-be-tested device; the gender prediction module is further configured to input the data to be tested corresponding to the same user account into a pre-trained gender prediction model, and generate a gender prediction result of the same user account corresponding to the device to be tested.
In some embodiments, the to-be-tested data generation module is further configured to generate different types of to-be-tested data according to a comparison result between preset operation times in the history data corresponding to the to-be-tested device and a preset threshold; the gender prediction module is further configured to input the data to be tested into a gender prediction model which is trained in advance and corresponds to the comparison result, and a gender prediction result of a user corresponding to the device to be tested is obtained.
In some embodiments, when the preset operation times in the history data corresponding to the device to be tested are greater than a preset value, the data to be tested includes a first commodity content characteristic and a user behavior characteristic; the user behavior characteristics include at least one of: information on the operated commodity in each category, information on the operated commodity under each brand, and information on the operated commodity with each type attribute; the first article content feature includes the number of operations on a word segmentation in the operated article title.
In some embodiments, when the preset operation times in the history data corresponding to the device to be tested are not greater than the preset value, the data to be tested includes a second commodity content feature, and the second commodity content feature is determined according to a word vector of a word segmentation of a title of the history operation commodity.
In some embodiments, the gender prediction device further comprises: the model training module is configured to generate historical data of the equipment-level user according to the historical data of the user account and the equipment information; generating training data according to historical data of the equipment-level user, wherein the marking value of the training data is the gender information of the equipment-level user; and training the model by using the training data and the marking value to obtain a gender prediction model so as to predict the gender of the user of the equipment by using the gender prediction model.
In some embodiments, the same user account is logged into a device corresponding to the device level user.
According to a third aspect of some embodiments of the present invention, there is provided a gender prediction device, comprising: a memory; and a processor coupled to the memory, the processor configured to perform any of the aforementioned methods of predicting gender based on instructions stored in the memory.
According to a fourth aspect of some embodiments of the present invention, there is provided a computer readable storage medium having a computer program stored thereon, wherein the program when executed by a processor implements any one of the above described methods of gender prediction.
Some embodiments of the above invention have the following advantages or benefits: the embodiment of the invention can carry out model training based on the historical data of the equipment-level user, thereby enabling the gender marked by the training data to be more accurate and improving the accuracy of the gender prediction model. Meanwhile, during prediction, the gender of the user can be accurately predicted according to the historical data corresponding to the device to be tested. Thus, the accuracy of gender prediction is improved.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow diagram illustrating a gender prediction methodology, according to some embodiments of the present invention.
Fig. 2A and 2B are schematic flow charts of methods of gender prediction according to further embodiments of the present invention.
FIG. 3 is a flow diagram illustrating a method for training gender prediction models, in accordance with some embodiments of the present invention.
FIG. 4 is a flowchart illustrating a method for training a gender prediction model according to further embodiments of the present invention.
FIG. 5 is a flowchart of a gender prediction method according to yet other embodiments of the present invention.
Fig. 6 is a schematic diagram of a gender prediction device, according to some embodiments of the present invention.
Fig. 7 is a schematic structural diagram of a gender prediction device according to another embodiment of the present invention.
Fig. 8 is a schematic structural diagram of a gender prediction device according to still other embodiments of the present invention.
Fig. 9 is a schematic structural diagram of a gender prediction device according to still other embodiments of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
FIG. 1 is a flow diagram illustrating a gender prediction methodology, according to some embodiments of the present invention. As shown in fig. 1, the gender prediction method of this embodiment includes steps S102 to S104.
In step S102, data to be measured is generated according to the history data corresponding to the device to be measured.
The historical data corresponding to the device to be tested refers to data generated when the user uses a target website or application on the device to be tested, for example, data generated when the user performs ordering, collecting, browsing and other operations on an e-commerce website. If not specifically stated, the historical data and the data generated by the device in the following description refer to data corresponding to a target website or application. The history data may include operation data of the user including, for example, browsing data, purchase data, collection data, and the like of the user and operated commodity data including, for example, names, categories, attributes, descriptions, and the like of commodities operated by the user. From these historical data, the data to be measured may be generated based on the characteristics included in the gender prediction model input data. In some embodiments, the historical data may be organized into multidimensional data according to a preset data type corresponding to each feature dimension.
In step S104, the data to be tested is input into a pre-trained gender prediction model, and a gender prediction result of the user corresponding to the device to be tested is obtained, wherein the gender prediction model is trained according to historical data of the device-level user, and a user account corresponding to the device-level user is used in the same device.
The device level user refers to a user account which only logs in the same device, so that the condition that a plurality of persons use respective devices to log in the same account can be eliminated, and the corresponding gender of the user is more accurate. In some embodiments, the device identity used by the user may be added to the user device when it sends data to the server, so that the server can determine the device on which each user account is logged. When training is carried out, equipment used by the user is fully considered by equipment level users, so that information in the same training data can all come from the same user, and the gender of the user corresponding to the same training data is more accurate.
After the gender of the user is predicted, personalized information recommendation can be performed on the user based on the prediction result, for example, data such as commodity information and activity information corresponding to the gender are sent to the user.
By the method of the embodiment, model training can be performed based on historical data of the equipment-level user, so that the gender marked by the training data can be more accurate, and the accuracy of the gender prediction model is improved. Meanwhile, during prediction, the gender of the user can be accurately predicted according to the historical data corresponding to the device to be tested. Thus, the accuracy of gender prediction is improved.
The method and the device can accurately predict the gender of the user under the conditions that the account number corresponds to the equipment one by one, multiple persons share one account number, multiple account numbers log in the same equipment and the like. An embodiment of the gender prediction method of the present invention is described below with reference to fig. 2A and 2B.
FIG. 2A is a flowchart of a gender prediction method according to further embodiments of the present invention. As shown in fig. 2A, the gender prediction method of this embodiment includes steps S202 to S204. In this embodiment, the user account for logging in the device to be tested is the same user account, that is, the user account and the device to be tested correspond to each other one by one, or multiple users share the same user account by using respective devices.
In step S202, data to be tested is generated according to history data generated by a user who is not logged in on the device to be tested and history data generated by a user account corresponding to the device to be tested.
In step S204, the data to be tested is input into the pre-trained gender prediction model, and a gender prediction result of the user corresponding to the device to be tested is obtained.
If a device only logs in one user account, the probability that the device is shared by multiple people is very small. At this time, even if the user does not log in the user account when browsing the e-commerce website, the history data generated by the user who does not log in the device to be tested is likely to be generated by the user account which logs in the device to be tested.
By the method, the gender of the user using the equipment to be tested can be jointly predicted according to the data of the unregistered user and the data of the logged user account, the data size of the data to be tested is enriched, and the accuracy of gender prediction is improved.
FIG. 2B is a flowchart of a gender prediction method according to further embodiments of the present invention. As shown in fig. 2B, the gender prediction method of this embodiment includes steps S212 to S214. In this embodiment, the user account for logging in the device under test includes a plurality of user accounts.
In step S212, the to-be-tested data corresponding to the same user account is acquired according to the history data generated by the same user account corresponding to the to-be-tested device.
In step S214, the data to be tested corresponding to the same user account is input into the pre-trained gender prediction model, and a gender prediction result for the same user account corresponding to the device to be tested is generated.
If a plurality of user accounts are logged in one device, the gender corresponding to the device-user account needs to be predicted according to the historical data of each user account on the device.
By the method of the embodiment, the condition that multiple persons share one device can be covered, so that gender prediction can be carried out on more scenes.
The methods of the embodiments of fig. 2A and 2B may be used alone or in combination. Table 1 shows the user account login situation on different devices in an application scenario.
TABLE 1
Device Account number logged on equipment
Device P1 User account ID1
Device P2 User account ID2
Device P3 User Account ID1, user Account ID3
Thus, in some embodiments, based on the entire historical data on device P1, the gender of the user of device P1 may be predicted; based on the overall historical data on device P2, the gender of the user of device P2 may be predicted; based on the history data of the user account ID1 on the device P3, the gender of the user who logged in the user account ID1 through the device P3 and the gender of the user who logged in the user account ID3 through the device P3 can be predicted.
The invention also provides a training method of the gender prediction model. An embodiment of the training method of the gender prediction model of the present invention is described below with reference to fig. 3.
FIG. 3 is a flow diagram illustrating a method for training gender prediction models, in accordance with some embodiments of the present invention. As shown in fig. 3, the training method of this embodiment includes steps S302 to S306.
In step S302, historical data of a device-level user is generated according to the historical data of the user account and the device information, and the user account corresponding to the device-level user is used in the same device.
When a user logs in an account of an e-commerce website on equipment and data interaction is carried out between the equipment and an e-commerce website background, the background can record the operation of the user and the corresponding user account and equipment information.
In some embodiments, the user accounts corresponding to the device-level users are used in the same device, but one device may log in to multiple user accounts. For example, the user account of user U1 is ID1, and the user account of user U2 is ID 2. Both user U1 and user U2 are only logged on to device P1, and the gender of the user account ID1 on device P1 and user account ID2 on device P1 may be determined.
In some embodiments, the user account corresponding to the device-level user is used on the same device, and the device corresponding to the device-level user logs in the same user account. Namely, historical data of user accounts corresponding to the equipment one by one is adopted for training. For example, when the user account ID1 is logged in only the device P1 and the device P1 is also logged in only the user account ID1, the gender corresponding to the device P1 can be determined.
If the user account ID1 logs in both the device P1 and the device P2, it may be that the user U1 and the user U2 share the account ID1, and at this time, the gender of the user corresponding to the device P1 and the device P2 cannot be determined. Such historical data associated with the user account cannot be used as training data. Therefore, in the training data of the present invention, the related data of the common account is excluded.
In step S304, training data is generated based on the historical data of the device-level user, and the label value of the training data is the gender information of the device-level user.
In step S306, the model is trained using the training data and the label value to obtain a gender prediction model, so as to predict the gender of the user of the device using the gender prediction model. The model may be, for example, a neural network model.
In some embodiments, the model may be composed of an input layer, a hidden layer, and an output layer. The probability that the user is judged to be male or female is calculated by the Softmax activation function. For example, for training data
Figure BDA0001873596100000091
The maximum log-likelihood function of the model may be expressed as
Figure BDA0001873596100000092
Wherein x isnIs the feature vector, y, of the nth device level usernIs the tag value for this device level user, M is the feature weight matrix, and f (-) is the softmax function. During training, the gender and corresponding probability values of the device-level user may be predicted using a random gradient descent method and a learning rate based on linear decay. The closer the probability value is to 1, the higher confidence the user's gender label prediction using the device is.
The user account corresponding to the device-level user is only used in the same device, that is, the user account is not shared by multiple users, so the gender of the user account is the gender of the device-level user. Therefore, the marking value of the training data can be accurately determined, and the prediction accuracy of the gender prediction model is improved.
Because the times of operations such as login, browsing, purchasing and the like of different users are different, the data of part of users is more and the data of part of users is less. In order to further improve the accuracy of prediction, the invention designs training data with different characteristic scales to train different models. An embodiment of the inventive gender prediction model training method is described below with reference to fig. 4.
FIG. 4 is a flowchart illustrating a method for training a gender prediction model according to further embodiments of the present invention. As shown in fig. 4, the training method of this embodiment includes steps S402 to S406.
In step S402, history data of the device-level user is generated based on the history data of the user account and the device information.
In step S404, different types of training data are generated according to a comparison result between the historical operation times of the device-level user on the preset operation and a preset threshold, and a tag value of the training data is gender information of the device-level user.
The preset operation may be a preset operation for measuring whether the user behavior is sparse. For example, the number of orders placed, the number of views, the number of logins, the number of favorites, etc. by the user over the year may be compared to preset thresholds.
In some embodiments, if the historical number of times of operations performed by the device-level user on a preset operation is greater than a preset threshold, for example, the number of times of orders made within a year exceeds 3, the device-level user is a user with rich operations, and otherwise, the device-level user is a user with sparse operations. For users with rich operations, for example, user behavior characteristics and commodity content characteristics can be adopted for training; for users with sparse operations, for example, training can be performed by using commodity content features. Thereby improving the accuracy of the prediction.
The different types of training data may be data having different feature compositions. Therefore, the training data can be formed by selecting the characteristics which can represent the characteristics of the user according to the richness of the user operation.
In step S406, the same type of training data and the label value are used to train the model corresponding to the comparison result, and the gender prediction model corresponding to the comparison result is obtained. For example, the training data corresponding to the users with rich operations may be used to train the gender prediction model with rich operations, and the training data corresponding to the users with sparse operations may be used to train the gender prediction model with sparse operations. And when prediction is carried out, a corresponding model can be selected for prediction according to the condition of the historical data corresponding to the equipment to be tested.
By the method of the embodiment, different types of gender prediction models can be sparsely trained according to user operation. Thereby, the accuracy of gender prediction can be further improved.
In some embodiments, in the case that the historical number of operations of the device-level user on the preset operation is greater than the preset value, the training data includes the first commodity content characteristic and the user behavior characteristic.
The user behavior characteristics may include, for example, at least one of: the device level user operates information on the commodities in each category, the commodities under each brand, and the commodities with each gender attribute. Two user behavior characteristics are exemplarily presented below.
A first exemplary user behavior feature is to count the total, and average of items under each category and brand clicked and purchased by the user. According to the categories and attributes of the commodities, the commodities can be divided into various primary categories, and the categories are continuously divided into secondary categories and tertiary categories. Male and female users may have different interests in purchasing categories and brands of goods. For example, in the category of goods such as automobile supplies, hardware tools, etc., or in the brand specialization for men, the behavior of male users is more abundant. Female users may be more active in the category of goods such as perfume beauty, jewelry and the like, or in the brand specialization for women. By counting clicks and purchases of users in preset time periods under different categories and brands, shopping preferences of the users can be portrayed.
The features are mainly based on category and brand dimensionality, statistical class features are generated on a coarser granularity, coverage is high, feature dimensionality is high, and the features are sparse.
A second exemplary user behavior characteristic is the number and proportion of times the user clicks, male/female/neutral goods purchased. Generally, a male user will have more rich interaction with male merchandise, and a female user will have the opposite. Therefore, the number and ratio of clicks and purchases of the user for male/female merchandise is a direct signal for determining the gender of the user. And neutral commodities such as refrigerators, washing machines and the like have no obvious effect on judging the sex of the user.
Such features, while low in feature dimension and limited in coverage, can be used to count gender preferences of users for all merchandise.
The first item content characteristic may include, for example, the number of operations of the device-level user on a word segmentation in the operated item title. For example, the titles of the commodities clicked or purchased by the device-level users may be segmented, the occurrence number of each segmented word may be counted, and then the occurrence number of each segmented word may be used as the feature value in the training data. The segmentation result data may be filtered, such as removing punctuation marks, stop words, etc., before using the segmentation in the title of the good to reduce interference of irrelevant information. For example, if a user at a certain device level clicks two product titles, "lady's long style coat" and "lady's short style sock", the non-normalized title segmentation features can be shown in table 2.
TABLE 2
Female Long money Wind coat Short money Socks
2 1 1 1 1
The product title is a key content of product information obtained by the user. The commodity title data is full in coverage rate and high in quality, and the commodity title data comprises commodity names and related modifiers, and the participles reflect gender tendencies to different degrees. From the title segmentation results, segmentation directly related to gender can be obtained, such as: male shoes, female bags, mother's clothing, etc.; and segmentation to obtain non-dominant gender words but with obvious gender orientation, such as: shavers, one-piece dresses, high-heeled shoes, etc.; in addition, word segmentation with implicit gender colors can be obtained, such as: electric drill, lace, pink, etc. In some embodiments, the number of clicks and purchases of each user for different heading segments may be counted separately, and the number of clicks and purchases may be normalized to serve as the feature value of each device-level user.
The characteristics are mainly based on commodity dimensions, and are rich in information, high in coverage rate, high in characteristic dimension and sparse.
In some embodiments, in a case that the number of times of the historical operations of the device-level user on the preset operations is not greater than the preset value, the training data includes a second commodity content feature, and the second commodity content feature is determined according to a word vector of the title segmentation of the device-level user historical operation commodity. For example, it may be determined from the average of the word vectors of the title participles for which the user has clicked.
For example, the title of the product clicked by the device-level user includes the words "man", "pure cotton", "letter", "round collar", "short sleeve", and the word vector of each word is: men ═ 0.8,0,0,0,0], cotton ═ 0.2,0,0,0,0], letters ═ 0,0.5,0,0,0], round-neck ═ 0,0,0.5,0,0], short-sleeve ═ 0,0,0,0.5, 0.5. The second commodity content characteristic corresponding to the device level user is to find the average value of the above five vectors, and the result is [0.2,0.1,0.1,0.1,0.1 ].
The statistical information of the users with sparse behaviors hardly reflects the characteristics of the users, so that for example, a text set consisting of related commodity titles can be constructed by using the title data of the clicked commodities within the preset time of the users. When the feature data of the users are counted, most of the high-frequency word segmentation data clicked by the users are found to have low correlation with gender. For the above reasons, the present invention does not embody click number information of each segmented word in the second article content feature.
In some embodiments, a bag-of-words model and a supervised learning method can be used to determine the word vector of each segmented word, so that the effect of the gender-related words can be more highlighted, and the accuracy of the final gender prediction result is improved. A method of determining a word vector is exemplarily described below.
Assume that there are a total of K commodity participles in the training sample set used for training the word vector. The characteristics of the nth user, i.e. the training data, after normalization can be expressed as
Figure BDA0001873596100000131
Figure BDA0001873596100000132
Wherein, if the user clicks or purchases the word segmentation i, xi1, otherwise xi0. The gender tag of the nth user is ynI.e. the tag value of the user characteristic. k represents the identity of the item participle.
In a training set of N users, the maximum likelihood function is expressed as
Figure BDA0001873596100000133
Wherein A ═ V0,V1……VK-1]T,VkAnd f (-) is an activation function for the word vector of the kth participle, and B is a gender prediction model parameter. By minimizing
Figure BDA0001873596100000134
Solutions for parameters a and B may be obtained to obtain word vectors and gender prediction model parameters.
In the stage of gender prediction, different models can be adopted for training according to preset operation times in historical data corresponding to the equipment to be tested. An embodiment of the gender prediction method of the present invention is described below with reference to fig. 5.
FIG. 5 is a flowchart of a gender prediction method according to yet other embodiments of the present invention. As shown in fig. 5, the gender prediction method of this embodiment includes steps S502 to S504.
In step S502, different types of data to be tested are generated according to a comparison result between preset operation times in the history data corresponding to the device to be tested and a preset threshold.
In some embodiments, when the preset operation times in the history data corresponding to the device to be tested are greater than a preset value, the data to be tested includes the first commodity content characteristic and the user behavior characteristic. The user behavior characteristics include at least one of: information on the operated commodity in each category, information on the operated commodity under each brand, and information on the operated commodity with each type attribute; the first article content feature includes the number of operations on a word segmentation in the operated article title.
In some embodiments, when the preset operation times in the history data corresponding to the device to be tested are not greater than the preset value, the data to be tested includes a second commodity content feature, and the second commodity content feature is determined according to a word vector of a word segmentation of a title of the history operation commodity.
In step S504, the data to be tested is input into the pre-trained gender prediction model corresponding to the comparison result, so as to obtain a gender prediction result of the user corresponding to the device to be tested.
By the method of the embodiment, different gender prediction models can be adopted for prediction according to whether the user operation is sparse. Thereby, the accuracy of gender prediction can be further improved.
A gender prediction device in accordance with some embodiments of the present invention is described below with reference to fig. 6.
Fig. 6 is a schematic diagram of a gender prediction device, according to some embodiments of the present invention. As shown in fig. 6, the gender prediction apparatus 60 of this embodiment includes: the to-be-tested data generation module 610 is configured to generate to-be-tested data according to historical data corresponding to the to-be-tested device; the gender prediction module 620 is configured to input the data to be tested into a pre-trained gender prediction model, and obtain a gender prediction result of the user corresponding to the device to be tested, where the gender prediction model is trained according to historical data of the device-level user, and a user account corresponding to the device-level user is used in the same device.
In some embodiments, the to-be-tested data generation module 610 is further configured to generate the to-be-tested data according to historical data generated by a user who does not log in the to-be-tested device and historical data generated by a user account corresponding to the to-be-tested device, when the user accounts logging in the to-be-tested device are the same user account.
In some embodiments, the to-be-tested data generation module 610 is further configured to, in a case that the user account logged in to the to-be-tested device includes multiple user accounts, obtain to-be-tested data corresponding to the same user account according to historical data generated by the same user account corresponding to the to-be-tested device; the gender prediction module 620 is further configured to input the data to be tested corresponding to the same user account into a pre-trained gender prediction model, and generate a gender prediction result for the same user account corresponding to the device to be tested.
In some embodiments, the to-be-tested data generation module 610 is further configured to generate different types of to-be-tested data according to a comparison result between preset operation times in the history data corresponding to the to-be-tested device and a preset threshold; the gender prediction module 620 is further configured to input the data to be tested into a pre-trained gender prediction model corresponding to the comparison result, and obtain a gender prediction result of the user corresponding to the device to be tested.
In some embodiments, when the preset operation times in the history data corresponding to the device to be tested are greater than a preset value, the data to be tested includes a first commodity content characteristic and a user behavior characteristic; the user behavior characteristics include at least one of: information on the operated commodity in each category, information on the operated commodity under each brand, and information on the operated commodity with each type attribute; the first article content feature includes the number of operations on a word segmentation in the operated article title.
In some embodiments, when the preset operation times in the history data corresponding to the device to be tested are not greater than the preset value, the data to be tested includes a second commodity content feature, and the second commodity content feature is determined according to a word vector of a word segmentation of a title of the history operation commodity.
A gender prediction device in accordance with some embodiments of the present invention is described below with reference to fig. 7.
Fig. 7 is a schematic structural diagram of a gender prediction device according to another embodiment of the present invention. As shown in fig. 7, the gender prediction apparatus 70 of this embodiment includes a to-be-tested data generation module 710 and a gender prediction module 720, and specific implementations thereof may refer to the to-be-tested data generation module 610 and the gender prediction module 620 in the embodiment of fig. 6, respectively. In addition, the gender prediction apparatus 70 further includes a model training module 730 configured to generate historical data of the device-level user according to the historical data of the user account and the device information; generating training data according to historical data of the equipment-level user, wherein the marking value of the training data is the gender information of the equipment-level user; and training the model by using the training data and the marking value to obtain a gender prediction model so as to predict the gender of the user of the equipment by using the gender prediction model.
In some embodiments, the same user account is logged into a device corresponding to the device level user.
Fig. 8 is a schematic structural diagram of a gender prediction device according to still other embodiments of the present invention. As shown in fig. 8, the gender prediction apparatus 80 of this embodiment includes: a memory 810 and a processor 820 coupled to the memory 810, the processor 820 being configured to perform a gender prediction method in any of the embodiments described above based on instructions stored in the memory 810.
Memory 810 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.
Fig. 9 is a schematic structural diagram of a gender prediction device according to still other embodiments of the present invention. As shown in fig. 9, the gender prediction apparatus 90 of this embodiment includes: the memory 910 and the processor 920 may further include an input/output interface 930, a network interface 940, a storage interface 950, and the like. These interfaces 930, 940, 950 and the memory 910 and the processor 920 may be connected, for example, by a bus 960. The input/output interface 930 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 940 provides a connection interface for various networking devices. The storage interface 950 provides a connection interface for external storage devices such as an SD card and a usb disk.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program is configured to implement any one of the above methods for predicting a gender of a user.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (11)

1. A method of gender prediction, comprising:
generating data to be tested according to historical data corresponding to the equipment to be tested;
and inputting the data to be tested into a pre-trained gender prediction model to obtain a gender prediction result of a user corresponding to the equipment to be tested, wherein the gender prediction model is trained according to historical data of equipment-level users, and user accounts corresponding to the equipment-level users are used in the same equipment.
2. A gender prediction method as claimed in claim 1 wherein,
and under the condition that the user account number of the device to be tested is the same, generating data to be tested according to historical data generated by the user who does not log in on the device to be tested and the historical data generated by the user account number corresponding to the device to be tested.
3. A gender prediction method as claimed in claim 1 wherein,
under the condition that the user account for logging in the equipment to be tested comprises a plurality of user accounts, obtaining data to be tested corresponding to the same user account according to historical data generated by the same user account corresponding to the equipment to be tested;
and inputting the data to be tested corresponding to the same user account into a pre-trained gender prediction model, and generating a gender prediction result of the same user account corresponding to the equipment to be tested.
4. A gender prediction method as claimed in claim 1 wherein,
generating different types of data to be tested according to a comparison result of preset operation times in historical data corresponding to the equipment to be tested and a preset threshold;
and inputting the data to be tested into a pre-trained gender prediction model corresponding to the comparison result to obtain a gender prediction result of the user corresponding to the equipment to be tested.
5. The gender prediction method of claim 4, wherein in case that the preset operation times in the history data corresponding to the device under test are greater than a preset value, the data under test comprises a first commodity content characteristic and a user behavior characteristic;
the user behavior characteristics include at least one of: information on the operated commodity in each category, information on the operated commodity under each brand, and information on the operated commodity with each type attribute;
the first commodity content feature includes the number of operations on a word segmentation in an operated commodity title.
6. The gender prediction method of claim 4, wherein, in case that the preset operation times in the history data corresponding to the device under test are not greater than a preset value, the data under test comprises a second commodity content feature, and the second commodity content feature is determined according to a word vector of a word segmentation of a title of the history operation commodity.
7. A gender prediction method as claimed in any of claims 1-6 further comprising:
generating historical data of equipment level users according to the historical data of the user accounts and the equipment information;
generating training data according to historical data of the equipment-level user, wherein the marking value of the training data is the gender information of the equipment-level user;
and training a model by using the training data and the marking value to obtain a gender prediction model so as to predict the gender of the user of the equipment by using the gender prediction model.
8. The gender prediction method of claim 7 wherein the same user account is logged on to the device corresponding to the device level user.
9. A gender prediction device, comprising:
the to-be-tested data generation module is configured to generate to-be-tested data according to historical data corresponding to the to-be-tested equipment;
and the gender prediction module is configured to input the data to be tested into a pre-trained gender prediction model to obtain a gender prediction result of the user corresponding to the device to be tested, wherein the gender prediction model is trained according to historical data of the device-level user, and the user account corresponding to the device-level user is used in the same device.
10. A gender prediction device, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the gender prediction method of any of claims 1-8 based on instructions stored in the memory.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a gender prediction method as claimed in any one of claims 1 to 8.
CN201811388916.3A 2018-11-21 2018-11-21 Gender prediction method, device and computer-readable storage medium Pending CN111209925A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811388916.3A CN111209925A (en) 2018-11-21 2018-11-21 Gender prediction method, device and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811388916.3A CN111209925A (en) 2018-11-21 2018-11-21 Gender prediction method, device and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN111209925A true CN111209925A (en) 2020-05-29

Family

ID=70787839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811388916.3A Pending CN111209925A (en) 2018-11-21 2018-11-21 Gender prediction method, device and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN111209925A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652432A (en) * 2020-06-01 2020-09-11 北京达佳互联信息技术有限公司 Method and device for determining user attribute information, electronic equipment and storage medium
CN113822691A (en) * 2020-10-28 2021-12-21 北京沃东天骏信息技术有限公司 User account identification method, device, system and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652432A (en) * 2020-06-01 2020-09-11 北京达佳互联信息技术有限公司 Method and device for determining user attribute information, electronic equipment and storage medium
CN113822691A (en) * 2020-10-28 2021-12-21 北京沃东天骏信息技术有限公司 User account identification method, device, system and medium

Similar Documents

Publication Publication Date Title
CN110222272B (en) Potential customer mining and recommending method
US11574139B2 (en) Information pushing method, storage medium and server
CN111444334B (en) Data processing method, text recognition device and computer equipment
CN106022800A (en) User feature data processing method and device
US20180053234A1 (en) Description information generation and presentation systems, methods, and devices
CN111784455A (en) Article recommendation method and recommendation equipment
CN106874314B (en) Information recommendation method and device
CN107169806B (en) Method and device for determining influence degree of commodity attribute on purchase decision
US11676194B2 (en) Faceted item recommendation system
CN106682686A (en) User gender prediction method based on mobile phone Internet-surfing behavior
CN110427548B (en) Information pushing method, information pushing device and computer readable storage medium
CN111737418B (en) Method, apparatus and storage medium for predicting relevance of search term and commodity
CN106886934B (en) Method, system and apparatus for determining merchant categories
CN111353838A (en) Method and device for automatically checking commodity category
US20210217053A1 (en) Methods and apparatuses for selecting advertisements using semantic matching
CN111209925A (en) Gender prediction method, device and computer-readable storage medium
CN110223095A (en) Determine the method, apparatus, equipment and storage medium of item property
CN111767459A (en) Item recommendation method and device
CN111461827A (en) Product evaluation information pushing method and device
CN104331395A (en) Method and device for identifying Chinese product name from text
CN103886869B (en) A kind of information feedback method based on speech emotion recognition and system
CN106204163B (en) Method and device for determining user attribute characteristics
CN110968670B (en) Method, device, equipment and storage medium for acquiring attributes of popular commodities
CN111523315B (en) Data processing method, text recognition device and computer equipment
CN112507241A (en) Vehicle type recommendation method, system, medium and electronic device for obtaining vehicle type recommendation list

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination