CN111639966A - User age information prediction method, device, electronic equipment and medium - Google Patents
User age information prediction method, device, electronic equipment and medium Download PDFInfo
- Publication number
- CN111639966A CN111639966A CN202010425817.9A CN202010425817A CN111639966A CN 111639966 A CN111639966 A CN 111639966A CN 202010425817 A CN202010425817 A CN 202010425817A CN 111639966 A CN111639966 A CN 111639966A
- Authority
- CN
- China
- Prior art keywords
- age
- user
- information
- participles
- probability distribution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000002372 labelling Methods 0.000 claims abstract description 79
- 238000004364 calculation method Methods 0.000 claims abstract description 15
- 238000004140 cleaning Methods 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 18
- 230000011218 segmentation Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 7
- 238000005406 washing Methods 0.000 claims description 5
- 230000008030 elimination Effects 0.000 claims description 4
- 238000003379 elimination reaction Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 7
- 230000009286 beneficial effect Effects 0.000 description 5
- 238000006467 substitution reaction Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 101100153581 Bacillus anthracis topX gene Proteins 0.000 description 2
- 101150041570 TOP1 gene Proteins 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 239000008267 milk Substances 0.000 description 1
- 210000004080 milk Anatomy 0.000 description 1
- 235000013336 milk Nutrition 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Databases & Information Systems (AREA)
- Accounting & Taxation (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Tourism & Hospitality (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Game Theory and Decision Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a user age information prediction method and device, an electronic device and a computer readable storage medium. Wherein the method comprises the following steps: obtaining first annotation information of a first user; classifying the first labeling information of the first user to obtain second labeling information of the first user; determining at least one age characteristic label corresponding to the first user according to the second labeling information; inquiring age probability distribution information corresponding to each age characteristic label in a preset age probability distribution information set; and carrying out age deviation weighting calculation based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information, and predicting the age bracket where the first user is located. According to the scheme, the user does not need to provide age information, and the age information of the user can be accurately predicted.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for predicting user age information, an electronic device, and a computer-readable storage medium.
Background
With the rapid development and popularization of internet technology, personalized recommendation gradually becomes a mainstream technology for information and service recommendation of information service providers to users, and because the preferences of users at different ages for information are greatly different, age information can become one of the reference factors for personalized recommendation, and more accurate personalized information recommendation content can be provided for the users.
In the prior art, due to the protection of user privacy, most information providers are difficult to acquire the age information of the user unless the user actively provides the information, and information recommendation cannot be performed on the user according to the age information.
Therefore, it is necessary to provide a technical solution for accurately predicting age information of a user without providing the age information by the user.
Disclosure of Invention
The application aims to provide a user age information prediction method and device, an electronic device and a computer readable storage medium.
A first aspect of the present application provides a method for predicting user age information, including:
obtaining first annotation information of a first user;
classifying the first labeling information of the first user to obtain second labeling information of the first user;
determining at least one age characteristic label corresponding to the first user according to the second labeling information;
inquiring age probability distribution information corresponding to each age characteristic label in a preset age probability distribution information set, wherein the age probability distribution information comprises the distribution probability of the age characteristic labels corresponding to each age group;
and carrying out age deviation weighting calculation based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information, and predicting the age bracket where the first user is located.
A second aspect of the present application provides a user age information prediction apparatus, including:
the first annotation information acquisition module is used for acquiring first annotation information of a first user;
the second labeling information obtaining module is used for classifying the first labeling information of the first user to obtain second labeling information of the first user;
the age characteristic label determining module is used for determining at least one age characteristic label corresponding to the first user according to the second labeling information;
a probability distribution information query module, configured to query age probability distribution information corresponding to each age feature tag in a preset age probability distribution information set, where the age probability distribution information includes distribution probabilities of the age feature tags corresponding to each age group;
and the age group prediction module is used for carrying out age deviation weighting calculation based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information and predicting the age group of the first user.
A third aspect of the present application provides an electronic device comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the computer program when executing the computer program to perform the method of the first aspect of the application.
A fourth aspect of the present application provides a computer readable storage medium having computer readable instructions stored thereon which are executable by a processor to implement the method of the first aspect of the present application.
The application provides at least one technical scheme which can at least obtain the following beneficial effects: the method comprises the steps of obtaining first labeling information of a first user, classifying the first labeling information of the first user to obtain second labeling information of the first user, determining at least one age characteristic label corresponding to the first user according to the second labeling information, inquiring age probability distribution information corresponding to the age characteristic labels in a preset age probability distribution information set, wherein the age probability distribution information comprises distribution probabilities of the age characteristic labels corresponding to all age groups, and carrying out age deviation weighted calculation based on the at least one age characteristic label corresponding to the first user and the corresponding age probability distribution information, so that the age group where the first user is located can be predicted. Because the first mark information of first user such as dad, grander etc. can reflect user's age information, consequently, through classifying the operation such as to first mark information to can confirm user's age characteristic label, then utilize age probability distribution information that age characteristic label corresponds respectively, can predict the age bracket that the user was located, thereby do not need the user to provide age information, can accurately predict user's age information.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 illustrates a flow chart of a user age information prediction method provided by some embodiments of the present application;
fig. 2 is a schematic diagram illustrating a user age information prediction apparatus according to some embodiments of the present application;
FIG. 3 illustrates a schematic diagram of an electronic device provided by some embodiments of the present application;
FIG. 4 illustrates a schematic diagram of a computer-readable storage medium provided by some embodiments of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which this application belongs.
In addition, the terms "first" and "second", etc. are used to distinguish different objects, rather than to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
The present application provides a method and an apparatus for predicting age information of a user, an electronic device, and a computer-readable storage medium, which are described below with reference to embodiments.
Referring to fig. 1, which illustrates a flowchart of a user age information prediction method according to some embodiments of the present application, as shown in the figure, the user age information prediction method may include the following steps:
step S101: first annotation information of a first user is obtained.
In some modifications of the embodiments of the present application, step S101 may include:
acquiring at least one piece of self-defined labeling information of a social account of a first user in a second user address list;
and carrying out accurate labeling duplication elimination operation on the acquired user-defined labeling information to obtain the first labeling information of the first user.
The first user and the second user are users having a social relationship with each other, for example, both users are users registered in the same application software and are friends of each other, and correspondingly, the address book may be an address book in the application software, or may refer to a mobile phone address book, and the like, which is not limited in the embodiment of the present application. In the address book, users can mutually perform customized labeling, wherein the customized labeling of one user to another user in the own address book is the customized labeling information, for example: dad, grandpa, king manager-XX company, king captain-XX bureau of the city XX, etc.
The number of the second users may be one or multiple, and when the number of the second users is multiple, the obtained customized labeling information is also multiple, and considering that different users have different customized labeling information for the same user and part of the customized labeling information contains information irrelevant to the age, the obtained customized labeling information needs to be subjected to operations such as deduplication, so that the first labeling information which is more beneficial to determining the user age feature label is obtained, and therefore interference of invalid data and redundant data on a final prediction result is avoided.
On the basis of the foregoing embodiment, in some modified embodiments, the performing a deduplication operation for accurately labeling the obtained custom labeling information to obtain the first labeling information of the first user includes:
performing word segmentation processing on the obtained custom annotation information to obtain a plurality of labeled words;
according to a preset data cleaning rule, performing data cleaning on the plurality of labeled participles to obtain standard labeled participles corresponding to the first user;
and carrying out duplication removing operation on the standard annotation participle to obtain first annotation information of the first user.
The word segmentation processing refers to segmenting a Chinese character sequence into individual words, and considering that part of the custom annotation information is a phrase or a short sentence formed by connecting a plurality of words, therefore, the word segmentation processing needs to be performed on the custom annotation information at first.
After word segmentation processing is performed, a plurality of labeled segmented words are obtained, and then data cleaning can be performed on the obtained labeled segmented words, wherein the data cleaning can include but is not limited to deleting words which are irrelevant to age in the labeled segmented words, converting non-standard words into unified standard words, and the like, so that the subsequent data processing efficiency is improved.
Specifically, in some embodiments, the performing data washing on the plurality of labeled participles according to a preset data washing rule may include at least one of:
aiming at the labeled participles belonging to the family appellation participles, replacing the labeled participles with standard family appellation participles;
aiming at the labeled participles belonging to the professional title participles, replacing the labeled participles with standard professional title participles;
aiming at the labeled participles which do not belong to family appellation participles and professional appellation participles, calculating the importance degree of each labeled participle by adopting a word frequency-inverse document frequency algorithm, and screening out a specified number of labeled participles with the top importance degree as user-defined labeled participles according to the importance degree of each labeled participle.
The standard words such as the standard family title and the standard professional title can be flexibly set according to actual requirements, for example, Dad and the like can be uniformly replaced by the standard family title "Dad", doctors and the like can be uniformly replaced by the standard professional title "doctor", and the like.
In addition, the word frequency-inverse document frequency algorithm, i.e. the TF-IDF algorithm, is a statistical method for evaluating the importance degree of a word to one of a set of files or a corpus, the importance of the word increases in proportion to the number of times it appears in the file, but decreases in inverse proportion to the frequency of it appearing in the corpus, in the implementation of the embodiment of the present application, the set of all labeled participles of the same user can be regarded as a file, the set of all labeled participles of a plurality of users or all users can be regarded as a corpus, then the importance degree of each labeled participle is determined by the TF-IDF algorithm, and a specified number of labeled participles with a higher importance degree are screened out as labeled custom participles, which also belong to standard labeled participles, wherein the specified number can be flexibly set according to actual requirements, the embodiments of the present application are not limited.
It is easy to understand that after the data washing is performed on the labeled participles, partially repeated participles still exist in the obtained standard labeled participles corresponding to the first user, and therefore, the duplication removing operation can be further performed on the standard labeled participles, so that the first labeled information of the first user is obtained.
According to the embodiment, the plurality of pieces of first labeling information which are not repeated and can represent the age information of the first user can be obtained, so that the accurate and concise first labeling information can be utilized subsequently, and the age characteristic label of the user can be accurately determined.
Step S102: and classifying the first labeling information of the first user to obtain second labeling information of the first user.
In order to better determine the age characteristic label corresponding to the first user, the first annotation information is classified, wherein the second annotation information is more closely and directly associated with the age characteristic, for example, the first annotation information such as husband, wife and the like can be classified as married, the first annotation information such as dad, mom and the like can be classified as child, and the first annotation information such as director, president, academia and the like can be classified as high income.
Step S103: and determining at least one age characteristic label corresponding to the first user according to the second labeling information.
Next, at least one age characteristic tag corresponding to the first user may be determined according to the second labeling information, for example, the age characteristic tag of the first user may include: married, child, high income, descendants, etc.
Step S104: and inquiring age probability distribution information corresponding to each age characteristic label in a preset age probability distribution information set, wherein the age probability distribution information comprises the distribution probability of the age characteristic labels corresponding to each age group.
For example, in some modified embodiments of the present application, before step S104, the age probability distribution information may further include:
obtaining sample data, wherein the sample data comprises user-defined labeling information of social accounts of a plurality of sample users in other user address lists and actual age information of the plurality of sample users;
determining an age characteristic label corresponding to each sample user according to the user-defined labeling information;
and generating an age probability distribution information set according to the corresponding relation between the actual age information of all the sample users and the age characteristic labels, wherein the age probability distribution information set comprises the age probability distribution information corresponding to each age characteristic label.
Through the embodiment, the age probability distribution information corresponding to each age characteristic label can be determined according to a large amount of sample data.
On the basis of the foregoing embodiment, in some modified embodiments, the determining, according to the customized labeling information, the age characteristic label corresponding to each sample user includes:
performing word segmentation processing on the custom annotation information corresponding to each sample user to obtain a plurality of labeled words;
according to a preset data cleaning rule, performing data cleaning on the plurality of labeled participles to obtain a standard labeled participle corresponding to each sample user;
carrying out duplication removal operation on the standard labeling participles to obtain first labeling information corresponding to each sample user;
classifying the first labeling information corresponding to each sample user to obtain second labeling information corresponding to each sample user;
and determining the age characteristic label corresponding to each sample user according to the second labeling information corresponding to each sample user based on the mapping relation between the preset second labeling information and the age characteristic label.
This embodiment can be understood by referring to the related description in step S101, and the description thereof is omitted here.
In addition to the foregoing embodiments, in some modified embodiments, the generating an age probability distribution information set according to the correspondence between the actual age information of all the sample users and the age feature labels includes:
calculating the distribution probability of each age characteristic label corresponding to each age group according to the corresponding relation between the actual age information of all the sample users and the age characteristic labels;
generating age probability distribution information corresponding to each age characteristic label according to the distribution probability of each age characteristic label in each age group;
and generating an age probability distribution information set according to the age probability distribution information corresponding to all the age characteristic labels.
On the basis of any of the foregoing embodiments, in some modified embodiments, the performing data cleaning on the plurality of labeled participles according to a preset data cleaning rule includes at least one of:
aiming at the labeled participles belonging to the family appellation participles, replacing the labeled participles with standard family appellation participles;
aiming at the labeled participles belonging to the professional title participles, replacing the labeled participles with standard professional title participles;
aiming at the labeled participles which do not belong to family appellation participles and professional appellation participles, calculating the importance degree of each labeled participle by adopting a word frequency-inverse document frequency algorithm, and screening out a specified number of labeled participles with the top importance degree as user-defined labeled participles according to the importance degree of each labeled participle.
This embodiment can be understood by referring to the related description in step S101, and the description thereof is omitted here.
Step S105: and carrying out age deviation weighting calculation based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information, and predicting the age bracket where the first user is located.
Specifically, in some embodiments, the step S105 may include:
determining the distribution probability of each age group corresponding to each age characteristic label based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information;
for each age group, carrying out weighted calculation on the distribution probability of the age group corresponding to each age characteristic label to obtain the prediction probability corresponding to the age group;
and selecting the age group with the highest prediction probability to determine the age group as the predicted age group of the first user.
In this embodiment, according to the age probability distribution information corresponding to each age feature tag, the distribution probability corresponding to each age feature tag of each age group can be determined, and in consideration of different degrees of representation of ages by different age feature tags, corresponding weights can be set for different age feature tags, and then, for each age group, the distribution probability corresponding to each age feature tag of the age group is weighted and calculated according to the weights, so as to obtain the prediction probability corresponding to the age group, and then, the age group with the highest prediction probability is selected, so that the predicted age group in which the first user is located can be determined.
The user age information prediction method provided by the embodiment of the application can at least obtain the following beneficial effects: the method comprises the steps of obtaining first labeling information of a first user, classifying the first labeling information of the first user to obtain second labeling information of the first user, determining at least one age characteristic label corresponding to the first user according to the second labeling information, inquiring age probability distribution information corresponding to the age characteristic labels in a preset age probability distribution information set, wherein the age probability distribution information comprises distribution probabilities of the age characteristic labels corresponding to all age groups, and carrying out age deviation weighted calculation based on the at least one age characteristic label corresponding to the first user and the corresponding age probability distribution information, so that the age group where the first user is located can be predicted. Because the first mark information of first user such as dad, grander etc. can reflect user's age information, consequently, through classifying the operation such as to first mark information to can confirm user's age characteristic label, then utilize age probability distribution information that age characteristic label corresponds respectively, can predict the age bracket that the user was located, thereby do not need the user to provide age information, can accurately predict user's age information.
The following description is given in conjunction with specific examples, which may be understood with reference to the description of any of the above embodiments, which may also be understood with reference to the following specific examples, which in some specific examples may include the following steps:
step A, label selection: through integrating 382 career labels and 504 family appellations which are manually sorted, and mining 500 high-frequency word segments of address book labeling content Top based on TF-IDF algorithm, removing meaningless words and removing duplication, 1010 career and family appellation labels are obtained; mapping 113 family title tags (dad, grand, daughter, etc.) to married tags; mapping 51 family appellation labels (milk, Yue father, family mother and the like) to child labels; mapping professional labels such as board of directors, bank leaders and hospital yardage to top1 income labels, and mapping professional labels such as boss, manager and master without top1 income labels to top10 income labels; in addition, millions of names (except for those containing specific keywords, such as cat, dog, bad, head, store, and the like) are extracted based on the Jieba segmentation and are taken as name labels.
B, counting label distribution: and counting the distribution of the tags (namely, the age characteristic tags) in each group of age group numbers in the step A, namely, the probability distribution information of each tag in each age group.
Step C, prediction: based on the age characteristic label and the age probability distribution information of each number, the age group deviation probability is calculated in a weighting mode, and then the age group can be predicted and judged.
Step D, evaluation: and evaluating the accuracy rate based on the third party number age group standard data.
According to the user age information prediction method provided by the embodiment of the application, based on address book data, according to relatively objective user number labeling information (namely, self-defined labeling information) in an address book, data such as labels (age characteristic labels including title, occupation, name, marriage and education, income and the like) associated with each labeled number are mined, probability distribution of each label in 6 age groups is counted, and prediction of the 6 age groups of the number can be carried out. Through mining and calculating collected billions of address book data, more than 10 billion number age data with the accuracy rate of 45 percent (the accuracy rate of random prediction is 1/6-17 percent) are predicted.
In the foregoing embodiment, a method for predicting user age information is provided, and correspondingly, an apparatus for predicting user age information is also provided. The user age information prediction device provided by the embodiment of the application can implement the user age information prediction method, and the user age information prediction device can be implemented in a software, hardware or software and hardware combined mode. For example, the user age information prediction apparatus may include integrated or separate functional modules or units to perform the corresponding steps of the above-described methods. Please refer to fig. 2, which illustrates a schematic diagram of a user age information prediction apparatus according to some embodiments of the present application. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.
As shown in fig. 2, the user age information prediction apparatus 10 may include:
a first annotation information obtaining module 101, configured to obtain first annotation information of a first user;
a second annotation information obtaining module 102, configured to perform a classification operation on the first annotation information of the first user to obtain second annotation information of the first user;
an age characteristic tag determining module 103, configured to determine, according to the second labeling information, at least one age characteristic tag corresponding to the first user;
a probability distribution information query module 104, configured to query age probability distribution information corresponding to each age feature tag in a preset age probability distribution information set, where the age probability distribution information includes distribution probabilities of the age feature tags corresponding to each age group;
an age group prediction module 105, configured to perform age bias weighting calculation based on at least one age feature tag corresponding to the first user and corresponding age probability distribution information, and predict an age group in which the first user is located.
In some variations of the embodiments of the present application, the first annotation information obtaining module 101 includes:
the user-defined labeling information acquisition unit is used for acquiring at least one piece of user-defined labeling information of the social account of the first user in the address book of the second user;
and the duplication elimination processing unit is used for carrying out duplication elimination operation of accurate annotation on the acquired user-defined annotation information to obtain the first annotation information of the first user.
In some variations of the embodiments of the present application, the deduplication processing unit includes:
the word segmentation processing subunit is used for carrying out word segmentation processing on the acquired user-defined tagging information to obtain a plurality of tagged words;
the data cleaning subunit is used for performing data cleaning on the plurality of labeled participles according to a preset data cleaning rule to obtain standard labeled participles corresponding to the first user;
and carrying out duplication removing operation on the standard annotation participle to obtain first annotation information of the first user.
In some variations of the embodiments of the present application, the apparatus 10 further comprises:
the sample data acquisition module is used for acquiring sample data, wherein the sample data comprises user-defined labeling information of social accounts of a plurality of sample users in other user address lists and actual age information of the plurality of sample users;
the sample age tag determining module is used for determining an age characteristic tag corresponding to each sample user according to the user-defined labeling information;
and the age distribution information generating module is used for generating an age probability distribution information set according to the corresponding relation between the actual age information of all the sample users and the age characteristic labels, and the age probability information set comprises the age probability distribution information corresponding to each age characteristic label.
In some variations of embodiments of the present application, the sample age label determining module includes:
the word segmentation processing unit is used for carrying out word segmentation processing on the custom marking information corresponding to each sample user to obtain a plurality of marked words;
the data cleaning unit is used for performing data cleaning on the plurality of labeled participles according to a preset data cleaning rule to obtain a standard labeled participle corresponding to each sample user;
the word segmentation and duplication removal unit is used for performing duplication removal operation on the standard labeled words to obtain first labeling information corresponding to each sample user;
the classification operation unit is used for performing classification operation on the first marking information corresponding to each sample user to obtain second marking information corresponding to each sample user;
and the sample age label determining unit is used for determining the age characteristic label corresponding to each sample user according to the second labeling information corresponding to each sample user based on the mapping relation between the preset second labeling information and the age characteristic label.
In some variations of embodiments of the present application, the data cleansing subunit or the data cleansing unit includes at least one of:
the family title replacing subunit is used for replacing the labeled participles belonging to the family title participles with standard family title participles;
the system comprises a professional title substitution subunit, a standard professional title participle substitution subunit and a semantic meaning substitution subunit, wherein the professional title substitution subunit is used for substituting the labeled participle into the standard professional title participle aiming at the labeled participle belonging to the professional title participle;
and the other appellation replacing subunit is used for calculating the importance of each labeled participle by adopting a word frequency-inverse document frequency algorithm aiming at labeled participles which do not belong to family appellation participles and professional appellation participles, and screening out a specified number of labeled participles with the top importance as user-defined labeled participles according to the importance of each labeled participle.
In some modifications of the embodiments of the present application, the age distribution information generation module includes:
the distribution probability calculation unit is used for calculating the distribution probability of each age characteristic label corresponding to each age group according to the corresponding relation between the actual age information of all the sample users and the age characteristic labels;
an age distribution information generating unit, configured to generate age probability distribution information corresponding to each of the age feature tags according to a distribution probability of each of the age feature tags corresponding to each of age groups;
and the distribution information set generating unit is used for generating an age probability distribution information set according to the age probability distribution information corresponding to all the age characteristic labels.
In some variations of the embodiments of the present application, the age group prediction module 105 includes:
a distribution probability determining unit, configured to determine, based on at least one of the age feature tags corresponding to the first user and corresponding age probability distribution information, the distribution probability of each age group corresponding to each of the age feature tags;
the weighting calculation unit is used for carrying out weighting calculation on the distribution probability of the age group corresponding to each age characteristic label to obtain the prediction probability corresponding to the age group;
and the probability selecting unit is used for selecting the age group with the highest prediction probability to determine as the predicted age group in which the first user is positioned.
The user age information prediction apparatus 10 according to the embodiment of the present application has the same advantageous effects as the user age information prediction method according to the foregoing embodiment of the present application.
The embodiment of the present application further provides an electronic device corresponding to the user age information prediction method provided in the foregoing embodiment, where the electronic device may be an electronic device for a server, such as a server, and includes an independent server and a distributed server cluster, so as to execute the user age information prediction method.
Please refer to fig. 3, which illustrates a schematic diagram of an electronic device according to some embodiments of the present application. As shown in fig. 3, the electronic device 20 may include: the system comprises a processor 200, a memory 201, a bus 202 and a communication interface 203, wherein the processor 200, the communication interface 203 and the memory 201 are connected through the bus 202; the memory 201 stores a computer program that can be executed on the processor 200, and the processor 200 executes the user age information prediction method provided in any of the foregoing embodiments when executing the computer program.
The Memory 201 may include a high-speed Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 203 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
The processor 200 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 200. The Processor 200 may be a general-purpose Processor, and may include a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201 and completes the steps of the method in combination with the hardware thereof.
The electronic device provided by the embodiment of the application and the user age information prediction method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.
Referring to fig. 4, the computer readable storage medium is an optical disc 30, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program executes the method for predicting the age information of the user according to any of the embodiments.
It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.
The computer-readable storage medium provided by the above-mentioned embodiment of the present application and the user age information prediction method provided by the embodiment of the present application have the same beneficial effects as the method adopted, run or implemented by the application program stored in the computer-readable storage medium.
It should be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present disclosure, and the present disclosure should be construed as being covered by the claims and the specification.
Claims (11)
1. A user age information prediction method is characterized by comprising the following steps:
obtaining first annotation information of a first user;
classifying the first labeling information of the first user to obtain second labeling information of the first user;
determining at least one age characteristic label corresponding to the first user according to the second labeling information;
inquiring age probability distribution information corresponding to each age characteristic label in a preset age probability distribution information set, wherein the age probability distribution information comprises the distribution probability of the age characteristic labels corresponding to each age group;
and carrying out age deviation weighting calculation based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information, and predicting the age bracket where the first user is located.
2. The method of claim 1, wherein obtaining the first annotation information of the first user comprises:
acquiring at least one piece of self-defined labeling information of a social account of a first user in a second user address list;
and carrying out accurate labeling duplication elimination operation on the acquired user-defined labeling information to obtain the first labeling information of the first user.
3. The method of claim 2, wherein the performing a de-duplication operation on the obtained customized annotation information to obtain the first annotation information of the first user comprises:
performing word segmentation processing on the obtained custom annotation information to obtain a plurality of labeled words;
according to a preset data cleaning rule, performing data cleaning on the plurality of labeled participles to obtain standard labeled participles corresponding to the first user;
and carrying out duplication removing operation on the standard annotation participle to obtain first annotation information of the first user.
4. The method according to claim 1, wherein before querying age probability distribution information corresponding to each of the age feature tags in a preset age probability distribution information set, the method further comprises:
obtaining sample data, wherein the sample data comprises user-defined labeling information of social accounts of a plurality of sample users in other user address lists and actual age information of the plurality of sample users;
determining an age characteristic label corresponding to each sample user according to the user-defined labeling information;
and generating an age probability distribution information set according to the corresponding relation between the actual age information of all the sample users and the age characteristic labels, wherein the age probability distribution information set comprises the age probability distribution information corresponding to each age characteristic label.
5. The method of claim 4, wherein the determining the age characteristic label corresponding to each of the sample users according to the customized labeling information comprises:
performing word segmentation processing on the custom annotation information corresponding to each sample user to obtain a plurality of labeled words;
according to a preset data cleaning rule, performing data cleaning on the plurality of labeled participles to obtain a standard labeled participle corresponding to each sample user;
carrying out duplication removal operation on the standard labeling participles to obtain first labeling information corresponding to each sample user;
classifying the first labeling information corresponding to each sample user to obtain second labeling information corresponding to each sample user;
and determining the age characteristic label corresponding to each sample user according to the second labeling information corresponding to each sample user based on the mapping relation between the preset second labeling information and the age characteristic label.
6. The method according to claim 3 or 5, wherein the data washing of the plurality of labeled participles according to a preset data washing rule comprises at least one of the following:
aiming at the labeled participles belonging to the family appellation participles, replacing the labeled participles with standard family appellation participles;
aiming at the labeled participles belonging to the professional title participles, replacing the labeled participles with standard professional title participles;
aiming at the labeled participles which do not belong to family appellation participles and professional appellation participles, calculating the importance degree of each labeled participle by adopting a word frequency-inverse document frequency algorithm, and screening out a specified number of labeled participles with the top importance degree as user-defined labeled participles according to the importance degree of each labeled participle.
7. The method according to claim 4, wherein generating an age probability distribution information set according to the correspondence between the actual age information of all the sample users and the age feature labels comprises:
calculating the distribution probability of each age characteristic label corresponding to each age group according to the corresponding relation between the actual age information of all the sample users and the age characteristic labels;
generating age probability distribution information corresponding to each age characteristic label according to the distribution probability of each age characteristic label in each age group;
and generating an age probability distribution information set according to the age probability distribution information corresponding to all the age characteristic labels.
8. The method of claim 1, wherein performing an age bias weighting calculation based on at least one of the age characteristic labels corresponding to the first user and corresponding age probability distribution information to predict an age group of the first user comprises:
determining the distribution probability of each age group corresponding to each age characteristic label based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information;
for each age group, carrying out weighted calculation on the distribution probability of the age group corresponding to each age characteristic label to obtain the prediction probability corresponding to the age group;
and selecting the age group with the highest prediction probability to determine the age group as the predicted age group of the first user.
9. A user age information prediction apparatus, comprising:
the first annotation information acquisition module is used for acquiring first annotation information of a first user;
the second labeling information obtaining module is used for classifying the first labeling information of the first user to obtain second labeling information of the first user;
the age characteristic label determining module is used for determining at least one age characteristic label corresponding to the first user according to the second labeling information;
a probability distribution information query module, configured to query age probability distribution information corresponding to each age feature tag in a preset age probability distribution information set, where the age probability distribution information includes distribution probabilities of the age feature tags corresponding to each age group;
and the age group prediction module is used for carrying out age deviation weighting calculation based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information and predicting the age group of the first user.
10. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor executes the computer program to implement the method according to any of claims 1 to 8.
11. A computer-readable storage medium having computer-readable instructions stored thereon, the computer-readable instructions being executable by a processor to implement the method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010425817.9A CN111639966A (en) | 2020-05-19 | 2020-05-19 | User age information prediction method, device, electronic equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010425817.9A CN111639966A (en) | 2020-05-19 | 2020-05-19 | User age information prediction method, device, electronic equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111639966A true CN111639966A (en) | 2020-09-08 |
Family
ID=72332114
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010425817.9A Pending CN111639966A (en) | 2020-05-19 | 2020-05-19 | User age information prediction method, device, electronic equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111639966A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150213370A1 (en) * | 2014-01-27 | 2015-07-30 | Facebook, Inc. | Label inference in a social network |
CN106651057A (en) * | 2017-01-03 | 2017-05-10 | 有米科技股份有限公司 | Mobile terminal user age prediction method based on installation package sequence table |
CN107918825A (en) * | 2017-11-13 | 2018-04-17 | 珠海金山网络游戏科技有限公司 | A kind of method and apparatus that age of user section is judged based on application installation preference |
CN109376927A (en) * | 2018-10-24 | 2019-02-22 | 阿里巴巴集团控股有限公司 | A kind of age of user prediction technique, device and equipment |
-
2020
- 2020-05-19 CN CN202010425817.9A patent/CN111639966A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150213370A1 (en) * | 2014-01-27 | 2015-07-30 | Facebook, Inc. | Label inference in a social network |
CN106651057A (en) * | 2017-01-03 | 2017-05-10 | 有米科技股份有限公司 | Mobile terminal user age prediction method based on installation package sequence table |
CN107918825A (en) * | 2017-11-13 | 2018-04-17 | 珠海金山网络游戏科技有限公司 | A kind of method and apparatus that age of user section is judged based on application installation preference |
CN109376927A (en) * | 2018-10-24 | 2019-02-22 | 阿里巴巴集团控股有限公司 | A kind of age of user prediction technique, device and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112148889A (en) | Recommendation list generation method and device | |
CN109002443B (en) | Text information classification method and device | |
CN109977366B (en) | Catalog generation method and device | |
JP2017182663A (en) | Content management device, content management method, and program | |
CN108959550B (en) | User focus mining method, device, equipment and computer readable medium | |
CN113535817A (en) | Method and device for generating characteristic broad table and training business processing model | |
CN111652658A (en) | Portrait fusion method, apparatus, electronic device and computer readable storage medium | |
CN108021713B (en) | Document clustering method and device | |
CN112765965A (en) | Text multi-label classification method, device, equipment and storage medium | |
CN111898378A (en) | Industry classification method and device for government and enterprise clients, electronic equipment and storage medium | |
CN111753201A (en) | Information pushing method, device, terminal and medium | |
CN111639966A (en) | User age information prediction method, device, electronic equipment and medium | |
CN108595498B (en) | Question feedback method and device | |
CN112487181A (en) | Keyword determination method and related equipment | |
CN115757799A (en) | Data storage method and system based on artificial intelligence and cloud platform | |
CN112818007B (en) | Data processing method and device and readable storage medium | |
CN111125345B (en) | Data application method and device | |
CN114595332A (en) | Text classification prediction method and device and electronic equipment | |
CN114818686A (en) | Text recommendation method based on artificial intelligence and related equipment | |
CN111061869B (en) | Text classification method for application preference based on TextRank | |
CN111914868A (en) | Model training method, abnormal data detection method and device and electronic equipment | |
CN113656649A (en) | Generation and storage algorithm and system for label portrait data | |
US9471569B1 (en) | Integrating information sources to create context-specific documents | |
CN112732891A (en) | Office course recommendation method and device, electronic equipment and medium | |
CN113836906B (en) | Method, device and server for generating bidding documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20211223 Address after: 200131 Zone E, 9th floor, No.1 Lane 666, zhangheng Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai Applicant after: Shanghai Shangxiang Network Technology Co.,Ltd. Address before: 201306 N2025 room 24, 2 New Town Road, mud town, Pudong New Area, Shanghai Applicant before: SHANGHAI LIANSHANG NETWORK TECHNOLOGY Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200908 |