CN111639966A - User age information prediction method, device, electronic equipment and medium - Google Patents

User age information prediction method, device, electronic equipment and medium Download PDF

Info

Publication number
CN111639966A
CN111639966A CN202010425817.9A CN202010425817A CN111639966A CN 111639966 A CN111639966 A CN 111639966A CN 202010425817 A CN202010425817 A CN 202010425817A CN 111639966 A CN111639966 A CN 111639966A
Authority
CN
China
Prior art keywords
age
user
information
participles
probability distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010425817.9A
Other languages
Chinese (zh)
Inventor
许文龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shangxiang Network Technology Co.,Ltd.
Original Assignee
Shanghai Lianshang Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lianshang Network Technology Co Ltd filed Critical Shanghai Lianshang Network Technology Co Ltd
Priority to CN202010425817.9A priority Critical patent/CN111639966A/en
Publication of CN111639966A publication Critical patent/CN111639966A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Databases & Information Systems (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Human Resources & Organizations (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a user age information prediction method and device, an electronic device and a computer readable storage medium. Wherein the method comprises the following steps: obtaining first annotation information of a first user; classifying the first labeling information of the first user to obtain second labeling information of the first user; determining at least one age characteristic label corresponding to the first user according to the second labeling information; inquiring age probability distribution information corresponding to each age characteristic label in a preset age probability distribution information set; and carrying out age deviation weighting calculation based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information, and predicting the age bracket where the first user is located. According to the scheme, the user does not need to provide age information, and the age information of the user can be accurately predicted.

Description

User age information prediction method, device, electronic equipment and medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for predicting user age information, an electronic device, and a computer-readable storage medium.
Background
With the rapid development and popularization of internet technology, personalized recommendation gradually becomes a mainstream technology for information and service recommendation of information service providers to users, and because the preferences of users at different ages for information are greatly different, age information can become one of the reference factors for personalized recommendation, and more accurate personalized information recommendation content can be provided for the users.
In the prior art, due to the protection of user privacy, most information providers are difficult to acquire the age information of the user unless the user actively provides the information, and information recommendation cannot be performed on the user according to the age information.
Therefore, it is necessary to provide a technical solution for accurately predicting age information of a user without providing the age information by the user.
Disclosure of Invention
The application aims to provide a user age information prediction method and device, an electronic device and a computer readable storage medium.
A first aspect of the present application provides a method for predicting user age information, including:
obtaining first annotation information of a first user;
classifying the first labeling information of the first user to obtain second labeling information of the first user;
determining at least one age characteristic label corresponding to the first user according to the second labeling information;
inquiring age probability distribution information corresponding to each age characteristic label in a preset age probability distribution information set, wherein the age probability distribution information comprises the distribution probability of the age characteristic labels corresponding to each age group;
and carrying out age deviation weighting calculation based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information, and predicting the age bracket where the first user is located.
A second aspect of the present application provides a user age information prediction apparatus, including:
the first annotation information acquisition module is used for acquiring first annotation information of a first user;
the second labeling information obtaining module is used for classifying the first labeling information of the first user to obtain second labeling information of the first user;
the age characteristic label determining module is used for determining at least one age characteristic label corresponding to the first user according to the second labeling information;
a probability distribution information query module, configured to query age probability distribution information corresponding to each age feature tag in a preset age probability distribution information set, where the age probability distribution information includes distribution probabilities of the age feature tags corresponding to each age group;
and the age group prediction module is used for carrying out age deviation weighting calculation based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information and predicting the age group of the first user.
A third aspect of the present application provides an electronic device comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the computer program when executing the computer program to perform the method of the first aspect of the application.
A fourth aspect of the present application provides a computer readable storage medium having computer readable instructions stored thereon which are executable by a processor to implement the method of the first aspect of the present application.
The application provides at least one technical scheme which can at least obtain the following beneficial effects: the method comprises the steps of obtaining first labeling information of a first user, classifying the first labeling information of the first user to obtain second labeling information of the first user, determining at least one age characteristic label corresponding to the first user according to the second labeling information, inquiring age probability distribution information corresponding to the age characteristic labels in a preset age probability distribution information set, wherein the age probability distribution information comprises distribution probabilities of the age characteristic labels corresponding to all age groups, and carrying out age deviation weighted calculation based on the at least one age characteristic label corresponding to the first user and the corresponding age probability distribution information, so that the age group where the first user is located can be predicted. Because the first mark information of first user such as dad, grander etc. can reflect user's age information, consequently, through classifying the operation such as to first mark information to can confirm user's age characteristic label, then utilize age probability distribution information that age characteristic label corresponds respectively, can predict the age bracket that the user was located, thereby do not need the user to provide age information, can accurately predict user's age information.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 illustrates a flow chart of a user age information prediction method provided by some embodiments of the present application;
fig. 2 is a schematic diagram illustrating a user age information prediction apparatus according to some embodiments of the present application;
FIG. 3 illustrates a schematic diagram of an electronic device provided by some embodiments of the present application;
FIG. 4 illustrates a schematic diagram of a computer-readable storage medium provided by some embodiments of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which this application belongs.
In addition, the terms "first" and "second", etc. are used to distinguish different objects, rather than to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
The present application provides a method and an apparatus for predicting age information of a user, an electronic device, and a computer-readable storage medium, which are described below with reference to embodiments.
Referring to fig. 1, which illustrates a flowchart of a user age information prediction method according to some embodiments of the present application, as shown in the figure, the user age information prediction method may include the following steps:
step S101: first annotation information of a first user is obtained.
In some modifications of the embodiments of the present application, step S101 may include:
acquiring at least one piece of self-defined labeling information of a social account of a first user in a second user address list;
and carrying out accurate labeling duplication elimination operation on the acquired user-defined labeling information to obtain the first labeling information of the first user.
The first user and the second user are users having a social relationship with each other, for example, both users are users registered in the same application software and are friends of each other, and correspondingly, the address book may be an address book in the application software, or may refer to a mobile phone address book, and the like, which is not limited in the embodiment of the present application. In the address book, users can mutually perform customized labeling, wherein the customized labeling of one user to another user in the own address book is the customized labeling information, for example: dad, grandpa, king manager-XX company, king captain-XX bureau of the city XX, etc.
The number of the second users may be one or multiple, and when the number of the second users is multiple, the obtained customized labeling information is also multiple, and considering that different users have different customized labeling information for the same user and part of the customized labeling information contains information irrelevant to the age, the obtained customized labeling information needs to be subjected to operations such as deduplication, so that the first labeling information which is more beneficial to determining the user age feature label is obtained, and therefore interference of invalid data and redundant data on a final prediction result is avoided.
On the basis of the foregoing embodiment, in some modified embodiments, the performing a deduplication operation for accurately labeling the obtained custom labeling information to obtain the first labeling information of the first user includes:
performing word segmentation processing on the obtained custom annotation information to obtain a plurality of labeled words;
according to a preset data cleaning rule, performing data cleaning on the plurality of labeled participles to obtain standard labeled participles corresponding to the first user;
and carrying out duplication removing operation on the standard annotation participle to obtain first annotation information of the first user.
The word segmentation processing refers to segmenting a Chinese character sequence into individual words, and considering that part of the custom annotation information is a phrase or a short sentence formed by connecting a plurality of words, therefore, the word segmentation processing needs to be performed on the custom annotation information at first.
After word segmentation processing is performed, a plurality of labeled segmented words are obtained, and then data cleaning can be performed on the obtained labeled segmented words, wherein the data cleaning can include but is not limited to deleting words which are irrelevant to age in the labeled segmented words, converting non-standard words into unified standard words, and the like, so that the subsequent data processing efficiency is improved.
Specifically, in some embodiments, the performing data washing on the plurality of labeled participles according to a preset data washing rule may include at least one of:
aiming at the labeled participles belonging to the family appellation participles, replacing the labeled participles with standard family appellation participles;
aiming at the labeled participles belonging to the professional title participles, replacing the labeled participles with standard professional title participles;
aiming at the labeled participles which do not belong to family appellation participles and professional appellation participles, calculating the importance degree of each labeled participle by adopting a word frequency-inverse document frequency algorithm, and screening out a specified number of labeled participles with the top importance degree as user-defined labeled participles according to the importance degree of each labeled participle.
The standard words such as the standard family title and the standard professional title can be flexibly set according to actual requirements, for example, Dad and the like can be uniformly replaced by the standard family title "Dad", doctors and the like can be uniformly replaced by the standard professional title "doctor", and the like.
In addition, the word frequency-inverse document frequency algorithm, i.e. the TF-IDF algorithm, is a statistical method for evaluating the importance degree of a word to one of a set of files or a corpus, the importance of the word increases in proportion to the number of times it appears in the file, but decreases in inverse proportion to the frequency of it appearing in the corpus, in the implementation of the embodiment of the present application, the set of all labeled participles of the same user can be regarded as a file, the set of all labeled participles of a plurality of users or all users can be regarded as a corpus, then the importance degree of each labeled participle is determined by the TF-IDF algorithm, and a specified number of labeled participles with a higher importance degree are screened out as labeled custom participles, which also belong to standard labeled participles, wherein the specified number can be flexibly set according to actual requirements, the embodiments of the present application are not limited.
It is easy to understand that after the data washing is performed on the labeled participles, partially repeated participles still exist in the obtained standard labeled participles corresponding to the first user, and therefore, the duplication removing operation can be further performed on the standard labeled participles, so that the first labeled information of the first user is obtained.
According to the embodiment, the plurality of pieces of first labeling information which are not repeated and can represent the age information of the first user can be obtained, so that the accurate and concise first labeling information can be utilized subsequently, and the age characteristic label of the user can be accurately determined.
Step S102: and classifying the first labeling information of the first user to obtain second labeling information of the first user.
In order to better determine the age characteristic label corresponding to the first user, the first annotation information is classified, wherein the second annotation information is more closely and directly associated with the age characteristic, for example, the first annotation information such as husband, wife and the like can be classified as married, the first annotation information such as dad, mom and the like can be classified as child, and the first annotation information such as director, president, academia and the like can be classified as high income.
Step S103: and determining at least one age characteristic label corresponding to the first user according to the second labeling information.
Next, at least one age characteristic tag corresponding to the first user may be determined according to the second labeling information, for example, the age characteristic tag of the first user may include: married, child, high income, descendants, etc.
Step S104: and inquiring age probability distribution information corresponding to each age characteristic label in a preset age probability distribution information set, wherein the age probability distribution information comprises the distribution probability of the age characteristic labels corresponding to each age group.
For example, in some modified embodiments of the present application, before step S104, the age probability distribution information may further include:
obtaining sample data, wherein the sample data comprises user-defined labeling information of social accounts of a plurality of sample users in other user address lists and actual age information of the plurality of sample users;
determining an age characteristic label corresponding to each sample user according to the user-defined labeling information;
and generating an age probability distribution information set according to the corresponding relation between the actual age information of all the sample users and the age characteristic labels, wherein the age probability distribution information set comprises the age probability distribution information corresponding to each age characteristic label.
Through the embodiment, the age probability distribution information corresponding to each age characteristic label can be determined according to a large amount of sample data.
On the basis of the foregoing embodiment, in some modified embodiments, the determining, according to the customized labeling information, the age characteristic label corresponding to each sample user includes:
performing word segmentation processing on the custom annotation information corresponding to each sample user to obtain a plurality of labeled words;
according to a preset data cleaning rule, performing data cleaning on the plurality of labeled participles to obtain a standard labeled participle corresponding to each sample user;
carrying out duplication removal operation on the standard labeling participles to obtain first labeling information corresponding to each sample user;
classifying the first labeling information corresponding to each sample user to obtain second labeling information corresponding to each sample user;
and determining the age characteristic label corresponding to each sample user according to the second labeling information corresponding to each sample user based on the mapping relation between the preset second labeling information and the age characteristic label.
This embodiment can be understood by referring to the related description in step S101, and the description thereof is omitted here.
In addition to the foregoing embodiments, in some modified embodiments, the generating an age probability distribution information set according to the correspondence between the actual age information of all the sample users and the age feature labels includes:
calculating the distribution probability of each age characteristic label corresponding to each age group according to the corresponding relation between the actual age information of all the sample users and the age characteristic labels;
generating age probability distribution information corresponding to each age characteristic label according to the distribution probability of each age characteristic label in each age group;
and generating an age probability distribution information set according to the age probability distribution information corresponding to all the age characteristic labels.
On the basis of any of the foregoing embodiments, in some modified embodiments, the performing data cleaning on the plurality of labeled participles according to a preset data cleaning rule includes at least one of:
aiming at the labeled participles belonging to the family appellation participles, replacing the labeled participles with standard family appellation participles;
aiming at the labeled participles belonging to the professional title participles, replacing the labeled participles with standard professional title participles;
aiming at the labeled participles which do not belong to family appellation participles and professional appellation participles, calculating the importance degree of each labeled participle by adopting a word frequency-inverse document frequency algorithm, and screening out a specified number of labeled participles with the top importance degree as user-defined labeled participles according to the importance degree of each labeled participle.
This embodiment can be understood by referring to the related description in step S101, and the description thereof is omitted here.
Step S105: and carrying out age deviation weighting calculation based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information, and predicting the age bracket where the first user is located.
Specifically, in some embodiments, the step S105 may include:
determining the distribution probability of each age group corresponding to each age characteristic label based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information;
for each age group, carrying out weighted calculation on the distribution probability of the age group corresponding to each age characteristic label to obtain the prediction probability corresponding to the age group;
and selecting the age group with the highest prediction probability to determine the age group as the predicted age group of the first user.
In this embodiment, according to the age probability distribution information corresponding to each age feature tag, the distribution probability corresponding to each age feature tag of each age group can be determined, and in consideration of different degrees of representation of ages by different age feature tags, corresponding weights can be set for different age feature tags, and then, for each age group, the distribution probability corresponding to each age feature tag of the age group is weighted and calculated according to the weights, so as to obtain the prediction probability corresponding to the age group, and then, the age group with the highest prediction probability is selected, so that the predicted age group in which the first user is located can be determined.
The user age information prediction method provided by the embodiment of the application can at least obtain the following beneficial effects: the method comprises the steps of obtaining first labeling information of a first user, classifying the first labeling information of the first user to obtain second labeling information of the first user, determining at least one age characteristic label corresponding to the first user according to the second labeling information, inquiring age probability distribution information corresponding to the age characteristic labels in a preset age probability distribution information set, wherein the age probability distribution information comprises distribution probabilities of the age characteristic labels corresponding to all age groups, and carrying out age deviation weighted calculation based on the at least one age characteristic label corresponding to the first user and the corresponding age probability distribution information, so that the age group where the first user is located can be predicted. Because the first mark information of first user such as dad, grander etc. can reflect user's age information, consequently, through classifying the operation such as to first mark information to can confirm user's age characteristic label, then utilize age probability distribution information that age characteristic label corresponds respectively, can predict the age bracket that the user was located, thereby do not need the user to provide age information, can accurately predict user's age information.
The following description is given in conjunction with specific examples, which may be understood with reference to the description of any of the above embodiments, which may also be understood with reference to the following specific examples, which in some specific examples may include the following steps:
step A, label selection: through integrating 382 career labels and 504 family appellations which are manually sorted, and mining 500 high-frequency word segments of address book labeling content Top based on TF-IDF algorithm, removing meaningless words and removing duplication, 1010 career and family appellation labels are obtained; mapping 113 family title tags (dad, grand, daughter, etc.) to married tags; mapping 51 family appellation labels (milk, Yue father, family mother and the like) to child labels; mapping professional labels such as board of directors, bank leaders and hospital yardage to top1 income labels, and mapping professional labels such as boss, manager and master without top1 income labels to top10 income labels; in addition, millions of names (except for those containing specific keywords, such as cat, dog, bad, head, store, and the like) are extracted based on the Jieba segmentation and are taken as name labels.
B, counting label distribution: and counting the distribution of the tags (namely, the age characteristic tags) in each group of age group numbers in the step A, namely, the probability distribution information of each tag in each age group.
Step C, prediction: based on the age characteristic label and the age probability distribution information of each number, the age group deviation probability is calculated in a weighting mode, and then the age group can be predicted and judged.
Step D, evaluation: and evaluating the accuracy rate based on the third party number age group standard data.
According to the user age information prediction method provided by the embodiment of the application, based on address book data, according to relatively objective user number labeling information (namely, self-defined labeling information) in an address book, data such as labels (age characteristic labels including title, occupation, name, marriage and education, income and the like) associated with each labeled number are mined, probability distribution of each label in 6 age groups is counted, and prediction of the 6 age groups of the number can be carried out. Through mining and calculating collected billions of address book data, more than 10 billion number age data with the accuracy rate of 45 percent (the accuracy rate of random prediction is 1/6-17 percent) are predicted.
In the foregoing embodiment, a method for predicting user age information is provided, and correspondingly, an apparatus for predicting user age information is also provided. The user age information prediction device provided by the embodiment of the application can implement the user age information prediction method, and the user age information prediction device can be implemented in a software, hardware or software and hardware combined mode. For example, the user age information prediction apparatus may include integrated or separate functional modules or units to perform the corresponding steps of the above-described methods. Please refer to fig. 2, which illustrates a schematic diagram of a user age information prediction apparatus according to some embodiments of the present application. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.
As shown in fig. 2, the user age information prediction apparatus 10 may include:
a first annotation information obtaining module 101, configured to obtain first annotation information of a first user;
a second annotation information obtaining module 102, configured to perform a classification operation on the first annotation information of the first user to obtain second annotation information of the first user;
an age characteristic tag determining module 103, configured to determine, according to the second labeling information, at least one age characteristic tag corresponding to the first user;
a probability distribution information query module 104, configured to query age probability distribution information corresponding to each age feature tag in a preset age probability distribution information set, where the age probability distribution information includes distribution probabilities of the age feature tags corresponding to each age group;
an age group prediction module 105, configured to perform age bias weighting calculation based on at least one age feature tag corresponding to the first user and corresponding age probability distribution information, and predict an age group in which the first user is located.
In some variations of the embodiments of the present application, the first annotation information obtaining module 101 includes:
the user-defined labeling information acquisition unit is used for acquiring at least one piece of user-defined labeling information of the social account of the first user in the address book of the second user;
and the duplication elimination processing unit is used for carrying out duplication elimination operation of accurate annotation on the acquired user-defined annotation information to obtain the first annotation information of the first user.
In some variations of the embodiments of the present application, the deduplication processing unit includes:
the word segmentation processing subunit is used for carrying out word segmentation processing on the acquired user-defined tagging information to obtain a plurality of tagged words;
the data cleaning subunit is used for performing data cleaning on the plurality of labeled participles according to a preset data cleaning rule to obtain standard labeled participles corresponding to the first user;
and carrying out duplication removing operation on the standard annotation participle to obtain first annotation information of the first user.
In some variations of the embodiments of the present application, the apparatus 10 further comprises:
the sample data acquisition module is used for acquiring sample data, wherein the sample data comprises user-defined labeling information of social accounts of a plurality of sample users in other user address lists and actual age information of the plurality of sample users;
the sample age tag determining module is used for determining an age characteristic tag corresponding to each sample user according to the user-defined labeling information;
and the age distribution information generating module is used for generating an age probability distribution information set according to the corresponding relation between the actual age information of all the sample users and the age characteristic labels, and the age probability information set comprises the age probability distribution information corresponding to each age characteristic label.
In some variations of embodiments of the present application, the sample age label determining module includes:
the word segmentation processing unit is used for carrying out word segmentation processing on the custom marking information corresponding to each sample user to obtain a plurality of marked words;
the data cleaning unit is used for performing data cleaning on the plurality of labeled participles according to a preset data cleaning rule to obtain a standard labeled participle corresponding to each sample user;
the word segmentation and duplication removal unit is used for performing duplication removal operation on the standard labeled words to obtain first labeling information corresponding to each sample user;
the classification operation unit is used for performing classification operation on the first marking information corresponding to each sample user to obtain second marking information corresponding to each sample user;
and the sample age label determining unit is used for determining the age characteristic label corresponding to each sample user according to the second labeling information corresponding to each sample user based on the mapping relation between the preset second labeling information and the age characteristic label.
In some variations of embodiments of the present application, the data cleansing subunit or the data cleansing unit includes at least one of:
the family title replacing subunit is used for replacing the labeled participles belonging to the family title participles with standard family title participles;
the system comprises a professional title substitution subunit, a standard professional title participle substitution subunit and a semantic meaning substitution subunit, wherein the professional title substitution subunit is used for substituting the labeled participle into the standard professional title participle aiming at the labeled participle belonging to the professional title participle;
and the other appellation replacing subunit is used for calculating the importance of each labeled participle by adopting a word frequency-inverse document frequency algorithm aiming at labeled participles which do not belong to family appellation participles and professional appellation participles, and screening out a specified number of labeled participles with the top importance as user-defined labeled participles according to the importance of each labeled participle.
In some modifications of the embodiments of the present application, the age distribution information generation module includes:
the distribution probability calculation unit is used for calculating the distribution probability of each age characteristic label corresponding to each age group according to the corresponding relation between the actual age information of all the sample users and the age characteristic labels;
an age distribution information generating unit, configured to generate age probability distribution information corresponding to each of the age feature tags according to a distribution probability of each of the age feature tags corresponding to each of age groups;
and the distribution information set generating unit is used for generating an age probability distribution information set according to the age probability distribution information corresponding to all the age characteristic labels.
In some variations of the embodiments of the present application, the age group prediction module 105 includes:
a distribution probability determining unit, configured to determine, based on at least one of the age feature tags corresponding to the first user and corresponding age probability distribution information, the distribution probability of each age group corresponding to each of the age feature tags;
the weighting calculation unit is used for carrying out weighting calculation on the distribution probability of the age group corresponding to each age characteristic label to obtain the prediction probability corresponding to the age group;
and the probability selecting unit is used for selecting the age group with the highest prediction probability to determine as the predicted age group in which the first user is positioned.
The user age information prediction apparatus 10 according to the embodiment of the present application has the same advantageous effects as the user age information prediction method according to the foregoing embodiment of the present application.
The embodiment of the present application further provides an electronic device corresponding to the user age information prediction method provided in the foregoing embodiment, where the electronic device may be an electronic device for a server, such as a server, and includes an independent server and a distributed server cluster, so as to execute the user age information prediction method.
Please refer to fig. 3, which illustrates a schematic diagram of an electronic device according to some embodiments of the present application. As shown in fig. 3, the electronic device 20 may include: the system comprises a processor 200, a memory 201, a bus 202 and a communication interface 203, wherein the processor 200, the communication interface 203 and the memory 201 are connected through the bus 202; the memory 201 stores a computer program that can be executed on the processor 200, and the processor 200 executes the user age information prediction method provided in any of the foregoing embodiments when executing the computer program.
The Memory 201 may include a high-speed Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 203 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
Bus 202 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 201 is used for storing a program, and the processor 200 executes the program after receiving an execution instruction, and the user age information prediction method disclosed in any of the foregoing embodiments of the present application may be applied to the processor 200, or implemented by the processor 200.
The processor 200 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 200. The Processor 200 may be a general-purpose Processor, and may include a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201 and completes the steps of the method in combination with the hardware thereof.
The electronic device provided by the embodiment of the application and the user age information prediction method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.
Referring to fig. 4, the computer readable storage medium is an optical disc 30, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program executes the method for predicting the age information of the user according to any of the embodiments.
It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.
The computer-readable storage medium provided by the above-mentioned embodiment of the present application and the user age information prediction method provided by the embodiment of the present application have the same beneficial effects as the method adopted, run or implemented by the application program stored in the computer-readable storage medium.
It should be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present disclosure, and the present disclosure should be construed as being covered by the claims and the specification.

Claims (11)

1. A user age information prediction method is characterized by comprising the following steps:
obtaining first annotation information of a first user;
classifying the first labeling information of the first user to obtain second labeling information of the first user;
determining at least one age characteristic label corresponding to the first user according to the second labeling information;
inquiring age probability distribution information corresponding to each age characteristic label in a preset age probability distribution information set, wherein the age probability distribution information comprises the distribution probability of the age characteristic labels corresponding to each age group;
and carrying out age deviation weighting calculation based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information, and predicting the age bracket where the first user is located.
2. The method of claim 1, wherein obtaining the first annotation information of the first user comprises:
acquiring at least one piece of self-defined labeling information of a social account of a first user in a second user address list;
and carrying out accurate labeling duplication elimination operation on the acquired user-defined labeling information to obtain the first labeling information of the first user.
3. The method of claim 2, wherein the performing a de-duplication operation on the obtained customized annotation information to obtain the first annotation information of the first user comprises:
performing word segmentation processing on the obtained custom annotation information to obtain a plurality of labeled words;
according to a preset data cleaning rule, performing data cleaning on the plurality of labeled participles to obtain standard labeled participles corresponding to the first user;
and carrying out duplication removing operation on the standard annotation participle to obtain first annotation information of the first user.
4. The method according to claim 1, wherein before querying age probability distribution information corresponding to each of the age feature tags in a preset age probability distribution information set, the method further comprises:
obtaining sample data, wherein the sample data comprises user-defined labeling information of social accounts of a plurality of sample users in other user address lists and actual age information of the plurality of sample users;
determining an age characteristic label corresponding to each sample user according to the user-defined labeling information;
and generating an age probability distribution information set according to the corresponding relation between the actual age information of all the sample users and the age characteristic labels, wherein the age probability distribution information set comprises the age probability distribution information corresponding to each age characteristic label.
5. The method of claim 4, wherein the determining the age characteristic label corresponding to each of the sample users according to the customized labeling information comprises:
performing word segmentation processing on the custom annotation information corresponding to each sample user to obtain a plurality of labeled words;
according to a preset data cleaning rule, performing data cleaning on the plurality of labeled participles to obtain a standard labeled participle corresponding to each sample user;
carrying out duplication removal operation on the standard labeling participles to obtain first labeling information corresponding to each sample user;
classifying the first labeling information corresponding to each sample user to obtain second labeling information corresponding to each sample user;
and determining the age characteristic label corresponding to each sample user according to the second labeling information corresponding to each sample user based on the mapping relation between the preset second labeling information and the age characteristic label.
6. The method according to claim 3 or 5, wherein the data washing of the plurality of labeled participles according to a preset data washing rule comprises at least one of the following:
aiming at the labeled participles belonging to the family appellation participles, replacing the labeled participles with standard family appellation participles;
aiming at the labeled participles belonging to the professional title participles, replacing the labeled participles with standard professional title participles;
aiming at the labeled participles which do not belong to family appellation participles and professional appellation participles, calculating the importance degree of each labeled participle by adopting a word frequency-inverse document frequency algorithm, and screening out a specified number of labeled participles with the top importance degree as user-defined labeled participles according to the importance degree of each labeled participle.
7. The method according to claim 4, wherein generating an age probability distribution information set according to the correspondence between the actual age information of all the sample users and the age feature labels comprises:
calculating the distribution probability of each age characteristic label corresponding to each age group according to the corresponding relation between the actual age information of all the sample users and the age characteristic labels;
generating age probability distribution information corresponding to each age characteristic label according to the distribution probability of each age characteristic label in each age group;
and generating an age probability distribution information set according to the age probability distribution information corresponding to all the age characteristic labels.
8. The method of claim 1, wherein performing an age bias weighting calculation based on at least one of the age characteristic labels corresponding to the first user and corresponding age probability distribution information to predict an age group of the first user comprises:
determining the distribution probability of each age group corresponding to each age characteristic label based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information;
for each age group, carrying out weighted calculation on the distribution probability of the age group corresponding to each age characteristic label to obtain the prediction probability corresponding to the age group;
and selecting the age group with the highest prediction probability to determine the age group as the predicted age group of the first user.
9. A user age information prediction apparatus, comprising:
the first annotation information acquisition module is used for acquiring first annotation information of a first user;
the second labeling information obtaining module is used for classifying the first labeling information of the first user to obtain second labeling information of the first user;
the age characteristic label determining module is used for determining at least one age characteristic label corresponding to the first user according to the second labeling information;
a probability distribution information query module, configured to query age probability distribution information corresponding to each age feature tag in a preset age probability distribution information set, where the age probability distribution information includes distribution probabilities of the age feature tags corresponding to each age group;
and the age group prediction module is used for carrying out age deviation weighting calculation based on at least one age characteristic label corresponding to the first user and corresponding age probability distribution information and predicting the age group of the first user.
10. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor executes the computer program to implement the method according to any of claims 1 to 8.
11. A computer-readable storage medium having computer-readable instructions stored thereon, the computer-readable instructions being executable by a processor to implement the method of any one of claims 1 to 8.
CN202010425817.9A 2020-05-19 2020-05-19 User age information prediction method, device, electronic equipment and medium Pending CN111639966A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010425817.9A CN111639966A (en) 2020-05-19 2020-05-19 User age information prediction method, device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010425817.9A CN111639966A (en) 2020-05-19 2020-05-19 User age information prediction method, device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN111639966A true CN111639966A (en) 2020-09-08

Family

ID=72332114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010425817.9A Pending CN111639966A (en) 2020-05-19 2020-05-19 User age information prediction method, device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN111639966A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150213370A1 (en) * 2014-01-27 2015-07-30 Facebook, Inc. Label inference in a social network
CN106651057A (en) * 2017-01-03 2017-05-10 有米科技股份有限公司 Mobile terminal user age prediction method based on installation package sequence table
CN107918825A (en) * 2017-11-13 2018-04-17 珠海金山网络游戏科技有限公司 A kind of method and apparatus that age of user section is judged based on application installation preference
CN109376927A (en) * 2018-10-24 2019-02-22 阿里巴巴集团控股有限公司 A kind of age of user prediction technique, device and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150213370A1 (en) * 2014-01-27 2015-07-30 Facebook, Inc. Label inference in a social network
CN106651057A (en) * 2017-01-03 2017-05-10 有米科技股份有限公司 Mobile terminal user age prediction method based on installation package sequence table
CN107918825A (en) * 2017-11-13 2018-04-17 珠海金山网络游戏科技有限公司 A kind of method and apparatus that age of user section is judged based on application installation preference
CN109376927A (en) * 2018-10-24 2019-02-22 阿里巴巴集团控股有限公司 A kind of age of user prediction technique, device and equipment

Similar Documents

Publication Publication Date Title
CN112148889A (en) Recommendation list generation method and device
CN109002443B (en) Text information classification method and device
CN109977366B (en) Catalog generation method and device
JP2017182663A (en) Content management device, content management method, and program
CN108959550B (en) User focus mining method, device, equipment and computer readable medium
CN113535817A (en) Method and device for generating characteristic broad table and training business processing model
CN111652658A (en) Portrait fusion method, apparatus, electronic device and computer readable storage medium
CN108021713B (en) Document clustering method and device
CN112765965A (en) Text multi-label classification method, device, equipment and storage medium
CN111898378A (en) Industry classification method and device for government and enterprise clients, electronic equipment and storage medium
CN111753201A (en) Information pushing method, device, terminal and medium
CN111639966A (en) User age information prediction method, device, electronic equipment and medium
CN108595498B (en) Question feedback method and device
CN112487181A (en) Keyword determination method and related equipment
CN115757799A (en) Data storage method and system based on artificial intelligence and cloud platform
CN112818007B (en) Data processing method and device and readable storage medium
CN111125345B (en) Data application method and device
CN114595332A (en) Text classification prediction method and device and electronic equipment
CN114818686A (en) Text recommendation method based on artificial intelligence and related equipment
CN111061869B (en) Text classification method for application preference based on TextRank
CN111914868A (en) Model training method, abnormal data detection method and device and electronic equipment
CN113656649A (en) Generation and storage algorithm and system for label portrait data
US9471569B1 (en) Integrating information sources to create context-specific documents
CN112732891A (en) Office course recommendation method and device, electronic equipment and medium
CN113836906B (en) Method, device and server for generating bidding documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211223

Address after: 200131 Zone E, 9th floor, No.1 Lane 666, zhangheng Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Applicant after: Shanghai Shangxiang Network Technology Co.,Ltd.

Address before: 201306 N2025 room 24, 2 New Town Road, mud town, Pudong New Area, Shanghai

Applicant before: SHANGHAI LIANSHANG NETWORK TECHNOLOGY Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200908