CN103218355B - A kind of method and apparatus generating label for user - Google Patents

A kind of method and apparatus generating label for user Download PDF

Info

Publication number
CN103218355B
CN103218355B CN201210015741.8A CN201210015741A CN103218355B CN 103218355 B CN103218355 B CN 103218355B CN 201210015741 A CN201210015741 A CN 201210015741A CN 103218355 B CN103218355 B CN 103218355B
Authority
CN
China
Prior art keywords
user
key word
label
information
operation behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210015741.8A
Other languages
Chinese (zh)
Other versions
CN103218355A (en
Inventor
席晓鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201210015741.8A priority Critical patent/CN103218355B/en
Publication of CN103218355A publication Critical patent/CN103218355A/en
Application granted granted Critical
Publication of CN103218355B publication Critical patent/CN103218355B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method and apparatus generating label for user: for any user X, obtain the operation behavior information after its logging in network in real time, and after often getting an operation behavior information, then carry out once following process: extract the key word in this operation behavior information, and preserve;Satisfactory key word is selected, as the label of user X from all key words preserved.Apply scheme of the present invention, it is possible to get the customized information of user conveniently and efficiently.

Description

A kind of method and apparatus generating label for user
Technical field
The present invention relates to network technology, particularly to a kind of method and apparatus generating label for user.
Background technology
In prior art, can to generate label (Tag) for article according to the key word extracted from article, So that reader can recognize the content etc. of article conveniently and efficiently.
Correspondingly, it is also desirable to label can be generated for user, in order to get the individual of user conveniently and efficiently Property information, thus preferably for its provide service, the such as label according to user be its push may sense emerging The information etc. of interest.
But prior art does not also have a kind of mode that can generate label for user.
Summary of the invention
In view of this, the present invention provides a kind of method and apparatus generating label for user such that it is able to convenient fast Get the customized information of user promptly.
For reaching above-mentioned purpose, the technical scheme is that and be achieved in that:
A kind of method generating label for user, including:
For any user X, obtain the operation behavior information after its logging in network in real time, and ought often get Article one, after operation behavior information, then carry out processing below once:
Extract the key word in this operation behavior information, and preserve;
Satisfactory key word is selected, as the label of user X from all key words preserved.
A kind of device generating label for user, including:
Acquisition module, for for any user X, obtaining the operation behavior information after its logging in network in real time, And each bar operation behavior information got is sent to processing module;
Described processing module includes the first processing unit and the second processing unit, wherein, described first processing unit For, after often receiving an operation behavior information, extract the key word in this operation behavior information, and Preserve;Described second processing unit is used for, and selects satisfactory from all key words preserved Key word, as the label of user X;For selected each key word Y, carry out following process respectively: Calculate the similarity of key word Y and each classification information in classification based training set set in advance respectively, will Classification information corresponding to the maximum result of calculation of value is as classification information corresponding for key word Y;By selected Each key word and respectively corresponding classification information as the label of user X.
Visible, use scheme of the present invention, by generating label for user, it is possible to get conveniently and efficiently The customized information of user such that it is able to preferably provide the user service;And, scheme of the present invention Implement simple and convenient, it is simple to universal and popularization.
Accompanying drawing explanation
Fig. 1 be the present invention be the flow chart that user generates the embodiment of the method for label.
Fig. 2 is the schematic diagram that user X subscribes to a certain information.
Fig. 3 is the schematic diagram that user X shares a certain information.
Fig. 4 be the present invention be the composition structural representation that user generates the device embodiment of label.
Detailed description of the invention
For problems of the prior art, the present invention proposes a kind of side generating label for user Case.
For making technical scheme clearer, clear, develop simultaneously embodiment referring to the drawings, right Scheme of the present invention is described in further detail.
Fig. 1 be the present invention be the flow chart that user generates the embodiment of the method for label.As it is shown in figure 1, include with Lower step:
Step 11: for any user X, obtain the operation behavior information after its logging in network in real time.
Convenient for statement, represent any user with user X, for any user, all can be according to the present invention Described mode processes.
After user's X logging in network, various operation behavior can be carried out, as click have subscribed a certain information, shares A certain information, or paid close attention to a certain information etc., in actual applications, can the operation of user in real X Behavioural information.
Fig. 2 is the schematic diagram that user X subscribes to a certain information;Fig. 3 is the signal that user X shares a certain information Figure.As shown in figures 2-3, user X can be by clicking on " subscription this column " and " being shared with good friend " button Subscribe to and share corresponding information.
Step 12: after often getting an operation behavior information, then carry out once following process: extracting should Key word in operation behavior information, and preserve;Select from all key words preserved and conform to The key word asked, as the label of user X.
In the present invention, often get an operation behavior information, then generate a secondary label, and utilize newly-generated The label generated before is updated by label.
Specifically, in this step, for every the operation behavior information got, can locate the most as follows Reason:
1) extract the key word in this operation behavior information, and preserve.
Concrete used word frequency (TF, Term Frequency) * reverse file word frequency (IDF, Inverse Document Frequency) keyword extraction mode.
Wherein, TF refers to that, in the file that portion is given, some given word occurs in this document Number of times, and can be normalized according to file size;IDF is used for weighing the general importance of a word, certain One IDF giving word can be taken the logarithm divided by the business of the number of files comprising this word by total files again and obtain; Specific in the present embodiment, an information can regard a file as, extracts the word that TF*IDF score value is higher Language is as key word.
For this reason, it may be necessary to preserve every the operation behavior information got, equally, need to preserve every extracted The key word of operation behavior information.
In actual applications, it would however also be possible to employ other keyword extraction mode, such as, based on N-gram (N-Gram) the keyword extraction mode etc. of Information Statistics.
2) from all key words preserved, satisfactory key word is selected, as the label of user X.
Specifically, the weight of each key word preserved can be determined respectively, and according to descending suitable of weight Sequence is ranked up, and will be in the key word label as user X of top N after sequence, and N is more than 1 Positive integer.
As it was previously stated, for every the operation behavior information preserved, all corresponding its key word can be preserved, then, Its weight can be determined respectively for each key word preserved, and enters according to the order that weight is descending Row sequence, the key word label as user X that then will be in top N after sequence.The concrete value of N Can be decided according to the actual requirements, can be such as 3.
How to determine that the weight of each key word can be decided according to the actual requirements, such as equally: corresponding operation behavior Between acquisition time and the current time of information, the duration at interval is the longest, and corresponding weight is the least;Illustrate: Assume that between acquisition time and the current time of corresponding operation behavior information, the duration at interval is within 5 days, Then its weight is 10, if within 5~10 days, is then 9, the like;On this basis, also may be used Further combined with other factors, such as one corresponding two the operation behavior information of key word (operate from the two Behavioural information has all extracted this key word, when sequence, can be using identical key word as a key Word processes), weight corresponding to one of them operation behavior information is 10, and another is 9, then this pass The final weight of keyword i.e. can be identified as 19;Certainly, may also be combined with other factors to determine the power of each key word Weight, repeats the most one by one.
Or, it would however also be possible to employ other label generating mode, such as: determining each key preserved respectively After the weight of word, weight is more than the key word label as user X of predetermined threshold, described threshold value Concrete value can be decided according to the actual requirements.
In the label of user X in addition to including selected key word, also can farther include following information it One or combination in any: classification information that selected each key word is the most corresponding, the time (one of generation label Determine can reflect in degree the active degree etc. of user X), normalizing that selected each key word is the most corresponding Change weighted value.
Wherein, standard scores normalization mode can be used to obtain the normalized weight value of each key word, be implemented as Prior art, can refer to the realization of college entrance examination standard scores.
It addition, a classification based training set can be preset, including various classification information, such as ecommerce Digital product class, cotton dress class in winter, spring and autumn unlined garment class etc..Correspondingly, for selected each key Word Y, can carry out following process: calculate respectively in key word Y and classification based training set set in advance respectively The similarity of each classification information, using classification information corresponding for result of calculation that value is maximum as key word Y Corresponding classification information.
Can determine that mode, similarity based on body determine mode by morphology similarity, or based on corpus Similarity determines that mode etc. determines the similarity between each key word and each classification information, implements It is similarly prior art.
Illustration: assume that a key word, for " Nokia ", finds itself and " ecommerce number through calculating Product class " similarity of this classification information is maximum, then then by " ecommerce digital product class " this Classification information is as classification information corresponding to " Nokia " this key word.
After obtaining the label of user X, it is possible to provide for the real-time query service of the label of user X, and can Label according to user X to provide corresponding service for user X, may be interested as pushed for user X Information etc..
Illustrate: assume that the label of user X includes " Nokia " this key word, then when it is stepped on After recording a shopping website, can be that it pushes the product information relevant to " Nokia ", further, since " promise Ji Ya " corresponding classification information is ecommerce digital product class, therefore, can push other electricity to it simultaneously The relevant information of son commercial affairs digital product.
It addition, the incidence relation between different terms also can be pre-build, so, when in the label of user X When comprising a certain key word, can be while pushing the information relevant to this key word to user X, will be with this The relevant information of the word that key word is associated also is pushed to user X.
Owing to the label of user X is real-time update, therefore, it is possible to hold the demand of user X in time, thus Current most interested information is pushed for it.
In actual applications, it is possible that situations below: user X is warp in certain certain time before The a certain key word of the normally off note, such as the relevant information of Nokia, then, a plurality of relevant operation will be got Behavioural information, so, when being ranked up key word, although " Nokia " this key word is recently The operation behavior information got does not occurs, but often occurs in operation behavior information before, this Sample, the final weight of this key word is likely to can be relatively big, thus be in top N after sequence, correspondingly, after Continue and will push the information relevant to Nokia for user X, but actually user X has been not required to this kind of Information, thus it is inaccurate to cause pushing result.
For overcoming the problems referred to above, can often after scheduled duration, then by acquisition time of being preserved with time current Between between, the duration at interval is more than the operation behavior information deletion of scheduled duration, and for remaining operation behavior Information re-starts keyword extraction, preserves and select, using the satisfactory key word selected as user X Label.The concrete value of described scheduled duration can be decided according to the actual requirements, and can be such as 1 month.
After having pushed the information relevant to its label for user X, can be according to the user X information to being pushed Interest level, update user X label.
Such as, it is that after user X has pushed an information, user X is also according to a key word in label Do not click on this information of reading, then then the normalized weight value of key word corresponding for this information can be reduced, The most directly delete this key word, whereas if user X clicks on has read this information, then can be by this information The normalized weight value of corresponding key word increases, and preferentially enters according to the key word that normalized weight value is bigger Row information pushing.
So far, the introduction about the inventive method embodiment is i.e. completed.
Based on above-mentioned introduction, Fig. 4 be the present invention be the composition structural representation that user generates the device embodiment of label Figure.As shown in Figure 4, including:
Acquisition module, for for any user X, obtaining the operation behavior information after its logging in network in real time, And each bar operation behavior information got is sent to processing module;
Processing module, for after often receiving an operation behavior information, then carries out once following process: carry Take the key word in this operation behavior information, and preserve;Symbol is selected from all key words preserved Close the key word required, as the label of user X.
Wherein, processing module may particularly include:
First processing unit, for extracting the key word in the operation behavior information received, and preserves;
Second processing unit, for determining the weight of each key word preserved respectively, and according to weight by greatly to Little order is ranked up, and will be in the key word label as user X of top N after sequence, and N is big In the positive integer of 1;Or, determine the weight of each key word preserved respectively, by weight more than predetermined threshold The key word of value is as the label of user X.
Second processing unit can be further used for, and for selected each key word Y, carries out following place respectively Reason: calculate the similarity of key word Y and each classification information in classification based training set set in advance respectively, Using classification information corresponding for the result of calculation that value is maximum as classification information corresponding for key word Y;By selected Each key word of going out and the most corresponding classification information are as the label of user X.
Above-mentioned label can farther include one of following information or all: generate time of label, selected The normalized weight value that each key word is the most corresponding.
It addition, the first processing unit can be further used for, preserve every the operation behavior information received;
Correspondingly, the second processing unit can be further used for, and often after scheduled duration, then will be preserved Between acquisition time and current time, the duration at interval is more than the operation behavior information deletion of scheduled duration, and pin Remaining operation behavior information is re-started keyword extraction, preserves and select, by meeting the requirements of selecting Key word as the label of user X.
Second processing unit can be further used for, and pushes the information relevant to its label, and root for user X According to the interest level of the user X information to being pushed, update the label of user X.
The specific works flow process of Fig. 4 shown device embodiment refer to saying accordingly in embodiment of the method shown in Fig. 1 Bright, here is omitted.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all at this Within the spirit of invention and principle, any modification, equivalent substitution and improvement etc. done, should be included in Within the scope of protection of the invention.

Claims (10)

1. the method generating label for user, it is characterised in that including:
For any user X, obtain the operation behavior information after its logging in network in real time, and ought often get Article one, after operation behavior information, then carry out processing below once:
Extract the key word in this operation behavior information, and preserve;
Satisfactory key word is selected, as the label of user X from all key words preserved;
Wherein, described in select satisfactory key word after, farther include:
For selected each key word Y, carry out following process respectively:
Calculate the similarity of key word Y and each classification information in classification based training set set in advance respectively, Using classification information corresponding for the result of calculation that value is maximum as classification information corresponding for key word Y;
Using selected each key word and respectively corresponding classification information as the label of user X.
Method the most according to claim 1, it is characterised in that described from all key words preserved In select satisfactory key word and include:
Determine the weight of each key word preserved respectively, and be ranked up according to the order that weight is descending, To be in the key word label as user X of top N after sequence, N is the positive integer more than 1;
Or, determine the weight of each key word preserved respectively, weight made more than the key word of predetermined threshold Label for user X.
Method the most according to claim 2, it is characterised in that described label farther includes following One of information or whole: generate the normalization power of the time of label, selected each key word correspondence respectively Weight values.
Method the most according to claim 1, it is characterised in that the method farther includes:
After often getting an operation behavior information, then preserve this operation behavior information;
Further, often after scheduled duration, then it is spaced between the acquisition time and the current time that will be preserved Duration is more than the operation behavior information deletion of scheduled duration, and re-starts for remaining operation behavior information Keyword extraction, preserve and select, using the satisfactory key word selected as the label of user X.
5. according to the method described in claim 1 or 4, it is characterised in that the method farther includes: for User X pushes the information relevant to its label, and according to the interest level of the user X information to being pushed, Update the label of user X.
6. the device generating label for user, it is characterised in that including:
Acquisition module, for for any user X, obtaining the operation behavior information after its logging in network in real time, And each bar operation behavior information got is sent to processing module;
Described processing module includes the first processing unit and the second processing unit, wherein,
Described first processing unit is used for, and after often receiving an operation behavior information, extracts this operation behavior Key word in information, and preserve;
Described second processing unit is used for, and selects satisfactory key word from all key words preserved, Label as user X;For selected each key word Y, carry out following process respectively: count respectively Calculate the similarity of each classification information in key word Y and classification based training set set in advance, by value Classification information corresponding to big result of calculation is as classification information corresponding for key word Y;By selected each The classification information of key word and respectively correspondence is as the label of user X.
Device the most according to claim 6, it is characterised in that described second processing unit is used for, point Do not determine the weight of each key word preserved, and be ranked up according to the order that weight is descending, will row Being in the key word label as user X of top N after sequence, N is the positive integer more than 1;Or, point Do not determine the weight of each key word preserved, weight is more than the key word of predetermined threshold as user X's Label.
Device the most according to claim 7, it is characterised in that described label farther includes following One of information or whole: generate the normalization power of the time of label, selected each key word correspondence respectively Weight values.
Device the most according to claim 7, it is characterised in that
Described first processing unit is further used for, and preserves every the operation behavior information received;
Described second processing unit is further used for, often after scheduled duration, then during the acquisition that will be preserved Between and current time between the duration at interval more than the operation behavior information deletion of scheduled duration, and for residue Operation behavior information re-start keyword extraction, preserve and select, the satisfactory key that will select Word is as the label of user X.
Device the most according to claim 7, it is characterised in that described second processing unit is used further In, push the information relevant to its label for user X, and the sense according to the user X information to being pushed is emerging Interest degree, updates the label of user X.
CN201210015741.8A 2012-01-18 2012-01-18 A kind of method and apparatus generating label for user Active CN103218355B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210015741.8A CN103218355B (en) 2012-01-18 2012-01-18 A kind of method and apparatus generating label for user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210015741.8A CN103218355B (en) 2012-01-18 2012-01-18 A kind of method and apparatus generating label for user

Publications (2)

Publication Number Publication Date
CN103218355A CN103218355A (en) 2013-07-24
CN103218355B true CN103218355B (en) 2016-08-31

Family

ID=48816159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210015741.8A Active CN103218355B (en) 2012-01-18 2012-01-18 A kind of method and apparatus generating label for user

Country Status (1)

Country Link
CN (1) CN103218355B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216881A (en) * 2013-05-29 2014-12-17 腾讯科技(深圳)有限公司 Method and device for recommending individual labels
CN104376010B (en) * 2013-08-14 2021-12-14 腾讯科技(深圳)有限公司 User recommendation method and device
CN104572733B (en) * 2013-10-22 2019-03-15 腾讯科技(深圳)有限公司 The method and device of user interest labeling
CN104159071A (en) * 2014-07-11 2014-11-19 深圳瞭望通达科技有限公司 Intelligent target identification device, system and method based on cloud service
CN104133906B (en) * 2014-08-06 2018-07-31 深圳市英威诺科技有限公司 A kind of information filters the technical method of simultaneously intelligent sequencing
CN104572951B (en) * 2014-12-29 2018-07-17 微梦创科网络科技(中国)有限公司 A kind of determination method and device of ability label
CN105005587A (en) * 2015-06-26 2015-10-28 深圳市腾讯计算机系统有限公司 User portrait updating method, apparatus and system
CN105260877A (en) * 2015-09-22 2016-01-20 世纪龙信息网络有限责任公司 E-mail-based method for acquiring data of user portrait
CN105893407A (en) * 2015-11-12 2016-08-24 乐视云计算有限公司 Individual user portraying method and system
TWI605353B (en) * 2016-05-30 2017-11-11 Chunghwa Telecom Co Ltd File classification system, method and computer program product based on lexical statistics
CN107526741B (en) * 2016-06-21 2021-05-18 华为技术有限公司 User label generation method and device
WO2018023685A1 (en) * 2016-08-05 2018-02-08 吴晓敏 Method for recognizing user's interests and recognition system
CN106294744A (en) * 2016-08-11 2017-01-04 上海动云信息科技有限公司 Interest recognition methods and system
CN106528851A (en) * 2016-11-24 2017-03-22 腾讯科技(深圳)有限公司 Intelligent recommendation method and device
CN108153752B (en) * 2016-12-02 2022-02-11 腾讯科技(北京)有限公司 Method and device for determining text keywords
CN107038213B (en) * 2017-02-28 2021-06-15 华为技术有限公司 Video recommendation method and device
CN107180098B (en) * 2017-05-16 2019-11-12 武汉斗鱼网络科技有限公司 Keyword eliminates method and device in a kind of information search
CN108304539A (en) * 2018-01-30 2018-07-20 平安科技(深圳)有限公司 Qualified database method for building up, device and storage medium
CN109447717A (en) * 2018-11-12 2019-03-08 万惠投资管理有限公司 A kind of determination method and system of label
CN109522467A (en) * 2018-11-14 2019-03-26 江苏中威科技软件系统有限公司 A kind of analysis method and device of the label time based on big data platform
CN110097407A (en) * 2019-05-10 2019-08-06 宁波奥克斯电气股份有限公司 A kind of generation method and system of user tag
CN111079026B (en) * 2019-11-28 2023-11-24 北京秒针人工智能科技有限公司 Method, storage medium and device for determining character impression data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101243442A (en) * 2005-08-18 2008-08-13 微软公司 Annotating shared contacts with public descriptors
CN101547162A (en) * 2008-03-28 2009-09-30 国际商业机器公司 Method and device for tagging user based on user state information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100849272B1 (en) * 2001-11-23 2008-07-29 주식회사 엘지이아이 Method for automatically summarizing Markup-type documents

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101243442A (en) * 2005-08-18 2008-08-13 微软公司 Annotating shared contacts with public descriptors
CN101547162A (en) * 2008-03-28 2009-09-30 国际商业机器公司 Method and device for tagging user based on user state information

Also Published As

Publication number Publication date
CN103218355A (en) 2013-07-24

Similar Documents

Publication Publication Date Title
CN103218355B (en) A kind of method and apparatus generating label for user
CN106250513B (en) Event modeling-based event personalized classification method and system
CN103076892B (en) A kind of method and apparatus of the input candidate item for providing corresponding to input character string
CN104067567B (en) System and method for carrying out spam detection using character histogram
US20180260484A1 (en) Method, Apparatus, and Device for Generating Hot News
CN101000627B (en) Method and device for issuing correlation information
CN105868267B (en) A kind of modeling method of mobile social networking user interest
CN105095434B (en) The recognition methods of timeliness demand and device
CN104077407B (en) A kind of intelligent data search system and method
CN102663435B (en) Junk image filtering method based on semi-supervision
CN103164427A (en) Method and device of news aggregation
US20120072466A1 (en) Contents creating device and contents creating method
CN104317784A (en) Cross-platform user identification method and cross-platform user identification system
CN103473289A (en) Device and method for completing communication addresses
CN103580939A (en) Method and device for detecting abnormal messages based on account number attributes
CN102567534A (en) Interactive product user generated content intercepting system and intercepting method for the same
CN102043811A (en) Method and system for evaluating medical information
CN111353838A (en) Method and device for automatically checking commodity category
TW201211805A (en) Information provision device, information provision method, programme, and information recording medium
CN112818230A (en) Content recommendation method and device, electronic equipment and storage medium
CN107103065B (en) Information recommendation method and device based on user behaviors
CN112307318B (en) Content publishing method, system and device
CN112487306B (en) Automatic event marking and classifying method based on knowledge graph
CN112733006B (en) User portrait generation method, device and equipment and storage medium
CN107203546B (en) Text display method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant