CN103218355B - A kind of method and apparatus generating label for user - Google Patents
A kind of method and apparatus generating label for user Download PDFInfo
- Publication number
- CN103218355B CN103218355B CN201210015741.8A CN201210015741A CN103218355B CN 103218355 B CN103218355 B CN 103218355B CN 201210015741 A CN201210015741 A CN 201210015741A CN 103218355 B CN103218355 B CN 103218355B
- Authority
- CN
- China
- Prior art keywords
- user
- key word
- label
- information
- operation behavior
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of method and apparatus generating label for user: for any user X, obtain the operation behavior information after its logging in network in real time, and after often getting an operation behavior information, then carry out once following process: extract the key word in this operation behavior information, and preserve;Satisfactory key word is selected, as the label of user X from all key words preserved.Apply scheme of the present invention, it is possible to get the customized information of user conveniently and efficiently.
Description
Technical field
The present invention relates to network technology, particularly to a kind of method and apparatus generating label for user.
Background technology
In prior art, can to generate label (Tag) for article according to the key word extracted from article,
So that reader can recognize the content etc. of article conveniently and efficiently.
Correspondingly, it is also desirable to label can be generated for user, in order to get the individual of user conveniently and efficiently
Property information, thus preferably for its provide service, the such as label according to user be its push may sense emerging
The information etc. of interest.
But prior art does not also have a kind of mode that can generate label for user.
Summary of the invention
In view of this, the present invention provides a kind of method and apparatus generating label for user such that it is able to convenient fast
Get the customized information of user promptly.
For reaching above-mentioned purpose, the technical scheme is that and be achieved in that:
A kind of method generating label for user, including:
For any user X, obtain the operation behavior information after its logging in network in real time, and ought often get
Article one, after operation behavior information, then carry out processing below once:
Extract the key word in this operation behavior information, and preserve;
Satisfactory key word is selected, as the label of user X from all key words preserved.
A kind of device generating label for user, including:
Acquisition module, for for any user X, obtaining the operation behavior information after its logging in network in real time,
And each bar operation behavior information got is sent to processing module;
Described processing module includes the first processing unit and the second processing unit, wherein, described first processing unit
For, after often receiving an operation behavior information, extract the key word in this operation behavior information, and
Preserve;Described second processing unit is used for, and selects satisfactory from all key words preserved
Key word, as the label of user X;For selected each key word Y, carry out following process respectively:
Calculate the similarity of key word Y and each classification information in classification based training set set in advance respectively, will
Classification information corresponding to the maximum result of calculation of value is as classification information corresponding for key word Y;By selected
Each key word and respectively corresponding classification information as the label of user X.
Visible, use scheme of the present invention, by generating label for user, it is possible to get conveniently and efficiently
The customized information of user such that it is able to preferably provide the user service;And, scheme of the present invention
Implement simple and convenient, it is simple to universal and popularization.
Accompanying drawing explanation
Fig. 1 be the present invention be the flow chart that user generates the embodiment of the method for label.
Fig. 2 is the schematic diagram that user X subscribes to a certain information.
Fig. 3 is the schematic diagram that user X shares a certain information.
Fig. 4 be the present invention be the composition structural representation that user generates the device embodiment of label.
Detailed description of the invention
For problems of the prior art, the present invention proposes a kind of side generating label for user
Case.
For making technical scheme clearer, clear, develop simultaneously embodiment referring to the drawings, right
Scheme of the present invention is described in further detail.
Fig. 1 be the present invention be the flow chart that user generates the embodiment of the method for label.As it is shown in figure 1, include with
Lower step:
Step 11: for any user X, obtain the operation behavior information after its logging in network in real time.
Convenient for statement, represent any user with user X, for any user, all can be according to the present invention
Described mode processes.
After user's X logging in network, various operation behavior can be carried out, as click have subscribed a certain information, shares
A certain information, or paid close attention to a certain information etc., in actual applications, can the operation of user in real X
Behavioural information.
Fig. 2 is the schematic diagram that user X subscribes to a certain information;Fig. 3 is the signal that user X shares a certain information
Figure.As shown in figures 2-3, user X can be by clicking on " subscription this column " and " being shared with good friend " button
Subscribe to and share corresponding information.
Step 12: after often getting an operation behavior information, then carry out once following process: extracting should
Key word in operation behavior information, and preserve;Select from all key words preserved and conform to
The key word asked, as the label of user X.
In the present invention, often get an operation behavior information, then generate a secondary label, and utilize newly-generated
The label generated before is updated by label.
Specifically, in this step, for every the operation behavior information got, can locate the most as follows
Reason:
1) extract the key word in this operation behavior information, and preserve.
Concrete used word frequency (TF, Term Frequency) * reverse file word frequency (IDF, Inverse
Document Frequency) keyword extraction mode.
Wherein, TF refers to that, in the file that portion is given, some given word occurs in this document
Number of times, and can be normalized according to file size;IDF is used for weighing the general importance of a word, certain
One IDF giving word can be taken the logarithm divided by the business of the number of files comprising this word by total files again and obtain;
Specific in the present embodiment, an information can regard a file as, extracts the word that TF*IDF score value is higher
Language is as key word.
For this reason, it may be necessary to preserve every the operation behavior information got, equally, need to preserve every extracted
The key word of operation behavior information.
In actual applications, it would however also be possible to employ other keyword extraction mode, such as, based on N-gram
(N-Gram) the keyword extraction mode etc. of Information Statistics.
2) from all key words preserved, satisfactory key word is selected, as the label of user X.
Specifically, the weight of each key word preserved can be determined respectively, and according to descending suitable of weight
Sequence is ranked up, and will be in the key word label as user X of top N after sequence, and N is more than 1
Positive integer.
As it was previously stated, for every the operation behavior information preserved, all corresponding its key word can be preserved, then,
Its weight can be determined respectively for each key word preserved, and enters according to the order that weight is descending
Row sequence, the key word label as user X that then will be in top N after sequence.The concrete value of N
Can be decided according to the actual requirements, can be such as 3.
How to determine that the weight of each key word can be decided according to the actual requirements, such as equally: corresponding operation behavior
Between acquisition time and the current time of information, the duration at interval is the longest, and corresponding weight is the least;Illustrate:
Assume that between acquisition time and the current time of corresponding operation behavior information, the duration at interval is within 5 days,
Then its weight is 10, if within 5~10 days, is then 9, the like;On this basis, also may be used
Further combined with other factors, such as one corresponding two the operation behavior information of key word (operate from the two
Behavioural information has all extracted this key word, when sequence, can be using identical key word as a key
Word processes), weight corresponding to one of them operation behavior information is 10, and another is 9, then this pass
The final weight of keyword i.e. can be identified as 19;Certainly, may also be combined with other factors to determine the power of each key word
Weight, repeats the most one by one.
Or, it would however also be possible to employ other label generating mode, such as: determining each key preserved respectively
After the weight of word, weight is more than the key word label as user X of predetermined threshold, described threshold value
Concrete value can be decided according to the actual requirements.
In the label of user X in addition to including selected key word, also can farther include following information it
One or combination in any: classification information that selected each key word is the most corresponding, the time (one of generation label
Determine can reflect in degree the active degree etc. of user X), normalizing that selected each key word is the most corresponding
Change weighted value.
Wherein, standard scores normalization mode can be used to obtain the normalized weight value of each key word, be implemented as
Prior art, can refer to the realization of college entrance examination standard scores.
It addition, a classification based training set can be preset, including various classification information, such as ecommerce
Digital product class, cotton dress class in winter, spring and autumn unlined garment class etc..Correspondingly, for selected each key
Word Y, can carry out following process: calculate respectively in key word Y and classification based training set set in advance respectively
The similarity of each classification information, using classification information corresponding for result of calculation that value is maximum as key word Y
Corresponding classification information.
Can determine that mode, similarity based on body determine mode by morphology similarity, or based on corpus
Similarity determines that mode etc. determines the similarity between each key word and each classification information, implements
It is similarly prior art.
Illustration: assume that a key word, for " Nokia ", finds itself and " ecommerce number through calculating
Product class " similarity of this classification information is maximum, then then by " ecommerce digital product class " this
Classification information is as classification information corresponding to " Nokia " this key word.
After obtaining the label of user X, it is possible to provide for the real-time query service of the label of user X, and can
Label according to user X to provide corresponding service for user X, may be interested as pushed for user X
Information etc..
Illustrate: assume that the label of user X includes " Nokia " this key word, then when it is stepped on
After recording a shopping website, can be that it pushes the product information relevant to " Nokia ", further, since " promise
Ji Ya " corresponding classification information is ecommerce digital product class, therefore, can push other electricity to it simultaneously
The relevant information of son commercial affairs digital product.
It addition, the incidence relation between different terms also can be pre-build, so, when in the label of user X
When comprising a certain key word, can be while pushing the information relevant to this key word to user X, will be with this
The relevant information of the word that key word is associated also is pushed to user X.
Owing to the label of user X is real-time update, therefore, it is possible to hold the demand of user X in time, thus
Current most interested information is pushed for it.
In actual applications, it is possible that situations below: user X is warp in certain certain time before
The a certain key word of the normally off note, such as the relevant information of Nokia, then, a plurality of relevant operation will be got
Behavioural information, so, when being ranked up key word, although " Nokia " this key word is recently
The operation behavior information got does not occurs, but often occurs in operation behavior information before, this
Sample, the final weight of this key word is likely to can be relatively big, thus be in top N after sequence, correspondingly, after
Continue and will push the information relevant to Nokia for user X, but actually user X has been not required to this kind of
Information, thus it is inaccurate to cause pushing result.
For overcoming the problems referred to above, can often after scheduled duration, then by acquisition time of being preserved with time current
Between between, the duration at interval is more than the operation behavior information deletion of scheduled duration, and for remaining operation behavior
Information re-starts keyword extraction, preserves and select, using the satisfactory key word selected as user X
Label.The concrete value of described scheduled duration can be decided according to the actual requirements, and can be such as 1 month.
After having pushed the information relevant to its label for user X, can be according to the user X information to being pushed
Interest level, update user X label.
Such as, it is that after user X has pushed an information, user X is also according to a key word in label
Do not click on this information of reading, then then the normalized weight value of key word corresponding for this information can be reduced,
The most directly delete this key word, whereas if user X clicks on has read this information, then can be by this information
The normalized weight value of corresponding key word increases, and preferentially enters according to the key word that normalized weight value is bigger
Row information pushing.
So far, the introduction about the inventive method embodiment is i.e. completed.
Based on above-mentioned introduction, Fig. 4 be the present invention be the composition structural representation that user generates the device embodiment of label
Figure.As shown in Figure 4, including:
Acquisition module, for for any user X, obtaining the operation behavior information after its logging in network in real time,
And each bar operation behavior information got is sent to processing module;
Processing module, for after often receiving an operation behavior information, then carries out once following process: carry
Take the key word in this operation behavior information, and preserve;Symbol is selected from all key words preserved
Close the key word required, as the label of user X.
Wherein, processing module may particularly include:
First processing unit, for extracting the key word in the operation behavior information received, and preserves;
Second processing unit, for determining the weight of each key word preserved respectively, and according to weight by greatly to
Little order is ranked up, and will be in the key word label as user X of top N after sequence, and N is big
In the positive integer of 1;Or, determine the weight of each key word preserved respectively, by weight more than predetermined threshold
The key word of value is as the label of user X.
Second processing unit can be further used for, and for selected each key word Y, carries out following place respectively
Reason: calculate the similarity of key word Y and each classification information in classification based training set set in advance respectively,
Using classification information corresponding for the result of calculation that value is maximum as classification information corresponding for key word Y;By selected
Each key word of going out and the most corresponding classification information are as the label of user X.
Above-mentioned label can farther include one of following information or all: generate time of label, selected
The normalized weight value that each key word is the most corresponding.
It addition, the first processing unit can be further used for, preserve every the operation behavior information received;
Correspondingly, the second processing unit can be further used for, and often after scheduled duration, then will be preserved
Between acquisition time and current time, the duration at interval is more than the operation behavior information deletion of scheduled duration, and pin
Remaining operation behavior information is re-started keyword extraction, preserves and select, by meeting the requirements of selecting
Key word as the label of user X.
Second processing unit can be further used for, and pushes the information relevant to its label, and root for user X
According to the interest level of the user X information to being pushed, update the label of user X.
The specific works flow process of Fig. 4 shown device embodiment refer to saying accordingly in embodiment of the method shown in Fig. 1
Bright, here is omitted.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all at this
Within the spirit of invention and principle, any modification, equivalent substitution and improvement etc. done, should be included in
Within the scope of protection of the invention.
Claims (10)
1. the method generating label for user, it is characterised in that including:
For any user X, obtain the operation behavior information after its logging in network in real time, and ought often get
Article one, after operation behavior information, then carry out processing below once:
Extract the key word in this operation behavior information, and preserve;
Satisfactory key word is selected, as the label of user X from all key words preserved;
Wherein, described in select satisfactory key word after, farther include:
For selected each key word Y, carry out following process respectively:
Calculate the similarity of key word Y and each classification information in classification based training set set in advance respectively,
Using classification information corresponding for the result of calculation that value is maximum as classification information corresponding for key word Y;
Using selected each key word and respectively corresponding classification information as the label of user X.
Method the most according to claim 1, it is characterised in that described from all key words preserved
In select satisfactory key word and include:
Determine the weight of each key word preserved respectively, and be ranked up according to the order that weight is descending,
To be in the key word label as user X of top N after sequence, N is the positive integer more than 1;
Or, determine the weight of each key word preserved respectively, weight made more than the key word of predetermined threshold
Label for user X.
Method the most according to claim 2, it is characterised in that described label farther includes following
One of information or whole: generate the normalization power of the time of label, selected each key word correspondence respectively
Weight values.
Method the most according to claim 1, it is characterised in that the method farther includes:
After often getting an operation behavior information, then preserve this operation behavior information;
Further, often after scheduled duration, then it is spaced between the acquisition time and the current time that will be preserved
Duration is more than the operation behavior information deletion of scheduled duration, and re-starts for remaining operation behavior information
Keyword extraction, preserve and select, using the satisfactory key word selected as the label of user X.
5. according to the method described in claim 1 or 4, it is characterised in that the method farther includes: for
User X pushes the information relevant to its label, and according to the interest level of the user X information to being pushed,
Update the label of user X.
6. the device generating label for user, it is characterised in that including:
Acquisition module, for for any user X, obtaining the operation behavior information after its logging in network in real time,
And each bar operation behavior information got is sent to processing module;
Described processing module includes the first processing unit and the second processing unit, wherein,
Described first processing unit is used for, and after often receiving an operation behavior information, extracts this operation behavior
Key word in information, and preserve;
Described second processing unit is used for, and selects satisfactory key word from all key words preserved,
Label as user X;For selected each key word Y, carry out following process respectively: count respectively
Calculate the similarity of each classification information in key word Y and classification based training set set in advance, by value
Classification information corresponding to big result of calculation is as classification information corresponding for key word Y;By selected each
The classification information of key word and respectively correspondence is as the label of user X.
Device the most according to claim 6, it is characterised in that described second processing unit is used for, point
Do not determine the weight of each key word preserved, and be ranked up according to the order that weight is descending, will row
Being in the key word label as user X of top N after sequence, N is the positive integer more than 1;Or, point
Do not determine the weight of each key word preserved, weight is more than the key word of predetermined threshold as user X's
Label.
Device the most according to claim 7, it is characterised in that described label farther includes following
One of information or whole: generate the normalization power of the time of label, selected each key word correspondence respectively
Weight values.
Device the most according to claim 7, it is characterised in that
Described first processing unit is further used for, and preserves every the operation behavior information received;
Described second processing unit is further used for, often after scheduled duration, then during the acquisition that will be preserved
Between and current time between the duration at interval more than the operation behavior information deletion of scheduled duration, and for residue
Operation behavior information re-start keyword extraction, preserve and select, the satisfactory key that will select
Word is as the label of user X.
Device the most according to claim 7, it is characterised in that described second processing unit is used further
In, push the information relevant to its label for user X, and the sense according to the user X information to being pushed is emerging
Interest degree, updates the label of user X.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210015741.8A CN103218355B (en) | 2012-01-18 | 2012-01-18 | A kind of method and apparatus generating label for user |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210015741.8A CN103218355B (en) | 2012-01-18 | 2012-01-18 | A kind of method and apparatus generating label for user |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103218355A CN103218355A (en) | 2013-07-24 |
CN103218355B true CN103218355B (en) | 2016-08-31 |
Family
ID=48816159
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210015741.8A Active CN103218355B (en) | 2012-01-18 | 2012-01-18 | A kind of method and apparatus generating label for user |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103218355B (en) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216881A (en) * | 2013-05-29 | 2014-12-17 | 腾讯科技(深圳)有限公司 | Method and device for recommending individual labels |
CN104376010B (en) * | 2013-08-14 | 2021-12-14 | 腾讯科技(深圳)有限公司 | User recommendation method and device |
CN104572733B (en) * | 2013-10-22 | 2019-03-15 | 腾讯科技(深圳)有限公司 | The method and device of user interest labeling |
CN104159071A (en) * | 2014-07-11 | 2014-11-19 | 深圳瞭望通达科技有限公司 | Intelligent target identification device, system and method based on cloud service |
CN104133906B (en) * | 2014-08-06 | 2018-07-31 | 深圳市英威诺科技有限公司 | A kind of information filters the technical method of simultaneously intelligent sequencing |
CN104572951B (en) * | 2014-12-29 | 2018-07-17 | 微梦创科网络科技(中国)有限公司 | A kind of determination method and device of ability label |
CN105005587A (en) * | 2015-06-26 | 2015-10-28 | 深圳市腾讯计算机系统有限公司 | User portrait updating method, apparatus and system |
CN105260877A (en) * | 2015-09-22 | 2016-01-20 | 世纪龙信息网络有限责任公司 | E-mail-based method for acquiring data of user portrait |
CN105893407A (en) * | 2015-11-12 | 2016-08-24 | 乐视云计算有限公司 | Individual user portraying method and system |
TWI605353B (en) * | 2016-05-30 | 2017-11-11 | Chunghwa Telecom Co Ltd | File classification system, method and computer program product based on lexical statistics |
CN107526741B (en) * | 2016-06-21 | 2021-05-18 | 华为技术有限公司 | User label generation method and device |
WO2018023685A1 (en) * | 2016-08-05 | 2018-02-08 | 吴晓敏 | Method for recognizing user's interests and recognition system |
CN106294744A (en) * | 2016-08-11 | 2017-01-04 | 上海动云信息科技有限公司 | Interest recognition methods and system |
CN106528851A (en) * | 2016-11-24 | 2017-03-22 | 腾讯科技(深圳)有限公司 | Intelligent recommendation method and device |
CN108153752B (en) * | 2016-12-02 | 2022-02-11 | 腾讯科技(北京)有限公司 | Method and device for determining text keywords |
CN107038213B (en) * | 2017-02-28 | 2021-06-15 | 华为技术有限公司 | Video recommendation method and device |
CN107180098B (en) * | 2017-05-16 | 2019-11-12 | 武汉斗鱼网络科技有限公司 | Keyword eliminates method and device in a kind of information search |
CN108304539A (en) * | 2018-01-30 | 2018-07-20 | 平安科技(深圳)有限公司 | Qualified database method for building up, device and storage medium |
CN109447717A (en) * | 2018-11-12 | 2019-03-08 | 万惠投资管理有限公司 | A kind of determination method and system of label |
CN109522467A (en) * | 2018-11-14 | 2019-03-26 | 江苏中威科技软件系统有限公司 | A kind of analysis method and device of the label time based on big data platform |
CN110097407A (en) * | 2019-05-10 | 2019-08-06 | 宁波奥克斯电气股份有限公司 | A kind of generation method and system of user tag |
CN111079026B (en) * | 2019-11-28 | 2023-11-24 | 北京秒针人工智能科技有限公司 | Method, storage medium and device for determining character impression data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101243442A (en) * | 2005-08-18 | 2008-08-13 | 微软公司 | Annotating shared contacts with public descriptors |
CN101547162A (en) * | 2008-03-28 | 2009-09-30 | 国际商业机器公司 | Method and device for tagging user based on user state information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100849272B1 (en) * | 2001-11-23 | 2008-07-29 | 주식회사 엘지이아이 | Method for automatically summarizing Markup-type documents |
-
2012
- 2012-01-18 CN CN201210015741.8A patent/CN103218355B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101243442A (en) * | 2005-08-18 | 2008-08-13 | 微软公司 | Annotating shared contacts with public descriptors |
CN101547162A (en) * | 2008-03-28 | 2009-09-30 | 国际商业机器公司 | Method and device for tagging user based on user state information |
Also Published As
Publication number | Publication date |
---|---|
CN103218355A (en) | 2013-07-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103218355B (en) | A kind of method and apparatus generating label for user | |
CN106250513B (en) | Event modeling-based event personalized classification method and system | |
CN103076892B (en) | A kind of method and apparatus of the input candidate item for providing corresponding to input character string | |
CN104067567B (en) | System and method for carrying out spam detection using character histogram | |
US20180260484A1 (en) | Method, Apparatus, and Device for Generating Hot News | |
CN101000627B (en) | Method and device for issuing correlation information | |
CN105868267B (en) | A kind of modeling method of mobile social networking user interest | |
CN105095434B (en) | The recognition methods of timeliness demand and device | |
CN104077407B (en) | A kind of intelligent data search system and method | |
CN102663435B (en) | Junk image filtering method based on semi-supervision | |
CN103164427A (en) | Method and device of news aggregation | |
US20120072466A1 (en) | Contents creating device and contents creating method | |
CN104317784A (en) | Cross-platform user identification method and cross-platform user identification system | |
CN103473289A (en) | Device and method for completing communication addresses | |
CN103580939A (en) | Method and device for detecting abnormal messages based on account number attributes | |
CN102567534A (en) | Interactive product user generated content intercepting system and intercepting method for the same | |
CN102043811A (en) | Method and system for evaluating medical information | |
CN111353838A (en) | Method and device for automatically checking commodity category | |
TW201211805A (en) | Information provision device, information provision method, programme, and information recording medium | |
CN112818230A (en) | Content recommendation method and device, electronic equipment and storage medium | |
CN107103065B (en) | Information recommendation method and device based on user behaviors | |
CN112307318B (en) | Content publishing method, system and device | |
CN112487306B (en) | Automatic event marking and classifying method based on knowledge graph | |
CN112733006B (en) | User portrait generation method, device and equipment and storage medium | |
CN107203546B (en) | Text display method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |