CN107526741A - user tag generation method and device - Google Patents
user tag generation method and device Download PDFInfo
- Publication number
- CN107526741A CN107526741A CN201610454113.8A CN201610454113A CN107526741A CN 107526741 A CN107526741 A CN 107526741A CN 201610454113 A CN201610454113 A CN 201610454113A CN 107526741 A CN107526741 A CN 107526741A
- Authority
- CN
- China
- Prior art keywords
- client
- characteristic attribute
- user
- classification
- weight information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of user tag generation method and device, are related to communication technical field, and this method includes:Each client for have recorded user data in user equipment of the same race, obtain and default n kinds characteristic attribute identical at least one characteristic attribute;It is determined that at least one characteristic attribute possessed by each client every kind of characteristic attribute weight information, obtain the weight information of every kind of characteristic attribute in n kind characteristic attributes;The all clients that user data is have recorded in user equipment of the same race are clustered using the weight information of default clustering target k and every kind of characteristic attribute;At least one feature client is extracted from k obtained classification;First user tag is generated according to the user data of each client records, and second user label is generated according at least one feature client, when solving server according to less user data generation user tag, the problem of user tag of generation is less, the effect of the quantity of the user tag of increase generation is reached.
Description
Technical field
The present invention relates to communication technical field, more particularly to a kind of user tag generation method and device.
Background technology
At present, server can determine the target group of product by building user's portrait.User, which draws a portrait, to be used
In the feature for portraying user, the feature of user includes the age of user, sex, interest, custom, position etc.
Feature.Because user tag can be used for describing the feature of at least one user, therefore, server can lead to
Generation user tag is crossed to obtain user's portrait.User tag include wheat despot, literature and art young woman, Pekinese,
Night owl etc..
Wherein, server generation user tag, including:Gather the user data at least one user equipment;
The user data is analyzed, obtains the feature of the user using at least one user equipment;According to this
Feature generates user tag.Wherein, user data can be the position data of user, age data, behavior
It is accustomed to data, hobby data, health status data etc..
Because server can only generate user tag according to the user data of collection, therefore, in the user of collection
When data are less, the negligible amounts of the user tag of generation.
The content of the invention
In order to solve the user data of collection it is less when, the problem of the negligible amounts of the user tag of generation, this
Application provides a kind of user tag generation method and device.
First aspect, there is provided a kind of user tag generation method, this method include:For user of the same race
Each client of user data is have recorded in equipment, respectively from characteristic attribute possessed by the client,
Obtain and default n kinds characteristic attribute identical at least one characteristic attribute;Had according to each client
Characteristic attribute quantity, it is determined that every kind of feature category at least one characteristic attribute possessed by each client
The weight information of property, obtains the weight information of every kind of characteristic attribute in n kind characteristic attributes;Gathered using default
The weight information of class index k and every kind of characteristic attribute is to have recorded the institute of user data in user equipment of the same race
There is client to be clustered, obtain k classification, each classification includes at least one client and each client
User equipment belonging to end, k is positive integer;At least one feature client, feature are extracted from k classification
Client is used for the common interest for reflecting the targeted user population of user equipment;According to each client records
User data generates the first user tag, and generates second user label according at least one feature client.
Wherein, user data is used to reflect the user using client to the operation performed by the client, feature category
Property be used for reflect the feature that the targeted user population of client possesses jointly, n is positive integer.
By determining to have recorded every kind of characteristic attribute possessed by each client of user data in user equipment
Weight information, obtain the weight information of every kind of characteristic attribute in n kind characteristic attributes and referred to according to default cluster
The weight information of mark k and every kind of characteristic attribute is to have recorded all visitors of user data in user equipment of the same race
Family end is clustered, and obtains k classification, and at least one feature client is extracted from the k classification, is made
Obtain server not only can generate user tag according to user data, can also be according at least one feature client
End generation user tag, when solving server according only to user data generation user tag, in user data
In the case of less, the problem of user tag of generation is less, the number of the user tag of increase generation is reached
The effect of amount.
With reference to the first realization of first aspect, the user data of each client includes the operation frequency of client
Rate, the quantity of characteristic attribute according to possessed by client, determine at least one feature category that client has
The weight information of every kind of characteristic attribute, obtains the weight information of every kind of characteristic attribute in n kind characteristic attributes in property,
Including:According to the quantity of characteristic attribute possessed by default weight total score and each client, set each
The weight of every kind of characteristic attribute possessed by client, weight and the quantity of characteristic attribute possessed by client
Negatively correlated relation;Every kind of feature category according to possessed by the running frequency of each client and each client
Property weight, it is determined that the weight information of every kind of characteristic attribute possessed by each client;For each user
The all clients of user data are have recorded in equipment, the weight information of characteristic attribute of the same race is added, obtains n
The weight information of kind characteristic attribute.
By setting weight for every kind of characteristic attribute possessed by each client, the weight is had with client
The negatively correlated relation of quantity of some characteristic attributes;According to the running frequency of each client and each client
The weight of possessed every kind of characteristic attribute, it is determined that the weight of every kind of characteristic attribute possessed by each client
Information;All clients for have recorded user data in each user equipment, by characteristic attribute of the same race
Weight information is added, and obtains the weight information of n kind characteristic attributes so that the n kind feature category that server obtains
The weight information of property and the running frequency correlation of client, embody user and use each client
Use habit, ensure that the accuracy of the second user label of generation.
With reference in a first aspect, in second of realization of first aspect, the user data of each client includes
The running frequency and run time section of client, according to the quantity of characteristic attribute possessed by each client,
It is determined that at least one characteristic attribute possessed by each client every kind of characteristic attribute weight information, obtain n
The weight information of every kind of characteristic attribute in kind characteristic attribute, including:According to default weight total score and each visitor
The quantity of characteristic attribute possessed by the end of family, the weight of every kind of characteristic attribute possessed by each client is set,
Weight and the negatively correlated relation of quantity of characteristic attribute possessed by client;It is determined that the operation of each client
Preset time period belonging to period, and determine operation frequency of each client in corresponding preset time period
Rate, each preset time period correspond to n kind characteristic attributes;For each client, according to client each
The weight of every kind of characteristic attribute, determines client possessed by running frequency and client in preset time period
In the weight information of every kind of characteristic attribute corresponding to each preset time period;For being recorded in each user equipment
The all clients of user data, by the weight information phase of the characteristic attribute of the same race in same preset time period
Add, obtain the weight information of n kinds characteristic attribute corresponding to each preset time period.
By setting weight for every kind of characteristic attribute possessed by each client, the weight is had with client
The negatively correlated relation of quantity of some characteristic attributes;For each client, according to client each default
The weight of every kind of characteristic attribute, determines client every possessed by running frequency and client in period
The weight information of every kind of characteristic attribute corresponding to individual preset time period;For have recorded use in each user equipment
The all clients of user data, the weight information of the characteristic attribute of the same race in same preset time period is added,
Obtain the weight information of n kinds characteristic attribute corresponding to each preset time period so that in the n that server obtains
The weight information of characteristic attribute more accurately reflects that user uses the custom of each client, and services
Device may be referred to further types of user data to generate second user label.
With reference to first aspect the first realize or second realization, in the third realization of first aspect,
Using the weight information of default clustering target k and every kind of characteristic attribute to have recorded in user equipment of the same race
The all clients of user data are clustered, and obtain k classification, including:When user equipment bag of the same race
When including m user equipment, the eigenmatrix of m × p dimensions is generated according to the weight information of n kind characteristic attributes,
When the user data does not include the run time section of each client, p=n;When the user data includes
The run time section of each client, and when the quantity of preset time period is q, p=n × q;To eigenmatrix
It is normalized, obtains the normalization matrix of m × p dimensions;Normalization matrix is entered using clustering target k
Row cluster, obtains k classification.
With reference to the third realization of first aspect, in the 4th kind of realization of first aspect, clustering target k is utilized
Normalization matrix is clustered, obtains k classification, including:Utilize default dimension-reduction algorithm and default
Dimensionality reduction index l, dimension-reduction treatment is carried out to the normalization matrix of m × p dimensions, obtains the dimensionality reduction matrix of m × l dimensions;
Dimensionality reduction matrix is clustered using clustering target k, obtains k classification.
By carrying out dimension-reduction treatment to normalization matrix, server by utilizing clustering target k was both reduced and had performed and gathered
Amount of calculation during class algorithm, improve the efficiency that server performs clustering algorithm;Also normalization matrix is deleted
In redundant data, improve server carry out clustering algorithm when data stability.
With reference to the third realization of first aspect, in the 5th kind of realization of first aspect, from k classification
At least one feature client is extracted, including:The center customer end of each classification in k classification is determined, in
The value of the quantity for the user equipment that the quantity divided by classification of user equipment belonging to heart client include is more than first
Predetermined threshold value;The class of the quantity maximum of the user equipment included is determined in the j classification that center customer end be present
Not, at least one center customer end of the classification of determination is defined as at least one feature client, 0 < j≤k.
By determining the center customer end of each classification, and bag is determined in j classification for exist center customer end
The maximum cluster of the quantity of the user equipment included, at least one center customer end of the category is defined as at least
One feature client so that at least one feature client that server obtains is that most of users make
Client, the common interest of most of users can be reflected, ensure that the feature client that server determines
The accuracy at end.
With reference to any one in the first of first aspect and first aspect to the 5th kind of realization, first
During the 6th kind of aspect is realized, each client for have recorded user data in user equipment of the same race,
Respectively from characteristic attribute possessed by client, obtain and default n kinds characteristic attribute identical at least one
Kind characteristic attribute, including:Each client for have recorded user data in user equipment of the same race, is adopted
Collect the user data of client records;The all customer data of client records was carried out according to preset rules
Filter, preset rules are less than the second predetermined threshold value for the operation duration of the client of record user data, or,
It is more than the 3rd predetermined threshold value in the operation duration of the client of record user data;When depositing user after filtration
During data, in the characteristic attribute having from client, obtain with default n kinds characteristic attribute identical at least
A kind of characteristic attribute.
By being filtered to the user data of client records so that server can filter out and not meet reality
The user data of border service condition, the accuracy that server calculates the weight information of n kind characteristic attributes is improved,
So as to improve the accuracy of cluster.
With reference to any one in the first of first aspect and first aspect to the 6th kind of realization, first
During the 7th kind of aspect is realized, after at least one feature client is extracted from k classification, in addition to:
When the quantity of at least one feature client is r, the mark of each feature client is obtained, by r client
The mark of each client obtains n+r kind characteristic attributes, r is positive integer as a kind of characteristic attribute in end;
N is updated to n+r, triggering performs the quantity of the characteristic attribute according to possessed by each client, it is determined that each
The weight information of every kind of characteristic attribute at least one characteristic attribute possessed by client, obtain the n kinds
The weight information of every kind of characteristic attribute in characteristic attribute;Utilize default clustering target k and every kind of characteristic attribute
Weight information all clients that user data is have recorded in the user equipment of the same race are clustered,
K classification is obtained, the step of at least one feature client is extracted from k classification, until extracting feature
Client stops when failing.
It is added to n kind characteristic attributes by regarding the mark of r obtained feature client as r characteristic attribute
In, n is updated to n+r, circulation performs the step of extracting at least one feature client so that server can
Constantly to extract feature client, more user tags are generated according to feature client, further increase
The quantity of the user tag of server generation.
With reference in a first aspect, in the 8th kind of first aspect realizes, using default clustering target k and
The weight information of every kind of characteristic attribute is to have recorded all clients of user data in the user equipment of the same race
End is clustered, after obtaining k classification, including:For each classification in k classification, statistics should
The quantity of user equipment in classification belonging to each client;The quantity of affiliated user equipment is more than the 4th
The client of predetermined threshold value is defined as client to be recommended;The user of client to be recommended is not installed into cluster
Equipment recommendation client to be recommended.
By determining the client to be recommended of each classification, the user for not installing client to be recommended into classification
Equipment recommendation client to be recommended so that server can to user recommended user may client interested,
Reduce the difficulty that user obtains the client.
Second aspect, there is provided a kind of user tag generating means, the device include at least one unit, should
At least one unit is used to realize the use provided at least one realization of above-mentioned first aspect or first aspect
Family label generating method.
The third aspect, there is provided a kind of server, the device include:Processor and it is connected with processing
Wireless transceiver;
The wireless transceiver is configured as being controlled by processor, the processor be used for realize above-mentioned first aspect or
User tag generation method provided at least one realization of first aspect.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, institute in being described below to embodiment
The accompanying drawing needed to use is briefly described, it should be apparent that, drawings in the following description are only the present invention
Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work,
Other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is the communication system architecture schematic diagram that an exemplary embodiment of the invention provides;
Fig. 2 is the structural representation for the user equipment that an exemplary embodiment of the invention provides;
Fig. 3 is the flow chart for the user tag generation method that an exemplary embodiment of the invention provides;
Fig. 4 is the structure chart for the user tag generating means that an exemplary embodiment of the invention provides.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to the present invention
Embodiment is described in further detail.
Referenced herein " unit " refers to the functional structure logically divided, and being somebody's turn to do " unit " can be by pure
Hardware realization, or, software and hardware is implemented in combination with.
Fig. 1 is refer to, the structure of the communication system 100 provided it illustrates an illustrative embodiment of the invention
Schematic diagram.The communication system 100 includes server 120 and multiple user equipmenies 140.
Server 120 is connected by communication network with each user equipment 140, and for gathering each user
The user data of client records in equipment 140.
At least one client is installed in user equipment 140, each client can record user data,
The user data is used to reflect the operation performed by the user using the client.Such as:Client records are used
Family starts the number of the client, record run time section etc..User equipment 140 can be set top box, move
Mobile phone (English:Cellphone), smart mobile phone (English:Smartphone), computer (English:Computer),
Tablet personal computer (English:Tablet computer), wearable device (English:Wearable device), it is personal
Digital assistant (English:Personal digital assistant, PDA), mobile internet device (English:mobile
Internet device, MID) and E-book reader (English:E-book reader) etc..
In the present embodiment, multiple user equipmenies 140 belong to same kind.For example, multiple user equipmenies 140
All it is set top box, or, multiple user equipmenies 140 are all smart mobile phones etc..
Fig. 2 is refer to, it illustrates the structure of the server 200 shown in another exemplary embodiment of the invention
Schematic diagram.The server 200 can be the server 120 shown in Fig. 1, and the server includes:Processing
Device 220, the wireless transceiver 240 being connected with processor 220.
The wireless transceiver 240 can be made up of one or more antennas, and the antenna enables server 200
Send or receive radio signal.
Wireless transceiver 240 may be connected to processor 220.Processor 220 is the control centre of server, should
Processor 220 can be central processing unit (English:Central processing unit, CPU), at network
Manage device (English:Network processor, NP) or CPU and NP combination.Processor 220 may be used also
To further comprise hardware chip.Above-mentioned hardware chip can be application specific integrated circuit (English:
Application-specific integrated circuit, ASIC), PLD (English:
Programmable logic device, PLD) or its combination.Above-mentioned PLD can be complex programmable logic
Device (English:Complex programmable logic device, CPLD), field programmable gate battle array
Row (English:Field-programmable gate array, FPGA), GAL (English:generic
Array logic, GAL) or its any combination.
Optionally, the server 200 also includes memory 260, the memory 260 bus or other means
It is connected with processor 220, memory 260 can be volatile memory (English:Volatile memory),
Nonvolatile memory (English:Non-volatile memory) or combinations thereof.Volatile memory
Can be random access memory (English:Random-access memory, RAM), such as static random
Access memory (English:Static random access memory, SRAM), dynamic random access memory
Device (English:Dynamic random access memory, DRAM).Nonvolatile memory can be only
Read memory (English:Read only memory image, ROM), such as programmable read only memory (English
Text:Programmable read only memory, PROM), Erasable Programmable Read Only Memory EPROM (English:
Erasable programmable read only memory, EPROM), Electrically Erasable Read Only Memory
(English:Electrically erasable programmable read-only memory, EEPROM).It is non-easy
The property lost memory can also be flash memory (English:Flash memory), magnetic memory, such as tape
(English:Magnetic tape), floppy disk (English:Floppy disk), hard disk.Nonvolatile memory
It can be CD.
User data can be stored in memory 260.Alternatively, memory 260 can store processor 220
Feature client, the first user tag and second user label for determining etc., specific determination process can be seen below
State the description in step 304 and step 305.
Fig. 3 is refer to, the stream of the user tag generation method provided it illustrates an exemplary embodiment of the invention
Cheng Tu.The present embodiment is used in communication system as shown in Figure 1 come for example, by server in this way
Following step is performed, this method includes following steps:
Step 301, for have recorded each client of user data in user equipment of the same race, respectively from visitor
In characteristic attribute possessed by the end of family, obtain and default n kinds characteristic attribute identical at least one feature category
Property, n is positive integer.
Multiple client is mounted with user equipment, when user uses some client, is deposited in the client
In the user data of record;When some client was not used in user, the unwritten user of the client
Data.Wherein, user data is used to reflect the operation performed by the user customer using client, than
Such as click on the frequency of client, the period for running client, the web page contents using Client browse.
Each client for have recorded user data in user equipment of the same race, server obtain the client
At least one characteristic attribute at end.Wherein, characteristic attribute is used to reflect that the targeted user population of client is common
The feature possessed, such as:The feature that the targeted user population of client " nursery rhymes are complete works of " possesses jointly is children,
Then the characteristic attribute of " nursery rhymes are complete works of " is children;The targeted user population of client " square dance is complete works of " has jointly
Standby feature is old man, then the characteristic attribute of " square dance is complete works of " is old man.
Client for have recorded user data, the user data of the possible client records do not meet reality
Service condition, such as, run time is less than 1 second or run time is more than 24 hours, now, if service
Device obtains the characteristic attribute of the client, and may result in follow-up result of calculation can not accurately reflect user
The common interest of the targeted user population of equipment.In the present embodiment, server is useful to the institute of client records
User data is filtered according to preset rules, when the preset rules is record the operation of the client of user data
Length is less than the second predetermined threshold value, or, it is pre- to be more than the 3rd in the operation duration of the client of record user data
If threshold value;When depositing user data after filtration, at least one characteristic attribute of the client is obtained.This
Sample, server have filtered out the user data for not meeting actual use situation, are tied so as to improve follow-up calculate
The accuracy of fruit.
Because each client may have various features attribute, but only part is special in the various features attribute
It is identical with default n kinds characteristic attribute to levy attribute, therefore, server needs have a variety of from the client
Obtained in characteristic attribute at least one with default n kinds characteristic attribute identical.Wherein, default n kinds are special
Levying attribute is determined according to business objective, such as:Business objective is to determine the target of user equipment of the same race
The age of colony, then characteristic attribute can be child, teenager, adult, old age in default n.
Assuming that default n kinds characteristic attribute is respectively child, teenager, adult, old age, " xx bears client
The characteristic attribute that person " has for game, leisure, adult and teenager, then server from game, leisure, into
Adult and characteristic attribute of the two characteristic attributes of teenager as " the xx persons of bearing " are obtained in year and teenager.
It should be noted that the characteristic attribute that client described below has is each meant and default n kinds
Characteristic attribute identical characteristic attribute.
Step 302, according to possessed by each client characteristic attribute quantity, it is determined that each client is had
The weight information of every kind of characteristic attribute at least one characteristic attribute having, obtain every kind of in n kind characteristic attributes
The weight information of characteristic attribute.
Wherein, the weight information of every kind of characteristic attribute is used to reflect whether user has a preference for use and have this feature category
Property client, the usual weight information is represented with numeral, the bigger characteristic attribute of weight information, illustrate use
More have a preference for using the client with this feature attribute at family.
In one implementation, the user data of each client includes the running frequency of the client, clothes
The quantity of business device characteristic attribute according to possessed by each client, it is determined that possessed by each client at least
The weight information of every kind of characteristic attribute in a kind of characteristic attribute, the weight of every kind of characteristic attribute in n kind characteristic attributes
Information, including:According to the quantity of characteristic attribute possessed by default weight total score and each client, if
Put the weight of every kind of characteristic attribute possessed by each client, weight and characteristic attribute possessed by client
The negatively correlated relation of quantity;It is every kind of according to possessed by the running frequency of each client and each client
The weight of characteristic attribute, it is determined that the weight information of every kind of characteristic attribute possessed by each client;For every
The all clients of user data are have recorded in individual user equipment, the weight information of characteristic attribute of the same race is added,
Obtain the weight information of n kind characteristic attributes.
The running frequency of each client refers to that user uses the number of the client.Each client can incite somebody to action
The user of statistics uses the number of the client as the running frequency of the client in preset duration, can also
User is used into running frequency of the client total degree as the client, the present embodiment is not to each client
The determination mode of the running frequency at end is construed as limiting.
Wherein, according to possessed by default weight total score and each client characteristic attribute quantity, set
The weight of every kind of characteristic attribute refers to possessed by each client:It is special possessed by a client when existing
The quantity for levying attribute is a, and when default weight total score is b, then every kind of feature category possessed by the client
The weight of property is b/a.It should be noted that algorithm b/a is when actually realizing, it is possible to achieve it is b × (1/a),
The calculating process that the present embodiment does not calculate server the weight of every kind of characteristic attribute is construed as limiting.Default weight
Total score can be any reasonable value, such as:1 point, 2 points, 100 grade, the present embodiment is not to weight weight
The concrete numerical value of total score is construed as limiting.
According to the weight of every kind of characteristic attribute possessed by the running frequency of each client and each client,
It is determined that the weight information of every kind of characteristic attribute refers to possessed by each client:When one client of presence
Running frequency is c, and when the weight of every kind of characteristic attribute possessed by the client is b/a, then the client
The weight information of possessed every kind of characteristic attribute is c × b/a.
Assuming that default 4 kinds of characteristic attributes are respectively child, teenager, adult, old age, and default power
Weight total score be 1 point, for same user equipment have recorded user data client " the xx persons of bearing " and
" xxKTV ", server get the characteristic attribute of " the xx persons of bearing " as teenager, adult, old age, feature category
Property quantity be 3, then the weight set for the adlescent characteristic attribute of " the xx persons of bearing " is 1/3=0.33 point;
The weight set for the adult characteristic attribute of " the xx persons of bearing " is 1/3=0.33 points;It is special for the old age of the client
The power that sign attribute is set is attached most importance to 1/3=0.33 points.The characteristic attribute that server gets " xxKTV " is teenager
Then it is 1/2=0.5 points for the weight that teenager's attribute of " xxKTV " is set and adult, characteristic attribute are two;
The weight for being then the adult attribute setting of " xxKTV " is 1/2=0.5 points.
The running frequency of " if the xx persons of bearing " is 1 time, the weight information of the adlescent characteristic attribute of " the xx persons of bearing "
For 1*0.33=0.33 points, the weight information for characteristic attribute of growing up is 1*0.33=0.33 points, old characteristic attribute
Weight information be 1*0.33=0.33 point.
If the running frequency of " xxKTV " is 2 times, the weight information of the adlescent characteristic attribute of " xxKTV " is
2*0.5=1 points, the weight information for characteristic attribute of growing up is 2*0.5=1 points.
The weight information of child's characteristic attribute of each client is added by server, obtains child's characteristic attribute
Weight information be 0 point;The weight information of the adlescent characteristic attribute of each client is added, obtains green grass or young crops
The weight information of juvenile characteristic attribute is 0.33+1=1.33 points;By the power of the adult characteristic attribute of each client
Weight information is added, and the weight information for obtaining adult characteristic attribute is 0.33+1=1.33 points;By each client
The weight information of old characteristic attribute is added, and the weight information for obtaining old characteristic attribute is 0.33 point.
In another implementation, the user data of each client include the client running frequency and
Run time section, according to the quantity of characteristic attribute possessed by each client, it is determined that each client is had
The weight information of every kind of characteristic attribute at least one characteristic attribute having, obtain every kind of in n kind characteristic attributes
The weight information of characteristic attribute, including:According to feature possessed by default weight total score and each client
The quantity of attribute, the weight of every kind of characteristic attribute possessed by each client, weight and client institute are set
The negatively correlated relation of quantity for the characteristic attribute having;It is it is determined that pre- belonging to the run time section of each client
If the period, and running frequency of each client in corresponding preset time period is determined, when each presetting
Between section correspond to n kind characteristic attributes;For each client, according to client in each preset time period
The weight of every kind of characteristic attribute possessed by running frequency and client, determine client in each preset time
The weight information of every kind of characteristic attribute corresponding to section;Institute for have recorded user data in each user equipment
There is client, the weight information of the characteristic attribute of the same race in same preset time period is added, obtained each pre-
If the weight information of n kinds characteristic attribute corresponding to the period.
Wherein, the run time section of each client can be period of the client in front stage operation,
It can also be residing period of the client during being run from bringing into operation to terminating, the period wraps
Front stage operation period and running background period are included, the present embodiment is not to the run time section of each client
Determination mode be construed as limiting.
Assuming that 4 preset time periods in server, as shown in following table one, default 4 kinds of characteristic attributes are distinguished
For child, teenager, adult, old age, and default weight is 1 point, in same user equipment
The client " the xx persons of bearing " and " xxKTV ", server that have recorded user data get the feature of " the xx persons of bearing "
Attribute is teenager, adult, old age, and the quantity of characteristic attribute is 3, and has been run once on daytime;
The characteristic attribute of " xxKTV " is teenager, adult, and the quantity of characteristic attribute is 2, and on daytime and evening
On be separately operable once.Server sets " the xx persons of bearing " and " xxKTV " weight respectively, and according to setting
The weight information of the weight weight that respectively obtains " the xx persons of bearing " and " xxKTV " be:" the xx persons of bearing " is on daytime
The weight information of adlescent characteristic attribute is 1*0.33=0.33 points, and the weight information for characteristic attribute of growing up is
1*0.33=0.33 point, the weight information of old characteristic attribute is 1*0.33=0.33 points;" xxKTV " is on daytime
The weight information of adlescent characteristic attribute be 1*0.5=0.5 point, the weight information for characteristic attribute of growing up is
1*0.5=0.5 point;The weight information of adlescent characteristic attribute at night is 1*0.5=0.5 points, feature category of growing up
Property weight information be 1*0.5=0.5 point.
The weight information of child's characteristic attribute on each client daytime is added by server, obtains the children on daytime
The weight information of youngster's characteristic attribute is 0 point;The weight of the adlescent characteristic attribute on each client daytime is believed
Manner of breathing adds, and the weight information for obtaining adlescent characteristic attribute is 0.33+0.5=0.83 points;Each client is white
The weight information of it adult characteristic attribute is added, and the weight information for obtaining adult characteristic attribute is
0.33+0.5=0.83 points;The weight information of the old characteristic attribute on each client daytime is added, obtained old
The weight information of year characteristic attribute is 0.33 point.
The weight information of child's characteristic attribute in each client evening is added by server, obtains the children in evening
The weight information of youngster's characteristic attribute is 0 point;The weight of the adlescent characteristic attribute in each client evening is believed
Manner of breathing adds, and the weight information for obtaining adlescent characteristic attribute is 0.33+0.5=0.83 points;By each client evening
On adult characteristic attribute weight information be added, obtain grow up characteristic attribute weight information be
0.33+0.5=0.83 points;The weight information of the old characteristic attribute in each client evening is added, obtained old
The weight information of year characteristic attribute is 0.33 point.
Table one:
Morning | [0:00,4:00) |
Daytime | [4:00,18:00) |
At night | [18:00,21:00) |
The late into the night | [21:00,0:00) |
Step 303, using the weight information of default clustering target k and every kind of characteristic attribute to user of the same race
The all clients that user data is have recorded in equipment are clustered, and obtain k classification, and each classification includes
User equipment belonging at least one client and each client, k are positive integer.
The weight information of server by utilizing pre-set level k and every kind of characteristic attribute in user equipment of the same race to remembering
The all clients for having recorded user data are clustered, and k obtained classification can reflect different clients
Between relevance, can be in same classification if the relevance between different clients is big;It is if different
Relevance between client is small, then can be in different classifications.Wherein, server is to user equipment of the same race
In to have recorded the clustering algorithm used when all clients of user data are clustered can be spectral clustering
(Spectral Clustering, SC) algorithm or k-means clustering algorithms, the present embodiment are not construed as limiting;
Default clustering target k is the Cong Zhongxuan after server has carried out multiple cluster according to different clustering targets
The best clustering target of the Clustering Effect selected.
Wherein, the weight information of server by utilizing pre-set level k and every kind of characteristic attribute is set to user of the same race
The all clients that user data is have recorded in standby are clustered, k obtained classification, including:When of the same race
User equipment when including m user equipment, m × p dimensions are generated according to the weight information of n kind characteristic attributes
Eigenmatrix;When user data does not include the run time section of each client, p=n;Work as user data
Run time section including each client, and when the quantity of preset time period is q, p=n × q;To feature
Matrix is normalized, and obtains the normalization matrix of m × p dimensions;Using clustering target k to normalized moments
Battle array is clustered, and obtains the k classification.
When user data does not include the run time section of each client, now, each user equipment is corresponding
The n data obtained according to the weight information of n kind characteristic attributes, therefore, the eigenmatrix of generation is m × n
Dimension;When user data includes the run time section of each client, and the quantity of preset time period is q,
Because each preset time period has corresponded to n kind characteristic attributes, therefore, each user equipment is corresponding according to every
N × q the data that the weight information of the n kind characteristic attributes of individual preset time period obtains, therefore, the square of generation
Battle array should be m × p, wherein, p=n × q.
Wherein, eigenmatrix is normalized refer to by each element in eigenmatrix it is unified to [0,
1] in section.Server can use max-min normalization to calculate when eigenmatrix is normalized
Method, the present embodiment are not construed as limiting.
Assuming that the user data of each user equipment collection is as shown in following table two in user equipment of the same race, each
The characteristic attribute that client has as shown in following table three, if default weight total score be 1 point, server according to
The weight information of n kinds characteristic attribute corresponding to each user equipment is calculated in table two and table three, such as following table four
Shown, then can obtain eigenmatrix according to following table four is:
The normalization matrix for being normalized to obtain to this feature matrix using max-min normalization algorithms is:
The normalization matrix is clustered using k-means clustering algorithms and clustering target k, obtains k class
Not.
Table two:
User equipment | Client | Running frequency |
User equipment 1 | Nursery rhymes are complete works of | 3 |
User equipment 1 | The xx persons of bearing | 1 |
User equipment 2 | Heroic x | 4 |
User equipment 3 | xxKTV | 2 |
User equipment 3 | One key is cleared up | 6 |
User equipment 4 | The xx persons of bearing | 2 |
Table three:
Table four:
In order to improve the accuracy of extraction feature client, server would generally gather every in a large number of users equipment
The user data of individual client records, such as:The number of users of each client records in 23640 user equipmenies
According to so, the dimension of the normalization matrix of generation is very high, and server by utilizing clustering algorithm is to the normalized moments
When battle array is clustered, greatly and there is redundant data very in amount of calculation, the stability of data is not in the normalization matrix
It is high.In the present embodiment, server after normalization matrix is obtained, can also utilize default dimension-reduction algorithm and
Default dimensionality reduction index l, dimension-reduction treatment is carried out to the normalization matrix of m × p dimensions, obtains the dimensionality reduction of m × l dimensions
Matrix;The dimensionality reduction matrix is clustered using clustering target k, obtains k classification.
The default dimension-reduction algorithm of server by utilizing and dimensionality reduction index l can determine one group from normalization matrix
The linear incoherent characteristic vector of the effective information of the normalization matrix can be at utmost represented, the group is special
It is smaller than normalization matrix to levy the dimension of vector, so, reduces answering for clustering algorithm when server is clustered
Miscellaneous degree, and the redundant data in normalization matrix is eliminated, improve used when server is clustered
The stability of data.Wherein, default dimension-reduction algorithm can be pivot analysis (Principal Component
Analysis, PCA) algorithm, or Non-negative Matrix Factorization (Non-negative Matrix Factorization,
NMF), the present embodiment is not construed as limiting;Default dimensionality reduction index l is calculated according to different dimensionality reduction indexs
In multiple m × p normalization matrixes after the variance for the data that m × l is tieed up, the result of calculation therefrom selected reaches
The dimensionality reduction index of the 90% of the variance of the normalization matrix.
Assuming that default clustering target k is 12, server is clustered using Spectral Clustering to dimensionality reduction matrix,
Obtained cluster result is as shown in following table five.
Table five:
Step 304, at least one feature client is extracted from k classification, feature client, which is used to reflect, to be used
The common interest of the targeted user population of family equipment.
Wherein, server extracts at least one feature client from k classification, including:Determine k class
The center customer end of each classification, the quantity divided by classification of the user equipment belonging to center customer end include in not
The value of quantity of user equipment be more than the first predetermined threshold value;Determined in the j classification that center customer end be present
Including user equipment the maximum classification of quantity, at least one center customer end of the classification of determination is determined
For at least one feature client, 0 < j≤k.
In the present embodiment, the quantity for the user equipment that the quantity of affiliated user equipment divided by classification are included
Value is more than the client of the first predetermined threshold value as center customer end, and feature client is determined from the client of center
End so that the feature client that server extracts be most of users all in the client used, with the spy
Client is levied to reflect that the accuracy of the common interest of the targeted user population of user equipment is higher.
Assuming that the first predetermined threshold value is 70%, then server is by the quantity of affiliated user equipment divided by classification bag
Client of the value of the quantity of the user equipment included more than 70% is as center customer end.Assuming that from shown in table five
Each classification in the center customer end of each classification that determines as shown in following table six, it can be seen from table six,
There is no center customer end in cluster-5, cluster-6, cluster-10, cluster-11 and cluster-12 classification, i.e.,
The quantity of the user equipment included in these classifications in the absence of the quantity divided by classification of affiliated user equipment exceedes
More than 70% client.It can be seen from table five, in the classification that center customer end be present, classification cluster-1
The quantity of user equipment be up to 877, then server using cluster-1 center customer end " xxKTV " as
Feature client.
Table six:
Classification | Center customer end | Classification | Center customer end |
cluster-1 | xxKTV | cluster-9 | Xx Condors |
cluster-2 | X hits ingenious military move 2 | cluster-9 | Fried xx glacial epoch xx |
cluster-3 | Qxx | cluster-9 | Overlord xx |
cluster-4 | Xx puzzles | cluster-9 | Xx plans |
cluster-7 | The keys of xx mono- are cleared up | cluster-9 | Xx balls |
cluster-9 | Xx Great War corpse TV versions | cluster-9 | Xx intelligent |
cluster-9 | Xx fish | cluster-9 | Xx knight 2 |
cluster-9 | The xx persons of bearing | cluster-9 | The xx worlds |
cluster-9 | Xx escapes |
Optionally, in order to extract more feature clients, after at least one feature client is obtained, when
When the quantity of at least one feature client is r, server obtains the mark of each feature client, by r
The mark of each client obtains n+r kind characteristic attributes, r is just whole as a kind of characteristic attribute in client
Number;N is updated to n+r, step 302 is performed to step 304, until extracting feature client in step 304
Stop during the failure of end.
Wherein, the mark of feature client can be characterized the title of client, can also be characterized client
Identity number (identity, ID), the present embodiment is not construed as limiting.
Assuming that the feature client extracted according to table five and table six is " xxKTV ", each user equipment is corresponding
N kind characteristic attributes weight information as shown in Table 4, xxKTV title is defined as a spy by server
Levy attribute, obtain 4+1=5 characteristic attribute, server updated according to possessed by each client after spy
The quantity of attribute is levied, it is determined that every kind of characteristic attribute at least one characteristic attribute possessed by each client
Weight information as shown in following table seven, wherein, used xxKTV user equipment 4 in the xxKTV on daytime
Weight information in characteristic attribute is 1 point.
Table seven:
Optionally, server can with intellectual analysis user may client interested, and to user equipment
Push analysis result.Wherein, the client that server intellectual analysis user may be interested, including:For k
Each classification in individual classification, count the quantity of the user equipment in classification belonging to each client;By belonging to
User equipment quantity be more than the 4th predetermined threshold value client be defined as client to be recommended;Into classification
The user equipment for not installing client to be recommended recommends client to be recommended.
Assuming that the user equipment in some classification more than 50% has all run xx Great War corpse TV versions, then service
The user equipment that device does not install xx Great War corpse TV versions into the category recommends xx Great War corpse TV versions.
Optionally, server can also run the opportunity point of client according to the user equipment of a predetermined level is exceeded
Whether analysis needs to optimize such user equipment.
Assuming that in the presence of 122 user equipmenies after xxKTV has been run, operation xx mono- key cleanings, server
According to user equipment run client Opportunity Analysis learn user equipment can be produced after xxKTV has been run compared with
More cachings are, it is necessary to optimize such user equipment.
Optionally, at least two clients that server can also be run according to the user equipment of a predetermined level is exceeded
End, recommend to the user equipment for only having run a part of type clients at least two client remaining
The client of type.
Assuming that include xx Great Wars corpse TV versions and xx ingenious military moves 2 in the presence of the client of 153 user equipment operations,
Server can recommend xx ingenious military moves 2 to the user equipment for only having run x Great War corpse TV versions.
Step 305, the first user tag is generated according to the user data of each client records, and according at least
One feature client generates second user label.
Server generates the first user tag according to the user data of each client records, such as, night owl,
Pekinese etc.;Server generates second user label according at least one feature client, such as, k songs intelligent,
Clear up intelligent etc..Because the first user tag is generated according to the user data of each client records, because
This, the effect for the relevance that the first user tag reflects between the target group of user equipment of the same race is poor;
And second user label is generated according to feature client, this feature client can reflect use of the same race
The relevance hidden between the target group of family equipment, therefore, second user label reflect that user of the same race sets
The effect of relevance before standby target group is preferable.
In summary, user tag generation method provided in an embodiment of the present invention, determine to record in user equipment
The weight information of every kind of characteristic attribute possessed by each client of user data, obtains n kind feature category
Property in every kind of characteristic attribute weight information according to the weight of default clustering target k and every kind of characteristic attribute believe
Cease and all clients that user data is have recorded in user equipment of the same race clustered, obtain k classification,
At least one feature client is extracted from the k classification so that server not only can be according to user data
User tag is generated, user tag can also be generated according at least one feature client, solve server
When generating user tag according only to user data, in the case where user data is less, the user tag of generation
The problem of less, the effect of the quantity of the user tag of increase generation is reached.
In addition, by setting weight, the weight and client for every kind of characteristic attribute possessed by each client
The negatively correlated relation of quantity of characteristic attribute possessed by end;According to the running frequency of each client and each
The weight of every kind of characteristic attribute possessed by client, it is determined that every kind of characteristic attribute possessed by each client
Weight information;All clients for have recorded user data in each user equipment, by feature of the same race
The weight information of attribute is added, and obtains the weight information of n kind characteristic attributes so that the n kinds that server obtains
The weight information of characteristic attribute and the running frequency correlation of client, embody user and use each
The use habit of client, it ensure that the accuracy of the second user label of generation.
In addition, by setting weight, the weight and client for every kind of characteristic attribute possessed by each client
The negatively correlated relation of quantity of characteristic attribute possessed by end;For each client, according to client every
The weight of every kind of characteristic attribute, determines client possessed by running frequency and client in individual preset time period
Hold the weight information in every kind of characteristic attribute corresponding to each preset time period;For remembering in each user equipment
The all clients of user data are recorded, by the weight information of the characteristic attribute of the same race in same preset time period
It is added, obtains the weight information of n kinds characteristic attribute corresponding to each preset time period so that server obtains
N in characteristic attribute weight information more accurately reflect user use each client custom, and
And server may be referred to further types of user data to generate second user label.
In addition, by carrying out dimension-reduction treatment to normalization matrix, server by utilizing clustering target k had both been reduced
Amount of calculation during clustering algorithm is performed, improves the efficiency that server performs clustering algorithm;Also normalizing is deleted
Change the redundant data in matrix, improve the stability that server carries out data during clustering algorithm.
In addition, the center customer end by determining each classification, and in it the j classification at center customer end be present
It is determined that including user equipment the maximum classification of quantity, at least one center customer end of the category is determined
For at least one feature client so that at least one feature client that server obtains is most of users
All in the client used, the common interest of most of users can be reflected, ensure that the spy that server determines
Levy the accuracy of client.
Fig. 4 is refer to, the block diagram of the user tag generating means provided it illustrates one embodiment of the invention.
The user tag generating means can be implemented in combination with as user equipment by software, hardware or both
All or part.The user tag generating means can include:Acquiring unit 410, determining unit 420,
Cluster cell 430, extraction unit 440, generation unit 450, updating block 460, statistic unit 470, push away
Recommend unit 480.
Acquiring unit 410, each feature visitor is obtained in the function of above-mentioned steps 301 and step 304 for realizing
The mark at family end, using the mark of each client in r client as a kind of characteristic attribute, obtain i+r kinds
The function of characteristic attribute.
Determining unit 420, for realizing the function of above-mentioned steps 302, and, in above-mentioned steps 304 by belonging to
The quantity of user equipment be more than the client of the 4th predetermined threshold value and be defined as the function of client to be recommended.
Cluster cell 430, for realizing the function of above-mentioned steps 303.
Extraction unit 440, for realizing the function of above-mentioned steps 304.
Generation unit 450, for realizing the function of above-mentioned steps 305.
Updating block 460, n is updated to n+r function for realizing in above-mentioned steps 304.
Statistic unit 470, for realizing in above-mentioned steps 304 for each classification in k classification, statistics
The function of the quantity of user equipment belonging to wherein each client.
Recommendation unit 480, the use of client to be recommended is not installed for realizing into classification in above-mentioned steps 304
The function of family equipment recommendation client to be recommended.
Correlative detail can combine embodiment of the method described in reference diagram 3.
It should be noted that above-mentioned acquiring unit 410, determining unit 420, cluster cell 430, extraction
Unit 440, generation unit 450, updating block 460, statistic unit 470, recommendation unit 480 can lead to
The processor crossed in user equipment is realized.
Those of ordinary skill in the art with reference to what the embodiments described herein described it is to be appreciated that respectively show
The unit and algorithm steps of example, it can be come with the combination of electronic hardware or computer software and electronic hardware
Realize.These functions are performed with hardware or software mode actually, application-specific depending on technical scheme
And design constraint.
Those of ordinary skill in the art can be understood that, for convenience and simplicity of description, foregoing description
Device and unit specific work process, may be referred to the corresponding process in preceding method embodiment, herein
Repeat no more.
In embodiment provided herein, it should be understood that disclosed apparatus and method, Ke Yitong
Other modes are crossed to realize.For example, device embodiment described above is only schematical, for example,
It the division of the unit, can be only a kind of division of logic function, can there is other draw when actually realizing
The mode of dividing, such as multiple units or component can combine or be desirably integrated into another system, or some spies
Sign can be ignored, or not perform.
The unit illustrated as separating component can be or may not be it is physically separate, as
The part that unit is shown can be or may not be physical location, you can with positioned at a place, or
It can also be distributed on multiple NEs.It can select according to the actual needs therein some or all of
Unit realizes the purpose of this embodiment scheme.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited to
This, any one skilled in the art the invention discloses technical scope in, can readily occur in
Change or replacement, it should all be included within the scope of the present invention.Therefore, protection scope of the present invention should
It is described to be defined by scope of the claims.
Claims (18)
1. a kind of user tag generation method, it is characterised in that methods described includes:
Each client for have recorded user data in user equipment of the same race, respectively from the client
In possessed characteristic attribute, acquisition and default n kinds characteristic attribute identical at least one characteristic attribute,
The user data is used to reflect the user using the client to the operation performed by the client, institute
State characteristic attribute to be used to reflect the feature that the targeted user population of the client possesses jointly, the n is just
Integer;
According to the quantity of characteristic attribute possessed by each client, it is determined that possessed by each client at least
The weight information of every kind of characteristic attribute in a kind of characteristic attribute, obtain every kind of feature in the n kinds characteristic attribute
The weight information of attribute;
Using the weight information of default clustering target k and every kind of characteristic attribute to the user equipment of the same race
In have recorded all clients of user data and clustered, obtain k classification, each classification is included at least
User equipment belonging to one client and each client, the k are positive integer;
At least one feature client is extracted from the k classification, the feature client is used to reflect institute
State the common interest of the targeted user population of user equipment;
First user tag is generated according to the user data of each client records, and according to described at least one
Feature client generates second user label.
2. according to the method for claim 1, it is characterised in that the user data of each client includes
The running frequency of the client, the quantity of the characteristic attribute according to possessed by the client, it is determined that
The weight information of every kind of characteristic attribute, obtains the n at least one characteristic attribute that the client has
The weight information of every kind of characteristic attribute in kind characteristic attribute, including:
According to the quantity of characteristic attribute possessed by default weight total score and each client, each visitor is set
The weight of every kind of characteristic attribute possessed by the end of family, the weight and characteristic attribute possessed by the client
The negatively correlated relation of quantity;
According to the weight of every kind of characteristic attribute possessed by the running frequency of each client and each client,
It is determined that the weight information of every kind of characteristic attribute possessed by each client;
All clients for have recorded user data in each user equipment, by the power of characteristic attribute of the same race
Weight information is added, and obtains the weight information of the n kinds characteristic attribute.
3. according to the method for claim 1, it is characterised in that the user data of each client includes
The running frequency and run time section of the client, characteristic attribute possessed by each client of basis
Quantity, it is determined that at least one characteristic attribute possessed by each client every kind of characteristic attribute weight letter
Breath, obtains the weight information of every kind of characteristic attribute in the n kinds characteristic attribute, including:
According to the quantity of characteristic attribute possessed by default weight total score and each client, each visitor is set
The weight of every kind of characteristic attribute possessed by the end of family, the weight and characteristic attribute possessed by the client
The negatively correlated relation of quantity;
It is determined that the preset time period belonging to the run time section of each client, and determine each client right
The running frequency in preset time period answered, each preset time period correspond to the n kinds characteristic attribute;
For each client, according to running frequency of the client in each preset time period and described
The weight of every kind of characteristic attribute possessed by client, determine that the client is corresponding in each preset time period
Every kind of characteristic attribute weight information;
All clients for have recorded user data in each user equipment, by same preset time period
Characteristic attribute of the same race weight information be added, obtain the n kinds feature category corresponding to each preset time period
The weight information of property.
4. according to the method in claim 2 or 3, it is characterised in that described to be referred to using default cluster
The weight information of mark k and every kind of characteristic attribute is to have recorded the institute of user data in the user equipment of the same race
There is client to be clustered, obtain k classification, including:
When the user equipment of the same race includes m user equipment, according to the power of the n kinds characteristic attribute
The eigenmatrix of weight information generation m × p dimensions, when the user data does not include the run time of each client
Duan Shi, p=n;When the user data includes the run time section of each client, and the number of preset time period
Measure for q when, p=n × q;
The eigenmatrix is normalized, obtains the normalization matrix of m × p dimensions;
The normalization matrix is clustered using the clustering target k, obtains the k classification.
5. according to the method described in the claim 4, it is characterised in that described to utilize the clustering target
K clusters to the normalization matrix, obtains the k classification, including:
Using default dimension-reduction algorithm and default dimensionality reduction index l, the normalization matrix of m × p dimensions is entered
Row dimension-reduction treatment, obtain the dimensionality reduction matrix of m × l dimensions;
The dimensionality reduction matrix is clustered using the clustering target k, obtains the k classification.
6. according to the method for claim 4, it is characterised in that described to be extracted from the k classification
At least one feature client, including:
Determine the center customer end of each classification in the k classification, the user belonging to the center customer end
The value of the quantity for the user equipment that the quantity of equipment divided by the classification include is more than the first predetermined threshold value;
The classification of the quantity maximum of the user equipment included is determined in the j classification that center customer end be present, will
At least one center customer end of the classification determined is defined as at least one feature client, 0 <
j≤k。
7. method according to any one of claims 1 to 6, it is characterised in that described for use of the same race
Each client of user data is have recorded in the equipment of family, respectively from characteristic attribute possessed by the client
In, acquisition and default n kinds characteristic attribute identical at least one characteristic attribute, including:
Each client for have recorded user data in user equipment of the same race, gather the client note
The user data of record;
The all customer data of the client records is filtered according to preset rules, the preset rules
It is less than the second predetermined threshold value to record the operation duration of the client of the user data, or, in record institute
The operation duration for stating the client of user data is more than the 3rd predetermined threshold value;
When depositing user data after filtration, in the characteristic attribute having from the client, obtain and pre-
If n kind characteristic attribute identical at least one characteristic attributes.
8. method according to any one of claims 1 to 7, it is characterised in that described from the k class
After the not middle at least one feature client of extraction, in addition to:
When the quantity of at least one feature client is r, the mark of each feature client is obtained, will
The mark of each client obtains n+r kind characteristic attributes as a kind of characteristic attribute in r client, described
R is positive integer;
N is updated to n+r, triggering performs the quantity of characteristic attribute possessed by each client of basis,
It is determined that at least one characteristic attribute possessed by each client every kind of characteristic attribute weight information, obtain
The weight information of every kind of characteristic attribute in the n kinds characteristic attribute;Utilize default clustering target k and every kind of
The weight information of characteristic attribute enters to all clients that user data is have recorded in the user equipment of the same race
Row cluster, obtains k classification, the step of extracting at least one feature client from the k classification,
Stop when extracting the feature client failure.
9. according to the method for claim 1, it is characterised in that using default clustering target k and
The weight information of every kind of characteristic attribute is to have recorded all clients of user data in the user equipment of the same race
End is clustered, after obtaining k classification, including:
For each classification in the k classification, the user belonging to each client in the classification is counted
The quantity of equipment;
The client that the quantity of affiliated user equipment is more than to the 4th predetermined threshold value is defined as client to be recommended
End;
The user equipment for not installing the client to be recommended into the classification recommends the client to be recommended
End.
10. a kind of user tag generating means, it is characterised in that described device includes:
Acquiring unit, for each client for have recorded user data in user equipment of the same race, divide
Not from characteristic attribute possessed by the client, obtain with default n kinds characteristic attribute identical at least
A kind of characteristic attribute, the user data are used to reflect the user using the client to the client institute
The operation of execution, the characteristic attribute are used to reflect the spy that the targeted user population of the client possesses jointly
Sign, the n is positive integer;
Determining unit, for the quantity of the characteristic attribute according to possessed by each client, it is determined that each client
The weight information of every kind of characteristic attribute at least one characteristic attribute possessed by end, obtain the n kinds feature
The weight information of every kind of characteristic attribute in attribute;
Cluster cell, for the every kind of feature category obtained using default clustering target k and the determining unit
The weight information of property gathers to all clients that user data is have recorded in the user equipment of the same race
Class, obtains k classification, and each classification is set including the user belonging at least one client and each client
Standby, the k is positive integer;
Extraction unit, for extracting at least one feature in the k classification that is obtained from the cluster cell
Client, the feature client are used for the common interest for reflecting the targeted user population of the user equipment;
Generation unit, for generating the first user tag, and root according to the user data of each client records
According at least one feature client generation second user label of extraction unit extraction.
11. device according to claim 10, it is characterised in that the user data package of each client
The running frequency of the client is included, the determining unit, is used for:
According to the quantity of characteristic attribute possessed by default weight total score and each client, each visitor is set
The weight of every kind of characteristic attribute possessed by the end of family, the weight and characteristic attribute possessed by the client
The negatively correlated relation of quantity;
According to the weight of every kind of characteristic attribute possessed by the running frequency of each client and each client,
It is determined that the weight information of every kind of characteristic attribute possessed by each client;
All clients for have recorded user data in each user equipment, by the power of characteristic attribute of the same race
Weight information is added, and obtains the weight information of the n kinds characteristic attribute.
12. device according to claim 10, it is characterised in that the user data package of each client
The running frequency and run time section of the client are included, the determining unit, is used for:
According to the quantity of characteristic attribute possessed by default weight total score and each client, each visitor is set
The weight of every kind of characteristic attribute possessed by the end of family, the weight and characteristic attribute possessed by the client
The negatively correlated relation of quantity;
It is determined that the preset time period belonging to the run time section of each client, and determine each client right
The running frequency in preset time period answered, each preset time period correspond to the n kinds characteristic attribute;
For each client, according to running frequency of the client in each preset time period and described
The weight of every kind of characteristic attribute possessed by client, determine that the client is corresponding in each preset time period
Every kind of characteristic attribute weight information;
All clients for have recorded user data in each user equipment, by same preset time period
Characteristic attribute of the same race weight information be added, obtain the n kinds feature category corresponding to each preset time period
The weight information of property.
13. the device according to claim 11 or 12, it is characterised in that the cluster cell, use
In:
When the user equipment of the same race includes m user equipment, according to the power of the n kinds characteristic attribute
The eigenmatrix of weight information generation m × n dimensions, when the user data does not include the run time of each client
Duan Shi, p=n;When the user data includes the run time section of each client, and the number of preset time period
Measure for q when, p=n × q;
The eigenmatrix is normalized, obtains the normalization matrix of m × p dimensions;
The normalization matrix is clustered using the clustering target k, obtains the k classification.
14. according to the device described in the claim 13, it is characterised in that the cluster cell, be used for:
Using default dimension-reduction algorithm and default dimensionality reduction index l, the normalization matrix of m × p dimensions is entered
Row dimension-reduction treatment, obtain the dimensionality reduction matrix of m × l dimensions;
The dimensionality reduction matrix is clustered using the clustering target k, obtains the k classification.
15. device according to claim 13, it is characterised in that the extraction unit, be used for:
Determine the center customer end of each classification in the k classification, the user belonging to the center customer end
The value of the quantity for the user equipment that the quantity of equipment divided by the classification include is more than the first predetermined threshold value;
The classification of the quantity maximum of the user equipment included is determined in the j classification that center customer end be present, will
At least one center customer end of the classification determined is defined as at least one feature client, 0 <
j≤k。
16. according to any described device of claim 10 to 15, it is characterised in that the acquiring unit,
For:
Each client for have recorded user data in user equipment of the same race, gather the client note
The user data of record;
The all customer data of the client records is filtered according to preset rules, the preset rules
It is less than the second predetermined threshold value to record the operation duration of the client of the user data, or, in record institute
The operation duration for stating the client of user data is more than the 3rd predetermined threshold value;
When depositing user data after filtration, in the characteristic attribute having from the client, obtain and pre-
If n kind characteristic attribute identical at least one characteristic attributes.
17. according to any described device of claim 10 to 16, it is characterised in that
The acquiring unit, for it is described extract at least one feature client from the k classification after,
When the quantity of at least one feature client is r, the mark of each feature client is obtained, by r
The mark of each client obtains n+r kind characteristic attributes, the r as a kind of characteristic attribute in client
For positive integer;
Described device also includes:
Updating block, for n to be updated into n+r, triggering performs special possessed by each client of basis
The quantity of attribute is levied, it is determined that every kind of characteristic attribute at least one characteristic attribute possessed by each client
Weight information, obtain the weight information of every kind of characteristic attribute in the n kinds characteristic attribute;Gathered using default
The weight information of class index k and every kind of characteristic attribute in the user equipment of the same race to have recorded user data
All clients clustered, obtain k classification, at least one feature extracted from the k classification
The step of client, stop when extracting the feature client failure.
18. device according to claim 10, it is characterised in that described device also includes:
Statistic unit, for utilizing default clustering target k and every kind of characteristic attribute weight information to institute
State and all clients of user data are have recorded in user equipment of the same race clustered, obtain k classification it
Afterwards, for each classification in the k classification, the user belonging to each client in the classification is counted
The quantity of equipment;
Determining unit, the client for the quantity of affiliated user equipment to be more than to the 4th predetermined threshold value determine
For client to be recommended;
Recommendation unit, the user equipment for not installing the client to be recommended into the classification recommend institute
State client to be recommended.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610454113.8A CN107526741B (en) | 2016-06-21 | 2016-06-21 | User label generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610454113.8A CN107526741B (en) | 2016-06-21 | 2016-06-21 | User label generation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107526741A true CN107526741A (en) | 2017-12-29 |
CN107526741B CN107526741B (en) | 2021-05-18 |
Family
ID=60735282
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610454113.8A Active CN107526741B (en) | 2016-06-21 | 2016-06-21 | User label generation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107526741B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109214435A (en) * | 2018-08-21 | 2019-01-15 | 北京睦合达信息技术股份有限公司 | A kind of data classification method and device |
CN111125506A (en) * | 2018-11-01 | 2020-05-08 | 百度在线网络技术(北京)有限公司 | Interest circle subject determination method, device, server and medium |
CN111382343A (en) * | 2018-12-27 | 2020-07-07 | 方正国际软件(北京)有限公司 | Label system generation method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6049777A (en) * | 1995-06-30 | 2000-04-11 | Microsoft Corporation | Computer-implemented collaborative filtering based method for recommending an item to a user |
CN103198418A (en) * | 2013-03-15 | 2013-07-10 | 北京亿赞普网络技术有限公司 | Application recommendation method and application recommendation system |
CN103218355A (en) * | 2012-01-18 | 2013-07-24 | 腾讯科技(深圳)有限公司 | Method and device for generating tags for user |
CN104750789A (en) * | 2015-03-12 | 2015-07-01 | 百度在线网络技术(北京)有限公司 | Label recommendation method and device |
-
2016
- 2016-06-21 CN CN201610454113.8A patent/CN107526741B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6049777A (en) * | 1995-06-30 | 2000-04-11 | Microsoft Corporation | Computer-implemented collaborative filtering based method for recommending an item to a user |
CN103218355A (en) * | 2012-01-18 | 2013-07-24 | 腾讯科技(深圳)有限公司 | Method and device for generating tags for user |
CN103198418A (en) * | 2013-03-15 | 2013-07-10 | 北京亿赞普网络技术有限公司 | Application recommendation method and application recommendation system |
CN104750789A (en) * | 2015-03-12 | 2015-07-01 | 百度在线网络技术(北京)有限公司 | Label recommendation method and device |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109214435A (en) * | 2018-08-21 | 2019-01-15 | 北京睦合达信息技术股份有限公司 | A kind of data classification method and device |
CN111125506A (en) * | 2018-11-01 | 2020-05-08 | 百度在线网络技术(北京)有限公司 | Interest circle subject determination method, device, server and medium |
CN111382343A (en) * | 2018-12-27 | 2020-07-07 | 方正国际软件(北京)有限公司 | Label system generation method and device |
CN111382343B (en) * | 2018-12-27 | 2023-11-28 | 方正国际软件(北京)有限公司 | Label system generation method and device |
Also Published As
Publication number | Publication date |
---|---|
CN107526741B (en) | 2021-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sharma et al. | Early diagnosis of rice plant disease using machine learning techniques | |
Fawagreh et al. | Random forests: from early developments to recent advancements | |
Vizentin‐Bugoni et al. | Including rewiring in the estimation of the robustness of mutualistic networks | |
CN109960763B (en) | Photography community personalized friend recommendation method based on user fine-grained photography preference | |
Gomez et al. | Evolution of pollination niches and floral divergence in the generalist plant Erysimum mediohispanicum | |
WO2020224128A1 (en) | News recommendation method and apparatus based on short-term interest of user, and electronic device and medium | |
CN109816535A (en) | Cheat recognition methods, device, computer equipment and storage medium | |
CN110012060A (en) | Information-pushing method, device, storage medium and the server of mobile terminal | |
CN104221015B (en) | Image retrieving apparatus, image search method, program and computer-readable storage medium | |
CN114359738B (en) | Cross-scene robust indoor people number wireless detection method and system | |
CN108228844A (en) | A kind of picture screening technique and device, storage medium, computer equipment | |
CN107526741A (en) | user tag generation method and device | |
KR101082589B1 (en) | System for providing Aspect Level News Browsing Service that reduce Media-Bias Effect and Method therefor | |
CN109241392A (en) | Recognition methods, device, system and the storage medium of target word | |
Song et al. | A non-cooperative game with incomplete information to improve patient hospital choice | |
Ma et al. | An improved SVM model for relevance feedback in remote sensing image retrieval | |
CN110309143A (en) | Data similarity determines method, apparatus and processing equipment | |
CN110489175A (en) | Service processing method, device, server and storage medium | |
CN108647739A (en) | A kind of myspace discovery method based on improved density peaks cluster | |
CN112765367B (en) | Method and device for constructing topic knowledge graph | |
CN109376287B (en) | House property map construction method, device, computer equipment and storage medium | |
He et al. | Multi-objective spatially constrained clustering for regionalization with particle swarm optimization | |
CN116415658A (en) | Searching method, searching device and computer storage medium of neural network architecture | |
CN109657950A (en) | Hierarchy Analysis Method, device, equipment and computer readable storage medium | |
Facco et al. | Comparison of PBIA and GEOBIA classification methods in classifying turbidity in reservoirs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200210 Address after: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen Applicant after: HUAWEI TECHNOLOGIES Co.,Ltd. Address before: 210000 Ande Gate No. 94, Yuhuatai District, Jiangsu, Nanjing Applicant before: Huawei Technologies Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |