CN107526741A - user tag generation method and device - Google Patents

user tag generation method and device Download PDF

Info

Publication number
CN107526741A
CN107526741A CN201610454113.8A CN201610454113A CN107526741A CN 107526741 A CN107526741 A CN 107526741A CN 201610454113 A CN201610454113 A CN 201610454113A CN 107526741 A CN107526741 A CN 107526741A
Authority
CN
China
Prior art keywords
client
characteristic attribute
user
classification
weight information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610454113.8A
Other languages
Chinese (zh)
Other versions
CN107526741B (en
Inventor
熊安斌
张锋
张旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201610454113.8A priority Critical patent/CN107526741B/en
Publication of CN107526741A publication Critical patent/CN107526741A/en
Application granted granted Critical
Publication of CN107526741B publication Critical patent/CN107526741B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of user tag generation method and device, are related to communication technical field, and this method includes:Each client for have recorded user data in user equipment of the same race, obtain and default n kinds characteristic attribute identical at least one characteristic attribute;It is determined that at least one characteristic attribute possessed by each client every kind of characteristic attribute weight information, obtain the weight information of every kind of characteristic attribute in n kind characteristic attributes;The all clients that user data is have recorded in user equipment of the same race are clustered using the weight information of default clustering target k and every kind of characteristic attribute;At least one feature client is extracted from k obtained classification;First user tag is generated according to the user data of each client records, and second user label is generated according at least one feature client, when solving server according to less user data generation user tag, the problem of user tag of generation is less, the effect of the quantity of the user tag of increase generation is reached.

Description

User tag generation method and device
Technical field
The present invention relates to communication technical field, more particularly to a kind of user tag generation method and device.
Background technology
At present, server can determine the target group of product by building user's portrait.User, which draws a portrait, to be used In the feature for portraying user, the feature of user includes the age of user, sex, interest, custom, position etc. Feature.Because user tag can be used for describing the feature of at least one user, therefore, server can lead to Generation user tag is crossed to obtain user's portrait.User tag include wheat despot, literature and art young woman, Pekinese, Night owl etc..
Wherein, server generation user tag, including:Gather the user data at least one user equipment; The user data is analyzed, obtains the feature of the user using at least one user equipment;According to this Feature generates user tag.Wherein, user data can be the position data of user, age data, behavior It is accustomed to data, hobby data, health status data etc..
Because server can only generate user tag according to the user data of collection, therefore, in the user of collection When data are less, the negligible amounts of the user tag of generation.
The content of the invention
In order to solve the user data of collection it is less when, the problem of the negligible amounts of the user tag of generation, this Application provides a kind of user tag generation method and device.
First aspect, there is provided a kind of user tag generation method, this method include:For user of the same race Each client of user data is have recorded in equipment, respectively from characteristic attribute possessed by the client, Obtain and default n kinds characteristic attribute identical at least one characteristic attribute;Had according to each client Characteristic attribute quantity, it is determined that every kind of feature category at least one characteristic attribute possessed by each client The weight information of property, obtains the weight information of every kind of characteristic attribute in n kind characteristic attributes;Gathered using default The weight information of class index k and every kind of characteristic attribute is to have recorded the institute of user data in user equipment of the same race There is client to be clustered, obtain k classification, each classification includes at least one client and each client User equipment belonging to end, k is positive integer;At least one feature client, feature are extracted from k classification Client is used for the common interest for reflecting the targeted user population of user equipment;According to each client records User data generates the first user tag, and generates second user label according at least one feature client. Wherein, user data is used to reflect the user using client to the operation performed by the client, feature category Property be used for reflect the feature that the targeted user population of client possesses jointly, n is positive integer.
By determining to have recorded every kind of characteristic attribute possessed by each client of user data in user equipment Weight information, obtain the weight information of every kind of characteristic attribute in n kind characteristic attributes and referred to according to default cluster The weight information of mark k and every kind of characteristic attribute is to have recorded all visitors of user data in user equipment of the same race Family end is clustered, and obtains k classification, and at least one feature client is extracted from the k classification, is made Obtain server not only can generate user tag according to user data, can also be according at least one feature client End generation user tag, when solving server according only to user data generation user tag, in user data In the case of less, the problem of user tag of generation is less, the number of the user tag of increase generation is reached The effect of amount.
With reference to the first realization of first aspect, the user data of each client includes the operation frequency of client Rate, the quantity of characteristic attribute according to possessed by client, determine at least one feature category that client has The weight information of every kind of characteristic attribute, obtains the weight information of every kind of characteristic attribute in n kind characteristic attributes in property, Including:According to the quantity of characteristic attribute possessed by default weight total score and each client, set each The weight of every kind of characteristic attribute possessed by client, weight and the quantity of characteristic attribute possessed by client Negatively correlated relation;Every kind of feature category according to possessed by the running frequency of each client and each client Property weight, it is determined that the weight information of every kind of characteristic attribute possessed by each client;For each user The all clients of user data are have recorded in equipment, the weight information of characteristic attribute of the same race is added, obtains n The weight information of kind characteristic attribute.
By setting weight for every kind of characteristic attribute possessed by each client, the weight is had with client The negatively correlated relation of quantity of some characteristic attributes;According to the running frequency of each client and each client The weight of possessed every kind of characteristic attribute, it is determined that the weight of every kind of characteristic attribute possessed by each client Information;All clients for have recorded user data in each user equipment, by characteristic attribute of the same race Weight information is added, and obtains the weight information of n kind characteristic attributes so that the n kind feature category that server obtains The weight information of property and the running frequency correlation of client, embody user and use each client Use habit, ensure that the accuracy of the second user label of generation.
With reference in a first aspect, in second of realization of first aspect, the user data of each client includes The running frequency and run time section of client, according to the quantity of characteristic attribute possessed by each client, It is determined that at least one characteristic attribute possessed by each client every kind of characteristic attribute weight information, obtain n The weight information of every kind of characteristic attribute in kind characteristic attribute, including:According to default weight total score and each visitor The quantity of characteristic attribute possessed by the end of family, the weight of every kind of characteristic attribute possessed by each client is set, Weight and the negatively correlated relation of quantity of characteristic attribute possessed by client;It is determined that the operation of each client Preset time period belonging to period, and determine operation frequency of each client in corresponding preset time period Rate, each preset time period correspond to n kind characteristic attributes;For each client, according to client each The weight of every kind of characteristic attribute, determines client possessed by running frequency and client in preset time period In the weight information of every kind of characteristic attribute corresponding to each preset time period;For being recorded in each user equipment The all clients of user data, by the weight information phase of the characteristic attribute of the same race in same preset time period Add, obtain the weight information of n kinds characteristic attribute corresponding to each preset time period.
By setting weight for every kind of characteristic attribute possessed by each client, the weight is had with client The negatively correlated relation of quantity of some characteristic attributes;For each client, according to client each default The weight of every kind of characteristic attribute, determines client every possessed by running frequency and client in period The weight information of every kind of characteristic attribute corresponding to individual preset time period;For have recorded use in each user equipment The all clients of user data, the weight information of the characteristic attribute of the same race in same preset time period is added, Obtain the weight information of n kinds characteristic attribute corresponding to each preset time period so that in the n that server obtains The weight information of characteristic attribute more accurately reflects that user uses the custom of each client, and services Device may be referred to further types of user data to generate second user label.
With reference to first aspect the first realize or second realization, in the third realization of first aspect, Using the weight information of default clustering target k and every kind of characteristic attribute to have recorded in user equipment of the same race The all clients of user data are clustered, and obtain k classification, including:When user equipment bag of the same race When including m user equipment, the eigenmatrix of m × p dimensions is generated according to the weight information of n kind characteristic attributes, When the user data does not include the run time section of each client, p=n;When the user data includes The run time section of each client, and when the quantity of preset time period is q, p=n × q;To eigenmatrix It is normalized, obtains the normalization matrix of m × p dimensions;Normalization matrix is entered using clustering target k Row cluster, obtains k classification.
With reference to the third realization of first aspect, in the 4th kind of realization of first aspect, clustering target k is utilized Normalization matrix is clustered, obtains k classification, including:Utilize default dimension-reduction algorithm and default Dimensionality reduction index l, dimension-reduction treatment is carried out to the normalization matrix of m × p dimensions, obtains the dimensionality reduction matrix of m × l dimensions; Dimensionality reduction matrix is clustered using clustering target k, obtains k classification.
By carrying out dimension-reduction treatment to normalization matrix, server by utilizing clustering target k was both reduced and had performed and gathered Amount of calculation during class algorithm, improve the efficiency that server performs clustering algorithm;Also normalization matrix is deleted In redundant data, improve server carry out clustering algorithm when data stability.
With reference to the third realization of first aspect, in the 5th kind of realization of first aspect, from k classification At least one feature client is extracted, including:The center customer end of each classification in k classification is determined, in The value of the quantity for the user equipment that the quantity divided by classification of user equipment belonging to heart client include is more than first Predetermined threshold value;The class of the quantity maximum of the user equipment included is determined in the j classification that center customer end be present Not, at least one center customer end of the classification of determination is defined as at least one feature client, 0 < j≤k.
By determining the center customer end of each classification, and bag is determined in j classification for exist center customer end The maximum cluster of the quantity of the user equipment included, at least one center customer end of the category is defined as at least One feature client so that at least one feature client that server obtains is that most of users make Client, the common interest of most of users can be reflected, ensure that the feature client that server determines The accuracy at end.
With reference to any one in the first of first aspect and first aspect to the 5th kind of realization, first During the 6th kind of aspect is realized, each client for have recorded user data in user equipment of the same race, Respectively from characteristic attribute possessed by client, obtain and default n kinds characteristic attribute identical at least one Kind characteristic attribute, including:Each client for have recorded user data in user equipment of the same race, is adopted Collect the user data of client records;The all customer data of client records was carried out according to preset rules Filter, preset rules are less than the second predetermined threshold value for the operation duration of the client of record user data, or, It is more than the 3rd predetermined threshold value in the operation duration of the client of record user data;When depositing user after filtration During data, in the characteristic attribute having from client, obtain with default n kinds characteristic attribute identical at least A kind of characteristic attribute.
By being filtered to the user data of client records so that server can filter out and not meet reality The user data of border service condition, the accuracy that server calculates the weight information of n kind characteristic attributes is improved, So as to improve the accuracy of cluster.
With reference to any one in the first of first aspect and first aspect to the 6th kind of realization, first During the 7th kind of aspect is realized, after at least one feature client is extracted from k classification, in addition to: When the quantity of at least one feature client is r, the mark of each feature client is obtained, by r client The mark of each client obtains n+r kind characteristic attributes, r is positive integer as a kind of characteristic attribute in end; N is updated to n+r, triggering performs the quantity of the characteristic attribute according to possessed by each client, it is determined that each The weight information of every kind of characteristic attribute at least one characteristic attribute possessed by client, obtain the n kinds The weight information of every kind of characteristic attribute in characteristic attribute;Utilize default clustering target k and every kind of characteristic attribute Weight information all clients that user data is have recorded in the user equipment of the same race are clustered, K classification is obtained, the step of at least one feature client is extracted from k classification, until extracting feature Client stops when failing.
It is added to n kind characteristic attributes by regarding the mark of r obtained feature client as r characteristic attribute In, n is updated to n+r, circulation performs the step of extracting at least one feature client so that server can Constantly to extract feature client, more user tags are generated according to feature client, further increase The quantity of the user tag of server generation.
With reference in a first aspect, in the 8th kind of first aspect realizes, using default clustering target k and The weight information of every kind of characteristic attribute is to have recorded all clients of user data in the user equipment of the same race End is clustered, after obtaining k classification, including:For each classification in k classification, statistics should The quantity of user equipment in classification belonging to each client;The quantity of affiliated user equipment is more than the 4th The client of predetermined threshold value is defined as client to be recommended;The user of client to be recommended is not installed into cluster Equipment recommendation client to be recommended.
By determining the client to be recommended of each classification, the user for not installing client to be recommended into classification Equipment recommendation client to be recommended so that server can to user recommended user may client interested, Reduce the difficulty that user obtains the client.
Second aspect, there is provided a kind of user tag generating means, the device include at least one unit, should At least one unit is used to realize the use provided at least one realization of above-mentioned first aspect or first aspect Family label generating method.
The third aspect, there is provided a kind of server, the device include:Processor and it is connected with processing Wireless transceiver;
The wireless transceiver is configured as being controlled by processor, the processor be used for realize above-mentioned first aspect or User tag generation method provided at least one realization of first aspect.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, institute in being described below to embodiment The accompanying drawing needed to use is briefly described, it should be apparent that, drawings in the following description are only the present invention Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, Other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is the communication system architecture schematic diagram that an exemplary embodiment of the invention provides;
Fig. 2 is the structural representation for the user equipment that an exemplary embodiment of the invention provides;
Fig. 3 is the flow chart for the user tag generation method that an exemplary embodiment of the invention provides;
Fig. 4 is the structure chart for the user tag generating means that an exemplary embodiment of the invention provides.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to the present invention Embodiment is described in further detail.
Referenced herein " unit " refers to the functional structure logically divided, and being somebody's turn to do " unit " can be by pure Hardware realization, or, software and hardware is implemented in combination with.
Fig. 1 is refer to, the structure of the communication system 100 provided it illustrates an illustrative embodiment of the invention Schematic diagram.The communication system 100 includes server 120 and multiple user equipmenies 140.
Server 120 is connected by communication network with each user equipment 140, and for gathering each user The user data of client records in equipment 140.
At least one client is installed in user equipment 140, each client can record user data, The user data is used to reflect the operation performed by the user using the client.Such as:Client records are used Family starts the number of the client, record run time section etc..User equipment 140 can be set top box, move Mobile phone (English:Cellphone), smart mobile phone (English:Smartphone), computer (English:Computer), Tablet personal computer (English:Tablet computer), wearable device (English:Wearable device), it is personal Digital assistant (English:Personal digital assistant, PDA), mobile internet device (English:mobile Internet device, MID) and E-book reader (English:E-book reader) etc..
In the present embodiment, multiple user equipmenies 140 belong to same kind.For example, multiple user equipmenies 140 All it is set top box, or, multiple user equipmenies 140 are all smart mobile phones etc..
Fig. 2 is refer to, it illustrates the structure of the server 200 shown in another exemplary embodiment of the invention Schematic diagram.The server 200 can be the server 120 shown in Fig. 1, and the server includes:Processing Device 220, the wireless transceiver 240 being connected with processor 220.
The wireless transceiver 240 can be made up of one or more antennas, and the antenna enables server 200 Send or receive radio signal.
Wireless transceiver 240 may be connected to processor 220.Processor 220 is the control centre of server, should Processor 220 can be central processing unit (English:Central processing unit, CPU), at network Manage device (English:Network processor, NP) or CPU and NP combination.Processor 220 may be used also To further comprise hardware chip.Above-mentioned hardware chip can be application specific integrated circuit (English: Application-specific integrated circuit, ASIC), PLD (English: Programmable logic device, PLD) or its combination.Above-mentioned PLD can be complex programmable logic Device (English:Complex programmable logic device, CPLD), field programmable gate battle array Row (English:Field-programmable gate array, FPGA), GAL (English:generic Array logic, GAL) or its any combination.
Optionally, the server 200 also includes memory 260, the memory 260 bus or other means It is connected with processor 220, memory 260 can be volatile memory (English:Volatile memory), Nonvolatile memory (English:Non-volatile memory) or combinations thereof.Volatile memory Can be random access memory (English:Random-access memory, RAM), such as static random Access memory (English:Static random access memory, SRAM), dynamic random access memory Device (English:Dynamic random access memory, DRAM).Nonvolatile memory can be only Read memory (English:Read only memory image, ROM), such as programmable read only memory (English Text:Programmable read only memory, PROM), Erasable Programmable Read Only Memory EPROM (English: Erasable programmable read only memory, EPROM), Electrically Erasable Read Only Memory (English:Electrically erasable programmable read-only memory, EEPROM).It is non-easy The property lost memory can also be flash memory (English:Flash memory), magnetic memory, such as tape (English:Magnetic tape), floppy disk (English:Floppy disk), hard disk.Nonvolatile memory It can be CD.
User data can be stored in memory 260.Alternatively, memory 260 can store processor 220 Feature client, the first user tag and second user label for determining etc., specific determination process can be seen below State the description in step 304 and step 305.
Fig. 3 is refer to, the stream of the user tag generation method provided it illustrates an exemplary embodiment of the invention Cheng Tu.The present embodiment is used in communication system as shown in Figure 1 come for example, by server in this way Following step is performed, this method includes following steps:
Step 301, for have recorded each client of user data in user equipment of the same race, respectively from visitor In characteristic attribute possessed by the end of family, obtain and default n kinds characteristic attribute identical at least one feature category Property, n is positive integer.
Multiple client is mounted with user equipment, when user uses some client, is deposited in the client In the user data of record;When some client was not used in user, the unwritten user of the client Data.Wherein, user data is used to reflect the operation performed by the user customer using client, than Such as click on the frequency of client, the period for running client, the web page contents using Client browse.
Each client for have recorded user data in user equipment of the same race, server obtain the client At least one characteristic attribute at end.Wherein, characteristic attribute is used to reflect that the targeted user population of client is common The feature possessed, such as:The feature that the targeted user population of client " nursery rhymes are complete works of " possesses jointly is children, Then the characteristic attribute of " nursery rhymes are complete works of " is children;The targeted user population of client " square dance is complete works of " has jointly Standby feature is old man, then the characteristic attribute of " square dance is complete works of " is old man.
Client for have recorded user data, the user data of the possible client records do not meet reality Service condition, such as, run time is less than 1 second or run time is more than 24 hours, now, if service Device obtains the characteristic attribute of the client, and may result in follow-up result of calculation can not accurately reflect user The common interest of the targeted user population of equipment.In the present embodiment, server is useful to the institute of client records User data is filtered according to preset rules, when the preset rules is record the operation of the client of user data Length is less than the second predetermined threshold value, or, it is pre- to be more than the 3rd in the operation duration of the client of record user data If threshold value;When depositing user data after filtration, at least one characteristic attribute of the client is obtained.This Sample, server have filtered out the user data for not meeting actual use situation, are tied so as to improve follow-up calculate The accuracy of fruit.
Because each client may have various features attribute, but only part is special in the various features attribute It is identical with default n kinds characteristic attribute to levy attribute, therefore, server needs have a variety of from the client Obtained in characteristic attribute at least one with default n kinds characteristic attribute identical.Wherein, default n kinds are special Levying attribute is determined according to business objective, such as:Business objective is to determine the target of user equipment of the same race The age of colony, then characteristic attribute can be child, teenager, adult, old age in default n.
Assuming that default n kinds characteristic attribute is respectively child, teenager, adult, old age, " xx bears client The characteristic attribute that person " has for game, leisure, adult and teenager, then server from game, leisure, into Adult and characteristic attribute of the two characteristic attributes of teenager as " the xx persons of bearing " are obtained in year and teenager.
It should be noted that the characteristic attribute that client described below has is each meant and default n kinds Characteristic attribute identical characteristic attribute.
Step 302, according to possessed by each client characteristic attribute quantity, it is determined that each client is had The weight information of every kind of characteristic attribute at least one characteristic attribute having, obtain every kind of in n kind characteristic attributes The weight information of characteristic attribute.
Wherein, the weight information of every kind of characteristic attribute is used to reflect whether user has a preference for use and have this feature category Property client, the usual weight information is represented with numeral, the bigger characteristic attribute of weight information, illustrate use More have a preference for using the client with this feature attribute at family.
In one implementation, the user data of each client includes the running frequency of the client, clothes The quantity of business device characteristic attribute according to possessed by each client, it is determined that possessed by each client at least The weight information of every kind of characteristic attribute in a kind of characteristic attribute, the weight of every kind of characteristic attribute in n kind characteristic attributes Information, including:According to the quantity of characteristic attribute possessed by default weight total score and each client, if Put the weight of every kind of characteristic attribute possessed by each client, weight and characteristic attribute possessed by client The negatively correlated relation of quantity;It is every kind of according to possessed by the running frequency of each client and each client The weight of characteristic attribute, it is determined that the weight information of every kind of characteristic attribute possessed by each client;For every The all clients of user data are have recorded in individual user equipment, the weight information of characteristic attribute of the same race is added, Obtain the weight information of n kind characteristic attributes.
The running frequency of each client refers to that user uses the number of the client.Each client can incite somebody to action The user of statistics uses the number of the client as the running frequency of the client in preset duration, can also User is used into running frequency of the client total degree as the client, the present embodiment is not to each client The determination mode of the running frequency at end is construed as limiting.
Wherein, according to possessed by default weight total score and each client characteristic attribute quantity, set The weight of every kind of characteristic attribute refers to possessed by each client:It is special possessed by a client when existing The quantity for levying attribute is a, and when default weight total score is b, then every kind of feature category possessed by the client The weight of property is b/a.It should be noted that algorithm b/a is when actually realizing, it is possible to achieve it is b × (1/a), The calculating process that the present embodiment does not calculate server the weight of every kind of characteristic attribute is construed as limiting.Default weight Total score can be any reasonable value, such as:1 point, 2 points, 100 grade, the present embodiment is not to weight weight The concrete numerical value of total score is construed as limiting.
According to the weight of every kind of characteristic attribute possessed by the running frequency of each client and each client, It is determined that the weight information of every kind of characteristic attribute refers to possessed by each client:When one client of presence Running frequency is c, and when the weight of every kind of characteristic attribute possessed by the client is b/a, then the client The weight information of possessed every kind of characteristic attribute is c × b/a.
Assuming that default 4 kinds of characteristic attributes are respectively child, teenager, adult, old age, and default power Weight total score be 1 point, for same user equipment have recorded user data client " the xx persons of bearing " and " xxKTV ", server get the characteristic attribute of " the xx persons of bearing " as teenager, adult, old age, feature category Property quantity be 3, then the weight set for the adlescent characteristic attribute of " the xx persons of bearing " is 1/3=0.33 point; The weight set for the adult characteristic attribute of " the xx persons of bearing " is 1/3=0.33 points;It is special for the old age of the client The power that sign attribute is set is attached most importance to 1/3=0.33 points.The characteristic attribute that server gets " xxKTV " is teenager Then it is 1/2=0.5 points for the weight that teenager's attribute of " xxKTV " is set and adult, characteristic attribute are two; The weight for being then the adult attribute setting of " xxKTV " is 1/2=0.5 points.
The running frequency of " if the xx persons of bearing " is 1 time, the weight information of the adlescent characteristic attribute of " the xx persons of bearing " For 1*0.33=0.33 points, the weight information for characteristic attribute of growing up is 1*0.33=0.33 points, old characteristic attribute Weight information be 1*0.33=0.33 point.
If the running frequency of " xxKTV " is 2 times, the weight information of the adlescent characteristic attribute of " xxKTV " is 2*0.5=1 points, the weight information for characteristic attribute of growing up is 2*0.5=1 points.
The weight information of child's characteristic attribute of each client is added by server, obtains child's characteristic attribute Weight information be 0 point;The weight information of the adlescent characteristic attribute of each client is added, obtains green grass or young crops The weight information of juvenile characteristic attribute is 0.33+1=1.33 points;By the power of the adult characteristic attribute of each client Weight information is added, and the weight information for obtaining adult characteristic attribute is 0.33+1=1.33 points;By each client The weight information of old characteristic attribute is added, and the weight information for obtaining old characteristic attribute is 0.33 point.
In another implementation, the user data of each client include the client running frequency and Run time section, according to the quantity of characteristic attribute possessed by each client, it is determined that each client is had The weight information of every kind of characteristic attribute at least one characteristic attribute having, obtain every kind of in n kind characteristic attributes The weight information of characteristic attribute, including:According to feature possessed by default weight total score and each client The quantity of attribute, the weight of every kind of characteristic attribute possessed by each client, weight and client institute are set The negatively correlated relation of quantity for the characteristic attribute having;It is it is determined that pre- belonging to the run time section of each client If the period, and running frequency of each client in corresponding preset time period is determined, when each presetting Between section correspond to n kind characteristic attributes;For each client, according to client in each preset time period The weight of every kind of characteristic attribute possessed by running frequency and client, determine client in each preset time The weight information of every kind of characteristic attribute corresponding to section;Institute for have recorded user data in each user equipment There is client, the weight information of the characteristic attribute of the same race in same preset time period is added, obtained each pre- If the weight information of n kinds characteristic attribute corresponding to the period.
Wherein, the run time section of each client can be period of the client in front stage operation, It can also be residing period of the client during being run from bringing into operation to terminating, the period wraps Front stage operation period and running background period are included, the present embodiment is not to the run time section of each client Determination mode be construed as limiting.
Assuming that 4 preset time periods in server, as shown in following table one, default 4 kinds of characteristic attributes are distinguished For child, teenager, adult, old age, and default weight is 1 point, in same user equipment The client " the xx persons of bearing " and " xxKTV ", server that have recorded user data get the feature of " the xx persons of bearing " Attribute is teenager, adult, old age, and the quantity of characteristic attribute is 3, and has been run once on daytime; The characteristic attribute of " xxKTV " is teenager, adult, and the quantity of characteristic attribute is 2, and on daytime and evening On be separately operable once.Server sets " the xx persons of bearing " and " xxKTV " weight respectively, and according to setting The weight information of the weight weight that respectively obtains " the xx persons of bearing " and " xxKTV " be:" the xx persons of bearing " is on daytime The weight information of adlescent characteristic attribute is 1*0.33=0.33 points, and the weight information for characteristic attribute of growing up is 1*0.33=0.33 point, the weight information of old characteristic attribute is 1*0.33=0.33 points;" xxKTV " is on daytime The weight information of adlescent characteristic attribute be 1*0.5=0.5 point, the weight information for characteristic attribute of growing up is 1*0.5=0.5 point;The weight information of adlescent characteristic attribute at night is 1*0.5=0.5 points, feature category of growing up Property weight information be 1*0.5=0.5 point.
The weight information of child's characteristic attribute on each client daytime is added by server, obtains the children on daytime The weight information of youngster's characteristic attribute is 0 point;The weight of the adlescent characteristic attribute on each client daytime is believed Manner of breathing adds, and the weight information for obtaining adlescent characteristic attribute is 0.33+0.5=0.83 points;Each client is white The weight information of it adult characteristic attribute is added, and the weight information for obtaining adult characteristic attribute is 0.33+0.5=0.83 points;The weight information of the old characteristic attribute on each client daytime is added, obtained old The weight information of year characteristic attribute is 0.33 point.
The weight information of child's characteristic attribute in each client evening is added by server, obtains the children in evening The weight information of youngster's characteristic attribute is 0 point;The weight of the adlescent characteristic attribute in each client evening is believed Manner of breathing adds, and the weight information for obtaining adlescent characteristic attribute is 0.33+0.5=0.83 points;By each client evening On adult characteristic attribute weight information be added, obtain grow up characteristic attribute weight information be 0.33+0.5=0.83 points;The weight information of the old characteristic attribute in each client evening is added, obtained old The weight information of year characteristic attribute is 0.33 point.
Table one:
Morning [0:00,4:00)
Daytime [4:00,18:00)
At night [18:00,21:00)
The late into the night [21:00,0:00)
Step 303, using the weight information of default clustering target k and every kind of characteristic attribute to user of the same race The all clients that user data is have recorded in equipment are clustered, and obtain k classification, and each classification includes User equipment belonging at least one client and each client, k are positive integer.
The weight information of server by utilizing pre-set level k and every kind of characteristic attribute in user equipment of the same race to remembering The all clients for having recorded user data are clustered, and k obtained classification can reflect different clients Between relevance, can be in same classification if the relevance between different clients is big;It is if different Relevance between client is small, then can be in different classifications.Wherein, server is to user equipment of the same race In to have recorded the clustering algorithm used when all clients of user data are clustered can be spectral clustering (Spectral Clustering, SC) algorithm or k-means clustering algorithms, the present embodiment are not construed as limiting; Default clustering target k is the Cong Zhongxuan after server has carried out multiple cluster according to different clustering targets The best clustering target of the Clustering Effect selected.
Wherein, the weight information of server by utilizing pre-set level k and every kind of characteristic attribute is set to user of the same race The all clients that user data is have recorded in standby are clustered, k obtained classification, including:When of the same race User equipment when including m user equipment, m × p dimensions are generated according to the weight information of n kind characteristic attributes Eigenmatrix;When user data does not include the run time section of each client, p=n;Work as user data Run time section including each client, and when the quantity of preset time period is q, p=n × q;To feature Matrix is normalized, and obtains the normalization matrix of m × p dimensions;Using clustering target k to normalized moments Battle array is clustered, and obtains the k classification.
When user data does not include the run time section of each client, now, each user equipment is corresponding The n data obtained according to the weight information of n kind characteristic attributes, therefore, the eigenmatrix of generation is m × n Dimension;When user data includes the run time section of each client, and the quantity of preset time period is q, Because each preset time period has corresponded to n kind characteristic attributes, therefore, each user equipment is corresponding according to every N × q the data that the weight information of the n kind characteristic attributes of individual preset time period obtains, therefore, the square of generation Battle array should be m × p, wherein, p=n × q.
Wherein, eigenmatrix is normalized refer to by each element in eigenmatrix it is unified to [0, 1] in section.Server can use max-min normalization to calculate when eigenmatrix is normalized Method, the present embodiment are not construed as limiting.
Assuming that the user data of each user equipment collection is as shown in following table two in user equipment of the same race, each The characteristic attribute that client has as shown in following table three, if default weight total score be 1 point, server according to The weight information of n kinds characteristic attribute corresponding to each user equipment is calculated in table two and table three, such as following table four Shown, then can obtain eigenmatrix according to following table four is:
The normalization matrix for being normalized to obtain to this feature matrix using max-min normalization algorithms is:
The normalization matrix is clustered using k-means clustering algorithms and clustering target k, obtains k class Not.
Table two:
User equipment Client Running frequency
User equipment 1 Nursery rhymes are complete works of 3
User equipment 1 The xx persons of bearing 1
User equipment 2 Heroic x 4
User equipment 3 xxKTV 2
User equipment 3 One key is cleared up 6
User equipment 4 The xx persons of bearing 2
Table three:
Table four:
In order to improve the accuracy of extraction feature client, server would generally gather every in a large number of users equipment The user data of individual client records, such as:The number of users of each client records in 23640 user equipmenies According to so, the dimension of the normalization matrix of generation is very high, and server by utilizing clustering algorithm is to the normalized moments When battle array is clustered, greatly and there is redundant data very in amount of calculation, the stability of data is not in the normalization matrix It is high.In the present embodiment, server after normalization matrix is obtained, can also utilize default dimension-reduction algorithm and Default dimensionality reduction index l, dimension-reduction treatment is carried out to the normalization matrix of m × p dimensions, obtains the dimensionality reduction of m × l dimensions Matrix;The dimensionality reduction matrix is clustered using clustering target k, obtains k classification.
The default dimension-reduction algorithm of server by utilizing and dimensionality reduction index l can determine one group from normalization matrix The linear incoherent characteristic vector of the effective information of the normalization matrix can be at utmost represented, the group is special It is smaller than normalization matrix to levy the dimension of vector, so, reduces answering for clustering algorithm when server is clustered Miscellaneous degree, and the redundant data in normalization matrix is eliminated, improve used when server is clustered The stability of data.Wherein, default dimension-reduction algorithm can be pivot analysis (Principal Component Analysis, PCA) algorithm, or Non-negative Matrix Factorization (Non-negative Matrix Factorization, NMF), the present embodiment is not construed as limiting;Default dimensionality reduction index l is calculated according to different dimensionality reduction indexs In multiple m × p normalization matrixes after the variance for the data that m × l is tieed up, the result of calculation therefrom selected reaches The dimensionality reduction index of the 90% of the variance of the normalization matrix.
Assuming that default clustering target k is 12, server is clustered using Spectral Clustering to dimensionality reduction matrix, Obtained cluster result is as shown in following table five.
Table five:
Step 304, at least one feature client is extracted from k classification, feature client, which is used to reflect, to be used The common interest of the targeted user population of family equipment.
Wherein, server extracts at least one feature client from k classification, including:Determine k class The center customer end of each classification, the quantity divided by classification of the user equipment belonging to center customer end include in not The value of quantity of user equipment be more than the first predetermined threshold value;Determined in the j classification that center customer end be present Including user equipment the maximum classification of quantity, at least one center customer end of the classification of determination is determined For at least one feature client, 0 < j≤k.
In the present embodiment, the quantity for the user equipment that the quantity of affiliated user equipment divided by classification are included Value is more than the client of the first predetermined threshold value as center customer end, and feature client is determined from the client of center End so that the feature client that server extracts be most of users all in the client used, with the spy Client is levied to reflect that the accuracy of the common interest of the targeted user population of user equipment is higher.
Assuming that the first predetermined threshold value is 70%, then server is by the quantity of affiliated user equipment divided by classification bag Client of the value of the quantity of the user equipment included more than 70% is as center customer end.Assuming that from shown in table five Each classification in the center customer end of each classification that determines as shown in following table six, it can be seen from table six, There is no center customer end in cluster-5, cluster-6, cluster-10, cluster-11 and cluster-12 classification, i.e., The quantity of the user equipment included in these classifications in the absence of the quantity divided by classification of affiliated user equipment exceedes More than 70% client.It can be seen from table five, in the classification that center customer end be present, classification cluster-1 The quantity of user equipment be up to 877, then server using cluster-1 center customer end " xxKTV " as Feature client.
Table six:
Classification Center customer end Classification Center customer end
cluster-1 xxKTV cluster-9 Xx Condors
cluster-2 X hits ingenious military move 2 cluster-9 Fried xx glacial epoch xx
cluster-3 Qxx cluster-9 Overlord xx
cluster-4 Xx puzzles cluster-9 Xx plans
cluster-7 The keys of xx mono- are cleared up cluster-9 Xx balls
cluster-9 Xx Great War corpse TV versions cluster-9 Xx intelligent
cluster-9 Xx fish cluster-9 Xx knight 2
cluster-9 The xx persons of bearing cluster-9 The xx worlds
cluster-9 Xx escapes
Optionally, in order to extract more feature clients, after at least one feature client is obtained, when When the quantity of at least one feature client is r, server obtains the mark of each feature client, by r The mark of each client obtains n+r kind characteristic attributes, r is just whole as a kind of characteristic attribute in client Number;N is updated to n+r, step 302 is performed to step 304, until extracting feature client in step 304 Stop during the failure of end.
Wherein, the mark of feature client can be characterized the title of client, can also be characterized client Identity number (identity, ID), the present embodiment is not construed as limiting.
Assuming that the feature client extracted according to table five and table six is " xxKTV ", each user equipment is corresponding N kind characteristic attributes weight information as shown in Table 4, xxKTV title is defined as a spy by server Levy attribute, obtain 4+1=5 characteristic attribute, server updated according to possessed by each client after spy The quantity of attribute is levied, it is determined that every kind of characteristic attribute at least one characteristic attribute possessed by each client Weight information as shown in following table seven, wherein, used xxKTV user equipment 4 in the xxKTV on daytime Weight information in characteristic attribute is 1 point.
Table seven:
Optionally, server can with intellectual analysis user may client interested, and to user equipment Push analysis result.Wherein, the client that server intellectual analysis user may be interested, including:For k Each classification in individual classification, count the quantity of the user equipment in classification belonging to each client;By belonging to User equipment quantity be more than the 4th predetermined threshold value client be defined as client to be recommended;Into classification The user equipment for not installing client to be recommended recommends client to be recommended.
Assuming that the user equipment in some classification more than 50% has all run xx Great War corpse TV versions, then service The user equipment that device does not install xx Great War corpse TV versions into the category recommends xx Great War corpse TV versions.
Optionally, server can also run the opportunity point of client according to the user equipment of a predetermined level is exceeded Whether analysis needs to optimize such user equipment.
Assuming that in the presence of 122 user equipmenies after xxKTV has been run, operation xx mono- key cleanings, server According to user equipment run client Opportunity Analysis learn user equipment can be produced after xxKTV has been run compared with More cachings are, it is necessary to optimize such user equipment.
Optionally, at least two clients that server can also be run according to the user equipment of a predetermined level is exceeded End, recommend to the user equipment for only having run a part of type clients at least two client remaining The client of type.
Assuming that include xx Great Wars corpse TV versions and xx ingenious military moves 2 in the presence of the client of 153 user equipment operations, Server can recommend xx ingenious military moves 2 to the user equipment for only having run x Great War corpse TV versions.
Step 305, the first user tag is generated according to the user data of each client records, and according at least One feature client generates second user label.
Server generates the first user tag according to the user data of each client records, such as, night owl, Pekinese etc.;Server generates second user label according at least one feature client, such as, k songs intelligent, Clear up intelligent etc..Because the first user tag is generated according to the user data of each client records, because This, the effect for the relevance that the first user tag reflects between the target group of user equipment of the same race is poor; And second user label is generated according to feature client, this feature client can reflect use of the same race The relevance hidden between the target group of family equipment, therefore, second user label reflect that user of the same race sets The effect of relevance before standby target group is preferable.
In summary, user tag generation method provided in an embodiment of the present invention, determine to record in user equipment The weight information of every kind of characteristic attribute possessed by each client of user data, obtains n kind feature category Property in every kind of characteristic attribute weight information according to the weight of default clustering target k and every kind of characteristic attribute believe Cease and all clients that user data is have recorded in user equipment of the same race clustered, obtain k classification, At least one feature client is extracted from the k classification so that server not only can be according to user data User tag is generated, user tag can also be generated according at least one feature client, solve server When generating user tag according only to user data, in the case where user data is less, the user tag of generation The problem of less, the effect of the quantity of the user tag of increase generation is reached.
In addition, by setting weight, the weight and client for every kind of characteristic attribute possessed by each client The negatively correlated relation of quantity of characteristic attribute possessed by end;According to the running frequency of each client and each The weight of every kind of characteristic attribute possessed by client, it is determined that every kind of characteristic attribute possessed by each client Weight information;All clients for have recorded user data in each user equipment, by feature of the same race The weight information of attribute is added, and obtains the weight information of n kind characteristic attributes so that the n kinds that server obtains The weight information of characteristic attribute and the running frequency correlation of client, embody user and use each The use habit of client, it ensure that the accuracy of the second user label of generation.
In addition, by setting weight, the weight and client for every kind of characteristic attribute possessed by each client The negatively correlated relation of quantity of characteristic attribute possessed by end;For each client, according to client every The weight of every kind of characteristic attribute, determines client possessed by running frequency and client in individual preset time period Hold the weight information in every kind of characteristic attribute corresponding to each preset time period;For remembering in each user equipment The all clients of user data are recorded, by the weight information of the characteristic attribute of the same race in same preset time period It is added, obtains the weight information of n kinds characteristic attribute corresponding to each preset time period so that server obtains N in characteristic attribute weight information more accurately reflect user use each client custom, and And server may be referred to further types of user data to generate second user label.
In addition, by carrying out dimension-reduction treatment to normalization matrix, server by utilizing clustering target k had both been reduced Amount of calculation during clustering algorithm is performed, improves the efficiency that server performs clustering algorithm;Also normalizing is deleted Change the redundant data in matrix, improve the stability that server carries out data during clustering algorithm.
In addition, the center customer end by determining each classification, and in it the j classification at center customer end be present It is determined that including user equipment the maximum classification of quantity, at least one center customer end of the category is determined For at least one feature client so that at least one feature client that server obtains is most of users All in the client used, the common interest of most of users can be reflected, ensure that the spy that server determines Levy the accuracy of client.
Fig. 4 is refer to, the block diagram of the user tag generating means provided it illustrates one embodiment of the invention. The user tag generating means can be implemented in combination with as user equipment by software, hardware or both All or part.The user tag generating means can include:Acquiring unit 410, determining unit 420, Cluster cell 430, extraction unit 440, generation unit 450, updating block 460, statistic unit 470, push away Recommend unit 480.
Acquiring unit 410, each feature visitor is obtained in the function of above-mentioned steps 301 and step 304 for realizing The mark at family end, using the mark of each client in r client as a kind of characteristic attribute, obtain i+r kinds The function of characteristic attribute.
Determining unit 420, for realizing the function of above-mentioned steps 302, and, in above-mentioned steps 304 by belonging to The quantity of user equipment be more than the client of the 4th predetermined threshold value and be defined as the function of client to be recommended.
Cluster cell 430, for realizing the function of above-mentioned steps 303.
Extraction unit 440, for realizing the function of above-mentioned steps 304.
Generation unit 450, for realizing the function of above-mentioned steps 305.
Updating block 460, n is updated to n+r function for realizing in above-mentioned steps 304.
Statistic unit 470, for realizing in above-mentioned steps 304 for each classification in k classification, statistics The function of the quantity of user equipment belonging to wherein each client.
Recommendation unit 480, the use of client to be recommended is not installed for realizing into classification in above-mentioned steps 304 The function of family equipment recommendation client to be recommended.
Correlative detail can combine embodiment of the method described in reference diagram 3.
It should be noted that above-mentioned acquiring unit 410, determining unit 420, cluster cell 430, extraction Unit 440, generation unit 450, updating block 460, statistic unit 470, recommendation unit 480 can lead to The processor crossed in user equipment is realized.
Those of ordinary skill in the art with reference to what the embodiments described herein described it is to be appreciated that respectively show The unit and algorithm steps of example, it can be come with the combination of electronic hardware or computer software and electronic hardware Realize.These functions are performed with hardware or software mode actually, application-specific depending on technical scheme And design constraint.
Those of ordinary skill in the art can be understood that, for convenience and simplicity of description, foregoing description Device and unit specific work process, may be referred to the corresponding process in preceding method embodiment, herein Repeat no more.
In embodiment provided herein, it should be understood that disclosed apparatus and method, Ke Yitong Other modes are crossed to realize.For example, device embodiment described above is only schematical, for example, It the division of the unit, can be only a kind of division of logic function, can there is other draw when actually realizing The mode of dividing, such as multiple units or component can combine or be desirably integrated into another system, or some spies Sign can be ignored, or not perform.
The unit illustrated as separating component can be or may not be it is physically separate, as The part that unit is shown can be or may not be physical location, you can with positioned at a place, or It can also be distributed on multiple NEs.It can select according to the actual needs therein some or all of Unit realizes the purpose of this embodiment scheme.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited to This, any one skilled in the art the invention discloses technical scope in, can readily occur in Change or replacement, it should all be included within the scope of the present invention.Therefore, protection scope of the present invention should It is described to be defined by scope of the claims.

Claims (18)

1. a kind of user tag generation method, it is characterised in that methods described includes:
Each client for have recorded user data in user equipment of the same race, respectively from the client In possessed characteristic attribute, acquisition and default n kinds characteristic attribute identical at least one characteristic attribute, The user data is used to reflect the user using the client to the operation performed by the client, institute State characteristic attribute to be used to reflect the feature that the targeted user population of the client possesses jointly, the n is just Integer;
According to the quantity of characteristic attribute possessed by each client, it is determined that possessed by each client at least The weight information of every kind of characteristic attribute in a kind of characteristic attribute, obtain every kind of feature in the n kinds characteristic attribute The weight information of attribute;
Using the weight information of default clustering target k and every kind of characteristic attribute to the user equipment of the same race In have recorded all clients of user data and clustered, obtain k classification, each classification is included at least User equipment belonging to one client and each client, the k are positive integer;
At least one feature client is extracted from the k classification, the feature client is used to reflect institute State the common interest of the targeted user population of user equipment;
First user tag is generated according to the user data of each client records, and according to described at least one Feature client generates second user label.
2. according to the method for claim 1, it is characterised in that the user data of each client includes The running frequency of the client, the quantity of the characteristic attribute according to possessed by the client, it is determined that The weight information of every kind of characteristic attribute, obtains the n at least one characteristic attribute that the client has The weight information of every kind of characteristic attribute in kind characteristic attribute, including:
According to the quantity of characteristic attribute possessed by default weight total score and each client, each visitor is set The weight of every kind of characteristic attribute possessed by the end of family, the weight and characteristic attribute possessed by the client The negatively correlated relation of quantity;
According to the weight of every kind of characteristic attribute possessed by the running frequency of each client and each client, It is determined that the weight information of every kind of characteristic attribute possessed by each client;
All clients for have recorded user data in each user equipment, by the power of characteristic attribute of the same race Weight information is added, and obtains the weight information of the n kinds characteristic attribute.
3. according to the method for claim 1, it is characterised in that the user data of each client includes The running frequency and run time section of the client, characteristic attribute possessed by each client of basis Quantity, it is determined that at least one characteristic attribute possessed by each client every kind of characteristic attribute weight letter Breath, obtains the weight information of every kind of characteristic attribute in the n kinds characteristic attribute, including:
According to the quantity of characteristic attribute possessed by default weight total score and each client, each visitor is set The weight of every kind of characteristic attribute possessed by the end of family, the weight and characteristic attribute possessed by the client The negatively correlated relation of quantity;
It is determined that the preset time period belonging to the run time section of each client, and determine each client right The running frequency in preset time period answered, each preset time period correspond to the n kinds characteristic attribute;
For each client, according to running frequency of the client in each preset time period and described The weight of every kind of characteristic attribute possessed by client, determine that the client is corresponding in each preset time period Every kind of characteristic attribute weight information;
All clients for have recorded user data in each user equipment, by same preset time period Characteristic attribute of the same race weight information be added, obtain the n kinds feature category corresponding to each preset time period The weight information of property.
4. according to the method in claim 2 or 3, it is characterised in that described to be referred to using default cluster The weight information of mark k and every kind of characteristic attribute is to have recorded the institute of user data in the user equipment of the same race There is client to be clustered, obtain k classification, including:
When the user equipment of the same race includes m user equipment, according to the power of the n kinds characteristic attribute The eigenmatrix of weight information generation m × p dimensions, when the user data does not include the run time of each client Duan Shi, p=n;When the user data includes the run time section of each client, and the number of preset time period Measure for q when, p=n × q;
The eigenmatrix is normalized, obtains the normalization matrix of m × p dimensions;
The normalization matrix is clustered using the clustering target k, obtains the k classification.
5. according to the method described in the claim 4, it is characterised in that described to utilize the clustering target K clusters to the normalization matrix, obtains the k classification, including:
Using default dimension-reduction algorithm and default dimensionality reduction index l, the normalization matrix of m × p dimensions is entered Row dimension-reduction treatment, obtain the dimensionality reduction matrix of m × l dimensions;
The dimensionality reduction matrix is clustered using the clustering target k, obtains the k classification.
6. according to the method for claim 4, it is characterised in that described to be extracted from the k classification At least one feature client, including:
Determine the center customer end of each classification in the k classification, the user belonging to the center customer end The value of the quantity for the user equipment that the quantity of equipment divided by the classification include is more than the first predetermined threshold value;
The classification of the quantity maximum of the user equipment included is determined in the j classification that center customer end be present, will At least one center customer end of the classification determined is defined as at least one feature client, 0 < j≤k。
7. method according to any one of claims 1 to 6, it is characterised in that described for use of the same race Each client of user data is have recorded in the equipment of family, respectively from characteristic attribute possessed by the client In, acquisition and default n kinds characteristic attribute identical at least one characteristic attribute, including:
Each client for have recorded user data in user equipment of the same race, gather the client note The user data of record;
The all customer data of the client records is filtered according to preset rules, the preset rules It is less than the second predetermined threshold value to record the operation duration of the client of the user data, or, in record institute The operation duration for stating the client of user data is more than the 3rd predetermined threshold value;
When depositing user data after filtration, in the characteristic attribute having from the client, obtain and pre- If n kind characteristic attribute identical at least one characteristic attributes.
8. method according to any one of claims 1 to 7, it is characterised in that described from the k class After the not middle at least one feature client of extraction, in addition to:
When the quantity of at least one feature client is r, the mark of each feature client is obtained, will The mark of each client obtains n+r kind characteristic attributes as a kind of characteristic attribute in r client, described R is positive integer;
N is updated to n+r, triggering performs the quantity of characteristic attribute possessed by each client of basis, It is determined that at least one characteristic attribute possessed by each client every kind of characteristic attribute weight information, obtain The weight information of every kind of characteristic attribute in the n kinds characteristic attribute;Utilize default clustering target k and every kind of The weight information of characteristic attribute enters to all clients that user data is have recorded in the user equipment of the same race Row cluster, obtains k classification, the step of extracting at least one feature client from the k classification, Stop when extracting the feature client failure.
9. according to the method for claim 1, it is characterised in that using default clustering target k and The weight information of every kind of characteristic attribute is to have recorded all clients of user data in the user equipment of the same race End is clustered, after obtaining k classification, including:
For each classification in the k classification, the user belonging to each client in the classification is counted The quantity of equipment;
The client that the quantity of affiliated user equipment is more than to the 4th predetermined threshold value is defined as client to be recommended End;
The user equipment for not installing the client to be recommended into the classification recommends the client to be recommended End.
10. a kind of user tag generating means, it is characterised in that described device includes:
Acquiring unit, for each client for have recorded user data in user equipment of the same race, divide Not from characteristic attribute possessed by the client, obtain with default n kinds characteristic attribute identical at least A kind of characteristic attribute, the user data are used to reflect the user using the client to the client institute The operation of execution, the characteristic attribute are used to reflect the spy that the targeted user population of the client possesses jointly Sign, the n is positive integer;
Determining unit, for the quantity of the characteristic attribute according to possessed by each client, it is determined that each client The weight information of every kind of characteristic attribute at least one characteristic attribute possessed by end, obtain the n kinds feature The weight information of every kind of characteristic attribute in attribute;
Cluster cell, for the every kind of feature category obtained using default clustering target k and the determining unit The weight information of property gathers to all clients that user data is have recorded in the user equipment of the same race Class, obtains k classification, and each classification is set including the user belonging at least one client and each client Standby, the k is positive integer;
Extraction unit, for extracting at least one feature in the k classification that is obtained from the cluster cell Client, the feature client are used for the common interest for reflecting the targeted user population of the user equipment;
Generation unit, for generating the first user tag, and root according to the user data of each client records According at least one feature client generation second user label of extraction unit extraction.
11. device according to claim 10, it is characterised in that the user data package of each client The running frequency of the client is included, the determining unit, is used for:
According to the quantity of characteristic attribute possessed by default weight total score and each client, each visitor is set The weight of every kind of characteristic attribute possessed by the end of family, the weight and characteristic attribute possessed by the client The negatively correlated relation of quantity;
According to the weight of every kind of characteristic attribute possessed by the running frequency of each client and each client, It is determined that the weight information of every kind of characteristic attribute possessed by each client;
All clients for have recorded user data in each user equipment, by the power of characteristic attribute of the same race Weight information is added, and obtains the weight information of the n kinds characteristic attribute.
12. device according to claim 10, it is characterised in that the user data package of each client The running frequency and run time section of the client are included, the determining unit, is used for:
According to the quantity of characteristic attribute possessed by default weight total score and each client, each visitor is set The weight of every kind of characteristic attribute possessed by the end of family, the weight and characteristic attribute possessed by the client The negatively correlated relation of quantity;
It is determined that the preset time period belonging to the run time section of each client, and determine each client right The running frequency in preset time period answered, each preset time period correspond to the n kinds characteristic attribute;
For each client, according to running frequency of the client in each preset time period and described The weight of every kind of characteristic attribute possessed by client, determine that the client is corresponding in each preset time period Every kind of characteristic attribute weight information;
All clients for have recorded user data in each user equipment, by same preset time period Characteristic attribute of the same race weight information be added, obtain the n kinds feature category corresponding to each preset time period The weight information of property.
13. the device according to claim 11 or 12, it is characterised in that the cluster cell, use In:
When the user equipment of the same race includes m user equipment, according to the power of the n kinds characteristic attribute The eigenmatrix of weight information generation m × n dimensions, when the user data does not include the run time of each client Duan Shi, p=n;When the user data includes the run time section of each client, and the number of preset time period Measure for q when, p=n × q;
The eigenmatrix is normalized, obtains the normalization matrix of m × p dimensions;
The normalization matrix is clustered using the clustering target k, obtains the k classification.
14. according to the device described in the claim 13, it is characterised in that the cluster cell, be used for:
Using default dimension-reduction algorithm and default dimensionality reduction index l, the normalization matrix of m × p dimensions is entered Row dimension-reduction treatment, obtain the dimensionality reduction matrix of m × l dimensions;
The dimensionality reduction matrix is clustered using the clustering target k, obtains the k classification.
15. device according to claim 13, it is characterised in that the extraction unit, be used for:
Determine the center customer end of each classification in the k classification, the user belonging to the center customer end The value of the quantity for the user equipment that the quantity of equipment divided by the classification include is more than the first predetermined threshold value;
The classification of the quantity maximum of the user equipment included is determined in the j classification that center customer end be present, will At least one center customer end of the classification determined is defined as at least one feature client, 0 < j≤k。
16. according to any described device of claim 10 to 15, it is characterised in that the acquiring unit, For:
Each client for have recorded user data in user equipment of the same race, gather the client note The user data of record;
The all customer data of the client records is filtered according to preset rules, the preset rules It is less than the second predetermined threshold value to record the operation duration of the client of the user data, or, in record institute The operation duration for stating the client of user data is more than the 3rd predetermined threshold value;
When depositing user data after filtration, in the characteristic attribute having from the client, obtain and pre- If n kind characteristic attribute identical at least one characteristic attributes.
17. according to any described device of claim 10 to 16, it is characterised in that
The acquiring unit, for it is described extract at least one feature client from the k classification after, When the quantity of at least one feature client is r, the mark of each feature client is obtained, by r The mark of each client obtains n+r kind characteristic attributes, the r as a kind of characteristic attribute in client For positive integer;
Described device also includes:
Updating block, for n to be updated into n+r, triggering performs special possessed by each client of basis The quantity of attribute is levied, it is determined that every kind of characteristic attribute at least one characteristic attribute possessed by each client Weight information, obtain the weight information of every kind of characteristic attribute in the n kinds characteristic attribute;Gathered using default The weight information of class index k and every kind of characteristic attribute in the user equipment of the same race to have recorded user data All clients clustered, obtain k classification, at least one feature extracted from the k classification The step of client, stop when extracting the feature client failure.
18. device according to claim 10, it is characterised in that described device also includes:
Statistic unit, for utilizing default clustering target k and every kind of characteristic attribute weight information to institute State and all clients of user data are have recorded in user equipment of the same race clustered, obtain k classification it Afterwards, for each classification in the k classification, the user belonging to each client in the classification is counted The quantity of equipment;
Determining unit, the client for the quantity of affiliated user equipment to be more than to the 4th predetermined threshold value determine For client to be recommended;
Recommendation unit, the user equipment for not installing the client to be recommended into the classification recommend institute State client to be recommended.
CN201610454113.8A 2016-06-21 2016-06-21 User label generation method and device Active CN107526741B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610454113.8A CN107526741B (en) 2016-06-21 2016-06-21 User label generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610454113.8A CN107526741B (en) 2016-06-21 2016-06-21 User label generation method and device

Publications (2)

Publication Number Publication Date
CN107526741A true CN107526741A (en) 2017-12-29
CN107526741B CN107526741B (en) 2021-05-18

Family

ID=60735282

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610454113.8A Active CN107526741B (en) 2016-06-21 2016-06-21 User label generation method and device

Country Status (1)

Country Link
CN (1) CN107526741B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214435A (en) * 2018-08-21 2019-01-15 北京睦合达信息技术股份有限公司 A kind of data classification method and device
CN111125506A (en) * 2018-11-01 2020-05-08 百度在线网络技术(北京)有限公司 Interest circle subject determination method, device, server and medium
CN111382343A (en) * 2018-12-27 2020-07-07 方正国际软件(北京)有限公司 Label system generation method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6049777A (en) * 1995-06-30 2000-04-11 Microsoft Corporation Computer-implemented collaborative filtering based method for recommending an item to a user
CN103198418A (en) * 2013-03-15 2013-07-10 北京亿赞普网络技术有限公司 Application recommendation method and application recommendation system
CN103218355A (en) * 2012-01-18 2013-07-24 腾讯科技(深圳)有限公司 Method and device for generating tags for user
CN104750789A (en) * 2015-03-12 2015-07-01 百度在线网络技术(北京)有限公司 Label recommendation method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6049777A (en) * 1995-06-30 2000-04-11 Microsoft Corporation Computer-implemented collaborative filtering based method for recommending an item to a user
CN103218355A (en) * 2012-01-18 2013-07-24 腾讯科技(深圳)有限公司 Method and device for generating tags for user
CN103198418A (en) * 2013-03-15 2013-07-10 北京亿赞普网络技术有限公司 Application recommendation method and application recommendation system
CN104750789A (en) * 2015-03-12 2015-07-01 百度在线网络技术(北京)有限公司 Label recommendation method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214435A (en) * 2018-08-21 2019-01-15 北京睦合达信息技术股份有限公司 A kind of data classification method and device
CN111125506A (en) * 2018-11-01 2020-05-08 百度在线网络技术(北京)有限公司 Interest circle subject determination method, device, server and medium
CN111382343A (en) * 2018-12-27 2020-07-07 方正国际软件(北京)有限公司 Label system generation method and device
CN111382343B (en) * 2018-12-27 2023-11-28 方正国际软件(北京)有限公司 Label system generation method and device

Also Published As

Publication number Publication date
CN107526741B (en) 2021-05-18

Similar Documents

Publication Publication Date Title
Sharma et al. Early diagnosis of rice plant disease using machine learning techniques
Fawagreh et al. Random forests: from early developments to recent advancements
Vizentin‐Bugoni et al. Including rewiring in the estimation of the robustness of mutualistic networks
CN109960763B (en) Photography community personalized friend recommendation method based on user fine-grained photography preference
Gomez et al. Evolution of pollination niches and floral divergence in the generalist plant Erysimum mediohispanicum
WO2020224128A1 (en) News recommendation method and apparatus based on short-term interest of user, and electronic device and medium
CN109816535A (en) Cheat recognition methods, device, computer equipment and storage medium
CN110012060A (en) Information-pushing method, device, storage medium and the server of mobile terminal
CN104221015B (en) Image retrieving apparatus, image search method, program and computer-readable storage medium
CN114359738B (en) Cross-scene robust indoor people number wireless detection method and system
CN108228844A (en) A kind of picture screening technique and device, storage medium, computer equipment
CN107526741A (en) user tag generation method and device
KR101082589B1 (en) System for providing Aspect Level News Browsing Service that reduce Media-Bias Effect and Method therefor
CN109241392A (en) Recognition methods, device, system and the storage medium of target word
Song et al. A non-cooperative game with incomplete information to improve patient hospital choice
Ma et al. An improved SVM model for relevance feedback in remote sensing image retrieval
CN110309143A (en) Data similarity determines method, apparatus and processing equipment
CN110489175A (en) Service processing method, device, server and storage medium
CN108647739A (en) A kind of myspace discovery method based on improved density peaks cluster
CN112765367B (en) Method and device for constructing topic knowledge graph
CN109376287B (en) House property map construction method, device, computer equipment and storage medium
He et al. Multi-objective spatially constrained clustering for regionalization with particle swarm optimization
CN116415658A (en) Searching method, searching device and computer storage medium of neural network architecture
CN109657950A (en) Hierarchy Analysis Method, device, equipment and computer readable storage medium
Facco et al. Comparison of PBIA and GEOBIA classification methods in classifying turbidity in reservoirs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200210

Address after: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Applicant after: HUAWEI TECHNOLOGIES Co.,Ltd.

Address before: 210000 Ande Gate No. 94, Yuhuatai District, Jiangsu, Nanjing

Applicant before: Huawei Technologies Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant