CN107526741B

CN107526741B - User label generation method and device

Info

Publication number: CN107526741B
Application number: CN201610454113.8A
Authority: CN
Inventors: 熊安斌; 张锋; 张旭
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2016-06-21
Filing date: 2016-06-21
Publication date: 2021-05-18
Anticipated expiration: 2036-06-21
Also published as: CN107526741A

Abstract

A user label generation method and a device relate to the technical field of communication, and the method comprises the following steps: for each client recorded with user data in the same user equipment, acquiring at least one characteristic attribute which is the same as the preset n characteristic attributes; determining the weight information of each characteristic attribute in at least one characteristic attribute of each client to obtain the weight information of each characteristic attribute in the n characteristic attributes; clustering all clients recording user data in the same user equipment by using a preset clustering index k and the weight information of each characteristic attribute; extracting at least one characteristic client from the obtained k categories; the first user label is generated according to the user data recorded by each client, and the second user label is generated according to at least one characteristic client, so that the problem that the generated user labels are few when the server generates the user labels according to less user data is solved, and the effect of increasing the number of the generated user labels is achieved.

Description

User label generation method and device

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for generating a user tag.

Background

Currently, a server may determine a target population of products by constructing a user representation. The user portrait is used for describing the characteristics of the user, wherein the characteristics of the user comprise the characteristics of the age, the sex, the interests, the habits, the position and the like of the user. Because the user tag may be used to characterize at least one user, the server may derive the user representation by generating the user tag. The user labels comprise Maiba, literature adolescents, Beijing, night cats and the like.

Wherein, the server generates the user tag, including: collecting user data in at least one user device; analyzing the user data to obtain characteristics of a user using the at least one user device; a user tag is generated from the feature. The user data may be location data, age data, behavior and habit data, hobby data, health data, and the like of the user.

Because the server can only generate the user tags according to the collected user data, the number of the generated user tags is small when the collected user data is small.

Disclosure of Invention

In order to solve the problem that the number of generated user tags is small when the number of collected user data is small, the application provides a user tag generation method and device.

In a first aspect, a user tag generation method is provided, where the method includes: for each client recorded with user data in the same user equipment, respectively acquiring at least one characteristic attribute which is the same as the preset n characteristic attributes from the characteristic attributes of the client; determining the weight information of each characteristic attribute in at least one characteristic attribute of each client according to the number of the characteristic attributes of each client to obtain the weight information of each characteristic attribute in the n characteristic attributes; clustering all clients recording user data in the same user equipment by using a preset clustering index k and weight information of each characteristic attribute to obtain k categories, wherein each category comprises at least one client and the user equipment to which each client belongs, and k is a positive integer; extracting at least one feature client from the k categories, wherein the feature client is used for reflecting the common interests of a target user group of the user equipment; and generating a first user label according to the user data recorded by each client, and generating a second user label according to at least one characteristic client. The user data is used for reflecting the operation executed by a user using the client to the client, the characteristic attribute is used for reflecting the characteristic commonly possessed by a target user group of the client, and n is a positive integer.

The method comprises the steps of obtaining weight information of each characteristic attribute of each client recorded with user data in the user equipment, clustering all the clients recorded with the user data in the same type of user equipment according to a preset clustering index k and the weight information of each characteristic attribute to obtain k categories, and extracting at least one characteristic client from the k categories, so that the server can generate a user label according to the user data and can generate the user label according to at least one characteristic client.

With reference to the first implementation of the first aspect, the determining, by the user data of each client, the operating frequency of the client, and according to the number of the characteristic attributes of the client, the weight information of each of at least one characteristic attribute of the client is determined, so as to obtain the weight information of each of the n types of characteristic attributes, includes: setting the weight of each characteristic attribute of each client according to the preset total weight score and the number of the characteristic attributes of each client, wherein the weight and the number of the characteristic attributes of the clients are in a negative correlation relationship; determining the weight information of each characteristic attribute of each client according to the running frequency of each client and the weight of each characteristic attribute of each client; and adding the weight information of the same characteristic attribute for all the clients recording the user data in each user device to obtain the weight information of n characteristic attributes.

Setting a weight for each characteristic attribute of each client, wherein the weight is in a negative correlation relation with the number of the characteristic attributes of the client; determining the weight information of each characteristic attribute of each client according to the running frequency of each client and the weight of each characteristic attribute of each client; for all the clients recording user data in each user device, the weight information of the same characteristic attribute is added to obtain the weight information of n characteristic attributes, so that the weight information of the n characteristic attributes obtained by the server is in positive correlation with the operating frequency of the client, the using habit of each client used by a user is reflected, and the accuracy of the generated second user label is ensured.

With reference to the first aspect, in a second implementation of the first aspect, the user data of each client includes an operating frequency and an operating time period of the client, and the determining, according to the number of the characteristic attributes of each client, weight information of each of at least one characteristic attribute of each client is determined, so as to obtain the weight information of each of the n types of characteristic attributes, including: setting the weight of each characteristic attribute of each client according to the preset total weight score and the number of the characteristic attributes of each client, wherein the weight and the number of the characteristic attributes of the clients are in a negative correlation relationship; determining a preset time period to which the operation time period of each client belongs, and determining the operation frequency of each client in the corresponding preset time period, wherein each preset time period corresponds to n characteristic attributes; for each client, determining the weight information of each characteristic attribute corresponding to each preset time period of the client according to the operating frequency of the client in each preset time period and the weight of each characteristic attribute of the client; and for all the clients recording the user data in each user device, adding the weight information of the same characteristic attribute in the same preset time period to obtain the weight information of the n characteristic attributes corresponding to each preset time period.

Setting a weight for each characteristic attribute of each client, wherein the weight is in a negative correlation relation with the number of the characteristic attributes of the client; for each client, determining the weight information of each characteristic attribute corresponding to each preset time period of the client according to the operating frequency of the client in each preset time period and the weight of each characteristic attribute of the client; for all the clients recording user data in each user device, the weight information of the same characteristic attribute in the same preset time period is added to obtain the weight information of n characteristic attributes corresponding to each preset time period, so that the habit of the user using each client is reflected more accurately by the weight information of the characteristic attribute in n obtained by the server, and the server can generate a second user label by referring to more types of user data.

With reference to the first implementation or the second implementation of the first aspect, in a third implementation of the first aspect, clustering all clients, in which user data is recorded, in the same type of user equipment by using a preset clustering index k and weight information of each feature attribute to obtain k categories includes: when the same type of user equipment comprises m user equipment, generating an m × p-dimensional feature matrix according to weight information of n feature attributes, and when the user data does not comprise the operation time period of each client, p is n; when the user data comprises an operation time period of each client and the number of preset time periods is q, p is n × q; normalizing the characteristic matrix to obtain an m multiplied by p dimensional normalized matrix; and clustering the normalized matrix by using the clustering index k to obtain k categories.

With reference to the third implementation of the first aspect, in a fourth implementation of the first aspect, clustering the normalized matrix by using a clustering index k to obtain k classes includes: performing dimensionality reduction processing on the m multiplied by p dimensional normalized matrix by using a preset dimensionality reduction algorithm and a preset dimensionality reduction index l to obtain an m multiplied by l dimensional dimensionality reduction matrix; and clustering the dimensionality reduction matrix by using the clustering index k to obtain k categories.

By carrying out dimension reduction processing on the normalized matrix, the calculation amount of the server when the server executes the clustering algorithm by using the clustering index k is reduced, and the efficiency of the server in executing the clustering algorithm is improved; redundant data in the normalized matrix is deleted, and the stability of the data when the server carries out a clustering algorithm is improved.

With reference to the third implementation of the first aspect, in a fifth implementation of the first aspect, the extracting at least one feature client from k classes includes: determining a central client of each of k categories, wherein the value obtained by dividing the number of user equipment to which the central client belongs by the number of user equipment included in the category is greater than a first preset threshold; determining the category with the largest number of included user equipment in j categories with the central client, and determining at least one central client of the determined categories as at least one characteristic client, wherein j is more than 0 and less than or equal to k.

By determining the central client of each category, determining the cluster with the largest number of user equipment in j categories with the central client, and determining at least one central client of the category as at least one characteristic client, the at least one characteristic client obtained by the server is the client used by most users, so that the common interests of the most users can be reflected, and the accuracy of the characteristic client determined by the server is ensured.

With reference to the first aspect and any one of the first to fifth implementations of the first aspect, in a sixth implementation of the first aspect, for each client that records user data in the same type of user equipment, obtaining at least one feature attribute that is the same as n preset feature attributes from feature attributes of the client, respectively, includes: for each client recorded with user data in the same user equipment, acquiring the user data recorded by the client; filtering all user data recorded by the client according to a preset rule, wherein the preset rule is that the running time of the client recording the user data is less than a second preset threshold, or the running time of the client recording the user data is greater than a third preset threshold; and when the filtered user data exists, acquiring at least one characteristic attribute which is the same as the preset n characteristic attributes from the characteristic attributes of the client.

By filtering the user data recorded by the client, the server can filter the user data which do not conform to the actual use condition, so that the accuracy of the server in calculating the weight information of the n characteristic attributes is improved, and the clustering accuracy is improved.

With reference to the first aspect and any one of the first to sixth implementations of the first aspect, in a seventh implementation of the first aspect, after extracting at least one feature client from the k categories, the method further includes: when the number of at least one characteristic client is r, acquiring an identifier of each characteristic client, and taking the identifier of each client in the r clients as a characteristic attribute to obtain n + r characteristic attributes, wherein r is a positive integer; updating n to n + r, triggering execution and determining the weight information of each characteristic attribute in at least one characteristic attribute of each client according to the number of the characteristic attributes of each client to obtain the weight information of each characteristic attribute in the n characteristic attributes; clustering all clients recording user data in the same type of user equipment by using a preset clustering index k and the weight information of each characteristic attribute to obtain k categories, and extracting at least one characteristic client from the k categories until the characteristic client is failed to be extracted.

The obtained identifications of the r feature clients are used as the r feature attributes to be added into the n feature attributes, n is updated to be n + r, and the step of extracting at least one feature client is executed in a circulating mode, so that the server can continuously extract the feature clients, more user tags are generated according to the feature clients, and the number of the user tags generated by the server is further increased.

With reference to the first aspect, in an eighth implementation of the first aspect, after clustering all the clients, in which user data is recorded, in the user equipment of the same type by using a preset clustering index k and weight information of each feature attribute, and obtaining k categories, the method includes: for each category in the k categories, counting the number of user equipment to which each client belongs in the category; determining the clients with the number of the user equipment larger than a fourth preset threshold value as clients to be recommended; and recommending the client to be recommended to the user equipment which is not provided with the client to be recommended in the cluster.

By determining the to-be-recommended clients of each category and recommending the to-be-recommended clients to the user equipment which is not provided with the to-be-recommended clients in the category, the server can recommend the clients which are possibly interested by the user to the user, and the difficulty of obtaining the clients by the user is reduced.

In a second aspect, an apparatus for generating a user tag is provided, where the apparatus includes at least one unit, and the at least one unit is configured to implement the method for generating a user tag provided in the first aspect or at least one implementation of the first aspect.

In a third aspect, a server is provided, and the apparatus includes: the wireless transceiver is connected with the processor;

the wireless transceiver is configured to be controlled by a processor for implementing the user tag generation method provided in the first aspect or at least one implementation of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of a communication system according to an exemplary embodiment of the present invention;

fig. 2 is a schematic structural diagram of a user equipment according to an exemplary embodiment of the present invention;

FIG. 3 is a flowchart of a user tag generation method provided by an exemplary embodiment of the present invention;

fig. 4 is a block diagram of a user tag generation apparatus according to an exemplary embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Reference herein to "a unit" is to a logically partitioned functional structure, and the "unit" may be implemented by pure hardware or a combination of hardware and software.

Referring to fig. 1, a schematic structural diagram of a communication system 100 according to an exemplary embodiment of the present invention is shown. The communication system 100 includes a server 120 and a plurality of user devices 140.

The server 120 is connected to each user device 140 via a communication network and is configured to collect user data recorded by the clients in each user device 140.

The user device 140 has at least one client installed therein, and each client can record user data reflecting an operation performed by a user using the client. Such as: the client records the number of times the user starts the client, records the running time period, and the like. The user device 140 may be a set-top box, a mobile phone (english: cellular), a smart phone (english: smartphone), a computer (english: computer), a tablet computer (english: tablet computer), a wearable device (english: wearable device), a personal digital assistant (english: personal digital assistant, PDA), a Mobile Internet Device (MID), an e-book reader (english: e-book reader), and the like.

In the present embodiment, the plurality of user equipments 140 belong to the same category. For example, the plurality of user devices 140 are all set-top boxes, or the plurality of user devices 140 are all smartphones, and so on.

Referring to fig. 2, a schematic structural diagram of a server 200 according to another exemplary embodiment of the present invention is shown. The server 200 may be the server 120 shown in fig. 1, the server comprising: a processor 220, and a wireless transceiver 240 coupled to the processor 220.

The wireless transceiver 240 may be comprised of one or more antennas that enable the server 200 to send or receive radio signals.

The wireless transceiver 240 may be connected to the processor 220. The processor 220 is a control center of the server, and the processor 220 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of the CPU and the NP. The processor 220 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

Optionally, the server 200 further includes a memory 260, the memory 260 is connected to the processor 220 by a bus or other means, and the memory 260 may be a volatile memory (english: volatile memory), a non-volatile memory (english: non-volatile memory) or a combination thereof. The volatile memory may be a random-access memory (RAM), such as a static random-access memory (SRAM) or a dynamic random-access memory (DRAM). The nonvolatile memory may be a Read Only Memory (ROM), such as a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), and an Electrically Erasable Programmable Read Only Memory (EEPROM). The non-volatile memory may also be a flash memory, a magnetic memory, such as a magnetic tape, a floppy disk, or a hard disk. The non-volatile memory may also be an optical disc.

The memory 260 may store user data therein. Optionally, the memory 260 may store the feature client, the first user tag, the second user tag, and the like determined by the processor 220, and the specific determination process may be described in step 304 and step 305 below.

Referring to fig. 3, a flowchart of a user tag generation method according to an exemplary embodiment of the present invention is shown. The present embodiment is exemplified by the method being used in a communication system as shown in fig. 1, the following steps being performed by a server, the method comprising the following steps:

step 301, for each client that records user data in the same type of user equipment, at least one characteristic attribute that is the same as n preset characteristic attributes is obtained from the characteristic attributes of the client, where n is a positive integer.

A plurality of clients are installed in user equipment, and when a user uses a certain client, recorded user data exists in the client; when a user has not used a client, the client has no recorded user data. The user data is used to reflect operations performed on the client by a user using the client, such as frequency of clicking the client, time period for running the client, web page content browsed by the client, and the like.

For each client recorded with user data in the same kind of user equipment, the server acquires at least one characteristic attribute of the client. The characteristic attribute is used for reflecting characteristics commonly possessed by a target user group of the client, such as: the target user group of the client-side 'child song grand' has the characteristic of children, and the characteristic attribute of the 'child song grand' is children; the target user group of the client-side square dance universities has the common characteristic of the old, and the characteristic attribute of the square dance universities is the old.

For the client that records the user data, the user data recorded by the client may not meet the actual use condition, for example, the running time is less than 1 second or the running time is greater than 24 hours, at this time, if the server acquires the characteristic attribute of the client, the subsequent calculation result may not accurately reflect the common interest of the target user group of the user equipment. In this embodiment, the server filters all user data recorded by the client according to a preset rule, where the preset rule is that the running time of the client recording the user data is less than a second preset threshold, or the running time of the client recording the user data is greater than a third preset threshold; and when the filtered user data exists, acquiring at least one characteristic attribute of the client. Therefore, the server filters the user data which do not conform to the actual use condition, and the accuracy of the subsequent calculation result is improved.

Since each client may have a plurality of feature attributes, but only some of the plurality of feature attributes are the same as the preset n feature attributes, the server needs to obtain at least one of the plurality of feature attributes that is the same as the preset n feature attributes from the plurality of feature attributes that the client has. The preset n kinds of feature attributes are determined according to the service target, such as: the service objective is to determine the age of the target group of the user equipments of the same kind, and the preset n middle-characteristic attributes may be children, teenagers, adults, and the elderly.

Assuming that the n preset feature attributes are respectively infant, teenager, adult and old, and the client "xx ninja" has the feature attributes of game, leisure, adult and teenager, the server acquires the two feature attributes of adult and teenager from the game, leisure, adult and teenager as the feature attributes of "xx ninja".

It should be noted that the feature attributes of the client described below refer to the same feature attributes as the preset n kinds of feature attributes.

Step 302, determining the weight information of each characteristic attribute in at least one characteristic attribute of each client according to the number of the characteristic attributes of each client, to obtain the weight information of each characteristic attribute in the n characteristic attributes.

Wherein, the weight information of each feature attribute is used to reflect whether the user prefers to use the client with the feature attribute, usually the weight information is represented by a number, and the greater the weight information, the more the user prefers to use the client with the feature attribute.

In one implementation, the user data of each client includes an operating frequency of the client, the server determines, according to the number of the characteristic attributes of each client, weight information of each of at least one characteristic attribute of each client, and the weight information of each of the n types of characteristic attributes includes: setting the weight of each characteristic attribute of each client according to the preset total weight score and the number of the characteristic attributes of each client, wherein the weight and the number of the characteristic attributes of the clients are in a negative correlation relationship; determining the weight information of each characteristic attribute of each client according to the running frequency of each client and the weight of each characteristic attribute of each client; and adding the weight information of the same characteristic attribute for all the clients recording the user data in each user device to obtain the weight information of n characteristic attributes.

The operating frequency of each client refers to the number of times the user uses the client. Each client may use the counted number of times that the user uses the client within a preset time length as the operating frequency of the client, or may use the total number of times that the user uses the client as the operating frequency of the client.

Wherein, according to the preset total weight score and the number of the characteristic attributes of each client, setting the weight of each characteristic attribute of each client refers to: when the number of the characteristic attributes of one client is a and the preset weight is b, the weight of each characteristic attribute of the client is b/a. It should be noted that, when the algorithm b/a is actually implemented, the algorithm b/a may be implemented as b × (1/a), and this embodiment does not limit the calculation process of the server for calculating the weight of each feature attribute. The preset total weight score can be any reasonable value, such as: score 1, score 2, score 100, etc., and the specific numerical value of the total weight score is not limited in this embodiment.

Determining the weight information of each characteristic attribute of each client according to the operating frequency of each client and the weight of each characteristic attribute of each client means: when the running frequency of a client is c and the weight of each characteristic attribute of the client is b/a, the weight information of each characteristic attribute of the client is c × b/a.

Assuming that 4 preset feature attributes are respectively infants, teenagers, adults and the elderly, and the preset weight total is 1 point, for client sides "xx ninja" and "xxKTV" in which the same user equipment records user data, the server obtains that the feature attributes of "xx ninja" are teenagers, adults and the elderly, and the number of the feature attributes is 3, the weight set for the feature attribute of the teenager of "xx ninja" is 1/3-0.33 points; the weight set for the adult trait attribute of "xx ninja" is 1/3 ═ 0.33 points; the weight 1/3 is set to 0.33 for the client's age characteristic attribute. The server acquires that the feature attributes of the xxKTV are teenagers and adults, and the feature attributes are two, so that the weight set for the teenager attribute of the xxKTV is 1/2-0.5 points; the weight set for the adult attribute of "xxKTV" is 1/2 ═ 0.5 points.

If the operation frequency of the "xx ninja" is 1, the weight information of the teenager characteristic attribute of the "xx ninja" is 1 × 0.33 to 0.33, the weight information of the adult characteristic attribute is 1 × 0.33 to 0.33, and the weight information of the elderly characteristic attribute is 1 × 0.33 to 0.33.

If the operation frequency of the "xxKTV" is 2 times, the weight information of the teenager characteristic attribute of the "xxKTV" is 2 × 0.5 to 1 point, and the weight information of the adult characteristic attribute is 2 × 0.5 to 1 point.

The server adds the weight information of the child characteristic attributes of each client to obtain 0 point of the weight information of the child characteristic attributes; adding the weight information of the feature attributes of the teenagers of each client to obtain the weight information of the feature attributes of the teenagers of which the weight information is 0.33+ 1-1.33 points; adding the weight information of the adult characteristic attributes of each client to obtain the weight information of the adult characteristic attributes, wherein the weight information of the adult characteristic attributes is 0.33+1 to 1.33 points; and adding the weight information of the old age characteristic attribute of each client to obtain the weight information of the old age characteristic attribute of 0.33 point.

In another implementation manner, the user data of each client includes an operating frequency and an operating time period of the client, and the determining, according to the number of the characteristic attributes of each client, the weight information of each of at least one characteristic attribute of each client is determined, so as to obtain the weight information of each of the n types of characteristic attributes, including: setting the weight of each characteristic attribute of each client according to the preset total weight score and the number of the characteristic attributes of each client, wherein the weight and the number of the characteristic attributes of the clients are in a negative correlation relationship; determining a preset time period to which the operation time period of each client belongs, and determining the operation frequency of each client in the corresponding preset time period, wherein each preset time period corresponds to n characteristic attributes; for each client, determining the weight information of each characteristic attribute corresponding to each preset time period of the client according to the operating frequency of the client in each preset time period and the weight of each characteristic attribute of the client; and for all the clients recording the user data in each user device, adding the weight information of the same characteristic attribute in the same preset time period to obtain the weight information of the n characteristic attributes corresponding to each preset time period.

The running time period of each client may be a time period when the client runs in the foreground, or may also be a time period during which the client runs from the beginning to the end of the running period, where the time period includes a foreground running time period and a background running time period, and the present embodiment does not limit the determination manner of the running time period of each client.

Assuming that 4 preset time periods in the server are shown in the following table one, the preset 4 characteristic attributes are respectively infants, teenagers, adults and the elderly, and the preset weight is 1 point, for client sides 'xx ninja' and 'xxKTV' recording user data in the same user equipment, the server acquires that the characteristic attributes of the 'xx ninja' are teenagers, adults and the elderly, the number of the characteristic attributes is 3, and the server operates once in the daytime; the feature attributes of "xxKTV" were adolescent, adult, the number of feature attributes was 2, and the feature attributes were run once during the day and night, respectively. The server sets the weights of the 'xx ninja' and the 'xxKTV', and obtains the weight information of the 'xx ninja' and the 'xxKTV' according to the set weights as follows: the weight information of the teenager characteristic attribute of the 'xx honeysuckle' in the daytime is 1 × 0.33-0.33 point, the weight information of the adult characteristic attribute is 1 × 0.33-0.33 point, and the weight information of the old characteristic attribute is 1 × 0.33-0.33 point; the weighting information of the teenager characteristic attribute of the xxKTV in the daytime is 1 × 0.5-0.5 points, and the weighting information of the adult characteristic attribute is 1 × 0.5-0.5 points; the weight information of the feature attribute of the teenager at night is 1 × 0.5-0.5 points, and the weight information of the feature attribute of the adult is 1 × 0.5-0.5 points.

The server adds the weight information of the child characteristic attributes of each client in the daytime to obtain 0 point of the weight information of the child characteristic attributes in the daytime; adding the weight information of the feature attributes of the teenagers of each client in the daytime to obtain the weight information of the feature attributes of the teenagers of 0.33+ 0.5-0.83 points; adding the weight information of the adult characteristic attributes of each client in the daytime to obtain the weight information of the adult characteristic attributes, wherein the weight information of the adult characteristic attributes is 0.33+ 0.5-0.83 points; and adding the weight information of the old age characteristic attributes of each client in the daytime to obtain the weight information of the old age characteristic attributes of 0.33 point.

The server adds the weight information of the infant characteristic attributes of each client at night to obtain 0 point of the weight information of the infant characteristic attributes at night; adding the weight information of the feature attributes of the teenagers at each client terminal later to obtain the weight information of the feature attributes of the teenagers, wherein the weight information of the feature attributes of the teenagers is 0.33+ 0.5-0.83 min; adding the weight information of the adult characteristic attributes of each client at night to obtain the weight information of the adult characteristic attributes, wherein the weight information of the adult characteristic attributes is 0.33+ 0.5-0.83 points; and adding the weight information of the late old age characteristic attribute of each client to obtain the weight information of the old age characteristic attribute of 0.33 point.

Table one:

Early morning	[0:00，4:00)
		Daytime	[4:00，18:00)
at night	[18:00，21:00)
		At night	[21:00，0:00)

Step 303, clustering all clients recording user data in the same user equipment by using a preset clustering index k and the weight information of each characteristic attribute to obtain k categories, wherein each category comprises at least one client and the user equipment to which each client belongs, and k is a positive integer.

The server clusters all clients recording user data in the same user equipment by using a preset index k and the weight information of each characteristic attribute, the obtained k categories can reflect the relevance among different clients, and if the relevance among different clients is large, the clients are in the same category; if the relevance between different clients is small, the clients are in different categories. The Clustering algorithm adopted by the server when Clustering all clients recording user data in the same user equipment may be a Spectral Clustering (SC) algorithm or a k-means Clustering algorithm, which is not limited in this embodiment; the preset clustering index k is the clustering index with the best clustering effect selected from the server after the server carries out clustering for multiple times according to different clustering indexes.

The server uses preset index k and weight information of each characteristic attribute to cluster all clients recording user data in the same user equipment, and k categories obtained by clustering include: when the same kind of user equipment comprises m user equipment, generating an m multiplied by p dimensional characteristic matrix according to the weight information of the n characteristic attributes; when the user data does not include the running time period of each client, p is n; when the user data comprises the operation time period of each client and the number of the preset time periods is q, p is n multiplied by q; normalizing the characteristic matrix to obtain an m multiplied by p dimensional normalized matrix; and clustering the normalized matrix by using the clustering index k to obtain the k categories.

When the user data does not include the operation time period of each client, at this time, each user device corresponds to n data obtained according to the weight information of the n characteristic attributes, and therefore, the generated characteristic matrix is of dimension m × n; when the user data includes an operating time period of each client, and the number of the preset time periods is q, each preset time period corresponds to n kinds of characteristic attributes, so that each user equipment corresponds to n × q pieces of data obtained according to the weight information of the n kinds of characteristic attributes of each preset time period, and therefore, the generated matrix should be m × p, where p is n × q.

The normalization processing of the feature matrix means that each element in the feature matrix is unified to be within a range of [0, 1 ]. The server may use a max-min normalization algorithm when performing normalization processing on the feature matrix, which is not limited in this embodiment.

Assuming that user data collected by each user device in the same type of user devices is shown in the following table two, and the characteristic attribute of each client is shown in the following table three, if the preset weight is 1 point, the server calculates the weight information of n types of characteristic attributes corresponding to each user device according to the table two and the table three, and as shown in the following table four, the characteristic matrix can be obtained according to the following table four:

the normalization matrix obtained by normalizing the feature matrix by using a max-min normalization algorithm is as follows:

and clustering the normalized matrix by using a k-means clustering algorithm and a clustering index k to obtain k categories.

Table two:

user equipment	Client terminal	Frequency of operation
			User equipment 1	Children song medicine	3
User equipment 1	xx ninja	1
			User equipment 2	Hero x	4
User equipment 3	xxKTV	2
			User equipment 3	One-key cleaning	6
User equipment 4	xx ninja	2

Table three:

table four:

to improve the accuracy of extracting feature clients, a server usually collects user data recorded by each client in a large number of user devices, such as: 23640 user data recorded by each client of the user equipments, so that the dimension of the generated normalization matrix is very high, and when the server uses a clustering algorithm to cluster the normalization matrix, the calculation amount is very large, redundant data exists in the normalization matrix, and the stability of the data is not high. In this embodiment, after obtaining the normalized matrix, the server performs dimension reduction processing on the m × p-dimensional normalized matrix by using a preset dimension reduction algorithm and a preset dimension reduction index l to obtain an m × l-dimensional dimension reduction matrix; and clustering the dimensionality reduction matrix by using a clustering index k to obtain k categories.

The server can determine a group of linear irrelevant feature vectors which can represent the effective information of the normalized matrix to the maximum extent from the normalized matrix by using a preset dimension reduction algorithm and a dimension reduction index l, and the dimension of the group of feature vectors is smaller than that of the normalized matrix, so that the complexity of a clustering algorithm when the server performs clustering is reduced, redundant data in the normalized matrix is removed, and the stability of data used when the server performs clustering is improved. The preset dimension reduction algorithm may be a Principal Component Analysis (PCA) algorithm, or may also be Non-Negative Matrix Factorization (NMF), which is not limited in this embodiment; the preset dimension reduction index l is a dimension reduction index which is selected from a plurality of m × l dimension data variances in the m × p normalized matrix after the variances of the m × l dimension data in the normalized matrix are calculated according to different dimension reduction indexes, and the calculation result reaches 90% of the variances of the normalized matrix.

Assuming that the preset clustering index k is 12, the server clusters the dimensionality reduction matrix by adopting a spectral clustering method, and the obtained clustering result is shown in the following table five.

Table five:

and step 304, extracting at least one characteristic client from the k categories, wherein the characteristic client is used for reflecting the common interests of the target user group of the user equipment.

Wherein, the server extracts at least one characteristic client from the k categories, including: determining a central client of each of k categories, wherein the value obtained by dividing the number of user equipment to which the central client belongs by the number of user equipment included in the category is greater than a first preset threshold; determining the category with the largest number of included user equipment in j categories with the central client, and determining at least one central client of the determined categories as at least one characteristic client, wherein j is more than 0 and less than or equal to k.

In this embodiment, the client whose value obtained by dividing the number of the user devices belonging to the category by the number of the user devices included in the category is greater than the first preset threshold is used as the central client, and the characteristic client is determined from the central client, so that the characteristic client extracted by the server is a client that most users use, and the accuracy of reflecting the common interests of the target user group of the user devices by the characteristic client is high.

Assuming that the first preset threshold is 70%, the server takes the client with the value of dividing the number of the belonged user equipment by the number of the user equipment included in the category, which is greater than 70%, as the central client. Assuming that the central client of each category determined from the respective categories shown in table five is shown in table six below, it can be seen from table six that there is no central client in the categories cluster-5, cluster-6, cluster-10, cluster-11 and cluster-12, i.e. there are no clients in these categories whose number of belonging user equipments divided by the number of user equipments comprised by the category exceeds 70%. As can be seen from the table V, in the category where the central client exists, the number of the user devices of the category cluster-1 is 877 at most, and then the server takes the central client "xxKTV" of cluster-1 as the feature client.

Table six:

categories	Central client	Categories	Central client
				cluster-1	xxKTV	cluster-9	xx magic hawk
cluster-2	x attacking qibing 2	cluster-9	Fry xx glacier xx
				cluster-3	Qxx	cluster-9	Overlord xx
cluster-4	xx puzzles	cluster-9	xx plan
				cluster-7	xx one-key cleanup	cluster-9	xx ball
cluster-9	xx TV edition of large war zombie	cluster-9	xx Da ren
				cluster-9	xx fish	cluster-9	xx knight 2
cluster-9	xx ninja	cluster-9	xx world
				cluster-9	xx escape and death

Optionally, in order to extract more feature clients, after obtaining at least one feature client, when the number of the at least one feature client is r, the server obtains an identifier of each feature client, and obtains n + r feature attributes by using the identifier of each client in the r clients as a feature attribute, where r is a positive integer; updating n to n + r, and executing steps 302 to 304 until the client fails to extract features in step 304.

The identifier of the feature client may be a name of the feature client, and may also be an identity Identifier (ID) of the feature client, which is not limited in this embodiment.

Assuming that the feature client extracted from table five and table six is "xxKTV", the weight information of n feature attributes corresponding to each user equipment is shown in table four, the server determines the name of xxKTV as one feature attribute, and obtains 4+1 — 5 feature attributes, and the server determines the weight information of each of at least one feature attribute of each client according to the number of updated feature attributes of each client, as shown in table seven below, where the weight information of xxKTV feature attributes of user equipment 4 using xxKTV in the daytime is 1 point.

TABLE VII:

optionally, the server may also intelligently analyze clients that may be interested by the user, and push the analysis results to the user device. The server intelligently analyzes the client which is possibly interested by the user, and the method comprises the following steps: for each category in the k categories, counting the number of user equipment to which each client belongs in the category; determining the clients with the number of the user equipment larger than a fourth preset threshold value as clients to be recommended; and recommending the client to be recommended to the user equipment which does not install the client to be recommended in the category.

Assuming that more than 50% of the user devices in a category run the xx catamaran TV edition, the server recommends the xx catamaran TV edition to the user devices in the category that do not have the xx catamaran TV edition installed.

Optionally, the server may further analyze whether the user equipment needs to be optimized according to the time when more than a predetermined number of user equipments operate the client.

Supposing that there are 122 pieces of user equipment operating xxKTV, the user equipment operates xx one-key cleaning, and the server analyzes and learns that the user equipment generates more caches after operating xxKTV according to the opportunity of the user equipment operating the client, so that the user equipment of the type needs to be optimized.

Optionally, the server may further recommend, according to at least two types of clients operated by more than a predetermined number of user devices, the remaining types of clients to the user devices that have operated only a part of the types of clients of the at least two types of clients.

Assuming that there are 153 clients running on the user device including xx combat zombie TV editions and xx fanciful soldiers 2, the server will recommend xx fanciful soldiers 2 to the user device running only x combat zombie TV editions.

Step 305, generating a first user tag according to the user data recorded by each client, and generating a second user tag according to at least one characteristic client.

The server generates a first user label according to the user data recorded by each client, such as a night cat, a Beijing person and the like; the server generates a second user tag, such as Karaoke, Cleaner, etc., according to the at least one feature client. Because the first user tag is generated according to the user data recorded by each client, the effect that the first user tag reflects the relevance between target groups of the same type of user equipment is poor; the second user label is generated according to the feature client, and the feature client can reflect the hidden relevance among the target groups of the same type of user equipment, so that the effect of reflecting the previous relevance of the target groups of the same type of user equipment by the second user label is better.

To sum up, the user tag generation method provided in the embodiment of the present invention determines the weight information of each feature attribute of each client that records user data in the user equipment, obtains the weight information of each feature attribute of the n feature attributes, clusters all clients that record user data in the user equipment of the same type according to the preset clustering index k and the weight information of each feature attribute, obtains k categories, extracts at least one feature client from the k categories, the server can generate the user label according to the user data and the at least one characteristic client, so that when the server generates the user label only according to the user data, and under the condition of less user data, the generated user tags are less, so that the effect of increasing the number of the generated user tags is achieved.

In addition, each characteristic attribute of each client is provided with a weight, and the weight and the number of the characteristic attributes of the client are in a negative correlation relationship; determining the weight information of each characteristic attribute of each client according to the running frequency of each client and the weight of each characteristic attribute of each client; for all the clients recording user data in each user device, the weight information of the same characteristic attribute is added to obtain the weight information of n characteristic attributes, so that the weight information of the n characteristic attributes obtained by the server is in positive correlation with the operating frequency of the client, the using habit of each client used by a user is reflected, and the accuracy of the generated second user label is ensured.

In addition, each characteristic attribute of each client is provided with a weight, and the weight and the number of the characteristic attributes of the client are in a negative correlation relationship; for each client, determining the weight information of each characteristic attribute corresponding to each preset time period of the client according to the operating frequency of the client in each preset time period and the weight of each characteristic attribute of the client; for all the clients recording user data in each user device, the weight information of the same characteristic attribute in the same preset time period is added to obtain the weight information of n characteristic attributes corresponding to each preset time period, so that the habit of the user using each client is reflected more accurately by the weight information of the characteristic attribute in n obtained by the server, and the server can generate a second user label by referring to more types of user data.

In addition, the dimension reduction processing is carried out on the normalized matrix, so that the calculation amount of the server for executing the clustering algorithm by using the clustering index k is reduced, and the efficiency of the server for executing the clustering algorithm is improved; redundant data in the normalized matrix is deleted, and the stability of the data when the server carries out a clustering algorithm is improved.

In addition, by determining the central client of each category, determining the category with the largest number of user equipment in j categories with the central client, and determining at least one central client of the category as at least one feature client, the at least one feature client obtained by the server is the client used by most users, so that the common interests of the most users can be reflected, and the accuracy of the feature client determined by the server is ensured.

Referring to fig. 4, a block diagram of a user tag generation apparatus according to an embodiment of the present invention is shown. The user tag generation means may be implemented as all or part of the user device in software, hardware or a combination of both. The user tag generation apparatus may include: the device comprises an acquisition unit 410, a determination unit 420, a clustering unit 430, an extraction unit 440, a generation unit 450, an updating unit 460, a statistics unit 470 and a recommendation unit 480.

An obtaining unit 410, configured to implement the function of step 301 and the function of obtaining the identifier of each feature client in step 304, and obtain i + r kinds of feature attributes by using the identifier of each client in r clients as a kind of feature attribute.

A determining unit 420, configured to implement the function of step 302, and determine, in step 304, a client whose number of the user devices is greater than a fourth preset threshold as a function of the client to be recommended.

A clustering unit 430, configured to implement the function of step 303.

An extracting unit 440, configured to implement the function of step 304.

A generating unit 450, configured to implement the function of step 305.

An updating unit 460, configured to implement the function of updating n to n + r in step 304.

A counting unit 470, configured to implement the function of counting, for each of the k categories in step 304, the number of user equipments to which each client belongs.

The recommending unit 480 is configured to implement the function of recommending the to-be-recommended client to the user equipment with no to-be-recommended client installed in the category in step 304.

The relevant details may be combined with the method embodiment described with reference to fig. 3.

It should be noted that the obtaining unit 410, the determining unit 420, the clustering unit 430, the extracting unit 440, the generating unit 450, the updating unit 460, the counting unit 470, and the recommending unit 480 may be implemented by a processor in a user equipment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatuses and units described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be merely a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A user tag generation method, the method comprising:

for each client recorded with user data in the same user equipment, respectively obtaining at least one feature attribute which is the same as n preset feature attributes from the feature attributes of the client, wherein the user data is used for reflecting the operation executed on the client by a user using the client, the feature attributes are used for reflecting the features commonly possessed by a target user group of the client, and n is a positive integer;

determining the weight information of each characteristic attribute in at least one characteristic attribute of each client according to the number of the characteristic attributes of each client to obtain the weight information of each characteristic attribute in the n characteristic attributes; clustering all clients recording user data in the same type of user equipment by using a preset clustering index k and weight information of each characteristic attribute to obtain k categories, wherein each category comprises at least one client and the user equipment to which each client belongs, and k is a positive integer;

extracting at least one feature client from the k categories, the feature client being used for reflecting common interests of a target user group of the user equipment; generating a first user label according to the user data recorded by each client, and generating a second user label according to the at least one characteristic client;

after the extracting at least one feature client from the k categories, the method further includes:

when the number of the at least one characteristic client is r, acquiring an identifier of each characteristic client, and taking the identifier of each client in the r clients as a characteristic attribute to obtain n + r characteristic attributes, wherein r is a positive integer;

updating n to n + r, triggering and executing the number of the characteristic attributes of each client, and determining the weight information of each characteristic attribute in at least one characteristic attribute of each client to obtain the weight information of each characteristic attribute in the n characteristic attributes; clustering all clients recording user data in the same type of user equipment by using a preset clustering index k and the weight information of each characteristic attribute to obtain k categories, and extracting at least one characteristic client from the k categories until the characteristic client is failed to be extracted.

2. The method according to claim 1, wherein the user data of each client includes an operating frequency of the client, and the determining the weight information of each of the at least one characteristic attribute that the client has according to the number of the characteristic attributes that the client has obtains the weight information of each of the n characteristic attributes includes:

setting the weight of each characteristic attribute of each client according to a preset total weight score and the number of the characteristic attributes of each client, wherein the weight and the number of the characteristic attributes of each client are in a negative correlation relationship;

determining the weight information of each characteristic attribute of each client according to the running frequency of each client and the weight of each characteristic attribute of each client;

and adding the weight information of the same characteristic attribute for all the clients recording the user data in each user device to obtain the weight information of the n characteristic attributes.

3. The method according to claim 1, wherein the user data of each client includes an operating frequency and an operating time period of the client, and the determining the weight information of each of the at least one characteristic attribute of each client according to the number of the characteristic attributes of each client to obtain the weight information of each of the n characteristic attributes comprises:

determining a preset time period to which the operation time period of each client belongs, and determining the operation frequency of each client in the corresponding preset time period, wherein each preset time period corresponds to the n characteristic attributes;

for each client, determining the weight information of each characteristic attribute corresponding to each preset time period of the client according to the operating frequency of the client in each preset time period and the weight of each characteristic attribute of the client;

and for all the clients recording user data in each user device, adding the weight information of the same characteristic attribute in the same preset time period to obtain the weight information of the n characteristic attributes corresponding to each preset time period.

4. The method according to claim 2 or 3, wherein the clustering all the clients in the user equipment of the same type, in which user data is recorded, by using a preset clustering index k and weight information of each feature attribute to obtain k categories, includes:

when the user equipment of the same kind comprises m user equipment, generating a characteristic matrix of m multiplied by p dimensions according to the weight information of the n characteristic attributes, and when the user data does not comprise the operation time period of each client, p is n; when the user data comprises an operation time period of each client and the number of preset time periods is q, p is n × q;

carrying out normalization processing on the characteristic matrix to obtain an m multiplied by p dimensional normalization matrix;

and clustering the normalized matrix by using the clustering index k to obtain the k categories.

5. The method according to claim 4, wherein the clustering the normalized matrix by using the clustering index k to obtain the k classes comprises:

performing dimensionality reduction processing on the m multiplied by p dimensional normalized matrix by using a preset dimensionality reduction algorithm and a preset dimensionality reduction index l to obtain an m multiplied by l dimensional dimensionality reduction matrix;

and clustering the dimensionality reduction matrix by using the clustering index k to obtain the k categories.

6. The method of claim 4, wherein said extracting at least one feature client from said k classes comprises:

determining a central client of each of the k categories, wherein a value obtained by dividing the number of user equipment to which the central client belongs by the number of user equipment included in the category is greater than a first preset threshold;

determining the category with the largest number of user equipment in j categories with the central client, determining at least one central client of the determined categories as the at least one characteristic client, wherein j is more than 0 and less than or equal to k.

7. The method according to claim 2 or 3, wherein the obtaining, for each client that records user data in the same kind of user equipment, at least one feature attribute that is the same as the preset n feature attributes from the feature attributes of the client respectively comprises:

for each client recorded with user data in the same user equipment, collecting the user data recorded by the client;

filtering all user data recorded by the client according to a preset rule, wherein the preset rule is that the running time of the client recording the user data is less than a second preset threshold, or the running time of the client recording the user data is greater than a third preset threshold;

and when the filtered user data exists, acquiring at least one characteristic attribute which is the same as the preset n characteristic attributes from the characteristic attributes of the client.

8. The method of claim 1, wherein after clustering all the clients in the user equipment of the same type, in which user data is recorded, by using a preset clustering index k and weight information of each feature attribute, to obtain k categories, the method comprises:

for each category in the k categories, counting the number of user equipment to which each client in the category belongs;

determining the clients with the number of the user equipment larger than a fourth preset threshold value as clients to be recommended;

and recommending the client to be recommended to the user equipment which does not install the client to be recommended in the category.

9. An apparatus for generating user tags, the apparatus comprising:

an obtaining unit, configured to obtain, for each client that records user data in the same type of user equipment, at least one feature attribute that is the same as n preset feature attributes from feature attributes that the client has, where the user data is used to reflect an operation performed on the client by a user using the client, the feature attributes are used to reflect features that a target user group of the client commonly has, and n is a positive integer;

a determining unit, configured to determine, according to the number of feature attributes of each client, weight information of each feature attribute of at least one feature attribute of each client, to obtain the weight information of each feature attribute of the n feature attributes;

a clustering unit, configured to cluster all clients, in which user data is recorded, in the user equipment of the same type by using a preset clustering index k and the weight information of each feature attribute obtained by the determining unit, to obtain k classes, where each class includes at least one client and the user equipment to which each client belongs, and k is a positive integer;

an extracting unit, configured to extract at least one feature client from the k categories obtained by the clustering unit, where the feature client is used to reflect a common interest of a target user group of the user equipment;

the generating unit is used for generating a first user label according to the user data recorded by each client and generating a second user label according to the at least one characteristic client extracted by the extracting unit;

the obtaining unit is configured to, after the at least one feature client is extracted from the k categories, obtain an identifier of each feature client when the number of the at least one feature client is r, obtain n + r feature attributes by using the identifier of each client in the r clients as a feature attribute, where r is a positive integer;

the device further comprises:

the updating unit is used for updating n to n + r, triggering and executing the weight information of each characteristic attribute in at least one characteristic attribute of each client according to the number of the characteristic attributes of each client, and obtaining the weight information of each characteristic attribute in the n characteristic attributes; clustering all clients recording user data in the same type of user equipment by using a preset clustering index k and the weight information of each characteristic attribute to obtain k categories, and extracting at least one characteristic client from the k categories until the characteristic client is failed to be extracted.

10. The apparatus of claim 9, wherein the user data of each client comprises an operating frequency of the client, and wherein the determining unit is configured to:

11. The apparatus of claim 9, wherein the user data of each client comprises an operating frequency and an operating time period of the client, and wherein the determining unit is configured to:

12. The apparatus according to claim 10 or 11, wherein the clustering unit is configured to:

when the user equipment of the same kind comprises m user equipment, generating an m × n-dimensional feature matrix according to the weight information of the n feature attributes, and when the user data does not comprise the operation time period of each client, p is n; when the user data comprises an operation time period of each client and the number of preset time periods is q, p is n × q;

13. The apparatus according to claim 12, wherein the clustering unit is configured to:

14. The apparatus of claim 12, wherein the extraction unit is configured to:

15. The apparatus according to claim 10 or 11, wherein the obtaining unit is configured to:

16. The apparatus of claim 9, further comprising:

a counting unit, configured to cluster all the clients that have recorded user data in the user equipment of the same type by using a preset clustering index k and weight information of each feature attribute to obtain k categories, and then count, for each category of the k categories, the number of user equipment to which each client in the category belongs;

the determining unit is used for determining the clients with the number of the user equipment larger than a fourth preset threshold value as the clients to be recommended;

and the recommending unit is used for recommending the client to be recommended to the user equipment which does not install the client to be recommended in the category.

17. A server, characterized in that the server comprises:

the wireless transceiver is connected with the processor;

the wireless transceiver is configured to be controlled by the processor for implementing the method of any of claims 1-8.