CN110874609B

CN110874609B - User clustering method, storage medium, device and system based on user behaviors

Info

Publication number: CN110874609B
Application number: CN201811026024.9A
Authority: CN
Inventors: 肖源
Original assignee: Wuhan Douyu Network Technology Co Ltd
Current assignee: Wuhan Douyu Network Technology Co Ltd
Priority date: 2018-09-04
Filing date: 2018-09-04
Publication date: 2022-08-16
Anticipated expiration: 2038-09-04
Also published as: CN110874609A

Abstract

The invention discloses a user clustering method, a storage medium, equipment and a system based on user behaviors, and relates to the field of Internet live broadcast. And taking each behavior parameter as an M-dimensional space coordinate value, and calculating the Euclidean distance between any two users as a behavior gap value. If the behavior gap value between N users is smaller than a preset threshold value and N is larger than the preset intensity number, the N users are classified into a cluster, wherein N is an integer larger than 1. The invention can better divide users with smaller difference into one group/cluster.

Description

User clustering method, storage medium, device and system based on user behaviors

Technical Field

The invention relates to the field of internet live broadcast, in particular to a user clustering method based on user behavior, a storage medium, equipment and a system.

Background

The live broadcast platform is used as a general entertainment mass platform, has massive user resources, the resources form a strong mass foundation, and users can exchange and share information on live broadcast rooms, posts and other internet platforms. Meanwhile, fun is added for supporting the anchor and live broadcast classification, the adhesion degree of a user and a platform is increased, people who have common purposes or interest are gathered by the social network site, great convenience is provided for interpersonal communication, and the method is a brand-new social mode outside the real society.

However, current live platform social networking is based on offline notifications of user-spontaneous organizations and activities, and it is not reasonable to match users who are consistent in interests but not in actions and speaking. The online time of some users is long, some users are not on line, some users like to send the barrage, some users watch the live broadcast rooms widely and some users are fixed, if the users are classified into one type, the better user classification effect cannot be achieved, the behavior difference of the users is large, the users are inconsistent, and if the users are directly classified into the same team, the collective activities of the teams are poor.

Therefore, there is a need for an interestingness matching algorithm that finds groups of users that match more highly, promotes these groups to a team, and gives the team some reward, stimulating the group's liveness.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a user clustering method, a storage medium, equipment and a system based on user behaviors for user clustering based on user behaviors, which can calculate the difference in behavior characteristics among users with the same interest according to the behavior parameters of the users on a live broadcast platform, thereby better dividing the users with smaller difference into a team and being more reasonable.

To achieve the above object, in a first aspect, an embodiment of the present invention provides a user clustering method based on user behaviors, including:

acquiring M behavior parameters of each user under the same behavior, wherein M is an integer larger than 0;

taking each behavior parameter as an M-dimensional space coordinate value, and calculating the Euclidean distance between any two users as a behavior gap value;

if the behavior gap value between N users is smaller than a preset threshold value and N is larger than the preset intensity number, the N users are classified into a cluster, wherein N is an integer larger than 1.

Preferably, the behavior parameters comprise viewing parameters and barrage content, and the viewing parameters comprise user viewing duration, viewing time period, number of concerned anchor and/or amount of sending virtual gifts;

calculating a deviation value between users in the viewing habit as a viewing deviation value according to the viewing parameters, and calculating a deviation value between the users in the bullet screen expression as an expression deviation value according to the bullet screen content;

and calculating a user behavior deviation value according to the viewing deviation value and the expression deviation value.

Preferably, the calculating of the deviation value between users in viewing habits according to the viewing parameters is used as the viewing deviation value:

the behavior parameters comprise A viewing parameters, the value of the A viewing parameters of each user is used as a coordinate value of an A-dimensional space, the Euclidean distance between two users is calculated to be used as a viewing deviation value, and A is an integer larger than 0.

Preferably, the calculating of the deviation value between the users on the barrage term according to the barrage content is used as the term deviation value:

and converting the phrase into a B-dimensional word vector by using a word2vec model, and calculating the Euclidean distance between word vectors corresponding to two users as a language deviation value.

Preferably, the calculating the user behavior deviation value according to the viewing deviation value and the term deviation value specifically includes:

the viewing deviation and the language deviation are respectively provided with corresponding weights, and the behavior gap is equal to the sum of the viewing deviation and the language deviation multiplied by the weights.

Preferably, the specific formula used by the preset algorithm is as follows:

wherein D is _XY As the behavior deviation of user X and user Y, M is the total number of viewing parameters, n is the total number of word vector parameters, M _Xi For the ith viewing parameter, M, of user X _Yi Is the ith viewing parameter, L, of user Y _Xi Is the i-th word vector parameter, L, of user X _Yi Is the ith word vector parameter, W, of user Y ₁ For the weight of viewing bias, W ₂ Is the weight of the linguistic deviation.

Preferably, if the behavior gap value between the N users is smaller than a preset threshold and N is greater than a preset intensity number, grouping the N users into a cluster includes the following steps:

a, setting all the users with behavior differences smaller than a preset threshold value with the users which are not clustered as users to be confirmed, and selecting one user to be confirmed;

b, counting the number of users of which the behavior difference with the user to be confirmed is smaller than a preset threshold, if the number is larger than the preset intensity number, classifying the users of which the behavior difference with the user to be confirmed is smaller than the preset threshold and the users of which the behavior difference with the users not clustered is smaller than the preset threshold into the same cluster, and setting the users of the cluster as the users to be confirmed;

and C, executing the step B aiming at each user to be confirmed until all the users to be confirmed finish confirmation.

In a second aspect, an embodiment of the present invention further provides a user clustering system based on user behaviors, which includes:

the statistical module is used for acquiring M behavior parameters of each user under the same behavior, wherein M is an integer greater than 0;

the computing module is used for taking each behavior parameter as an M-dimensional space coordinate value and computing the Euclidean distance between any two users as a behavior gap value;

and the clustering module is used for classifying the N users into a cluster if the behavior gap value between the N users is smaller than a preset threshold value and N is larger than a preset density number, wherein N is an integer larger than 1.

In a third aspect, an embodiment of the present invention further provides a storage medium, where a computer program is stored on the storage medium, and when being executed by a processor, the computer program implements the method in the embodiment of the first aspect.

In a fourth aspect, an embodiment of the present invention further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program running on the processor, and the processor executes the computer program to implement the method in the first aspect.

Compared with the prior art, the invention has the advantages that:

the invention clusters the users with the same interest, collects a plurality of behavior characteristics of the users, calculates the value of the behavior difference between any two users through an algorithm, finds out people with closer user data and more people through the delineation of the value, thereby forming a user group with higher behavior similarity, namely combining the users with smaller behavior difference, leading the user experience to be better and the user adhesion to be higher.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings corresponding to the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a user clustering method based on user behavior according to an embodiment of the present invention;

FIG. 2 is another flow chart of a user clustering method based on user behavior according to the present invention;

fig. 3 is a schematic structural diagram of a user clustering system based on user behaviors according to the present invention.

In the figure: 1-a statistic module, 2-a calculation module and 3-a clustering module.

Detailed Description

Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, embodiments of the present invention provide a user clustering method, a storage medium, a device, and a system based on user behavior for user clustering based on user behavior, which calculate differences between users through behavior parameters of users of the same interest, define differences and densities between users, perform user clustering, help users find user groups with higher matches on a platform, promote the groups into a team, give a certain reward to the team, and stimulate the activity of the group.

In order to achieve the technical effects, the general idea of the application is as follows:

In summary, the present invention obtains the behavior parameters generated when the user watches live broadcast, measures the behavior parameters by using a specific formula to generate a total parameter, and then calculates the behavior gap between every two users. If multiple users in a particular area behave closely and are relatively dense, the users may be grouped into a cluster, i.e., a group.

The above-mentioned specific formula may be directly set by a plurality of experiments, or the user behavior parameter may be directly used as the vector coordinate of the multidimensional space, and the vector distance may be calculated by the formula. So long as the behavior gap of the user can be uniformly measured by a specific value.

In addition, the vector distance herein is a euclidean distance obtained by taking the behavior parameter of the user as a coordinate parameter of a vector of a multidimensional space and calculating a coordinate distance between two users.

Further, for a specific area, it may be set based on a plurality of trial assurance clusters, for example, if the set area is too wide, it may be possible to actually divide a plurality of clusters into one cluster. And if the set area is too narrow, 1 cluster may be scattered, or even the clusters cannot be synthesized, and due to factors such as parameters and platforms, the collected data forms obtained by the clusters are different, so that the area needs to be adjusted and set through multiple tests according to different conditions.

Meanwhile, in the number of other users of which the behavior difference with the non-clustered user is smaller than a preset threshold value, the other users

In order to better understand the technical solution, the following detailed description is made with reference to specific embodiments.

Example one

The embodiment of the invention provides a user clustering method based on user behaviors, which comprises the following steps:

s1: obtaining M behavior parameters of each user under the same behavior, wherein M is an integer larger than 0.

Specifically, the behavior parameters of each user are parameters generated when the user views and interacts with the network platform. Partial habits and preferences of users can be known through the characteristic platforms, so that users with similar habits and preferences can be found

For example, for a live platform, the behavioral parameter may be the reward data consumed on the live platform, the delivered barrage data, the number of anchor under the interest horizon of interest, the length of time the live broadcast was viewed within the last 3 months, the earliest starting time per day within the last 3 months, and the latest ending time per day within the last 3 months.

Further, for the platform, the bullet screen sent by the platform also comprises habits, culture levels, preferences and the like of the user, so that the information of the user can be further acquired according to words in the bullet screen of the user.

For example, the M behavior parameters include a viewing parameters and B barrages sent by the user, where a and B are not integers greater than 0, and the viewing parameters include/are parameters generated by the user viewing a live broadcast on a live broadcast platform.

S2: and taking each behavior parameter as an M-dimensional space coordinate value, and calculating the Euclidean distance between any two users as a behavior gap value.

Specifically, each behavior parameter of the user is the most one dimension, a plurality of behavior parameters are used as a multi-dimensional space vector to express the coordinate of the behavior feature of the user in the multi-dimensional space, and the actual behavior gap of the two users can be calculated through calculating the vector distance between the two users through the actual numerical value.

As another optional trial scheme, the behavior parameters of the user are divided into viewing parameters and barrage:

s201, acquiring the viewing parameters of two users, calculating the viewing numerical values of the users according to the viewing parameters of the two users, and multiplying the viewing numerical values by the viewing weights to obtain viewing deviations;

s202, acquiring the barrages of two users, calculating the language values of the users according to the barrages of the two users, and multiplying the language values by the language weight to obtain a language deviation;

and S203, adding the viewing deviation and the language deviation to obtain a behavior deviation.

Specifically, in addition to the barrage sent by the user, parameters of various aspects of interaction between the user and the live platform are used as viewing parameters, and differences among the users are specifically calculated through the viewing parameters. And for the bullet screen sent by the user, carrying out independent calculation to obtain the language deviation of the user.

For example, according to the viewing parameters of the two users, the viewing values of the users are calculated, specifically: and taking the value of the A viewing parameters of each user as a coordinate value of the A-dimensional space, and calculating the Euclidean distance between the two users as a viewing numerical value.

The viewing parameters of the users are used as the coordinates of the space vectors, and the difference of the behaviors of the two users on the interaction with the platform can be visually embodied in numerical value by calculating the vector distance, namely the Euclidean distance.

For example, the language numerical value of the user is calculated according to the bullet screens of the two users, specifically: dividing words of B barrages of two users to obtain a word group of each user, converting the word group into word vectors by using a word2vec model, calculating the vector distance of the two users according to the word vectors of the users, and taking the vector distance as a language numerical value.

Further, dividing words of bullet screens of two users to obtain word groups of each user, converting the word groups into word vectors by using a word2vec model, calculating the vector distance of the two users according to the word vectors of the users, and taking the vector distance as a language numerical value.

It should be noted that the word segmentation may be performed on the bullet screen of the user by using NLPIR, and the NLPIR may extract text information for the chinese word segmentation system and separate words from the text information. The live broadcast platform can also set and search common words according to the bullet screen language of the user, so that word segmentation is carried out, and the live broadcast platform can carry out word segmentation on the bullet screen language only.

It should be noted that word2vec is a group of correlation models used to generate word vectors. These models are shallow, two-level neural networks that are trained to reconstruct linguistic word text, which can be used to map each word to a vector, which can be used to represent word-to-word relationships.

The bullet screen language of the user is converted into word vectors, the vector distance of the word vectors is calculated to reflect the language difference of the user, the calculation is similar to the calculation of behavior difference, and the deviation of the user can be reflected on the language better.

As a preferable scheme, when the viewing deviation and the language deviation are added to obtain the behavior deviation, different weights are set for the viewing deviation and the language deviation, and differences embodied on the bullet screen by different users and differences on interaction of the website platform cannot be treated equally, so that different weights need to be set, that is, the viewing deviation, the language deviation and the respective weights are added to obtain the behavior deviation.

Further, since the language value is greatly influenced by the website display content and the like, and has a large uncertainty, the language deviation weight is set to be smaller than the viewing deviation weight.

Specifically, the formula for calculating the inter-user variability is:

wherein D is _XY Is the behavioral deviation of user X and user Y, M is the total number of viewing parameters, n is the total number of word vector parameters, M _Xi For the ith viewing parameter, M, of user X _Yi Is the ith viewing parameter, L, of user Y _Xi Is the i-th word vector parameter, L, of user X _Yi Is the i-th word vector parameter, W, of user Y ₁ For the weight of viewing bias, W ₂ Is the weight of the linguistic deviation.

S3: if the behavior gap value between N users is smaller than a preset threshold value and N is larger than the preset intensity number, the N users are classified into a cluster, wherein N is an integer larger than 1.

After calculating the differences among all users and expressing the differences through numerical values, the convergence condition of all users in the multidimensional space can be reflected, and the users with close distances and high convergence degree can be classified into one type.

Specifically, as shown in fig. 2, grouping the non-clustered users and other users whose behavior gap is smaller than a preset threshold into a cluster further includes the following steps:

In a multidimensional space, a user-assembled shape may not be completely covered centering on one user, and therefore, further search is required on the already assembled part.

For example, in a three-dimensional space, a cluster of users are converged into an L-shape, and a user is used for convergence, which may only converge the horizontal or vertical of the L-shape, but not all the users belong to the same cluster, and after the horizontal or vertical convergence is determined, the convergence degree around the cluster of users is further confirmed one by one, that is, other parts of the L-shape are checked and added, thereby completing the convergence of the users.

Based on the same inventive concept, the present application provides the second embodiment, which is as follows.

Example two

As shown in fig. 3, an embodiment of the present invention provides a user clustering system based on user behaviors, which includes:

and the clustering module is used for classifying the N users into a cluster if the behavior gap value among the N users is smaller than a preset threshold value and N is larger than a preset intensity number, wherein N is an integer larger than 1.

Various modifications and specific examples in the foregoing method embodiments are also applicable to the system of the present embodiment, and the detailed description of the method is clear to those skilled in the art, so that the detailed description is omitted here for the sake of brevity.

Based on the same inventive concept, the present application provides the third embodiment.

EXAMPLE III

A third embodiment of the invention provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out all or part of the method steps of the first embodiment.

The present invention can implement all or part of the flow in the method of the first embodiment, and can also be implemented by using a computer program to instruct related hardware, where the computer program can be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments can be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

Based on the same inventive concept, the present application provides the fourth embodiment.

Example four

The fourth embodiment of the present invention further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program running on the processor, and the processor executes the computer program to implement all or part of the method steps in the first embodiment.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like which is the control center for the computer device and which connects the various parts of the overall computer device using various interfaces and lines.

The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the computer device by running or executing the computer programs and/or modules stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Generally speaking, the user clustering method, the storage medium, the device and the system based on the user behavior and based on the user behavior provided by the embodiment of the invention express the behavior parameters of the users in the multi-dimensional space, and calculate the vector distance between the users, so as to obtain the user groups with higher convergence and density degree in the multi-dimensional space, and classify the user groups into one class, thereby better providing the user groups/clusters with the same interest for the users.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A user clustering method based on user behaviors is characterized by comprising the following steps:

if the behavior difference value between N users is smaller than a preset threshold value and N is larger than a preset intensity number, classifying the N users into a cluster, wherein N is an integer larger than 1;

the behavior parameters comprise viewing parameters and barrage content, and the viewing parameters comprise user viewing duration, viewing time period, attention anchor number and/or virtual gift sending amount;

calculating a user behavior deviation value according to the viewing deviation value and the expression deviation value;

the specific formula for calculating the user behavior deviation value is as follows:

2. The user clustering method based on user behaviors as claimed in claim 1, wherein the calculating of the deviation value between users in viewing habits as the viewing deviation value is based on the viewing parameters:

3. The user clustering method based on user behaviors as claimed in claim 1, wherein the deviation value between users on the barrage term is calculated as the term deviation value according to the barrage content:

and converting the bullet screen content into B-dimensional word vectors by using a word2vec model, and calculating the Euclidean distance between the word vectors corresponding to the two users as a language deviation value.

4. The method according to claim 1, wherein if the behavior gap value between N users is smaller than a preset threshold and N is greater than a preset intensity number, the grouping N users into a cluster comprises the following steps:

s1, setting all users with behavior differences smaller than a preset threshold value with users who are not clustered as users to be confirmed, and selecting one user to be confirmed;

s2, counting the number of users of which the behavior difference with the user to be confirmed is smaller than a preset threshold, if the number is larger than the preset intensity number, classifying the users of which the behavior difference with the user to be confirmed is smaller than the preset threshold and the users of which the behavior difference with the users not clustered is smaller than the preset threshold into the same cluster, and setting the users of the cluster as the users to be confirmed;

s3, for each user to be confirmed, executing step S2 until all users to be confirmed are confirmed.

5. A storage medium having a computer program stored thereon, characterized in that: the computer program, when executed by a processor, implements the method of any of claims 1 to 4.

6. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program for execution on the processor, the electronic device comprising: the processor, when executing the computer program, implements the method of any of claims 1 to 4.

7. A user clustering system based on user behavior, comprising:

the clustering module is used for classifying N users into a cluster if the behavior difference value between the N users is smaller than a preset threshold value and N is larger than a preset intensity number, wherein N is an integer larger than 1;