CN111667018B

CN111667018B - Object clustering method and device, computer readable medium and electronic equipment

Info

Publication number: CN111667018B
Application number: CN202010554265.1A
Authority: CN
Inventors: 王敏; 孔魏建; 许冲
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2023-12-15
Anticipated expiration: 2040-06-17
Also published as: CN111667018A

Abstract

The embodiment of the application provides a method, a device, a computer readable medium and electronic equipment for object clustering. The object clustering method comprises the following steps: obtaining interest labels of all objects, generating label sequences corresponding to all the objects based on the interest labels, clustering all the objects according to the label sequences corresponding to all the objects to obtain object groups corresponding to the same clustering label, merging the objects in the object groups into the object groups associated with the clustering labels under the condition that the number of the objects in the object groups is smaller than the number of the objects threshold to obtain target groups with the number of the objects greater than or equal to the number of the objects threshold. According to the technical scheme, the target groups have the clustering labels capable of accurately representing the object preference information, and all the target groups have balanced scales, so that the clustering groups are correspondingly processed aiming at the clustering labels of the clustering groups, the accuracy and the balance of object clustering are improved, and the accuracy of processing the clustering groups is further improved.

Description

Object clustering method and device, computer readable medium and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and apparatus for object clustering, a computer readable medium, and an electronic device.

Background

When clustering objects, the objects are generally classified into different categories according to various characteristics of the objects, so as to perform corresponding processing for the different categories. However, the clustering method in the related art cannot perform clustering on the targets with more forms, and especially under the conditions that the targets are more in information and wide in types and no association relation exists among the targets, the finally obtained clustering result cannot accurately represent the objects with the types, the clustering result is uneven, and further the operation after the clustering result is affected.

Disclosure of Invention

The embodiment of the application provides an object clustering method, an object clustering device, a computer-readable medium and electronic equipment, which can cluster to obtain clusters of clustering labels capable of accurately representing object preference at least to a certain extent, and objects in each cluster have balanced scale sizes so as to correspondingly process the clustering groups aiming at the clustering labels of the clustering groups, thereby improving the accuracy and the balance of object clustering and further improving the accuracy of processing the clustering groups.

Other features and advantages of the application will be apparent from the following detailed description, or may be learned by the practice of the application.

According to one aspect of the embodiment of the application, a method for clustering objects is provided, which comprises the steps of obtaining interest labels of objects; generating a tag sequence corresponding to each object based on the interest tag; clustering the objects according to the label sequences corresponding to the objects to obtain an object group corresponding to the same clustering label; if the number of the objects in the object group is smaller than the threshold value of the number of the objects, merging the objects in the object group into the object group associated with the cluster label to obtain a target group with the number of the objects being greater than or equal to the threshold value of the number of the objects.

According to an aspect of an embodiment of the present application, there is provided an apparatus for object clustering, including: the acquisition unit is used for acquiring interest labels of all objects; the generating unit is used for generating a label sequence corresponding to each object based on the interest labels; the clustering unit is used for clustering the objects according to the label sequences corresponding to the objects to obtain an object group corresponding to the same clustering label; and the merging unit is used for merging the objects in the object group into the object group associated with the cluster label if the number of the objects in the object group is smaller than the threshold value of the number of the objects, so as to obtain a target group with the number of the objects being greater than or equal to the threshold value of the number of the objects.

In some embodiments of the present application, based on the foregoing scheme, the merging unit includes: the first identification unit is used for identifying the cluster labels of the object group as a label sequence to be processed if the number of the objects in the object group is smaller than the threshold value of the number of the objects; the first extraction unit is used for extracting a sub-tag sequence from the tag sequence to be processed; and the first merging unit is used for merging the objects in the object group into the object group associated with the sub-tag sequence, and repeating the steps until a target group with the number of the objects being greater than or equal to the threshold value of the number of the objects is obtained.

In some embodiments of the present application, based on the foregoing scheme, the tag sequence to be processed includes at least two tag features and weights corresponding to the tag features; the first extraction unit includes: the second extraction unit is used for extracting a first preset number of feature tags from the tag sequence to be processed based on the weights corresponding to the tag features in the tag sequence to be processed; and the sequence composing unit is used for composing the first preset number of characteristic labels into the sub-label sequence.

In some embodiments of the present application, based on the foregoing scheme, the tag sequence to be processed is composed of the tag features in a descending order according to the weights corresponding to the tag features; the second extraction unit includes: the third extraction unit is used for extracting the first n-a characteristic labels from n label characteristics in the label sequence to be processed, wherein n is more than or equal to 2; a is more than or equal to 1 and less than n.

In some embodiments of the present application, based on the foregoing scheme, the first merging unit includes: the second identification unit is used for identifying target cluster labels similar to the sub-label sequences from the cluster label set; and the second merging unit is used for merging the object group corresponding to the label sequence to be processed into the object group corresponding to the target cluster label.

In some embodiments of the application, based on the foregoing, the sub-tag sequence includes at least two tag features; the first merging unit includes: the first matching unit is used for carrying out similarity matching on the tag characteristics in each sub-tag sequence to obtain mutually matched sub-tag sequences; and the third merging unit is used for merging the object groups corresponding to the mutually matched sub-tag sequences.

In some embodiments of the application, based on the foregoing scheme, the generating unit includes: a fourth extracting unit, configured to extract a second preset number of interest features from the interest tags of the object; the arrangement unit is used for arranging the second preset number of interest features corresponding to the object to obtain a tag sequence corresponding to the object.

In some embodiments of the present application, based on the foregoing scheme, the interest feature includes an interest point representing an individual interest type, or an interest vector representing an object interest migration condition; the fourth extraction unit includes: a portrait unit for generating an interest portrait of the object based on the interest tag of the object; and a fifth extraction unit, configured to extract the second preset number of interest points or the second preset number of interest vectors from the interest portrait.

In some embodiments of the application, based on the foregoing scheme, the clustering unit includes: a third identifying unit, configured to identify, based on the tag sequence, a target cluster tag contained by the tag sequence from a cluster tag set; and the first adding unit is used for adding the object to the object group corresponding to the target cluster label.

In some embodiments of the present application, based on the foregoing solution, the apparatus for clustering objects further includes: a creating unit, configured to create a new cluster tag corresponding to the tag sequence if the cluster tag set does not include the cluster tag included in the tag sequence; and the second adding unit is used for adding the object to the object group corresponding to the newly-built cluster tag.

In some embodiments of the present application, based on the foregoing solution, the apparatus for clustering objects further includes: the content acquisition unit is used for acquiring the content preferred by the target group according to the clustering label corresponding to the target group; and the content pushing unit is used for pushing the content preferred by the target group to the terminal corresponding to the target group.

In some embodiments of the present application, based on the foregoing solution, the cluster label includes at least two label features and weights corresponding to the label features; the content pushing unit includes: a first determining unit, configured to determine, based on each of the tag features in the cluster tags, content corresponding to each of the tag features; the second determining unit is used for determining the pushing sequence of the content corresponding to each tag feature based on the weight corresponding to each tag feature; and the sorting pushing unit is used for pushing the content preferred by the target group to the terminal corresponding to the target group according to the pushing sequence of the content corresponding to each tag characteristic.

According to an aspect of an embodiment of the present application, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method of object clustering as described in the above embodiments.

According to an aspect of an embodiment of the present application, there is provided an electronic apparatus including: one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of object clustering as described in the above embodiments.

In the technical scheme provided by some embodiments of the present application, interest tags of objects are obtained, tag sequences corresponding to the objects are generated based on the interest tags, so that the objects are clustered according to the tag sequences corresponding to the objects to obtain object groups corresponding to the same cluster tag, and under the condition that the number of the objects in the object groups is smaller than the number of the objects threshold, the objects in the object groups are combined into the object groups associated with the cluster tags to obtain target groups with the number of the objects greater than or equal to the number of the objects threshold, so that the finally obtained target groups have cluster tags capable of accurately representing preference information of the target groups, and the objects in the target groups have balanced scale sizes, so that the cluster groups are correspondingly processed according to the cluster tags of the cluster groups, the accuracy and the balance of the object clusters are improved, and the accuracy of processing the cluster groups is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is evident that the drawings in the following description are only some embodiments of the present application and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:

FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the application may be applied;

FIG. 2 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the application may be applied;

FIG. 3 schematically illustrates a flow chart of a method of object clustering in accordance with one embodiment of the application;

FIG. 4 schematically illustrates a schematic diagram of acquiring interest tags for an object according to one embodiment of the application;

FIG. 5 schematically illustrates a diagram of generating a tag sequence corresponding to each object, according to one embodiment of the application;

FIG. 6 schematically illustrates a schematic diagram of determining target cluster labels according to one embodiment of the application;

FIG. 7 schematically illustrates a schematic diagram of an object group according to one embodiment of the application;

FIG. 8 schematically illustrates a schematic diagram of generating a target population according to one embodiment of the application;

FIG. 9 schematically illustrates a rollback schematic of a tag sequence according to one embodiment of the application;

FIG. 10 schematically illustrates a schematic diagram of object cluster merging, according to one embodiment of the application;

FIG. 11 schematically illustrates a schematic diagram of object cluster merging, according to one embodiment of the application;

FIG. 12 schematically illustrates a schematic diagram based on clustering users according to one embodiment of the application;

FIG. 13 schematically illustrates a diagram of mass news recall, according to one embodiment of the application;

FIG. 14 schematically illustrates a diagram of community-based mass news recall, in accordance with one embodiment of the present application;

FIG. 15 shows a block diagram of an apparatus for object clustering in accordance with one embodiment of the application;

fig. 16 shows a schematic diagram of a computer system suitable for use in implementing an embodiment of the application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the application may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the present application may be applied.

As shown in fig. 1, the system architecture may include a terminal device (such as one or more of the smartphone 101, tablet 102, and portable computer 103 shown in fig. 1, but of course, a desktop computer, etc.), a network 104, and a server 105. The network 104 is the medium used to provide communication links between the terminal devices and the server 105. The network 104 may include various connection types, such as wired communication links, wireless communication links, and the like.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.

A user may interact with the server 105 via the network 104 using a terminal device to receive or send messages or the like. The server 105 may be a server providing various services. For example, the user uploads the interest tag of the object to the server 105 by using the terminal device 103 (may also be the terminal device 101 or 102), the server 105 obtains the interest tag of each object, generates a tag sequence corresponding to each object based on the interest tag, clusters each object according to the tag sequence corresponding to each object to obtain an object group corresponding to the same cluster tag, and merges the objects in the object group into the object group associated with the cluster tag when the number of objects in the object group is smaller than the threshold of the number of objects to obtain a target group with the number of objects greater than or equal to the threshold of the number of objects, so that the finally obtained target group has the cluster tag capable of accurately representing the preference information of the target group, and the objects in each target group have balanced scale sizes, so that the cluster group is correspondingly processed according to the cluster tags of the cluster groups, thereby improving the accuracy and the balance of the object clustering, and further improving the accuracy of processing the cluster group.

It should be noted that, the method for object clustering provided in the embodiment of the present application is generally executed by the server 105, and accordingly, the device for object clustering is generally disposed in the server 105. However, in other embodiments of the present application, the terminal device may also have a similar function as the server, so as to execute the object clustering scheme provided by the embodiments of the present application.

Fig. 2 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the present application may be applied.

As shown in fig. 2, the system architecture may include a user group consisting of at least two terminal devices (e.g., one or more of the smart phone 201, tablet 202, and portable computer 203 shown in fig. 2, but of course, a desktop computer, etc.), a network 204, and a corresponding computer device 205 of the management node. The network 204 is the medium used to provide communications links between terminal devices and the computer devices 205. The network 204 may include various connection types, such as wired communication links, wireless communication links, and the like.

As shown in fig. 2, in one embodiment of the present application, the method of object clustering is applied to a scene with a large number of objects, such as the object group 206 in fig. 2. Acquiring interest tags of all objects through computer equipment corresponding to management nodes, generating tag sequences corresponding to all objects based on the interest tags, clustering all objects according to the tag sequences corresponding to all objects to obtain object groups corresponding to the same cluster tags, merging the objects in the object groups into the object groups associated with the cluster tags under the condition that the number of the objects in the object groups is smaller than the threshold value of the number of the objects to obtain target groups with the number of the objects being greater than or equal to the threshold value of the number of the objects, so that the finally obtained target groups have the cluster tags capable of accurately representing preference information of the target groups, and the objects in all the target groups have balanced scale sizes.

The implementation details of the technical scheme of the embodiment of the application are described in detail below:

fig. 3 shows a flow chart of a method of object clustering, which may be performed by a server, which may be the server shown in fig. 1, according to an embodiment of the present application. Referring to fig. 3, the method for clustering objects at least includes steps S310 to S340, which are described in detail as follows:

in step S310, interest tags of the respective objects are acquired.

In one embodiment of the present application, the manner of acquiring the interest tag of each object may be by collecting application data of the user terminal. In this embodiment, the object may include a user represented by a person, or may include a robot, a virtual device, or the like. The interest tags of the objects are used to represent preference information of the objects, such as points of interest, and the like.

Fig. 4 is a schematic diagram of an interest tag of an obtained object according to an embodiment of the present application.

As shown in fig. 4, in one embodiment of the present application, when the object is a user, after the user registers an account of an application, the application needs to determine a type corresponding to the user according to the preference of the user. Specifically, at least two interest tags 420 are displayed in the interface of the terminal device 410, such as sports, military, entertainment, lovely pet, travel, finance, science and technology, and the like. By presenting these interest tags 420 on a user interface, the user is instructed to select their own preferred interest tags based on these interest tags 420.

In addition, the viewing condition or the application condition of the content in the application program can be detected in the process of using the application program by the user. For example, the user clicks, links in some application programs are triggered, or the stay time of the user terminal in a certain page in the process of running the application programs is monitored, so that the content of interest of the user object is determined. And extracting interest labels of the objects based on the types, the forms and the like of the contents.

In one embodiment of the application, the user representation may also be generated based on user preference information, which may include tags, topics, media numbers, and the like. And extracting the interest labels with the preset number of labels as objects from the user portrait.

In addition, if the object is a robot or a Virtual device, for example, a Virtual Reality (VR) device, the corresponding tag information may be determined based on the configuration information in the VR device and the corresponding user audience.

In step S320, a tag sequence corresponding to each object is generated based on the interest tags.

In one embodiment of the present application, after the interest tags of the objects are acquired, considering that the preference degree of the objects based on each interest tag is different, in this embodiment, a tag sequence corresponding to each object is generated based on the preference degree corresponding to each interest tag. The tag sequence comprises interest tags or tag features carrying weight information.

In one embodiment of the present application, as shown in fig. 5, the process of generating a tag sequence corresponding to each object based on the interest tag in step S320 includes the following steps S510 to S520, which are described in detail below:

in step S510, a second preset number of interest features are extracted from the interest tags of the object.

In one embodiment of the present application, in order to facilitate matching of interest tags, a second preset number is set for controlling the number of interest tags corresponding to one object, so as to extract a second preset number of interest features from the interest tags of the object based on the second preset number. The interest labels all have the same interest characteristics, so that cluster management is facilitated.

For example, if the interest tag of the object includes sports, military, entertainment, lovely pet, travel, finance and accounting, and science and technology, the set second preset number is 3, 3 are extracted from the interest tag as interest features, that is, sports, military, entertainment.

Further, the interest tags in this embodiment have weights corresponding to the interest tags, so as to represent the preference procedure of the object for each interest tag. Therefore, in this embodiment, when extracting the interest feature from the interest tag, the second preset number of interest features with larger weights may be extracted first based on the order of the weights from large to small. The preference of the object is represented more representatively through the interest feature with larger weight, so that the clustering result is more accurate.

In one embodiment of the application, the interest feature comprises an interest point representing an individual interest type, or an interest vector representing an object interest migration situation; the process of extracting the second preset number of interest features from the interest tags of the object in step S510 specifically includes the following steps:

generating an interest image of the object based on the interest tag of the object;

and extracting a second preset number of interest points or a second preset number of interest vectors from the interest portrait.

In one embodiment of the application, an interest image of an object can be generated based on the interest tag to characterize the preference condition of the object through the interest image, and each object interest image can be stored and the like in a targeted manner. After the representation of interest is generated, a second predetermined number of points of interest or interest vectors are extracted from the representation of interest.

In one embodiment of the application, the interest points are used to represent the characteristics of individual interest types or interest tags, and the interest vectors are used to represent the interest characteristic transition condition when the interests of the object are changed. In the embodiment, the preference condition of the object is represented based on the interest points and the interest vectors, so that quantitative calculation and management of preference data are realized.

In step S520, a second preset number of interest features corresponding to the object are arranged, so as to obtain a tag sequence corresponding to the object.

In one embodiment of the present application, after obtaining the interesting features, the second preset number of interesting features are arranged based on the weights corresponding to the interesting features, so as to obtain the tag sequence of the object.

The tag sequence in this embodiment includes information of interest features of the object, and further implies weights corresponding to the interest features based on the arrangement sequence of the interest features, so that the object can be clustered based on the interest features and the weights thereof.

In step S330, each object is clustered according to the tag sequence corresponding to each object, so as to obtain an object group corresponding to the same clustered tag.

In one embodiment of the present application, after generating the tag sequences corresponding to the respective objects, the respective objects are clustered based on the tag sequences, so as to aggregate the objects corresponding to the similar tag sequences into one object group, and obtain the object group corresponding to the same clustered tag.

It should be noted that, because the number of objects is numerous, after many tag sequences of different types are obtained, the tag sequences in this embodiment may be identical to the cluster tags, or may include the cluster tags, so as to ensure that the number of each object group is balanced, and the objects in each object group have uniform cluster tags.

In one embodiment of the present application, the process of clustering each object according to the tag sequence corresponding to each object in step S330 to obtain the object group corresponding to the same cluster tag specifically includes the following steps:

identifying target cluster labels contained by the label sequences from the cluster label set based on the label sequences;

and adding the object to the object group corresponding to the target cluster label.

In one embodiment of the present application, the cluster tag set is a set of predetermined cluster tags, and after determining the tag sequence, the target cluster tag contained by the tag sequence is searched for from the cluster tag set, so as to add the object to the object group corresponding to the target cluster tag.

Fig. 6 is a schematic diagram of determining a target cluster label according to an embodiment of the present application.

As shown in fig. 6, the interest labels of 3 users are a-b-c-h (610), a-b-d (620) and a-b-f (630), and the corresponding cluster label (640) is found in the cluster label set to be a-b, then the 3 users are added to the object group with the cluster label of a-b, and the number of objects in the object group is counted.

It should be noted that, in the object tag in this embodiment, the weight of the first tag feature is the largest, and in this embodiment, when the object cluster tag is identified, the first tag feature is taken as an initial reference, and the first feature in the object tag and the first feature of the cluster tag are compared in sequence to determine the object cluster tag, so as to ensure the accuracy of the object cluster tag obtained by searching.

In an embodiment of the present application, the object clustering method in this embodiment further includes: if the clustering label set does not contain the clustering label contained by the label sequence, creating a new clustering label corresponding to the label sequence; and adding the object to the object group corresponding to the newly built cluster tag.

In one embodiment of the present application, if no cluster tag contained by the tag sequence exists in the original cluster tag set, a new cluster tag of the tag sequence object is created, so that after the new cluster tag is obtained, the object is added to the object group of the new cluster tag. And if the tag sequence containing the new cluster tag exists later, adding the object corresponding to the tag sequence to the object group corresponding to the new cluster tag.

Specifically, when creating a new cluster label corresponding to the label sequence, the label sequence can be directly used as the new cluster label, or the label characteristics of a preset number can be extracted from the label sequence, and the new cluster label is obtained by sequencing and combining according to the weight corresponding to each label characteristic.

For example, when clustering starts, judging whether cluster labels C1-C2 … Ck to which a label sequence of an object belongs exist or not, if so, adding the class, and adding 1 to the membership of the class; otherwise, a new cluster label C1-C2 … Ck is created, and the initial membership of the class is initialized to 1. After all users traverse and finish one pass, counting the number Cnum of class members under each class label, if the number Cnum of the members under the class Cm-Cn … Ck label is larger than a threshold value N, the clustering center Cm-Cn … Ck is reserved, and all objects under the clustering label stop clustering. For all users under the cluster labels Cm-Cn … Ck, their common feature is that articles with categories Cm, cn, ck are of interest, and the degree of interest decreases in sequence by Cm, cn, ck.

In step S340, if the number of objects in the object group is smaller than the threshold number of objects, the objects in the object group are merged into the object group associated with the cluster tag, so as to obtain a target group with the number of objects greater than or equal to the threshold number of objects.

Fig. 7 is a schematic diagram of an object group according to an embodiment of the present application.

In one embodiment of the application, as shown in fig. 7, in the case of a large number of objects, different objects correspond to different tag features, and as the preferences of the objects are more and more dispersed, the resulting tag features are more irregular. Thus, the number of objects in the object group obtained based on the tag feature may be increased, for example, the object group 710 in fig. 7, or an extreme case where the number of objects is very small, for example, the object group 720 in fig. 7. In addition, when the number of objects is small, performing specialized processing based on the object population 720 with the small number of objects consumes more resources with little effect. Therefore, in this embodiment, merging processing is performed on the object groups with a smaller number of objects to obtain the object groups with a larger number of objects, so as to balance the number of objects in each object group, improve the stability of the object groups, and further improve the benefit of group processing.

For example, if the number Cnum of the members under the Cm-Cn … Ck label is smaller than the threshold value N after the clustering, the clustering center is invalid, and the User belonging to the clustering label must first perform regular rollback, and then re-clustering the label according to the rollback.

In one embodiment of the present application, if the number of objects in the object group is less than the threshold number of objects in step S340, the process of merging the objects in the object group into the object group associated with the cluster tag to obtain the target group with the number of objects greater than or equal to the threshold number of objects includes the following steps S810 to S830, which are described in detail below:

in step S810, if the number of objects in the object group is smaller than the threshold number of objects, the cluster labels of the object group are identified as a label sequence to be processed.

In one embodiment of the present application, after all the objects are classified to obtain object groups, the number of objects in each object group is detected. If the number of objects in the object group is smaller than the threshold value of the number of objects, it indicates that the object group is smaller, which is not consistent with the purpose of the present embodiment. Therefore, the cluster labels of the object groups are identified as a label sequence to be processed, so that the object groups are combined with other object groups to obtain a larger object group.

In this embodiment, the purpose of identifying the cluster tag corresponding to the smaller number of object groups as the tag sequence to be processed is to perform rule rollback on the tag sequence to be processed, so as to obtain a tag with a larger inclusion range.

In step S820, a sub-tag sequence is extracted from the tag sequence to be processed.

In one embodiment of the application, the tag sequence to be processed includes at least two tag features and weights corresponding to the tag features; the process of extracting the sub-tag sequence from the tag sequence to be processed in step S820 specifically includes the following steps:

extracting a first preset number of feature tags from the tag sequence to be processed based on weights corresponding to tag features in the tag sequence to be processed;

and forming a sub-tag sequence by the first preset number of characteristic tags.

In one embodiment of the application, the tag sequence to be processed comprises at least two tag features, each tag feature has its corresponding weight, and each tag feature is ordered according to its weight. And extracting a first preset number of feature labels from the label sequence to be processed according to the weight of each label feature object in the label sequence to be processed. To combine these feature tags into a sub-tag sequence.

Illustratively, a new tag sequence C1-C2 … C (K-1) is formed by extracting the (K-1) tag features of greatest interest to the user from the tag features C1, C2,..C (K-1) -Ck, in order of tag feature weights from high to low.

In one embodiment of the application, the tag sequence to be processed consists of tag features ordered from big to small according to the weights corresponding to the tag features; the process of extracting the first preset number of feature tags from the tag sequence to be processed based on the weights corresponding to the tag features in the tag sequence to be processed in the above steps specifically includes the following steps: extracting the first n-a feature tags from n tag features in a tag sequence to be processed, wherein n is more than or equal to 2; a is more than or equal to 1 and less than n.

Fig. 9 is a schematic diagram of a rollback of a tag sequence according to an embodiment of the present application.

As shown in fig. 9, in generating the sub-tag sequence, a regular backoff is performed based on each tag feature a-b-c-h in the tag sequence to be processed 910. The rollback rule is as follows, extracting the first n-a feature tags from n tag features in the tag sequence to be processed. Wherein n is greater than or equal to 2, a is less than n and greater than or equal to 1. Illustratively, if a is 1, the tag sequence to be processed 910 is rolled back one step to obtain the sub-tag sequence 920, i.e., a-b-c; if a is 2, the tag sequence to be processed 910 is rolled back by 2 steps, so as to obtain a sub-tag sequence 930, i.e. a-b.

Similarly, if the tag of interest before rollback is C1-C2 … C (k-1) -Ck, then the tag of interest after one step of rollback is C1-C2 … C (k-1), and the tag of interest after x steps of rollback is C1-C2 … C (k-x), where x >0; k > x and (k-x) > =1.

In step S830, the objects in the object group are merged into the object group associated with the sub-tag sequence, and the above steps are repeated until a target group with the number of objects greater than or equal to the threshold of the number of objects is obtained.

In one embodiment of the application, after the sub-tag sequence is obtained, determining an object group associated with the sub-tag sequence, and merging objects in the object group into the associated object group until the number of the obtained objects is greater than or equal to a target group of the object number threshold. The iteration is completed for a plurality of times, most users can be gathered into a class with more members, and in the extreme case, a few users are independently classified.

In one embodiment of the present application, the process of merging the objects in the object group into the object group associated with the sub-tag sequence in step S830 includes the steps of:

identifying target cluster labels similar to the sub-label sequences from the cluster label set;

and merging the object group corresponding to the label sequence to be processed into the object group corresponding to the target clustering label.

In one embodiment of the application, after generating the sub-tag sequence, target cluster tags that are similar to the sub-tag sequence are identified from the set of cluster tags to merge these fewer object groups into the object group of the target cluster tags. The case similar to the sub-tag sequence includes the case of the same as the sub-tag sequence, including the sub-tag sequence, and the like.

Fig. 10 is a schematic diagram of object merging according to an embodiment of the present application.

As shown in fig. 10, if the tag sequence 1010 to be processed is a-b-c-h and the obtained sub tag sequence 1020 is a-b-c, searching the cluster tag set for the target cluster tag 1030 identical to a-b-c; if the clustering label which is the same as the clustering label of the a-b-c exists, merging the object group 1040 corresponding to the label sequence to be processed into the object group 1050 with the clustering label of the a-b-c to obtain a target group 1060; if the cluster label which is the same as the cluster label of the a-b-c does not exist, the cluster label of the a-b-c is newly built, and then the object is placed into the object group corresponding to the cluster label.

In one embodiment of the application, the sub-tag sequence includes at least two tag features; in step S340, the process of merging the objects in the object group into the object group associated with the cluster tag to obtain the target group with the number of objects greater than or equal to the threshold of the number of objects specifically includes the following steps:

Merging objects in the object group into the object group associated with the sub-tag sequence, comprising:

performing similarity matching on the tag characteristics in each sub-tag sequence to obtain sub-tag sequences matched with each other;

and merging object groups corresponding to the mutually matched sub-tag sequences.

In an embodiment of the present application, for a sub-tag sequence obtained by a tag sequence to be processed, there may be a plurality of sub-tag sequences in this embodiment, in order to merge a smaller number of object groups, each matching object group may be further merged for each similar sub-tag sequence, so as to merge a smaller object group to obtain a larger object group.

Fig. 11 is a schematic diagram of object merging according to an embodiment of the present application.

As shown in FIG. 11, if the tag sequence 1110 to be processed is a-b-c-h, the resulting sub-tag sequence 1120 is a-b-c; the tag sequence 1130 to be processed is a-b-c-g, and the resulting sub-tag sequence 1140 is also a-b-c; then the object groups 1150 and 1160 corresponding to the sub-tag sequences that match each other are merged to obtain the target group 1170.

Fig. 12 is a schematic diagram of clustering users according to an embodiment of the present application.

As shown in fig. 12, in step S1210, interest information of a user is acquired; in step S1220, the interest information is processed by the tag rule to obtain a user interest tag 1230; in step S1240, it is detected whether a similar tag exists; if the similar label exists, adding the similar label to the class corresponding to the similar label in step S1260; if the similar label does not exist, in step S1250, a new cluster label is generated and added to the class corresponding to the cluster label; in step S1270, the number of users under each cluster label is counted; in step S1280, it is detected whether the number of users is greater than a threshold; if the number is smaller than the threshold, in step S1211, performing rule rollback on the clustering labels to obtain new labels, and then clustering; if the number is greater than the threshold, in step S1210, the class is reserved, and the final target user group is obtained. Through the method, the finally obtained target groups have the clustering labels capable of accurately representing the preference information of the target groups, and the objects in each target group have balanced scale.

In one embodiment of the present application, if the number of objects in the object group is smaller than the threshold number of objects, the objects in the object group are merged into the object group associated with the clustering label, and after the target group with the number of objects greater than or equal to the threshold number of objects is obtained, the object clustering method in this embodiment further includes: acquiring the content of the preference of the target group according to the cluster label corresponding to the target group; pushing the content preferred by the target group to the terminal corresponding to the target group.

In one embodiment of the application, after each target group is obtained, the content preferred by the target group is recalled according to the clustering label of each target group object, so that after the content preferred by the target group is obtained, the content preferred by the target group is pushed to the terminal corresponding to the target group.

Fig. 13 is a schematic diagram of massive news recall according to an embodiment of the present application.

As shown in fig. 13, in the application scenario of personalized news recommendation, the personalized news recommendation is generally divided into two phases of recall and ordering, and the two phases perform their own tasks, and respectively complete different tasks. In recall 1320, based on the massive user portraits 1340, the massive news 1310 is filtered for important content, resulting in content of the target group preference. The content is then prioritized 1330 to push the content preferred by the target group to the terminal to which the target group corresponds.

In one embodiment of the application, the cluster tag comprises at least two tag features and weights corresponding to the tag features; in the above step, the process of pushing the content preferred by the target group to the terminal corresponding to the target group specifically includes: determining the content corresponding to each label feature based on each label feature in the clustered labels; based on the weight corresponding to each tag feature, determining the pushing sequence of the content corresponding to each tag feature according to the sequence from high to low of the weight; and pushing the content preferred by the target group to the terminal corresponding to the target group according to the pushing sequence of the content corresponding to each tag characteristic. By the method, the user side can browse the corresponding content in sequence according to the preference degree of the user side, and the content acceptance rate and the content conversion rate are improved.

FIG. 14 is a schematic diagram of a community-based mass news recall provided by an embodiment of the present application.

As shown in fig. 14, in the application scenario of personalized news recommendation, in personalized news recommendation recall, news that best meets the interest preference of each user needs to be recalled, but in the scenario of large user quantity, the consumption of computing resources and storage resources is large, so that it is not practical to completely realize 1-to-1 recall. In this embodiment, the users are clustered 1420 through the massive user portraits 1410 by grouping recall to obtain a plurality of object groups, so as to perform group processing based on each object group, and convert 1 to 1 into n to 1, thereby reducing the computational complexity.

Specifically, 1-to-1 is converted into n-to-1 through clustering, that is, users with the same or similar interests are gathered into one class, and then the same news is recommended for the one class of users. The present embodiment groups users into classes or groups according to their interest preferences. And then merging the classes with the class smaller than a certain threshold value into the class most similar to the class according to the class size, and completing the clustering process after multiple iterations. Generating a group image 1430 from the resulting group to recall 1440 the group image based on the group image 1430; or obtain group click history 1450, and collaborative recall 1460 of news. The unified processing of news data based on the user group is realized, and the news pushing efficiency is improved.

The following describes an embodiment of the apparatus of the present application, which may be used to perform the method of object clustering in the above embodiment of the present application. It will be appreciated that the apparatus may be a computer program (including program code) running in a computer device, for example the apparatus being an application software; the device can be used for executing corresponding steps in the method provided by the embodiment of the application. For details not disclosed in the embodiment of the apparatus of the present application, please refer to the embodiment of the method for object clustering described above.

FIG. 15 shows a block diagram of an apparatus for object clustering in accordance with one embodiment of the application.

Referring to fig. 15, an apparatus 1500 of object clustering according to an embodiment of the present application includes: an obtaining unit 1510, configured to obtain interest tags of objects; a generating unit 1520, configured to generate a tag sequence corresponding to each object based on the interest tags; a clustering unit 1530, configured to cluster each object according to a tag sequence corresponding to each object, to obtain an object group corresponding to the same cluster tag; the merging unit 1540 is configured to merge the objects in the object group into the object group associated with the cluster tag if the number of objects in the object group is less than the threshold number of objects, so as to obtain a target group with the number of objects greater than or equal to the threshold number of objects.

In some embodiments of the present application, based on the foregoing scheme, the merging unit 1540 includes: the first identification unit is used for identifying the cluster labels of the object group as a label sequence to be processed if the number of the objects in the object group is smaller than the threshold value of the number of the objects; the first extraction unit is used for extracting a sub-tag sequence from the tag sequence to be processed; the first merging unit is used for merging the objects in the object group into the object group associated with the sub-tag sequence, and repeating the steps until a target group with the number of the objects being greater than or equal to the threshold value of the number of the objects is obtained.

In some embodiments of the present application, based on the foregoing scheme, the tag sequence to be processed includes at least two tag features and weights corresponding to the tag features; the first extraction unit includes: the second extraction unit is used for extracting a first preset number of feature tags from the tag sequence to be processed based on weights corresponding to the tag features in the tag sequence to be processed; the sequence composing unit is used for composing the first preset number of characteristic labels into a sub-label sequence.

In some embodiments of the present application, based on the foregoing scheme, the tag sequence to be processed is composed of tag features ordered from big to small according to weights corresponding to the tag features; the second extraction unit includes: the third extraction unit is used for extracting the first n-a characteristic labels from n label characteristics in the label sequence to be processed, wherein n is more than or equal to 2; a is more than or equal to 1 and less than n.

In some embodiments of the present application, based on the foregoing scheme, the generating unit 1520 includes: a fourth extracting unit, configured to extract a second preset number of interest features from the interest tags of the object; the arrangement unit is used for arranging the second preset number of interest features corresponding to the objects to obtain a tag sequence corresponding to the objects.

In some embodiments of the application, based on the foregoing, the interest feature includes interest points representing individual interest types, or interest vectors representing object interest migration conditions; the fourth extraction unit includes: the portrait unit is used for generating an interest portrait of the object based on the interest tag of the object; and a fifth extraction unit, configured to extract a second preset number of interest points or a second preset number of interest vectors from the interest image.

In some embodiments of the present application, based on the foregoing scheme, the clustering unit 1530 includes: a third identifying unit, configured to identify, based on the tag sequence, a target cluster tag contained in the tag sequence from the cluster tag set; and the first adding unit is used for adding the object to the object group corresponding to the target cluster label.

In some embodiments of the present application, based on the foregoing solution, the apparatus 1500 for object clustering further includes: the creating unit is used for creating a new cluster tag corresponding to the tag sequence if the cluster tag contained by the tag sequence does not exist in the cluster tag set; and the second adding unit is used for adding the object to the object group corresponding to the newly-built cluster tag.

In some embodiments of the present application, based on the foregoing solution, the apparatus 1500 for object clustering further includes: the content acquisition unit is used for acquiring the content preferred by the target group according to the clustering label corresponding to the target group; and the content pushing unit is used for pushing the content preferred by the target group to the terminal corresponding to the target group.

In some embodiments of the present application, based on the foregoing scheme, the cluster labels include at least two label features and weights corresponding to the label features; the content pushing unit includes: the first determining unit is used for determining contents corresponding to each label characteristic based on each label characteristic in the clustering labels; the second determining unit is used for determining the pushing sequence of the content corresponding to each tag feature based on the weight corresponding to each tag feature; the ordering pushing unit is used for pushing the content preferred by the target group to the terminal corresponding to the target group according to the pushing sequence of the content corresponding to each tag characteristic.

It should be noted that, the computer system 1600 of the electronic device shown in fig. 16 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 16, the computer system 1600 includes a central processing unit (Central Processing Unit, CPU) 1601 that can perform various appropriate actions and processes, such as performing the method in the above-described embodiment, according to a program stored in a Read-Only Memory (ROM) 1602 or a program loaded from a storage section 1608 into a random access Memory (RandomAccess Memory, RAM) 1603. In the RAM 1603, various programs and data required for system operation are also stored. The CPU 1601, ROM 1602, and RAM 1603 are connected to each other by a bus 1604. An Input/Output (I/O) interface 1605 is also connected to bus 1604.

The following components are connected to the I/O interface 1605: an input portion 1606 including a keyboard, a mouse, and the like; an output portion 1607 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like, a speaker, and the like; a storage section 1608 including a hard disk or the like; and a communication section 1609 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 1609 performs communication processing via a network such as the internet. The drive 1610 is also connected to the I/O interface 1605 as needed. A removable medium 1611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 1610 so that a computer program read out therefrom is installed into the storage section 1608 as needed.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1809, and/or installed from the removable medium 1811. When executed by a Central Processing Unit (CPU) 1801, performs various functions defined in the system of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the methods described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of clustering objects, comprising:

obtaining interest labels of all objects;

generating a tag sequence corresponding to each object based on the interest tag;

clustering the objects according to the tag sequences corresponding to the objects to obtain an object group corresponding to the same clustering tag, wherein the tag sequences are the same as the clustering tag or the tag sequences contain the clustering tag;

if the number of the objects in the object group is smaller than the threshold value of the number of the objects, identifying the clustering label of the object group as a label sequence to be processed, wherein the label sequence to be processed consists of label characteristics in a sequence from big to small according to the weight corresponding to the label characteristics;

Extracting the first n-a characteristic labels from n label characteristics in the label sequence to be processed as a first preset number of characteristic labels, wherein n is more than or equal to 2; a is more than or equal to 1 and less than n;

forming a sub-tag sequence from the first preset number of feature tags;

merging objects in the object group into an object group associated with the sub-tag sequence;

repeating the steps until a target population with the number of objects being greater than or equal to the threshold value of the number of objects is obtained.

2. The method of claim 1, wherein merging objects in the object group into the object group associated with the sub-tag sequence comprises:

identifying target cluster labels similar to the sub-label sequences from a cluster label set;

3. The method of claim 1, wherein the sub-tag sequence comprises at least two tag features;

merging objects in the object group into an object group associated with the sub-tag sequence, including:

performing similarity matching on the tag characteristics in each sub-tag sequence to obtain mutually matched sub-tag sequences;

And merging the object groups corresponding to the mutually matched sub-tag sequences.

4. The method of claim 1, wherein generating a tag sequence corresponding to each object based on the interest tags comprises:

extracting a second preset number of interest features from the interest tags of the object;

and arranging the second preset number of interest features corresponding to the object to obtain a tag sequence corresponding to the object.

5. The method of claim 4, wherein the interest feature comprises an interest point representing an individual interest type or an interest vector representing an object interest migration condition;

extracting a second preset number of interest features from the interest tags of the object, including:

generating an interest portrait of the object based on the interest tag of the object;

and extracting the second preset number of interest points or the second preset number of interest vectors from the interest portrait.

6. The method according to claim 1, wherein clustering the objects according to the tag sequences corresponding to the objects to obtain the object group corresponding to the same cluster tag comprises:

Identifying target cluster labels contained by the label sequences from a cluster label set based on the label sequences;

and adding the object to an object group corresponding to the target cluster label.

7. The method of claim 6, wherein the method further comprises:

if the clustering label set does not contain the clustering label contained by the label sequence, creating a new clustering label corresponding to the label sequence;

and adding the object to the object group corresponding to the newly built cluster tag.

8. The method of claim 1, further comprising, after obtaining the target population having the number of objects greater than or equal to the threshold number of objects:

acquiring the content of the preference of the target group according to the cluster label corresponding to the target group;

and pushing the content preferred by the target group to a terminal corresponding to the target group.

9. The method of claim 8, wherein the cluster labels comprise at least two label features and weights corresponding to the label features;

pushing the content preferred by the target group to the terminal corresponding to the target group, including:

Determining contents corresponding to each label feature based on each label feature in the cluster labels;

determining the pushing sequence of the content corresponding to each tag feature based on the weight corresponding to each tag feature;

and pushing the content preferred by the target group to the terminal corresponding to the target group according to the pushing sequence of the content corresponding to the tag characteristics.

10. An apparatus for clustering objects, comprising:

the acquisition unit is used for acquiring interest labels of all objects;

the generating unit is used for generating a label sequence corresponding to each object based on the interest labels;

the clustering unit is used for clustering the objects according to the tag sequences corresponding to the objects to obtain an object group corresponding to the same clustering tag, wherein the tag sequences are the same as the clustering tag or the tag sequences contain the clustering tag;

the merging unit is used for identifying the clustering label of the object group as a label sequence to be processed if the number of the objects in the object group is smaller than the threshold value of the number of the objects, wherein the label sequence to be processed consists of label characteristics which are ordered from big to small according to the weight corresponding to the label characteristics; extracting the first n-a characteristic labels from n label characteristics in the label sequence to be processed as a first preset number of characteristic labels, wherein n is more than or equal to 2; a is more than or equal to 1 and is less than or equal to n; forming a sub-tag sequence from the first preset number of feature tags; merging objects in the object group into an object group associated with the sub-tag sequence; repeating the steps until a target population with the number of objects being greater than or equal to the threshold value of the number of objects is obtained.

11. A computer readable medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method of object clustering according to any one of claims 1 to 9.

12. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the method of object clustering of any one of claims 1 to 9.