CN111667018A - Object clustering method and device, computer readable medium and electronic equipment - Google Patents

Object clustering method and device, computer readable medium and electronic equipment Download PDF

Info

Publication number
CN111667018A
CN111667018A CN202010554265.1A CN202010554265A CN111667018A CN 111667018 A CN111667018 A CN 111667018A CN 202010554265 A CN202010554265 A CN 202010554265A CN 111667018 A CN111667018 A CN 111667018A
Authority
CN
China
Prior art keywords
objects
label
tag
clustering
interest
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010554265.1A
Other languages
Chinese (zh)
Other versions
CN111667018B (en
Inventor
王敏
孔魏建
许冲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010554265.1A priority Critical patent/CN111667018B/en
Publication of CN111667018A publication Critical patent/CN111667018A/en
Application granted granted Critical
Publication of CN111667018B publication Critical patent/CN111667018B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services

Abstract

The embodiment of the application provides an object clustering method, an object clustering device, a computer readable medium and electronic equipment. The object clustering method comprises the following steps: obtaining the interest labels of the objects, generating label sequences corresponding to the objects based on the interest labels, clustering the objects according to the label sequences corresponding to the objects to obtain object groups corresponding to the same cluster labels, and merging the objects in the object groups into the object groups associated with the cluster labels under the condition that the number of the objects in the object groups is less than an object number threshold value to obtain target groups with the number of the objects greater than or equal to the object number threshold value. According to the technical scheme, the target groups have the clustering labels capable of accurately representing the object preference information, and each target group has a balanced scale, so that the clustering labels of the clustering groups are correspondingly processed, the accuracy and balance of object clustering are improved, and the accuracy of processing the clustering groups is further improved.

Description

Object clustering method and device, computer readable medium and electronic equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for clustering objects, a computer-readable medium, and an electronic device.
Background
When clustering an object, it is common to classify the object into different categories by various characteristics of the object, and to perform corresponding processing for each of the different categories. However, the clustering method in the related art often cannot perform clustering on objects with a large number of forms, and particularly under the conditions that the objects have a large number of information and a wide variety and no association relationship exists among the objects, the finally obtained clustering result cannot accurately represent objects of each variety, the clustering results are uneven, and further the operation after the clustering result is influenced.
Disclosure of Invention
Embodiments of the present application provide a method, an apparatus, a computer-readable medium, and an electronic device for clustering objects, so that a group of clustering labels that can accurately represent preferences of the objects can be obtained by clustering at least to a certain extent, and the objects in each group have a balanced scale size, so as to perform corresponding processing on the clustering groups according to the clustering labels of the clustering groups, thereby improving accuracy and balance of object clustering, and further improving accuracy of processing the clustering groups.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to an aspect of an embodiment of the present application, a method for clustering objects is provided, including obtaining interest tags of each object; generating a label sequence corresponding to each object based on the interest label; clustering the objects according to the label sequences corresponding to the objects to obtain object groups corresponding to the same clustering label; if the number of the objects in the object group is smaller than the threshold value of the number of the objects, merging the objects in the object group into the object group associated with the clustering label to obtain a target group with the number of the objects larger than or equal to the threshold value of the number of the objects.
According to an aspect of an embodiment of the present application, there is provided an apparatus for clustering objects, including: the acquisition unit is used for acquiring the interest tags of all the objects; a generating unit, configured to generate a tag sequence corresponding to each object based on the interest tag; the clustering unit is used for clustering the objects according to the label sequences corresponding to the objects to obtain object groups corresponding to the same clustering label; and the merging unit is used for merging the objects in the object group into the object group associated with the clustering label if the number of the objects in the object group is less than an object number threshold value, so as to obtain a target group with the number of the objects being greater than or equal to the object number threshold value.
In some embodiments of the present application, based on the foregoing solution, the merging unit includes: a first identification unit, configured to identify a cluster tag of the object group as a tag sequence to be processed if the number of objects in the object group is smaller than the threshold value of the number of objects; the first extraction unit is used for extracting a sub-tag sequence from the tag sequence to be processed; and the first merging unit is used for merging the objects in the object group into the object group associated with the sub-tag sequence, and repeating the steps until a target group with the number of the objects larger than or equal to the threshold value of the number of the objects is obtained.
In some embodiments of the present application, based on the foregoing scheme, the tag sequence to be processed includes at least two tag features and weights corresponding to the tag features; the first extraction unit includes: a second extraction unit, configured to extract a first preset number of feature tags from the to-be-processed tag sequence based on a weight corresponding to each of the tag features in the to-be-processed tag sequence; and the sequence composition unit is used for composing the first preset number of characteristic labels into the sub-label sequence.
In some embodiments of the present application, based on the foregoing scheme, the to-be-processed tag sequence is formed by sorting the tag features according to weights corresponding to the tag features from large to small; the second extraction unit includes: a third extraction unit, configured to extract first n-a feature tags from n tag features in the to-be-processed tag sequence, where n is greater than or equal to 2; a is more than or equal to 1 and less than n.
In some embodiments of the present application, based on the foregoing solution, the first merging unit includes: a second identification unit, configured to identify a target cluster label similar to the sub-label sequence from the cluster label set; and the second merging unit is used for merging the object group corresponding to the label sequence to be processed into the object group corresponding to the target clustering label.
In some embodiments of the present application, based on the foregoing scheme, the sub-tag sequence comprises at least two tag features; the first merging unit includes: the first matching unit is used for carrying out similarity matching on the tag characteristics in each sub-tag sequence to obtain mutually matched sub-tag sequences; and a third merging unit, configured to merge the object groups corresponding to the mutually matched sub-tag sequences.
In some embodiments of the present application, based on the foregoing scheme, the generating unit includes: the fourth extraction unit is used for extracting a second preset number of interest features from the interest tags of the objects; and the arrangement unit is used for arranging a second preset number of interest features corresponding to the object to obtain a tag sequence corresponding to the object.
In some embodiments of the present application, based on the foregoing scheme, the interest feature includes an interest point representing an individual interest type or an interest vector representing an interest migration condition of an object; the fourth extraction unit includes: a portrait unit for generating a portrait of interest of the object based on the interest tag of the object; a fifth extracting unit, configured to extract the second preset number of interest points or the second preset number of interest vectors from the interest image.
In some embodiments of the present application, based on the foregoing scheme, the clustering unit includes: a third identification unit, configured to identify, based on the tag sequence, a target cluster tag included in the tag sequence from a cluster tag set; and the first adding unit is used for adding the object to the object group corresponding to the target clustering label.
In some embodiments of the present application, based on the foregoing solution, the apparatus for clustering objects further includes: a creating unit, configured to create a new cluster label corresponding to the label sequence if there is no cluster label included in the cluster label set by the label sequence; and the second adding unit is used for adding the object to the object group corresponding to the newly-built clustering label.
In some embodiments of the present application, based on the foregoing solution, the apparatus for clustering objects further includes: the content acquisition unit is used for acquiring the content preferred by the target group according to the clustering label corresponding to the target group; and the content pushing unit is used for pushing the content preferred by the target group to the terminal corresponding to the target group.
In some embodiments of the present application, based on the foregoing scheme, the cluster label includes at least two label features and weights corresponding to the label features; the content push unit includes: a first determining unit, configured to determine, based on each of the tag features in the clustering tag, content corresponding to each of the tag features; a second determining unit, configured to determine, based on a weight corresponding to each tag feature, a push order of content corresponding to each tag feature; and the sequencing pushing unit is used for pushing the content preferred by the target group to the terminal corresponding to the target group according to the pushing sequence of the content corresponding to each label characteristic.
According to an aspect of embodiments of the present application, there is provided a computer-readable medium on which a computer program is stored, which, when being executed by a processor, implements the method of object clustering as described in the above embodiments.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of object clustering as described in the above embodiments.
In the technical solutions provided by some embodiments of the present application, interest tags of each object are obtained, tag sequences corresponding to each object are generated based on the interest tags, so as to cluster each object according to the label sequence corresponding to each object to obtain an object group corresponding to the same clustering label, merging the objects in the object group into the object group associated with the cluster label under the condition that the number of the objects in the object group is less than the threshold value of the number of the objects, obtaining a target group with the number of the objects more than or equal to the threshold value of the number of the objects, so that the finally obtained target groups have clustering labels capable of accurately representing preference information of the target groups, and objects in each target group have balanced scale sizes, the clustering labels aiming at the clustering groups are used for correspondingly processing the clustering groups, so that the accuracy and the balance of object clustering are improved, and the accuracy of processing the clustering groups is further improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 shows a schematic diagram of an exemplary system architecture to which aspects of embodiments of the present application may be applied;
FIG. 2 shows a schematic diagram of an exemplary system architecture to which aspects of embodiments of the present application may be applied;
FIG. 3 schematically shows a flow diagram of a method of object clustering according to an embodiment of the present application;
FIG. 4 schematically illustrates a diagram for obtaining an interest tag of an object according to an embodiment of the present application;
FIG. 5 schematically illustrates a schematic diagram of generating a tag sequence corresponding to each object according to an embodiment of the present application;
FIG. 6 schematically illustrates a schematic diagram of determining a target cluster label according to an embodiment of the present application;
FIG. 7 schematically shows a schematic diagram of a group of objects according to an embodiment of the present application;
FIG. 8 schematically shows a schematic diagram of generating a target population according to an embodiment of the present application;
FIG. 9 schematically shows a fallback diagram for a tag sequence according to an embodiment of the present application;
FIG. 10 schematically illustrates a schematic diagram of object group merging according to an embodiment of the present application;
FIG. 11 schematically illustrates a schematic diagram of object group merging according to an embodiment of the present application;
FIG. 12 schematically illustrates a diagram based on clustering users according to an embodiment of the present application;
FIG. 13 schematically illustrates a diagram of mass news recalls, according to one embodiment of the present application;
FIG. 14 schematically illustrates a diagram of a community-based mass news recall according to one embodiment of the present application;
FIG. 15 shows a block diagram of an apparatus for object clustering according to an embodiment of the present application;
FIG. 16 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.
As shown in fig. 1, the system architecture may include a terminal device (e.g., one or more of a smartphone 101, a tablet computer 102, and a portable computer 103 shown in fig. 1, but may also be a desktop computer, etc.), a network 104, and a server 105. The network 104 serves as a medium for providing communication links between terminal devices and the server 105. Network 104 may include various connection types, such as wired communication links, wireless communication links, and so forth.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
A user may use a terminal device to interact with the server 105 over the network 104 to receive or send messages or the like. The server 105 may be a server that provides various services. For example, a user uploads an interest tag of an object to the server 105 by using the terminal device 103 (or the terminal device 101 or 102), the server 105 obtains the interest tag of each object, generates a tag sequence corresponding to each object based on the interest tag, clusters each object according to the tag sequence corresponding to each object, obtains an object group corresponding to the same cluster tag, and, in a case that the number of objects in the object group is less than an object number threshold, merges the objects in the object group into the object group associated with the cluster tag, obtains an object group whose number of objects is greater than or equal to the object number threshold, so that the finally obtained object group has the cluster tag capable of accurately representing the preference information thereof, and the objects in each object group have a balanced size, so as to perform corresponding processing on the cluster group according to the cluster tags of the cluster groups, the accuracy and the balance of object clustering are improved, and the accuracy of processing a clustering group is further improved.
It should be noted that the method for clustering objects provided in the embodiments of the present application is generally performed by the server 105, and accordingly, the device for clustering objects is generally disposed in the server 105. However, in other embodiments of the present application, the terminal device may also have a similar function as the server, so as to execute the scheme of object clustering provided in the embodiments of the present application.
Fig. 2 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.
As shown in fig. 2, the system architecture may include a user group consisting of at least two terminal devices (e.g., one or more of a smartphone 201, a tablet computer 202, and a portable computer 203 shown in fig. 2, but may also be a desktop computer, etc.), a network 204, and a computer device 205 corresponding to the management node. Network 204 is the medium used to provide communication links between terminal devices and computer device 205. Network 204 may include various connection types, such as wired communication links, wireless communication links, and so forth.
As shown in fig. 2, in an embodiment of the present application, the method for object clustering is applied to a scene with a large number of objects, such as the object group 206 in fig. 2. The method comprises the steps of obtaining interest labels of all objects through computer equipment corresponding to a management node, generating label sequences corresponding to all the objects based on the interest labels, clustering all the objects according to the label sequences corresponding to all the objects to obtain object groups corresponding to the same clustering labels, combining the objects in the object groups into the object groups associated with the clustering labels under the condition that the number of the objects in the object groups is smaller than an object number threshold value, and obtaining target groups with the number of the objects larger than or equal to the object number threshold value, so that the finally obtained target groups have the clustering labels capable of accurately representing preference information of the target groups, and the objects in all the target groups have balanced sizes.
The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:
fig. 3 shows a flow diagram of a method of object clustering, which may be performed by a server, which may be the server shown in fig. 1, according to an embodiment of the application. Referring to fig. 3, the method for clustering objects at least includes steps S310 to S340, which are described in detail as follows:
in step S310, interest tags of the respective objects are acquired.
In an embodiment of the present application, the manner of obtaining the interest tag of each object may be a manner of collecting application data of the user terminal. In this embodiment, the object may include a user represented by a human being, or may include a robot, a virtual device, or the like. The interest tag of the object is used to represent preference information of the object, such as a point of interest, and the like.
Fig. 4 is a schematic diagram of obtaining an interest tag of an object according to an embodiment of the present disclosure.
As shown in fig. 4, in an embodiment of the present application, when the object is a user, after the user registers an account of an application program, the application program needs to determine a type corresponding to the user according to the preference of the user. Specifically, at least two interest tags 420 are displayed in the interface of the terminal device 410, such as sports, military, entertainment, lovely pets, travel, finance, science and fashion, and so on. By presenting these interest tags 420 on the user interface, the user is instructed to select his preferred interest tags according to these interest tags 420.
Besides, the viewing condition or the application condition of the content in the application program can be detected during the application program usage of the user. For example, obtaining a user click, triggering a link in some application program, or monitoring the stay time of the user terminal in a certain page during the application program running process, and determining the content of interest of the user object. And extracting the interest tags of the objects based on the types, forms and the like of the contents.
In one embodiment of the present application, a user representation may also be generated based on user preference information, which may include tags, themes, media numbers, and the like. And extracting a preset number of tags from the user portrait as interest tags of the object.
In addition, if the object is a robot or a virtual device, such as a Virtual Reality (VR) device, the tag information corresponding to the object may be determined based on configuration information in the VR device and a corresponding user audience.
In step S320, based on the interest tag, a tag sequence corresponding to each object is generated.
In an embodiment of the present application, after obtaining the interest tags of the objects, considering that the preference degrees of the objects based on each interest tag are different, in this embodiment, a tag sequence corresponding to each object is generated based on the preference degrees corresponding to each interest tag. Wherein, the label sequence comprises interest labels or label characteristics carrying weight information.
In an embodiment of the present application, as shown in fig. 5, the process of generating the tag sequence corresponding to each object based on the interest tag in step S320 includes the following steps S510 to S520, which are described in detail as follows:
in step S510, a second preset number of interest features are extracted from the interest tags of the object.
In an embodiment of the application, in order to facilitate matching of the interest tags, a second preset number is set, which is used to control the number of the interest tags corresponding to one object, so as to extract a second preset number of interest features from the interest tags of the object based on the second preset number. The interest labels have the same interest characteristics, and clustering management is facilitated.
Illustratively, if the interest tags of the subject include sports, military, entertainment, lovely pet, tourism, finance, science and technology, and the second preset number is set to be 3, 3 are extracted from the interest tags as interest features, namely sports, military, entertainment.
Further, the interest tags in this embodiment have their respective weights, which are used to represent the preference procedure of the object for each interest tag. Therefore, in the embodiment, when the interest features are extracted from the interest tags, the second preset number of interest features with a larger weight may be extracted first based on the order of the weights from large to small. The preference of the object is reflected more typically through the interest features with larger weights, so that the clustering result is more accurate.
In one embodiment of the present application, the interest features include interest points representing individual interest types, or interest vectors representing object interest migration situations; the process of extracting a second preset number of interest features from the interest tags of the object in step S510 specifically includes the following steps:
generating an interest image of the object based on the interest label of the object;
and extracting a second preset number of interest points or a second preset number of interest vectors from the interest images.
In one embodiment of the application, an interest image of an object can be generated based on an interest tag to characterize the preference of the object through the interest image, and each object interest image can be managed in a targeted manner, such as storage. After the interest representation is generated, a second predetermined number of interest points or interest vectors are extracted from the interest representation.
In one embodiment of the present application, the interest points are used to represent individual interest types or features of interest tags, and the interest vectors are used to represent interest feature transition situations when the interest of the object changes. In the embodiment, the preference condition of the object is represented based on the interest point and the interest vector, so that quantitative calculation and management of preference data are realized.
In step S520, a second preset number of interest features corresponding to the object are arranged to obtain a tag sequence corresponding to the object.
In an embodiment of the application, after the interest features are obtained, based on the weights corresponding to the interest features, a second preset number of interest features are arranged to obtain a tag sequence of the object.
The tag sequence in this embodiment includes information of interest features of the object, and the weights corresponding to the interest features are implied based on the arrangement sequence of the interest features, so that the objects can be clustered based on the interest features and the weights thereof.
In step S330, each object is clustered according to the tag sequence corresponding to each object, so as to obtain an object group corresponding to the same clustering tag.
In an embodiment of the present application, after generating a tag sequence corresponding to each object, clustering is performed on each object based on the tag sequence, so as to aggregate objects corresponding to similar tag sequences in one object group, and obtain an object group corresponding to the same clustering tag.
It should be noted that, because the number of the objects is large, after obtaining many different types of tag sequences, the tag sequence in this embodiment may be completely the same as the clustering tag, and may also include the clustering tag, so as to ensure that the number of each object group is balanced, and the objects in each object group have a uniform clustering tag.
In an embodiment of the present application, the process of clustering each object according to the tag sequence corresponding to each object in step S330 to obtain an object group corresponding to the same clustering tag specifically includes the following steps:
identifying a target cluster label contained by the label sequence from the cluster label set based on the label sequence;
and adding the object to the object group corresponding to the target clustering label.
In an embodiment of the present application, the cluster label set is a set formed by predetermined cluster labels, and after the label sequence is determined, a target cluster label included in the label sequence is searched from the cluster label set, so as to add an object to an object group corresponding to the target cluster label.
Fig. 6 is a schematic diagram of determining a target cluster label according to an embodiment of the present application.
As shown in fig. 6, the interest tags of 3 users are a-b-c-h (610), a-b-d (620) and a-b-f (630), respectively, and if a corresponding cluster tag (640) is found to be a-b in the cluster tag set, the 3 users are added to the object group with the cluster tag of a-b, and the number of objects in the object group is counted.
It should be noted that, in the object tag in this embodiment, the weight of the first tag feature is the largest, and in this embodiment, when the target cluster tag is identified, the first tag feature is used as an initial reference, and the first feature in the object tag and the first feature of the cluster tag are sequentially compared to determine the target cluster tag, so as to ensure the accuracy of the target cluster tag obtained by searching.
In an embodiment of the present application, the object clustering method in this embodiment further includes: if the clustering label set does not have the clustering label contained by the label sequence, creating a new clustering label corresponding to the label sequence; and adding the object to the object group corresponding to the newly-built clustering label.
In an embodiment of the present application, if there is no cluster tag included in a tag sequence in an original cluster tag set, a new cluster tag of a tag sequence object is created, so that after the new cluster tag is obtained, the object is added to an object group of the new cluster tag. And then if the label sequence containing the new clustering label exists, adding the object corresponding to the label sequence to the object group corresponding to the new clustering label.
Specifically, when a new clustering label corresponding to the label sequence is created, the label sequence can be directly used as the new clustering label, or a preset number of label features can be extracted from the label sequence, and the new clustering label is obtained by sequencing and combining according to the weight corresponding to each label feature.
For example, at the beginning of clustering, firstly, judging whether a clustering label C1-C2 … Ck to which a label sequence of an object belongs exists, if so, adding the object into the class, and adding 1 to the number of members of the class; otherwise, a new cluster label C1-C2 … Ck is created and the initial membership of the class is initialized to 1. And after all the users complete traversal, counting the number Cnum of the class members under each class label, if the number Cnum of the members under the class Cm-Cn … Ck labels is greater than a threshold value N, keeping the cluster center Cm-Cn … Ck, and stopping clustering all the objects under the cluster label. For all users under the clustering labels Cm-Cn … Ck, the common characteristic of the users is that the articles with the categories Cm, Cn and Ck are interested, and the interested degree is decreased in the order of Cm, Cn and Ck.
In step S340, if the number of objects in the object group is less than the threshold value of the number of objects, the objects in the object group are merged into the object group associated with the cluster tag, so as to obtain a target group with the number of objects greater than or equal to the threshold value of the number of objects.
Fig. 7 is a schematic diagram of an object group according to an embodiment of the present disclosure.
As shown in fig. 7, in an embodiment of the present application, in a case where there are a large number of objects, different objects correspond to different tag features, and as the preferences of the objects are more and more dispersed, the obtained tag features are more uneven. Thus, the number of objects in the object group derived based on the tag characteristics may be subject to a sudden increase, such as object group 710 in FIG. 7, or an extreme case where the number of objects is very small, such as object group 720 in FIG. 7. In addition, when the number of objects is small, the special processing based on the object group 720 with small number of objects consumes more resources and has little effect. Therefore, in the present embodiment, the object groups with a small number of objects are merged to obtain the object groups with a large number of objects, so as to balance the number of objects in each object group, improve the stability of the object groups, and further improve the benefit of group processing.
Illustratively, if the number of members Cnum under the Cm-Cn … Ck label obtained after clustering is less than the threshold N, the clustering center is invalid, and the User belonging to the clustering label must perform regular backoff first and then re-cluster the labels again according to the backoff.
In an embodiment of the application, if the number of objects in the object group is smaller than the threshold value of the number of objects in step S340, the process of merging the objects in the object group into the object group associated with the cluster tag to obtain the target group with the number of objects greater than or equal to the threshold value of the number of objects includes the following steps S810 to S830, which are described in detail as follows:
in step S810, if the number of objects in the object group is less than the object number threshold, the cluster tag of the object group is identified as the to-be-processed tag sequence.
In an embodiment of the present application, after all the objects are classified to obtain object groups, the number of the objects in each object group is detected. If the number of objects in the object group is smaller than the threshold value of the number of objects, it means that the object group is smaller, which is not in accordance with the original purpose of the embodiment. Therefore, the cluster label of the object group is identified as the label sequence to be processed, so that the object group is merged with other object groups to obtain a larger object group.
In this embodiment, the purpose of identifying the corresponding cluster tag as the tag sequence to be processed when the number of the object groups is small is to perform regular rollback on the tag sequence to be processed to obtain a tag with a large inclusion range.
In step S820, a sub-tag sequence is extracted from the tag sequence to be processed.
In one embodiment of the present application, a tag sequence to be processed includes at least two tag features and weights corresponding to the tag features; the process of extracting the sub-tag sequence from the tag sequence to be processed in step S820 specifically includes the following steps:
extracting a first preset number of feature tags from the tag sequence to be processed based on the weight corresponding to each tag feature in the tag sequence to be processed;
a first predetermined number of feature tags are grouped into a sequence of sub-tags.
In one embodiment of the present application, the tag sequence to be processed includes at least two tag features, each tag feature has a corresponding weight, and each tag feature is sorted according to its weight. According to the weight of each label feature object in the label sequence to be processed, extracting a first preset number of feature labels from the label sequence to be processed. To group these signature tags into sub-tag sequences.
Illustratively, a new tag sequence C1-C2 … C (K-1) is formed by extracting (K-1) tag features of most interest to the user from the tag features C1, C2,. C (K-1) -Ck, in order of tag feature weight from high to low.
In one embodiment of the application, the tag sequence to be processed is composed of tag features which are sorted from big to small according to the corresponding weights of the tag features; in the above step, based on the weight corresponding to each tag feature in the to-be-processed tag sequence, a process of extracting a first preset number of feature tags from the to-be-processed tag sequence specifically includes the following steps: extracting the first n-a characteristic tags from n tag characteristics in a tag sequence to be processed, wherein n is more than or equal to 2; a is more than or equal to 1 and less than n.
Fig. 9 is a schematic diagram of a fallback sequence of a tag according to an embodiment of the present disclosure.
As shown in fig. 9, in the process of generating the sub-tag sequence, regular backoff is performed based on the tag features a-b-c-h in the pending tag sequence 910. The rollback rule is as follows, and the first n-a feature tags are extracted from n tag features in the tag sequence to be processed. Wherein n is greater than or equal to 2, and a is less than n and greater than or equal to 1. Illustratively, if a is 1, the to-be-processed tag sequence 910 is rolled back by one step to obtain a sub-tag sequence 920, i.e. a-b-c; if a is 2, the pending tag sequence 910 is rolled back for 2 steps to obtain a sub-tag sequence 930, i.e. a-b.
Similarly, if the interest tag before rollback is C1-C2 … C (k-1) -Ck, then the interest tag after rollback one step is C1-C2 … C (k-1), and the interest tag after rollback x steps is C1-C2 … C (k-x), where x > 0; k > x and (k-x) > < 1.
In step S830, the objects in the object group are merged into the object group associated with the sub-tag sequence, and the above steps are repeated until a target group with the number of objects greater than or equal to the threshold number of objects is obtained.
In one embodiment of the present application, after obtaining the sub-tag sequences, an object group associated with the sub-tag sequences is determined, and objects in the object group are merged into the associated object group until the obtained number of objects is greater than or equal to a target group of an object number threshold. And iteration is carried out for several times, so that most users can be gathered into a class with a large number of members, and in an extreme case, a few users can independently form the class.
In an embodiment of the present application, the process of merging the objects in the object group into the object group associated with the sub-tag sequence in step S830 includes the following steps:
identifying target cluster labels similar to the sub-label sequences from the cluster label set;
and merging the object group corresponding to the label sequence to be processed into the object group corresponding to the target clustering label.
In one embodiment of the present application, after generating the sequence of sub-tags, target cluster tags that are similar to the sequence of sub-tags are identified from the set of cluster tags to merge these fewer number of object clusters into the object cluster of the target cluster tags. The case of similarity to the sub-tag sequence includes the case of identity to the sub-tag sequence, inclusion of the sub-tag sequence, and the like.
Fig. 10 is a schematic diagram of object group merging according to an embodiment of the present application.
As shown in fig. 10, if the to-be-processed tag sequence 1010 is a-b-c-h and the obtained sub-tag sequence 1020 is a-b-c, first searching for a target cluster tag 1030 that is the same as a-b-c in the cluster tag set; if the clustering label same as the a-b-c exists, merging the object group 1040 corresponding to the label sequence to be processed into the object group 1050 with the clustering label of the a-b-c to obtain a target group 1060; if the clustering label same as the clustering label a-b-c does not exist, the clustering label a-b-c is newly built, and then the object is placed into the object group corresponding to the clustering label.
In one embodiment of the present application, the sub-tag sequence comprises at least two tag features; a process of merging the objects in the object group into the object group associated with the cluster label in step S340 to obtain a target group with the number of objects greater than or equal to the threshold of the number of objects, specifically including the following steps:
merging objects in the object group into an object group associated with the sub-tag sequence, comprising:
carrying out similarity matching on the tag characteristics in each sub-tag sequence to obtain mutually matched sub-tag sequences;
and merging the object groups corresponding to the matched sub-label sequences.
In an embodiment of the present application, for a sub-tag sequence obtained by a tag sequence to be processed, there may be a plurality of sub-tag sequences in this embodiment, and in order to merge a smaller number of object groups, each matched object group may also be merged for each similar sub-tag sequence, so as to merge a smaller object group to obtain a larger object group.
Fig. 11 is a schematic diagram of object group merging according to an embodiment of the present disclosure.
As shown in fig. 11, if the tag sequence to be processed 1110 is a-b-c-h, the obtained sub-tag sequence 1120 is a-b-c; the tag sequence 1130 to be processed is a-b-c-g, and the obtained sub-tag sequence 1140 is also a-b-c; the object groups 1150 and 1160 corresponding to the matched sub-tag sequences are merged to obtain the target group 1170.
Fig. 12 is a schematic diagram illustrating a user clustering according to an embodiment of the present application.
As shown in fig. 12, in step S1210, interest information of a user is acquired; in step S1220, the interest information is processed according to the tag rule to obtain a user interest tag 1230; in step S1240, detecting whether a similar tag exists; if the similar label exists, adding the class corresponding to the similar label in step S1260; if no similar label exists, in step S1250, a new clustering label is generated and added to the class corresponding to the clustering label; in step S1270, the number of users under each cluster label is counted; in step S1280, it is detected whether the number of users is greater than a threshold value; if the value is smaller than the threshold value, in step S1211, performing rule rollback on the clustering label to obtain a new label, and then clustering; if the value is greater than the threshold value, in step S1210, the class is retained to obtain the final target user group. Through the method, the finally obtained target groups have the clustering labels capable of accurately representing the preference information of the target groups, and the objects in each target group have balanced scale sizes.
In an embodiment of the present application, if the number of objects in the object group is smaller than the threshold value of the number of objects, merging the objects in the object group into the object group associated with the clustering label, and after obtaining a target group whose number of objects is greater than or equal to the threshold value of the number of objects, the object clustering method in this embodiment further includes: acquiring the content of the target group preference according to the clustering label corresponding to the target group; and pushing the content preferred by the target group to the terminal corresponding to the target group.
In an embodiment of the application, after each target group is obtained, according to the clustering label of each target group object, the content preferred by the target group is recalled, so that after the content preferred by the target group is obtained, the content preferred by the target group is pushed to the terminal corresponding to the target group.
Fig. 13 is a schematic diagram of a mass news recall provided in an embodiment of the present application.
As shown in fig. 13, in the application scenario of personalized news recommendation, the personalized news recommendation generally includes two stages, i.e., recall stage and sort stage, and the two stages respectively perform their own roles and respectively complete different tasks. In the recall 1320 stage, based on the mass user profile 1340, the mass news 1310 is filtered for important content, resulting in content preferred by the target group. The contents are then prioritized 1330 so that the contents preferred by the target group are pushed to the terminals corresponding to the target group.
In one embodiment of the present application, a cluster label includes at least two label features and weights corresponding to the label features; in the above step, the process of pushing the content preferred by the target group to the terminal corresponding to the target group specifically includes: determining the content corresponding to each label characteristic based on each label characteristic in the clustering labels; determining a pushing sequence of the content corresponding to each label feature according to the sequence from high to low of the weight based on the weight corresponding to each label feature; and pushing the contents preferred by the target group to the terminal corresponding to the target group according to the pushing sequence of the contents corresponding to the label characteristics. By the method, the user side can browse the corresponding contents in sequence according to the preference degree of the user side, and the acceptance rate and the conversion rate of the contents are further improved.
Fig. 14 is a schematic diagram of mass news recalls based on groups according to an embodiment of the present application.
As shown in fig. 14, in the application scenario of personalized news recommendation, it is usually necessary to implement that every user recalls the news that best meets his interest and preference in personalized news recommendation recall, but in the scenario with a large amount of users, the consumption of computing resources and storage resources is large, so it is not practical to implement 1-to-1 recall completely. In this embodiment, a clustering recall is performed, and a large number of user figures 1410 are used to cluster 1420 users to obtain a plurality of object groups, so as to perform clustering processing based on each object group, and 1 to 1 is converted into n to 1, thereby reducing the computational complexity.
Specifically, 1 to 1 is converted into n to 1 through clustering, that is, users with the same or similar interests are clustered into one class, and the same news is recommended for the class of users. The present embodiment groups users into several classes or groups according to the interest preference of the users. And then according to the class size, combining the class smaller than a certain threshold value to the most similar class, and completing the clustering process after multiple iterations. Generating a group portrait 1430 from the obtained group, and performing group portrait recall 1440 on news based on the group portrait 1430; or obtain a group click history 1450, with a collaborative recall 1460 of the news. The unified processing of news data based on the user groups is realized, and the efficiency of news pushing is improved.
Embodiments of the apparatus of the present application are described below, which may be used to perform the method for object clustering in the above-described embodiments of the present application. It will be appreciated that the apparatus may be a computer program (comprising program code) running on a computer device, for example an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method for object clustering described above in the present application.
FIG. 15 shows a block diagram of an apparatus for object clustering according to an embodiment of the present application.
Referring to fig. 15, an apparatus 1500 for object clustering according to an embodiment of the present application includes: an obtaining unit 1510 configured to obtain interest tags of each object; a generating unit 1520, configured to generate a tag sequence corresponding to each object based on the interest tag; the clustering unit 1530 is configured to cluster each object according to the tag sequence corresponding to each object to obtain an object group corresponding to the same clustering tag; a merging unit 1540, configured to merge the objects in the object group into the object group associated with the cluster label if the number of the objects in the object group is less than the threshold of the number of the objects, so as to obtain a target group with the number of the objects being greater than or equal to the threshold of the number of the objects.
In some embodiments of the present application, based on the foregoing scheme, the merge unit 1540 includes: the first identification unit is used for identifying the clustering label of the object group as a label sequence to be processed if the number of the objects in the object group is less than the threshold value of the number of the objects; the first extraction unit is used for extracting a sub-tag sequence from the tag sequence to be processed; and the first merging unit is used for merging the objects in the object group into the object group associated with the sub-label sequence, and repeating the steps until a target group with the number of the objects larger than or equal to the threshold value of the number of the objects is obtained.
In some embodiments of the present application, based on the foregoing scheme, the tag sequence to be processed includes at least two tag features and weights corresponding to the tag features; the first extraction unit includes: the second extraction unit is used for extracting a first preset number of feature tags from the tag sequence to be processed based on the weight corresponding to each tag feature in the tag sequence to be processed; and the sequence composition unit is used for composing the first preset number of characteristic labels into a sub-label sequence.
In some embodiments of the present application, based on the foregoing scheme, the tag sequence to be processed is composed of tag features that are sorted from large to small according to weights corresponding to the tag features; the second extraction unit includes: the third extraction unit is used for extracting the first n-a characteristic labels from n label characteristics in the label sequence to be processed, wherein n is more than or equal to 2; a is more than or equal to 1 and less than n.
In some embodiments of the present application, based on the foregoing solution, the first merging unit includes: the second identification unit is used for identifying a target clustering label similar to the sub-label sequence from the clustering label set; and the second merging unit is used for merging the object group corresponding to the label sequence to be processed into the object group corresponding to the target clustering label.
In some embodiments of the present application, based on the foregoing scheme, the sub-tag sequence comprises at least two tag features; the first merging unit includes: the first matching unit is used for carrying out similarity matching on the tag characteristics in each sub-tag sequence to obtain mutually matched sub-tag sequences; and the third merging unit is used for merging the object groups corresponding to the matched sub-label sequences.
In some embodiments of the present application, based on the foregoing scheme, the generating unit 1520 includes: the fourth extraction unit is used for extracting a second preset number of interest features from the interest tags of the objects; and the arrangement unit is used for arranging a second preset number of interest features corresponding to the object to obtain a tag sequence corresponding to the object.
In some embodiments of the present application, based on the foregoing scheme, the interest feature includes an interest point representing an individual interest type or an interest vector representing an interest migration situation of the object; the fourth extraction unit includes: the portrait unit is used for generating an interest portrait of the object based on the interest label of the object; and the fifth extraction unit is used for extracting a second preset number of interest points or a second preset number of interest vectors from the interest portrait.
In some embodiments of the present application, based on the foregoing scheme, the clustering unit 1530 includes: a third identification unit, configured to identify, based on the tag sequence, a target cluster tag included in the tag sequence from the cluster tag set; and the first adding unit is used for adding the object to the object group corresponding to the target clustering label.
In some embodiments of the present application, based on the foregoing scheme, the apparatus 1500 for clustering objects further includes: the creating unit is used for creating a new clustering label corresponding to the label sequence if the clustering label set does not have the clustering label contained in the label sequence; and the second adding unit is used for adding the object to the object group corresponding to the newly-built clustering label.
In some embodiments of the present application, based on the foregoing scheme, the apparatus 1500 for clustering objects further includes: the content acquisition unit is used for acquiring the content preferred by the target group according to the clustering label corresponding to the target group; and the content pushing unit is used for pushing the content preferred by the target group to the terminal corresponding to the target group.
In some embodiments of the present application, based on the foregoing scheme, the clustering label includes at least two label features and weights corresponding to the label features; the content push unit includes: the first determining unit is used for determining the content corresponding to each label characteristic based on each label characteristic in the clustering labels; the second determining unit is used for determining the pushing sequence of the content corresponding to each label characteristic based on the weight corresponding to each label characteristic; and the sequencing pushing unit is used for pushing the contents preferred by the target group to the terminal corresponding to the target group according to the pushing sequence of the contents corresponding to the label characteristics.
FIG. 16 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
It should be noted that the computer system 1600 of the electronic device shown in fig. 16 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 16, the computer system 1600 includes a Central Processing Unit (CPU) 1601 which can perform various appropriate actions and processes, such as executing the method in the above-described embodiment, according to a program stored in a Read-Only Memory (ROM) 1602 or a program loaded from a storage portion 1608 into a Random Access Memory (RAM) 1603. In the RAM 1603, various programs and data necessary for system operation are also stored. The CPU 1601, ROM 1602, and RAM 1603 are connected to each other via a bus 1604. An Input/Output (I/O) interface 1605 is also connected to the bus 1604.
The following components are connected to the I/O interface 1605: an input portion 1606 including a keyboard, a mouse, and the like; an output section 1607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage portion 1608 including a hard disk and the like; and a communication section 1609 including a network interface card such as a LAN (Local area network) card, a modem, or the like. The communication section 1609 performs communication processing via a network such as the internet. The driver 1610 is also connected to the I/O interface 1605 as needed. A removable medium 1611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1610 as necessary, so that a computer program read out therefrom is mounted in the storage portion 1608 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1809, and/or installed from the removable media 1811. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 1801.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (15)

1. A method of object clustering, comprising:
obtaining interest tags of all objects;
generating a label sequence corresponding to each object based on the interest label;
clustering the objects according to the label sequences corresponding to the objects to obtain object groups corresponding to the same clustering label;
if the number of the objects in the object group is smaller than the threshold value of the number of the objects, merging the objects in the object group into the object group associated with the clustering label to obtain a target group with the number of the objects larger than or equal to the threshold value of the number of the objects.
2. The method of claim 1, wherein if the number of objects in the object group is less than an object number threshold, merging the objects in the object group into the object group associated with the cluster tag to obtain a target group with the number of objects greater than or equal to the object number threshold comprises:
if the number of the objects in the object group is smaller than the threshold value of the number of the objects, identifying the clustering label of the object group as a label sequence to be processed;
extracting a sub-tag sequence from the tag sequence to be processed;
merging objects in the object group into an object group associated with the sub-tag sequence;
and repeating the steps until a target population with the number of the objects larger than or equal to the threshold value of the number of the objects is obtained.
3. The method of claim 2, wherein the tag sequence to be processed comprises at least two tag features and weights corresponding to the tag features;
extracting a sub-tag sequence from the tag sequence to be processed, wherein the sub-tag sequence comprises the following steps:
extracting a first preset number of feature tags from the tag sequence to be processed based on the weight corresponding to each tag feature in the tag sequence to be processed;
and forming the sub-label sequence by the first preset number of feature labels.
4. The method according to claim 3, wherein the label sequence to be processed is composed of the label features sorted from big to small according to the weights corresponding to the label features;
extracting a first preset number of feature tags from the tag sequence to be processed based on the weight corresponding to each tag feature in the tag sequence to be processed, including:
extracting first n-a characteristic tags from n tag characteristics in the tag sequence to be processed, wherein n is more than or equal to 2; a is more than or equal to 1 and less than n.
5. The method of claim 2, wherein merging the objects in the object group into the object group associated with the sub-tag sequence comprises:
identifying a target cluster label from a set of cluster labels that is similar to the sequence of sub-labels;
and merging the object group corresponding to the label sequence to be processed into the object group corresponding to the target clustering label.
6. The method of claim 2, wherein the sub-tag sequence comprises at least two tag features;
merging the objects in the object group into the object group associated with the sub-tag sequence, including:
carrying out similarity matching on the tag characteristics in each sub-tag sequence to obtain mutually matched sub-tag sequences;
and merging the object groups corresponding to the matched sub-label sequences.
7. The method of claim 1, wherein generating the tag sequence corresponding to each object based on the interest tag comprises:
extracting a second preset number of interest features from the interest tags of the object;
and arranging a second preset number of interest features corresponding to the object to obtain a tag sequence corresponding to the object.
8. The method of claim 7, wherein the interest features comprise interest points representing individual interest types or interest vectors representing object interest migration situations;
extracting a second preset number of interest features from the interest tags of the object, including:
generating an interest portrait of the object based on the interest tag of the object;
extracting the second preset number of interest points or the second preset number of interest vectors from the interest image.
9. The method of claim 1, wherein clustering the objects according to the tag sequences corresponding to the objects to obtain object groups corresponding to the same clustering tags comprises:
identifying, based on the tag sequence, a target cluster tag contained by the tag sequence from a set of cluster tags;
and adding the object to the object group corresponding to the target clustering label.
10. The method of claim 9, further comprising:
if the clustering label set does not have the clustering label contained by the label sequence, creating a new clustering label corresponding to the label sequence;
and adding the object to the object group corresponding to the new cluster label.
11. The method of claim 1, wherein if the number of objects in the object group is smaller than the threshold number of objects, merging the objects in the object group into the object group associated with the cluster tag, and obtaining a target group with the number of objects greater than or equal to the threshold number of objects, further comprising:
acquiring the content preferred by the target group according to the clustering label corresponding to the target group;
and pushing the content preferred by the target group to a terminal corresponding to the target group.
12. The method of claim 11, wherein the cluster label comprises at least two label features and weights corresponding to the label features;
pushing the content preferred by the target group to a terminal corresponding to the target group, including:
determining content corresponding to each label feature based on each label feature in the clustering labels;
determining a pushing sequence of the content corresponding to each label characteristic based on the weight corresponding to each label characteristic;
and pushing the content preferred by the target group to the terminal corresponding to the target group according to the pushing sequence of the content corresponding to each label characteristic.
13. An apparatus for clustering objects, comprising:
the acquisition unit is used for acquiring the interest tags of all the objects;
a generating unit, configured to generate a tag sequence corresponding to each object based on the interest tag;
the clustering unit is used for clustering the objects according to the label sequences corresponding to the objects to obtain object groups corresponding to the same clustering label;
and the merging unit is used for merging the objects in the object group into the object group associated with the clustering label if the number of the objects in the object group is less than an object number threshold value, so as to obtain a target group with the number of the objects being greater than or equal to the object number threshold value.
14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method of object clustering according to any one of claims 1 to 12.
15. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method of object clustering as claimed in any one of claims 1 to 12.
CN202010554265.1A 2020-06-17 2020-06-17 Object clustering method and device, computer readable medium and electronic equipment Active CN111667018B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010554265.1A CN111667018B (en) 2020-06-17 2020-06-17 Object clustering method and device, computer readable medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010554265.1A CN111667018B (en) 2020-06-17 2020-06-17 Object clustering method and device, computer readable medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN111667018A true CN111667018A (en) 2020-09-15
CN111667018B CN111667018B (en) 2023-12-15

Family

ID=72388343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010554265.1A Active CN111667018B (en) 2020-06-17 2020-06-17 Object clustering method and device, computer readable medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111667018B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966176A (en) * 2021-02-01 2021-06-15 北京三快在线科技有限公司 Object display method and device, electronic equipment and readable storage medium
CN114610921A (en) * 2021-11-30 2022-06-10 腾讯科技(深圳)有限公司 Object cluster portrait determination method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050258A (en) * 2014-06-15 2014-09-17 中国传媒大学 Group recommendation method based on interest groups
CN107122805A (en) * 2017-05-15 2017-09-01 腾讯科技(深圳)有限公司 A kind of user clustering method and apparatus
CN108287864A (en) * 2017-12-06 2018-07-17 深圳市腾讯计算机系统有限公司 A kind of interest group division methods, device, medium and computing device
CN110555164A (en) * 2019-07-23 2019-12-10 平安科技(深圳)有限公司 generation method and device of group interest tag, computer equipment and storage medium
US10521824B1 (en) * 2014-01-02 2019-12-31 Outbrain Inc. System and method for personalized content recommendations

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10521824B1 (en) * 2014-01-02 2019-12-31 Outbrain Inc. System and method for personalized content recommendations
CN104050258A (en) * 2014-06-15 2014-09-17 中国传媒大学 Group recommendation method based on interest groups
CN107122805A (en) * 2017-05-15 2017-09-01 腾讯科技(深圳)有限公司 A kind of user clustering method and apparatus
CN108287864A (en) * 2017-12-06 2018-07-17 深圳市腾讯计算机系统有限公司 A kind of interest group division methods, device, medium and computing device
CN110555164A (en) * 2019-07-23 2019-12-10 平安科技(深圳)有限公司 generation method and device of group interest tag, computer equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966176A (en) * 2021-02-01 2021-06-15 北京三快在线科技有限公司 Object display method and device, electronic equipment and readable storage medium
CN112966176B (en) * 2021-02-01 2022-08-26 北京三快在线科技有限公司 Object display method and device, electronic equipment and readable storage medium
CN114610921A (en) * 2021-11-30 2022-06-10 腾讯科技(深圳)有限公司 Object cluster portrait determination method and device, computer equipment and storage medium
CN114610921B (en) * 2021-11-30 2023-02-28 腾讯科技(深圳)有限公司 Object cluster portrait determination method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111667018B (en) 2023-12-15

Similar Documents

Publication Publication Date Title
CN108287864B (en) Interest group dividing method, device, medium and computing equipment
WO2022057658A1 (en) Method and apparatus for training recommendation model, and computer device and storage medium
CN108021708B (en) Content recommendation method and device and computer readable storage medium
CN109471978B (en) Electronic resource recommendation method and device
CN110399487B (en) Text classification method and device, electronic equipment and storage medium
CN112380859A (en) Public opinion information recommendation method and device, electronic equipment and computer storage medium
WO2021155691A1 (en) User portrait generating method and apparatus, storage medium, and device
CN113688310B (en) Content recommendation method, device, equipment and storage medium
CN113051480A (en) Resource pushing method and device, electronic equipment and storage medium
CN111667018B (en) Object clustering method and device, computer readable medium and electronic equipment
CN112231555A (en) Recall method, apparatus, device and storage medium based on user portrait label
CN115222443A (en) Client group division method, device, equipment and storage medium
CN110390014A (en) A kind of Topics Crawling method, apparatus and storage medium
CN113962401A (en) Federal learning system, and feature selection method and device in federal learning system
CN113569162A (en) Data processing method, device, equipment and storage medium
CN108830302B (en) Image classification method, training method, classification prediction method and related device
CN116401602A (en) Event detection method, device, equipment and computer readable medium
CN110852078A (en) Method and device for generating title
CN113591881B (en) Intention recognition method and device based on model fusion, electronic equipment and medium
CN112801053B (en) Video data processing method and device
CN115439192A (en) Medical commodity information pushing method and device, storage medium and computer equipment
CN114461822A (en) Resource processing method, device, equipment and storage medium
CN114610758A (en) Data processing method and device based on data warehouse, readable medium and equipment
CN113822112A (en) Method and apparatus for determining label weights
CN109885504B (en) Recommendation system test method, device, medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant