CN116955772A - Data processing method and related device - Google Patents

Data processing method and related device Download PDF

Info

Publication number
CN116955772A
CN116955772A CN202211457492.8A CN202211457492A CN116955772A CN 116955772 A CN116955772 A CN 116955772A CN 202211457492 A CN202211457492 A CN 202211457492A CN 116955772 A CN116955772 A CN 116955772A
Authority
CN
China
Prior art keywords
group
feature
features
cluster
portrait
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211457492.8A
Other languages
Chinese (zh)
Inventor
苏鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202211457492.8A priority Critical patent/CN116955772A/en
Publication of CN116955772A publication Critical patent/CN116955772A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method and a related device, which can be applied to various scenes such as artificial intelligence, block chain technology and the like, and aims at each object group in a plurality of object groups to acquire group image features and perform vector mapping on the group image features to acquire feature vector representation of each group image feature. The feature vector representation can reflect the features of the object group, so that a plurality of object groups are clustered according to the feature vector representation, and the object groups with similar features can be clustered together to obtain a cluster. And integrating the group portrait features of the object groups included in the clusters to obtain the cluster portrait features of the clusters. As the cluster portrait features can cover more interactive data and wider interest content, the interest popularization is improved, and the cluster portrait features can provide higher-quality service for users and improve the user experience.

Description

Data processing method and related device
Technical Field
The present application relates to the field of the internet, and in particular, to a data processing method and related apparatus.
Background
In scenes such as recommendation, search, advertisement and the like, interest features of a user are generally extracted according to related data of the user, so that corresponding services are provided for the user according to the extracted interest features. The way in which the relevant data of only a single user is used to extract the features of interest and provide the corresponding services to the user often covers the limited number of users, especially for new and low-activity users, whose features of interest are difficult to build.
In this case, the interest feature of the user can be mined through the group related to the user, and a new interest can be explored for the user based on the historical interaction data of other users in the group, namely, the group portrayal feature is constructed by utilizing the related data of the group related to the user, and the group portrayal feature can reflect the interest of the user and also can reflect the hidden interest of the user which has not been mined.
Currently this approach is based mainly on the actual population, e.g. the user's interest group, the fan group of fixed authors, etc. However, effective interaction data and coverage user quantity in the actual group are often limited, so that historical interaction data covered by group portrait features are less, interest popularization is poor, service quality is poor, and user experience is poor.
Disclosure of Invention
In order to solve the technical problems, the application provides a data processing method and a related device, and the constructed cluster portrait features can cover more interactive data and wider interest content, so that interest popularization is improved, and service quality and user experience are improved.
The embodiment of the application discloses the following technical scheme:
in one aspect, an embodiment of the present application provides a data processing method, including:
Acquiring a plurality of object groups;
for each object group in the plurality of object groups, acquiring group portrait features of the object group;
respectively carrying out vector mapping on the group portrait features of each object group to obtain a feature vector representation of each group portrait feature;
clustering the plurality of object groups according to the feature vector representation of the group portrait features to obtain clusters;
and integrating the group portrait features of the object groups included in the group to obtain the group portrait features of the group.
In yet another aspect, an embodiment of the present application provides a data processing apparatus, where the apparatus includes an obtaining unit, a mapping unit, a clustering unit, and an integrating unit:
the acquisition unit is used for acquiring a plurality of object groups;
the acquisition unit is further used for acquiring group image features of each of the plurality of object groups;
the mapping unit is used for respectively carrying out vector mapping on the group portrait features of each object group to obtain a feature vector representation of each group portrait feature;
the clustering unit is used for clustering the plurality of object groups according to the characteristic vector representation of the group portrait characteristic to obtain a cluster;
The integration unit is used for integrating the group portrait features of the object groups included in the group to obtain the group portrait features of the group.
In another aspect, an embodiment of the present application provides a computer device including a processor and a memory:
the memory is used for storing a computer program and transmitting the computer program to the processor;
the processor is configured to perform the method of any of the preceding aspects according to instructions in the computer program.
In another aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program for performing the method of any one of the preceding aspects.
In another aspect, embodiments of the present application provide a computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the method of any of the preceding aspects.
According to the technical scheme, for each object group in the plurality of acquired object groups, the group portrait features of the object group are acquired, and vector mapping is carried out on the group portrait features of each object group, so that the feature vector representation of each group portrait feature is obtained. The feature vector representation of the group portrait features of the object group can reflect the features of the object group, so that a plurality of object groups are clustered according to the feature vector representation of the group portrait features, and the object groups with the features being similar can be clustered together to obtain a cluster, namely, the object groups included in the cluster have the similarity in the feature aspect. Further, the group portrait features of the object groups included in the group can be integrated to obtain the group portrait features of the group. Compared with the group portrait features of a single object group, the group portrait features can cover more interactive data and wider interest content, so that interest popularization is improved. Furthermore, when the service is provided for the user, the cluster portrait characteristic of the cluster to which the user belongs can be directly utilized to determine the content possibly interested by the user, and compared with the content possibly interested determined based on the cluster portrait characteristic of the single object group to which the user belongs, the content possibly interested is richer, so that the service with higher quality is provided for the user, and the user experience is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained according to the drawings without inventive effort for a person of ordinary skill in the art.
Fig. 1 is a schematic diagram of an application scenario of a data processing method according to an embodiment of the present application;
FIG. 2 is a flowchart of a data processing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a framework of a population relationship-based interest population second degree aggregation method according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a second-degree aggregation system according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for constructing a group image feature according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a multi-source feature fusion method based on object groups according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a multi-source feature fusion system according to an embodiment of the present application;
FIG. 8 is a block diagram of a data processing apparatus according to an embodiment of the present application;
fig. 9 is a block diagram of a terminal device according to an embodiment of the present application;
fig. 10 is a block diagram of a server according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings.
In scenes such as recommendation, search and advertisement, interest features of objects are usually mined by using groups related to the objects, and specifically, real interests of the objects and interests which have not been mined are reflected by using group portrait features of the groups. Further, in providing services related to recommendations, searches, advertisements, etc. to an object, content that may be of interest to the object may be determined based on the group representation features of the individual group in which the object is located. Wherein the object may refer to a user.
In the related art, the group portrait features of the group are mainly constructed based on the actual users included in the group and the effective interaction data of the actual users, such as the interest group of the users, the fan group of the fixed authors and the like. However, the effective interactive data and the coverage user quantity in a single group existing in practice are often limited, so that the interactive data covered by the group portrait features are less, and the covered interest content is limited, which results in poor interest popularization, so that the service quality provided for the user by using the group portrait features in related applications is poor, and especially the situation that the quantity of possibly interested content determined for the user by using the group portrait features is less, the covered interest is limited, and the like, which can result in poor user experience.
Therefore, the embodiment of the application provides a data processing method and a related device, wherein a plurality of object groups are clustered by using the group portrait features of the object groups, and the group portrait features can reflect the features of the object groups, so that the object groups with similar features can be clustered together to obtain clusters. Further, the cluster portrait features of the clusters are obtained by integrating the cluster portrait features of the object groups included in the clusters, so that the clusters and the cluster portrait features are used for providing services. Compared with the group portrait features of a single object group, the group portrait features can cover more interactive data and wider interest content, so that interest popularization is improved. Furthermore, when the service is provided for the user, the cluster portrait characteristic of the cluster to which the user belongs can be directly utilized to determine the content possibly interested by the user, and compared with the content possibly interested determined based on the cluster portrait characteristic of the single object group to which the user belongs, the content possibly interested is richer, so that higher-quality service can be provided for the user, and the user experience is improved.
The data processing method provided by the embodiment of the application can be implemented through computer equipment, wherein the computer equipment can be terminal equipment or a server, and the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud computing service. The terminal equipment comprises, but is not limited to, mobile phones, computers, intelligent voice interaction equipment, intelligent household appliances, vehicle-mounted terminal equipment and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein. The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, auxiliary driving and the like.
The embodiment of the application can be applied to various scenes needing to provide services for users, such as recommended scenes (e.g. intra-domain recommendation, out-of-domain recommendation, newly released content hotness and right raising, etc.), search scenes, advertisement scenes, etc.
It should be noted that, in the specific embodiment of the present application, relevant data such as user information may be involved in the process of data processing, and when the above embodiment of the present application is applied to specific products or technologies, user permission or consent needs to be obtained, and the collection, use and processing of relevant data need to comply with relevant laws and regulations and standards of relevant countries and regions.
The present application may be applied in the field of artificial intelligence (Artificial Intelligence, AI), which is a theory, method, technique and application system that simulates, extends and expands human intelligence, senses environment, acquires knowledge and uses knowledge to obtain optimal results using a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
With research and progress of artificial intelligence technology, research and application of artificial intelligence technology are being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autopilot, unmanned, robotic, smart medical, smart customer service, car networking, autopilot, smart transportation, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and will be of increasing importance.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.
The embodiment of the application mainly relates to directions such as natural language processing and the like. Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
The application can be applied to the technical field of blockchains, and the data processing method disclosed by the application can be used for storing data such as object groups, clusters, group portrait features, cluster portrait features and the like on the blockchain.
Fig. 1 shows an application scenario of a data processing method according to an embodiment of the present application, where in the scenario shown in fig. 1, a server 100 is described as an example of the foregoing computer device:
the server 100 may first obtain a plurality of object groups and, for each of the plurality of object groups, obtain a group representation feature of the object group. The group portrait characteristic of the object group can be a description mode of the characteristic of the object group, and can reflect the characteristic of the object group.
Then, the server 100 may perform vector mapping on the group representation features of each object group to obtain a feature vector representation of each group representation feature. Based on this, the feature vector representation can reflect the features of the object group, so that the subsequent clustering and the like can be performed.
Then, the server 100 may cluster a plurality of object groups according to the feature vector representation of the group portrait features, and may group together object groups having similarity in the features of the object groups to obtain a cluster, that is, the features of the object groups included in the cluster obtained by the clustering have similarity. Wherein a cluster may be a collection formed by aggregation of a population of objects.
Further, the server 100 may integrate the group representation features of the object groups included in the group to obtain the group representation features of the group. Compared with the group portrait features of a single object group, the group portrait features can cover more interactive data and wider interest content, so that interest popularization is improved.
Wherein, the cluster and the cluster portrait features can be used in various downstream applications for providing services for users. For example, in practical application, when providing services for users, the server 100 may directly use the cluster portrait features of the cluster to which the user belongs to determine content that may be of interest to the user, and compared with content that is determined based on the cluster portrait features of the single object group to which the user belongs, the content that may be of interest is richer, thereby providing higher quality services for the user and improving user experience.
For example, in a recommendation scenario, when a recommendation service needs to be provided for the user a, the server 100 may first determine a cluster to which the user a belongs, and then determine recommendation content for the user a according to a cluster portrait feature of the cluster to which the user a belongs. In practical application, the recommended content may also be displayed to the user a through the terminal device 200 corresponding to the user a, where the terminal device 200 may be a terminal device such as a mobile phone, a computer, or the like, and the terminal device 200 may be connected to the server 100 based on a network. In a specific implementation, the user a may log in to an application program through the terminal device 200, where the application program may provide a user page, and further may display the recommended content for the user a through the user page, and accordingly, the user a may view the recommended content by interacting with the recommended content.
It should be noted that, the implementation manner of the data processing method provided by the embodiment of the present application is not limited in any way. In one possible implementation manner, the method may be performed online, and in a specific implementation, when a service (such as a recommendation service) is provided for the user a, an object group related to the user a may be clustered online to obtain a cluster, and the cluster to which the user a belongs is determined, so as to determine recommended content for the user a. In another possible implementation manner, the method may be performed offline, and in a specific implementation, clusters may be clustered in advance, and when a service (such as a recommended service) is provided for the user a, the cluster to which the user a belongs is directly determined from the clusters obtained by clustering in advance, so as to determine recommended content for the user a. Based on the method, the recommended content can be rapidly determined by directly utilizing the clustered clusters in advance, and the recommendation efficiency is improved.
Fig. 2 is a flowchart of a data processing method according to an embodiment of the present application, where a server is used as an example of the foregoing computer device, and the method includes S201 to S205:
s201: a plurality of object populations is acquired.
In practical applications, the server may first obtain a plurality of object groups. The object group may refer to a set (may include an explicit set or an implicit set) formed by objects, where the explicit set may specifically refer to a group that actually exists, and the implicit set may specifically refer to a virtual group. When the object is a user, the actually existing group may refer to an interest group (for example, an interest group such as an instant messaging group, a friend-making group, a fan group, a communication post, etc.) that a plurality of users join together, and the virtual group may be, for example, a user group including users who join together in a certain activity, a user group including certain interest tags together, or the like.
It should be noted that the present application is not limited in any way as to how to obtain a plurality of object groups. In one possible implementation, the total number of object groups that can be acquired, and based on this, the total number of object groups can be clustered at one time to obtain corresponding clusters. In yet another possible implementation manner, a specific type of object group may be obtained according to an application scenario, for example, when the application scenario is a recommended scenario of a sports video, only a specific type of object group (for example, "ski group", "world cup group", etc.) of a sports topic may be obtained, and based on this, respective corresponding clusters may be obtained by clustering for different application scenarios.
S202: for each of a plurality of object groups, a group representation feature of the object group is obtained.
After the plurality of object groups are acquired, the server may continue to acquire the group representation feature of the object group for each of the plurality of object groups. The group portrait features of the object groups can refer to a description mode of the features of the object groups, such as group interest tags, interest association tags among groups, basic attribute distribution tags among groups, and the like. When an object exists as an individual in a certain object group, the object features can also embody the features of the object group, and the object features can refer to the features of a single object, for example, the object features can be features including interactions, attributes, interests and the like of the object. Therefore, in practical application, the characteristics of the object group may include two dimensions of the group characteristics and the object characteristics of the objects included in the object group, the group characteristics may reflect the characteristics of the object group from an overall perspective, the object characteristics may reflect the characteristics of the object group from an individual perspective, and the object characteristics may include interaction characteristics, attribute characteristics, interest characteristics, and the like. It can be seen that the population portrayal features of the subject population can be used to reflect the population features of the subject population as well as the subject features of the subjects included in the subject population.
S203: and respectively carrying out vector mapping on the group portrait features of each object group to obtain the feature vector representation of each group portrait feature.
In order to facilitate comparison of characteristic difference conditions among different object groups so as to realize clustering, the server can also respectively perform vector mapping on the group portrait features of each object group to obtain a feature vector representation of each group portrait feature. The standardized processing of the group image features is realized by the processing mode of vector mapping of the object group portrait features, and then the group features of the object group and the object features of the objects included in the object group are reflected by the feature vector representation.
In practical applications, the characteristics of the object population may be described from the content-side characteristics of the object population and the attribute-side characteristics of the object population. The content-side feature may refer to a feature related to core interest content associated with the object group, such as an interest of users commonly concerned by the object group, and the attribute-side feature may refer to a feature related to attribute distribution associated with the object group, such as a region attribute of users included in the object group.
In particular implementations, content-side features may be defined as community-interest features, such as community names, community announcements, interest keyword distributions, etc., and attribute-side features as community-relationship features, such as user lists, user territory distributions, user liveness, ongoing content, and user consumption behavior, etc. In practical application, the two characteristics can be used as the group portrait characteristics of the object group, at this time, the group portrait characteristics of the object group can comprise group relation characteristics and group interest characteristics, and based on the group relation characteristics and the group interest characteristics, the characteristics of the object group can be comprehensively described from two dimensions, namely, the group relation characteristics and the group interest characteristics, and the richness of the group portrait characteristics is improved. In addition, only one group representation feature of the target group can be selected according to actual requirements, and at the moment, the group representation feature of the target group can comprise group relation features or group interest features, and based on the group representation features of the target group have tendency of group relation dimensions or group interest dimensions.
Correspondingly, when vector mapping is performed, the server may perform vector mapping on the group relationship features to obtain a first feature vector representation of the group relationship features, and perform vector mapping on the group interest features to obtain a second feature vector representation of the group interest features, with respect to the group relationship features and the group interest features of each object group. Further, the server may splice the first feature vector representation and the second feature vector representation to obtain a feature vector representation of the group representation feature. Based on the feature vector representation, the group relation features and the group interest features of the object group are fused, so that the feature vector representation has richer feature expression capability.
It should be noted that the present application is not limited in any way as to how to splice the first feature vector representation and the second feature vector representation. In different application scenarios, the tendency of two dimensions, namely the group relation feature and the group interest feature, can also be changed under different clustering requirements, which can correspond to different clustering requirements. Therefore, in order to facilitate the adjustment and control of the clustering according to the clustering requirement to obtain the clustering result meeting the expectancy, in one possible implementation manner, the first feature vector representation and the second feature vector representation may be spliced in a weighted splicing manner, specifically, the server may first obtain the first weight of the first feature vector representation and the second weight of the second feature vector representation, and further utilize the first weight and the second weight to perform weighted splicing on the first feature vector representation and the second feature vector representation to obtain the feature vector representation of the group portrait feature. Based on the above, the first and second feature vector representations are weighted and spliced by setting the first and second weights, so that the feature vector representations of the group portrait features are obtained for subsequent clustering and other processing. Based on the method, the tendency of clustering in two dimensions of the group relation feature and the group interest feature can be flexibly regulated and controlled according to the clustering requirement by setting the weight, and the clustering result meeting the expectations is obtained.
Meanwhile, it should be noted that, the present application is not limited in any way with respect to the setting manners of the first weight and the second weight. For example, in an application scenario in which a first feature vector representation and a second feature vector representation of two dimensions can be regarded as being equally located, the first weight and the second weight can be set to be the same weight, and based on the same, the feature vector representations obtained by weighting and splicing can uniformly embody features of an object group in the two dimensions of a group relation and a group interest. In another example, in an application scenario requiring the feature representation of the weighted object group in a certain dimension, the first weight and the second weight may be set to different weights, and different emphasis points may be represented by the different weights. Correspondingly, the feature vector obtained by weighting and splicing represents the feature of one dimension with larger weight and the feature of the other dimension with smaller weakening weight.
In specific implementation, the magnitudes of the first weight and the second weight can be determined according to actual clustering requirements. For example, if the clustering requirement is to emphasize the group interest concentration, the first weight may be set smaller than the second weight; for another example, if the clustering requirement is to emphasize the relationship concentration of the group, the first weight may be set to be greater than the second weight; for another example, if the clustering requirement is to balance the performance of the population relationship with the population interest, the first weight may be set equal to the second weight. Based on the clustering result, the clustering result meeting the expectations is achieved by setting different weights.
S204: clustering the plurality of object groups according to the feature vector representation of the group portrait features to obtain clusters.
For each object group, after determining the feature vector representation of the group portrait feature, the server may further cluster the plurality of object groups according to the feature vector representation to obtain a cluster. Since the feature vector representation is used for reflecting the group characteristics of the object groups and the object characteristics of the objects included in the object groups, the object groups with the group characteristics and the object characteristics having similarity can be clustered together to obtain the group according to the feature vector representation of the group portrait characteristics, namely, the object groups included in the group obtained by clustering have similarity in the group characteristics and the object characteristics.
It should be noted that the present application is not limited in any way as to how to perform clustering. In one possible implementation, the clustering may be performed by selecting a K-means clustering approach. Specifically, the server may first select K object groups from the multiple object groups as initial cluster centers, where the initial cluster centers may refer to cluster centers referred to during primary clustering, and in implementation, for example, the multiple object groups may be first randomly divided into K groups, and then one object group is randomly selected from the K groups as an initial cluster center, so as to obtain K initial cluster centers. Further, the server can calculate the distance between each object group and each clustering center according to the feature vector of the group portrait feature, the size of the distance between the object group and the clustering center can reflect the similarity degree of the object group and the clustering center on the features, and in practical application, the closer the distance is, the more similar the features of the object group and the clustering center are. Thus, for each object group, the object group can be assigned to the cluster represented by the cluster center closest to the object group, based on which the group feature and the object group having similarity can be aggregated in one cluster.
It can be understood that the situation of cluster after each allocation is changed, so in practical application, each time an object group is allocated to a cluster, the server can recalculate the cluster center according to the existing object group in the cluster, and execute the step of calculating the distance between each object group and each cluster center according to the feature vector of the cluster portrait feature again until the cluster termination condition is met, so as to obtain K clusters. Based on this, the cluster center can be dynamically updated in the allocation process, so that the updated cluster center gradually approaches the final cluster center.
It should be noted that, the present application is not limited in any way with respect to the setting of the clustering termination condition. For example, a threshold number of times of clustering may be set as a clustering termination condition according to the number of the plurality of object groups, and the clustering termination condition may be considered to be satisfied when the number of times of clustering actually performed reaches the threshold number of times. For another example, the clustering termination condition may be set as that no more change occurs in the clustering center after the continuous preset number of times, and may be considered to be satisfied when no more change occurs in the clustering center recalculated by the continuous preset number of times.
When the K-means clustering mode is selected for clustering, the cluster scale and the like obtained after clustering can be influenced with respect to the setting of K, and the K can be specifically set according to actual clustering requirements, and the method is not limited in any way. In practical application, K may be determined according to at least one of an object coverage amount in a cluster, a number of object interaction data in a cluster, and a concentration of object relationships in a cluster. The intra-cluster object coverage may refer to the total number of objects in the object group included in the cluster, and the number of intra-cluster object interaction data may refer to the number of intra-cluster object interaction data, for example, may be the forwarding number of a certain content, and the intra-cluster object relationship concentration may be used to represent the intra-cluster object relationship closeness.
In the implementation, the relationship between K and the coverage of the objects in the clusters, the number of the object interaction data in the clusters and the concentration of the object relationship in the clusters can be set as negative correlation. For example, if the coverage of the object in the cluster is smaller, the coverage of the cluster can be considered to be too small, and K can be reduced at this time to reduce the number of clusters obtained by clustering, so as to improve the coverage of the object in the cluster; if the coverage of the objects in the cluster is larger, the coverage of the cluster can be considered to be too large, and at the moment, K can be increased to increase the number of clusters obtained by clustering so as to reduce the coverage of the objects in the cluster.
It should be noted that, for a specific process of determining K according to the number of object interaction data in the cluster and the concentration of object relationships in the cluster, reference may be made to a process of determining K according to the coverage amount of objects in the cluster, which is not described herein.
S205: and integrating the group portrait features of the object groups included in the clusters to obtain the cluster portrait features of the clusters.
After the clustering is completed to obtain the cluster, the server can integrate the group portrait features of the object groups included in the cluster, so as to obtain the cluster portrait features of the cluster. Based on the method, the upward integration of the group image features is realized, the image features are integrated from group granularity to group granularity, and the interaction data and the interest content which can be covered are enlarged, so that the interest popularization is improved.
In practical applications, for example, in scenes such as recommendation, search, advertisement and the like, the clusters obtained by clustering and cluster portrait features can be utilized to provide relevant services for users. Specifically, the server may first obtain an application service request for the object to be served, where the application service request may include an object identifier, where the object identifier is used to uniquely identify the object to be served. The object to be served may refer to a user who needs to provide an application service for the object to be served, and correspondingly, the object identifier may be a user identifier of the user. Further, the server may determine a cluster to which the object to be served belongs according to the object identifier, and return a request result based on a cluster portrait feature of the cluster to which the object to be served belongs, where the request result may refer to an application service provided for the object to be served. In practical applications, the request result may also be presented to the user. For example, in a recommendation scenario, an application service request may refer to a content recommendation request, and the request result may refer to recommended content.
Compared with the group portrait features of the single object group, the group portrait features can provide richer interest content for users in the single object group by supplementing the users in the single object group with information in other similar object groups, so that the group portrait features can provide higher-quality service for the users in downstream application, and user experience is improved.
The method for integrating the group image features of the object group included in the group is not limited in any way. In the implementation, the integration basis can be determined by selecting part of the characteristics from the group interest characteristics and the group relation characteristics according to the clustering requirements, and then the group portrait characteristics of the object group included in the group cluster are counted according to the integration basis, so that the group portrait characteristics are obtained. For example, the interest keyword distribution may be selected from the group interest feature, the user list may be selected from the group relationship feature, the user region distribution may be selected from the group relationship feature, and the user consumption behavior may be determined based on the integration basis including the group interest keyword distribution, the group user list, the group user region distribution, and the group user consumption behavior, and then the group interest keyword distribution may be obtained by counting the interest keyword distribution of each object group included in the group, the group user list may be counted in each object group included in the group to obtain the group user list, the group user region distribution may be counted in each object group included in the group to obtain the group user region distribution, and the group user consumption behavior may be counted in each object group included in the group to obtain the group user consumption behavior. In practical applications, when the object groups included in the cluster are counted, the counting can be performed according to a weighted counting mode.
It can be understood that other information can be added as integration basis according to actual requirements. For example, a regular content in a cluster may be added as an integration basis, and the regular content may refer to a content having a regular behavior such as user praise, user forwarding, user attention, and the like. The content which is regularly used in the cluster is integrated as candidate content which can be popularized and diffused in the cluster, so that the matching degree of the candidate content and the interests of the users in the cluster can be improved, and the value which is regularly used in the cluster for ordering, forwarding, focusing and the like of the users in the cluster can be better exerted. Correspondingly, in the downstream application, the candidate content can be promoted and diffused among all users in the cluster, so that the promotion and diffusion effect of the downstream application is improved.
For example, in a recommendation scenario, the downstream application may be a recommendation application, and the candidate content may refer to consumed content of other users within the cluster. Explicit or implicit recommendations may be selected when recommending consumed content to users within a cluster. In the explicit recommendation mode, prompt slogans such as "similar friends are also concerned", "Guangdong running lovers are all looking", "flying disc lovers in Shanghai Pudong are all looking" can be marked on consumption content in a product end corresponding to a recommendation application, so that users are prompted to view consumption content, and product viscosity of the recommendation application is improved. In the implicit recommendation mode, the consumption content can be recommended to the user without perception, for example, the consumption content can be directly displayed to the users in the cluster, and the recommendation diversity can be improved without marking the consumption content with a prompt slogan.
As another example, in a search scenario, the downstream application may be a search application and the candidate content may refer to search results. During searching, the content in the cluster can be recommended to the user as the searching result by priority.
According to the technical scheme, for each object group in the plurality of acquired object groups, the group portrait features of the object group are acquired, and vector mapping is carried out on the group portrait features of each object group, so that the feature vector representation of each group portrait feature is obtained. The feature vector representation of the group portrait features of the object group can reflect the group features of the object group and the object features of the objects included in the object group, so that the object groups are clustered according to the feature vector representation of the group portrait features, and the object groups with the group features and the object features having similarity can be clustered together to obtain a group, namely, the object groups included in the group have similarity in the aspects of group features and the object features. Further, the group portrait features of the object groups included in the group can be integrated to obtain the group portrait features of the group. Compared with the group portrait features of a single object group, the group portrait features can cover more interactive data and wider interest content, so that interest popularization is improved. Furthermore, when the service is provided for the user, the cluster portrait characteristic of the cluster to which the user belongs can be directly utilized to determine the content possibly interested by the user, and compared with the content possibly interested determined based on the cluster portrait characteristic of the single object group to which the user belongs, the content possibly interested is richer, so that the service with higher quality is provided for the user, and the user experience is improved.
The data processing method provided by the embodiment of the application provides an aggregation relation from a user to an object group to a cluster. In practical application, the aggregation relation can be further aggregated upwards, so that a larger-level object group aggregation relation is constructed, and the coverage range is enlarged. In specific implementation, a plurality of clusters with similar interest classes may be aggregated upwards, for example, by aggregating a plurality of clusters with similar interest classes upwards, a basketball interest class cluster 1, a football interest class cluster 2 and a tennis interest class cluster 3 may be aggregated upwards to obtain a sports interest cluster, where the obtained sports interest cluster has a larger coverage than a single cluster 1, cluster 2 or cluster 3.
In the embodiment described above, the first feature vector representation may be obtained by vector mapping the group relationship features, for reflecting the features of the object group in the group relationship dimension. The second feature vector representation may be obtained by vector mapping the group interest features to reflect the features of the subject group in the group interest dimension. It should be noted that, the present application is not limited in any way as to how to perform vector mapping on the group relationship features to obtain the first feature vector representation and how to perform vector mapping on the group interest features to obtain the second feature vector representation. For ease of understanding, the following description will be given.
First, a description will be given of how to vector-map the population relation features to obtain a first feature vector representation, which is specifically as follows:
in one possible implementation manner, for each object group, the server may directly obtain the group relationship feature of each object group, and further obtain, through vector mapping processing, a first feature vector representation of each object group, based on which the first feature vector representation can comprehensively reflect the group relationship feature of the object group.
In yet another possible implementation manner, for each object group, the first feature vector representation of each object group may be determined by integrating the group relation features of other object groups on the basis of the group relation features of the object group itself. In this case, the embodiment of the present application correspondingly provides a manner of determining the first feature vector representation based on two links, where in the first link, a similarity metric value between each object group and other object groups may be determined, and in turn, in the second link, the first feature vector representation of each object group is determined in combination with the similarity metric value. In implementation, the server may take each object group as the first object group, and further may perform similarity measurement according to the group relationship feature of the first object group and the group relationship feature of the second object group, and obtain a similarity measurement value between the first object group and the second object group through the similarity measurement. Wherein the second object group may refer to each of the plurality of object groups other than the first object group, the similarity measure being for reflecting a similarity between the population relationship features of the first object group and the population relationship features of the second object group. Finally, the server may convert the similarity measure between the first population of objects and the second population of objects into a first feature vector representation of the first population of objects. Based on this, the first feature vector representation of each object population is determined after fusing the similarity of population relationship features with other object populations, such that it is convenient to aggregate object populations having similar population relationship features in one cluster at the time of subsequent clustering.
In the embodiment in which the first feature vector representation is determined by means of a similarity measure, it should also be noted that the application is not limited in any way as to how the similarity measure is performed and how the similarity measure between the first object population and the second object population is converted into the first feature vector representation of the first object population. For ease of understanding, the following description will be given.
First, a description is given of how the similarity measurement is performed. For example, in practical application, a similarity measurement basis may be first determined, where the similarity measurement basis may be a predetermined performance index for reflecting a population relationship feature of the object population, and further a difference in performance of the first object population and the second object population in terms of the similarity measurement basis may be compared, and a similarity measurement between the first object population and the second object population may be performed to determine a similarity measurement value between the first object population and the second object population. Also, the application is not limited in any way as to how the similarity measure is determined. For ease of understanding, embodiments of the present application provide the following as examples:
in general, the population relationship features of the object population may be derived from a variety of relationship features, such as geographic distribution information, academic distribution information, and friend relationships among objects included in the object population. Therefore, in one possible implementation manner, for each object group, the relationship features from different sources may be first weighted and integrated to obtain a group relationship feature of each object group, and then the group relationship feature may be used as a similarity metric basis.
In practical applications, the number of common objects included in different object groups and the overlap ratio of the covered object interaction data can reflect the similarity between different object groups, so in another possible implementation manner, the number of common objects included in the object groups and the overlap ratio of the covered object interaction data can also be used as a similarity measurement basis.
It will be appreciated that in some possible implementations, the group relationship features, the number of common objects included, and the overlap ratio of the covered object interaction data may also be combined, and the combined result is used as a similarity measure basis.
Next, a description will be given of how to convert the similarity measure between the first object group and the second object group into the first feature vector representation of the first object group. For example, in one possible implementation, a similarity measure between the first population of objects and the second population of objects may be converted into a first eigenvector representation in a random probabilistic walk. Specifically, the server may first construct an object group relationship graph with each object group as a node, a relationship between object groups as an edge, and a similarity measure value between object groups as a weight of the edge, where the object group relationship graph constructed based on the object group relationship graph can reflect a similarity relationship between a plurality of object groups. Further, the server may perform random walk on the object group relationship graph with a node corresponding to the first object group as a starting point and a weight of the edge as a walk probability, to obtain a first feature vector representation of the first object group. Since the weight of the edge may refer to a similarity measure value between the object groups, a higher similarity measure value indicates a higher similarity between the first object group and the second object group, and therefore, the weight of the edge is used as a walk probability, and the first feature vector representation of the first object group can be rapidly determined by controlling the part of the second object group which is more similar to the first object group to walk.
In practical application, a depth walk algorithm may be used to perform random walk, specifically, a node corresponding to the first object group is used as a starting point, a weight of an edge is used as a walk probability, the depth walk algorithm is used to perform random walk on the object group relation graph, and when the random walk is finished, a graph embedded vector representation based on the object group relation graph is output, and the graph embedded vector representation can be used as the first feature vector representation of the first object group.
Next, a description will be given of how to vector-map the group interest feature to obtain a second feature vector representation. The method comprises the following steps:
the group interest feature may refer to a feature related to core interest content associated with the object group, and may specifically be a representation form of a group interest keyword, for example, the group interest keyword may be an interest keyword that is commonly concerned by users included in the object group, a fixed author keyword that is commonly concerned by fan groups, and the like. Accordingly, the method of vector mapping the group interest feature to obtain the second feature vector representation may refer to mapping the group interest keyword into a semantic vector by embedding the vector representation, and further, the semantic vector may be used as the second feature vector representation, so that the object groups with similar interests or similar group purposes can be clustered together to form a cluster during clustering.
In one possible implementation manner, a typical construction manner of mapping the group interest keywords into semantic vectors may be to construct an embedded vector expression corresponding to the central word as the semantic vector by using a word vector mapping (word 2 vector) method based on the central word corresponding to the group interest keywords. In the specific implementation, keywords capable of representing the group interests may be first extracted from the group interest keywords as the central words, for example, the name of the fixed author and the field of the fixed author may be extracted as the central words by taking the fixed author keywords commonly focused on by the vermicelli group as examples. Furthermore, the central word can be mapped into a high-dimensional space through a word2vector method to obtain the embedded vector expression of the central word, and the embedded vector expression of the central word is used as a second feature vector expression of the object group and used as a measure of a subsequent processing process such as clustering.
For example, in a recommended scenario of video recommendation, the group interest keyword may also be extracted from a multimodal feature of the video, where the multimodal feature of the video may be a title text of the video, a picture image of the video, an audio of the video, and so on. Correspondingly, different word vector mapping methods can also be used.
It should be further noted that, the present application is not limited in terms of the number of the group interest keywords and the number of the center words, and accordingly, the number of the group interest keywords and the number of the center words may not be limited to one. In yet another possible implementation manner, the vector mapping may be performed in a manner that the plurality of interest keywords in the head of the object population are weighted and summed by different weights, and the weighted and summed result is used as a second feature vector representation of the population interest keywords.
Based on the description of the above embodiment, the clusters are obtained by re-aggregating the object groups, and when the group image features of the object groups include the group relationship features and the group interest features, the aggregation is performed on the object groups based on the group relationship features and the group interest features, so that the data processing method provided by the embodiment of the application can be considered as a group relationship-based interest group secondary aggregation method, and particularly can be seen in fig. 3. Fig. 3 shows a schematic frame diagram of a population relationship-based interest population secondary aggregation method, the population relationship-based interest population secondary aggregation method can be applied to a secondary aggregation system 400, a schematic structure diagram of the secondary aggregation system 400 can be shown in fig. 4, and the secondary aggregation system 400 can include a population portrait feature building module 401, a population relationship embedding representation module 402, a population interest embedding representation module 403, a cluster aggregation module 404 and a cluster portrait feature integration module 405.
Referring to fig. 3, in the actual second-degree aggregation process, for each object group in the acquired multiple object groups, the group portrait feature of each object group may be first constructed by the group portrait feature construction module 401, so as to obtain the group portrait feature of the object group, and in the second-degree aggregation process, the group portrait feature constructed by the group portrait feature construction module 401 may include two parts of a group relation feature and a group interest feature. Further, the group relation feature may be subjected to vector mapping by the group relation embedding representation module 402 to obtain a group relation embedded vector representation (i.e. the aforementioned first feature vector representation), and the group interest feature may be subjected to vector mapping by the group interest embedding representation module 403 to obtain a group interest word embedded vector representation (i.e. the aforementioned second feature vector representation), where the group interest word embedded vector representation may also be referred to as a group keyword embedded vector representation, based on which the group relation feature and the group interest feature in the object group are respectively represented in the form of an embedded vector for performing a subsequent clustering process. Further, the group relation embedded vector representation and the group interest word embedded vector representation may be input together as a subject group aggregation metric to the group cluster aggregation module 404. Then, a plurality of object groups may be secondarily aggregated by the group aggregation module based on the group relation embedded vector representation and the group interest word embedded vector representation at 404 to obtain a group. In a specific implementation, the group relation embedded vector representation and the group interest word embedded vector representation can be used as upstream inputs of the group clustering module 404, so that the two can be spliced, the spliced vectors are feature vector representations of group image features, and the group clustering module 404 performs secondary aggregation on a plurality of object groups according to the feature vector representations. After the second degree of aggregation is completed, the aggregation result may be output by the cluster aggregation module 404, and the aggregation result may be used to indicate the resulting clusters. It is understood that the aggregation result may be to aggregate a plurality of object groups into a single cluster, or may be to aggregate a plurality of object groups into a plurality of clusters. Finally, the cluster image feature integration module 405 can integrate the group image features of the object group included in the cluster upwards, and integrate the image features from the group granularity to the cluster granularity to obtain the cluster image features.
Based on the method, the interest group secondary aggregation method based on the group relation is provided, object groups with compact group relation and similar interests can be aggregated to form a group, and interest uniformity and relation uniformity are shown in the group. When the method is applied subsequently, the cluster portrait features of the clusters can be directly utilized to serve downstream applications, and as compared with the cluster portrait features of single object groups, the cluster portrait features of the clusters can cover more interactive data and wider interest content, the interest popularization can be improved, and therefore the problems that the interactive data covered by the single object groups are sparse, the number of users is less, the user popularization range is smaller and the like are solved. Compared with a single object group, the cluster can supplement other object groups with similar interest characteristics and similar relation characteristics to the single object group, so that the similar interest characteristics, the similar relation characteristics and the like can be effectively promoted in all the object groups included in the cluster, and the promotion range is improved. Meanwhile, the cluster can aggregate more interactive data, so that the popularization of user interest behaviors is improved, for example, the user consumption behaviors in a certain object group in the cluster can be used as recommended content to be popularized to more potential users (for example, the potential users can refer to users in other object groups in the cluster), and the popularization range is improved.
In the above-described embodiment, the group representation feature is an important basis for implementing the clustering process on the plurality of object groups, and accordingly, in S202, the group representation feature of the object group may be acquired for each of the plurality of object groups. The present application is not limited in any way as to how to acquire the group image features of the target group. For ease of understanding, embodiments of the present application also provide the following as examples:
in one possible implementation, the server may perform group representation feature construction according to group feature information of the object group and object feature information of objects included in the object group, to obtain group representation features of the object group. The group feature information may be related information capable of reflecting the group feature, and may be related information capable of reflecting the feature of the object group from the overall perspective, and the object feature information may be related information capable of reflecting the feature of the object group from the individual perspective. Based on the above, the characteristics of the object group can be described from the whole and individual angles, and the group portrait characteristics obtained by corresponding construction can be used for reflecting the group characteristics of the object group and the object characteristics of the objects included in the object group.
In practical application, the group characteristic information may include group name, group bulletin, number of users in the group, group interest tag and other group characteristic information of the object group, and in specific implementation, the group characteristic information of the object group may be extracted first, and the group characteristic information may be further integrated to obtain group characteristic information. The object feature information may refer to object interaction feature information of objects included in the object group, object attribute feature information, and the like. The object interaction feature information is used for representing interaction between an object and an interaction object, the interaction object may refer to content such as video, articles, and the like, and the interaction may refer to browsing, collecting, praying, posting, commenting, and the like of the content by the object, for example. The object attribute feature information may refer to feature information of an attribute level such as a region. In the implementation, the object interaction characteristic information and the object attribute characteristic information of each user included in the object group can be aggregated and recombined, and the obtained object characteristic information can be often presented in a frequency or distribution mode. In the aggregation and reorganization process, the corresponding presentation form can be set according to the type of the object characteristic information. For example, for the object attribute feature information, the object attribute feature information may be set into a distributed presentation form, and specifically may include a formal representation of user region attribute distribution; the user interaction characteristic information can be set into a frequent presentation form, specifically, the frequency addition and the sorting can be performed by taking the interaction object as a basic dimension, or keywords can be extracted from the interaction object, and then the frequency is counted by utilizing the form of hit frequency of the keywords.
However, in practical applications, because the population characteristic information of the object population and the object characteristic information generally include information in multiple aspects, related problems such as large number of individual users, wide time range, uncertainty of variability and the like often exist, so that more noise exists, the distribution is more diffuse, and the concentration is poor. Correspondingly, in the related art, the group portrait features are directly constructed according to the acquired group feature information and the object feature information, and the acquired group portrait features have the problems of poor concentration, low confidence and the like, and meanwhile, the effect of downstream application and user experience are also affected.
In order to construct higher-quality group portrait features and avoid the problems of poor concentration, low confidence and the like, in one possible implementation manner, the cluster feature information and the object feature information can be firstly subjected to vertical class identification to realize concentration screening, and then the group portrait features are constructed according to the vertical class identification result so as to improve the quality of the group portrait features and solve the problems of poor concentration, low confidence and the like. Specifically, referring to fig. 5, fig. 5 shows a flowchart of a method for constructing a group image feature, which may include S501-S504:
S501: and performing vertical class identification on the group characteristic information of the object group to obtain a first vertical class identification result.
S502: and performing the vertical class identification on the object characteristic information of the objects included in the object group to obtain a second vertical class identification result.
In general, the feature information of the object group generally shows the characteristics of large quantity, dispersion and the like, so in practical application, the feature verticals with higher concentration can be determined from the numerous feature information of the object group, and compared with the feature information with higher dispersion, the features of the object group can be more accurately represented by utilizing the feature verticals. Specifically, the server may first perform the cluster recognition on the cluster feature information of the object cluster to obtain a first cluster recognition result, where the cluster feature information included in the first cluster recognition result may be considered as a cluster feature cluster (such as a cluster in the direction of interest) conforming to the object cluster, and based on this, implement the concentration screening on the cluster feature information, where the obtained first cluster recognition result has a high concentration compared with the cluster feature information before the cluster recognition. Similarly, the server may perform the cluster recognition on the object feature information of the object included in the object group to obtain a second cluster recognition result, where the object feature information included in the second cluster recognition result may be considered as an object feature cluster conforming to the object included in the object group, and based on this, the concentration screening is performed on the object feature information, where the obtained second cluster recognition result has a higher concentration than the object feature information before the cluster recognition. Based on the method, the feature sags (the group feature sags and the object feature sags) corresponding to the object group can be identified from the plurality of feature information (the group feature information and the object feature information) of the object group through the sags identification, so that the features of the object group can be more accurately reflected, and the quality of the group image features can be improved.
S503: and determining a common feature vertice between the first vertice identification result and the second vertice identification result.
S504: and determining the common feature verticals as the group portrait features of the object group.
In order to improve the quality of the group portrait features, after the first vertical class identification result and the second vertical class identification result are obtained, the server can also determine the common feature vertical class between the first vertical class identification result and the second vertical class identification result, and further determine the common feature vertical class as the group portrait features of the target group. The common feature verticals can be considered to be consistent with both the group feature verticals and the object feature verticals, and have higher concentration and confidence in two dimensions of the group features and the object features, so that the common feature verticals can be considered to be closer to the real common feature verticals of the object group, the effective group portrait features of the object group can be reflected more accurately, and the common feature verticals can be determined as group portrait features. Based on the method, the community portrait features are constructed by finding out common feature verticals surrounding the object community from numerous feature information (community feature information and object feature information) of the object community, so that the quality of the community portrait features can be improved, and the effect of downstream application and user experience are correspondingly improved.
In practical applications, the group feature information may include group name of the object group, group announcement, number of users in the group, group interest tag, and other group feature information, and in consideration of that the confidence level of the group feature information may change over time, in order to improve accuracy of the first vertical type recognition result, in one possible implementation manner, the server may determine the weight of the group feature information according to the release time of the group feature information, where the release time of the group feature information is positively related to the weight of the group feature information. Specifically, the earlier the release time, the smaller the weight of the group feature information, the closer the release time, and the larger the weight of the group feature information, for example, the weight of the group feature information of 10 months 05 days of 2022 is larger than the weight of the group feature information of 08 months 05 days of 2022. Further, the server can weight and integrate the group characteristic information according to the weight of the group characteristic information to obtain candidate group characteristic information, and based on the candidate group characteristic information, the influence of the release time on the confidence degree of the group characteristic information can be comprehensively considered, so that the candidate group characteristic information can more accord with the group characteristic of the current time compared with the original group characteristic information. Finally, the server can aggregate the candidate group characteristic information to obtain a first vertical identification result, and based on the first vertical identification result, the accuracy of the first vertical identification result can be improved.
For the sake of understanding, taking the group advertisement as the group feature information for example, the group advertisement may change with the publishing time, and in the implementation, the keyword included in the group advertisement may be weighted according to the distance of the publishing time of the group advertisement, the greater the weight of the publishing time, the further the corresponding candidate group advertisement is obtained through weighted integration, and the first vertical recognition result is obtained by aggregating the candidate group advertisement (in this case, the first vertical recognition result may be used to represent the group advertisement).
Similarly, the object feature information may specifically include object interaction feature information and object attribute feature information of the objects included in the object group, and considering that the object interaction feature information and the object attribute feature information may be personalized features, the personalized features are generally used to represent a few individuals, and if the personalized features are used to represent the whole object group, deviations are easily caused. In practical applications, the distribution frequency of the object feature information can reflect that the object feature information is more likely to be a personalized feature or a commonized feature. Therefore, in order to improve accuracy of the second vertical class recognition result, the server may first eliminate object feature information with a distribution frequency lower than a preset frequency threshold from object feature information of objects included in the object group to obtain candidate object feature information, and may further aggregate the candidate object feature information to obtain the second vertical class recognition result. The preset number of times threshold may be preset, for example, may be set according to the number of object feature information and the like.
In practical application, if the distribution frequency is lower than the preset frequency threshold, the object feature information can be considered to be personalized feature, so that the object feature information can be directly eliminated, and the object feature information with low distribution frequency at the middle and tail parts can be filtered out, so that the candidate object feature information is closer to the real commonality feature than the initial object feature information. Further, the server can determine the second vertical class identification result by using the candidate object feature information, so that the influence of the object feature information with low middle-tail distribution frequency on the second vertical class identification result can be eliminated, and the accuracy of the second vertical class identification result is improved.
It should be noted that, the present application is not limited in any way as to how to aggregate the candidate group feature information to obtain the first vertical recognition result and how to aggregate the candidate object feature information to obtain the second vertical recognition result. For ease of understanding, the following description will be given.
First, how to aggregate the candidate group feature information to obtain a first vertical class identification result is described.
In practical application, the server can aggregate the candidate group feature information to obtain group feature verticals, and the candidate group feature information is subjected to concentration screening through aggregation, so that the obtained group feature verticals have higher concentration compared with the candidate group feature information, and the group features of the object group can be reflected more accurately. Further, the server may determine a first sagged class identification result based on the group feature sagged class. For example, the first cluster recognition result may be determined according to the number of cluster feature clusters, and in a specific implementation, if the number of cluster feature clusters is one, the cluster feature clusters may be directly used as the first cluster recognition result, and if the number of cluster feature clusters is multiple, all or part of the plurality of cluster feature clusters may be used as the first cluster recognition result.
Aiming at the situation that the number of the group feature sags is a plurality of, the embodiment of the application correspondingly provides a mode for determining the identification result of the first sags. Specifically, the server may aggregate the candidate group feature information to obtain a plurality of group feature sags, and then may sort the plurality of group feature sags according to the group sags strength to obtain a group sags sequence, and finally may determine the first k group feature sags in the group sags sequence as the first sags recognition result. The larger the population drop intensity is, the higher the concentration degree of the population feature drop is, and the population features of the object population can be reflected more accurately. Therefore, in practical application, the plurality of group feature verticals can be sequenced according to the sequence from the big to the small of the group verticals, so as to obtain a group verticals sequence, and then the group feature verticals ranked in the previous k are selected as the first verticals identification result, so that the concentration of the first verticals identification result is improved. It can be understood that, for the setting of k, the application is not limited in any way, for example, k can be set by integrating the actual clustering requirement and the number of the cluster feature sags.
And secondly, introducing how to aggregate the candidate object characteristic information to obtain a second vertical class identification result.
In practical application, the server can aggregate the candidate object feature information to obtain object feature sags, and the aggregation is used for intensively screening the candidate object feature information, so that the obtained object feature sags have higher concentration degree compared with the candidate object feature information, and object features of the object group can be reflected more accurately. Further, the server may determine a second sags recognition result based on the object feature sags. For example, the second feature class recognition result may be determined according to the number of object feature classes, and in the specific implementation, if the number of object feature classes is one, the object feature classes may be directly used as the second feature class recognition result, and if the number of object feature classes is multiple, all or part of the object feature classes may be used as the second feature class recognition result.
The method for determining the second vertical class recognition result by selecting a part of object feature vertical classes is not limited in any way in the case that the number of object feature vertical classes is a plurality of. For ease of understanding, embodiments of the present application provide the following two ways as examples:
in one possible implementation manner, a plurality of object feature verticals may be screened based on the representative word of each object feature verticals, and the modes of associating a posterior effect, an interest word white list and the like, and finally, the screened object feature verticals may be determined as the second verticals identification result.
In yet another possible implementation manner, the object feature classes with the object class intensities ranked in the first m may be selected to be determined as the second class identification result. Specifically, the server may aggregate the candidate object feature information to obtain a plurality of object feature classes, and then may sort the plurality of object feature classes according to the object class strength to obtain an object class sequence, and finally may determine the first m object feature classes in the object class sequence as the second class recognition result. The greater the object vertical class strength is, the higher the concentration degree of the object feature vertical class is, and the object features of the object group can be reflected more accurately. Therefore, in practical application, the object feature classes can be sequenced according to the order of the object class intensities from large to small to obtain an object class sequence, and then the object feature classes ranked in the first m are selected as the second class recognition results, so that the concentration of the second class recognition results is improved. It can be understood that, for the setting of m, the present application is not limited in any way, for example, m can be set by integrating the actual clustering requirement and the number of object feature sags.
In practical applications, the object-feature-class strength of the object feature classes may be determined according to the number of object feature keywords or the hit frequency of the object feature keywords associated with each object feature class, and specifically, the greater the number of object feature keywords or the higher the hit frequency of the object feature keywords associated with the object feature classes, the greater the object-feature class strength of the object feature classes.
In general, the candidate feature information may be in the form of candidate feature labels, and accordingly, in the process of aggregating the candidate feature information to obtain the object feature sags, the server may first perform measurable transformation on the candidate feature labels in a basic representation manner, uniformly transform the candidate feature labels into a measurable space, and further may aggregate the candidate feature labels in the measurable space to obtain the object feature sags. The basic representation mode can be label embedding (labeling) or label map relation and the like. In actual polymerization, clustering the candidate object feature labels in a clustering mode such as K-means clustering and the like can be adopted to obtain a plurality of content inner clusters, a plurality of centers are correspondingly obtained, and further the respective corresponding clustering labels are selected according to the centers, wherein the clustering labels can be used as representatives of the candidate object feature labels included in the content inner clusters where the centers are located. Finally, a plurality of corresponding cluster labels in a plurality of content can be used as object feature verticals, thereby providing a way for representing the object feature verticals in the form of a plurality of content clusters, and realizing concentration aggregation in a single content cluster.
In practical application, the aggregation can be completed in a central screening mode, in specific implementation, candidate object feature tags with highest concentration degree in all candidate object feature tags can be selected as central tags, the central tags can be used as reference basic representation in a central screening process, further candidate object feature tags which are closest to the central tags are reserved around the central tags to serve as associated feature tags, and finally the central tags and the associated feature tags can be used as object feature verticals.
In practical application, the candidate group feature information is similar to the candidate object feature information, and may be in the form of a candidate group feature tag, and accordingly, the relevant process of aggregating the candidate group feature information to obtain the group feature cluster may be described in the above embodiment of aggregating the candidate object feature information to obtain the object feature cluster. In addition, considering that the population characteristics of the object population are generally stable, in the aggregation process, the central information of the candidate population characteristic information can be generally selected as the population characteristic verticals, and the central information of a plurality of candidate population characteristic information can be reserved by adopting different strategies of long term and short term to be commonly used as the population characteristic verticals.
According to the embodiment, the embodiment of the application provides a method for constructing the group image features, which specifically comprises the steps of finding out common feature verticals surrounding the object group from a plurality of feature information (group feature information and object feature information) of the object group, and constructing the group image features according to the common feature verticals, so that the quality of the group image features can be improved, and the effect of downstream application and the user experience can be correspondingly improved.
It can be seen that it is important how to find the common feature verticals surrounding the subject population from among the numerous feature information of the subject population. It should be noted that, the present application is not limited in any way with respect to the manner of finding the common feature droop class surrounding the object population from the numerous feature information of the object population. For ease of understanding, embodiments of the present application provide the following two ways as examples:
in one possible implementation, a manner of determining common feature verticals based on a distribution frequency ordering of multiple feature verticals is provided. Specifically, the server may first determine, as the valid content class, a target feature class in the first and second class identification results, where the target feature class can represent both the first and second class identification results, for example, the target feature class may be a feature class that is common to the first and second class identification results and has the highest distribution frequency. That is, first, a target feature cluster common to the first cluster recognition result and the second cluster recognition result and having the highest distribution frequency is selected from the plurality of feature clusters based on the distribution frequency, and the target feature cluster has the highest concentration and can be used as a representative feature cluster of the target group, and in practical application, can be used as an effective content class. Then, the server may compare the correlation degree between the first and second vertical class recognition results and the effective content class, respectively, where the feature vertical class with a larger correlation degree is more correlated with the effective content class. Finally, the server may determine the feature verticals for which the correlation degree reaches the first preset threshold as the common feature verticals. The first preset threshold may be used to determine a degree of correlation between the feature sags and the valid content categories, and may specifically be set according to actual clustering requirements, which is not limited in the present application. In the specific implementation, if the correlation degree reaches the first preset threshold, the feature sag is considered to be related to the valid content category, and can be used as a representative feature sag of the target group, so that the feature sag of which the correlation degree reaches the first preset threshold can be determined as a common feature sag. Based on this, the common feature verticals are selected from the multi-feature verticals as representative feature verticals of the subject population based on the distribution frequency ordering of the multi-feature verticals.
In yet another possible implementation, considering that the population characteristic information of the subject population can reflect the characteristics of the subject population from an overall perspective, generally with a higher confidence (such as information of group bulletins, etc.), a way of determining the common feature verticals based on the population characteristic information matching is also provided. Specifically, the server may first determine, from the first cluster recognition result, a target cluster feature cluster with a confidence level higher than a second preset threshold. The second preset threshold may be used to determine whether the feature sags in the first sags recognition result are feature sags with high confidence, and specifically may be set according to, for example, the type of the object group, and the application is not limited in any way. In a specific implementation, if the confidence is higher than the second preset threshold, the feature cluster may be considered as a feature cluster with high confidence, and may be used as a representative feature cluster of the target group and may be used as an alignment criterion. Further, the server may determine an object feature cluster matching the target group feature cluster in the second cluster recognition result as a common feature cluster. In practical applications, the object feature cluster matching the object group feature cluster may be considered as a representative feature cluster which corresponds to the object group feature cluster and has a certain association with the object group feature cluster. Based on this, a common feature cluster is determined based on the population feature information matching screen as a representative feature cluster of the subject population.
It should be noted that, in practical application, feature sags (such as group feature sags incapable of judging confidence) which cannot be judged may occur, and for such feature sags, direct filtering may be performed, so that the accuracy of the first sags recognition result is prevented from being affected by inaccurate judgment.
It should be further noted that, for the above manner of determining the common feature verticals based on the distribution frequency ordering of the multi-feature verticals and the manner of determining the common feature verticals based on the group feature information matching, the present application is not limited in any way, and the present application may specifically be selected according to different practical application scenarios. For example, when the object group is a sports interest group (e.g. "ski group"), the confidence of the group feature sag class of the object group, which is the group name, is high, then a manner of determining the common feature sag class based on the group feature information matching may be selected at this time, and the object feature sag class corresponding to the group name may be reserved as the common feature sag class. For another example, when the object group is a family, a boring class, or the like, the group name of the object group often does not include valid semantics, and at this time, a manner of determining the common feature class based on the distribution frequency ordering of the multi-feature classes may be selected, and the common feature classes in the object feature class and the group feature class (such as interest preference) are aggregated. In addition, the two modes can be combined.
Based on the above description of the embodiment of the group portrait feature of the building object group, when the group portrait feature of the building object group is combined with the feature information of two different sources, namely group feature information and object feature information, the building mode provided by the application can be considered as a multi-source feature fusion method based on the object group, and particularly can be seen in fig. 6. Fig. 6 shows a schematic frame diagram of a multi-source feature fusion method based on an object group, the multi-source feature fusion method based on the object group can be applied to a multi-source feature fusion system 700, a schematic structure diagram of the multi-source feature fusion system 700 can be shown in fig. 7, and the multi-source feature fusion system 700 can include an information extraction aggregation module 701, a vertical class identification module 702 and an information matching filtering module 703.
Referring to fig. 6, in practical application, for the obtained object group, the information extraction and aggregation module 701 may perform information extraction and aggregation on the object group to obtain group feature information of the object group and object feature information of objects included in the object group, where the object may be a user. Further, the cluster feature information and the object feature information may be subjected to cluster recognition by the cluster recognition module 702, and the process of cluster recognition may include a concentration screening and outputting, specifically, the cluster feature information and the object feature information may be subjected to concentration screening by the cluster recognition module 702 to obtain a corresponding cluster feature cluster and an object feature cluster. After completing the concentration screening, the droop recognition module 702 may output two sets of feature droop results, namely, a droop result of group feature information of the object group (i.e., the first droop recognition result) and a droop result of object feature information of the object group (i.e., the second droop recognition result) according to the group feature droop and the object feature droop. After the cluster feature information and the object feature information are identified by the cluster identification module 702, the cluster feature information and the object feature information are divided into a plurality of representation modes of feature clusters, for example, after the cluster feature information and the object feature information of the object cluster S are identified by the cluster, the cluster feature information cluster result may include four feature clusters of "skateboard/skiing/sports+hiking", and the object feature information cluster result may include six feature clusters of "music/dance+sports/basketball+skiing+internet". Although each feature cluster already accords with a certain specific direction, when the feature clusters corresponding to the object clusters are numerous, in order to accurately determine the sources of the relevance of the object clusters together, the cluster feature information cluster result and the object feature information cluster result of the object clusters can be further screened by the information matching filtering module 703, so as to screen out the common feature clusters (for example, one or a plurality of main feature clusters or interested entities) surrounding the object clusters. After the information matching filtering is completed, the common feature verticals can be determined as the group feature images of the object groups, and in practical application, the output results for indicating the group image features of the object groups can also be output through the information matching filtering module 703.
In practical applications, the verticality identification module 702 may include an object information centralized verticality identification module and a group information centralized verticality identification module. In specific implementation, the object information centralized class identification module may be used to perform centralized screening on object feature information of the object group, and output an object feature information class result of the object group. The group information centralized droop type recognition module can be used for carrying out centralized screening on group characteristic information of the object group and outputting a group characteristic information droop type result of the object group.
It should be noted that, when the group portrait feature of the object group is constructed, other information may be introduced in addition to the foregoing group feature information and related information of the object group such as the object feature information as the basis for constructing the group portrait feature, where the behavior of the object included in the object group in other domains (for example, related behavior of the object in a music player, online video software, and game application). Based on the method, the construction basis of the group portrait features is enriched, and the richness of the group portrait features is improved.
According to the technical scheme, the multi-source feature fusion method based on the object group can find out common feature sags surrounding the object group from numerous feature information (group feature information and object feature information) of the object group, wherein the common feature sags can be available features with high concentration and high confidence, and can be further determined to be group portrait features of the object group, and high-quality group portrait features can be constructed based on the common feature sags, so that the problems of disordered interest, poor concentration, insufficient confidence and the like when the group portrait features are constructed by directly utilizing numerous feature information of the object group in the related art and the problems of poor application effects such as interest or ring layer deviation and the like when the group portrait features are applied downstream are solved. By adopting the multi-source feature fusion method based on the object group, the constructed group image features can be application-level, and can be correspondingly applied to scenes such as recommendation, search, advertisement and the like, so that the effect of downstream application and user experience can be optimized.
In practical application, the multi-source feature fusion method based on the object group is fused into the group portrait feature output calculation flow, and online is carried out in a plurality of recommended scenes such as video numbers, so that compared with the situation that before online, the accuracy of capturing the outside interests, the consumption behavior gain in the video numbers and the like are obviously improved, the outside recommendation efficiency can be effectively improved, and meanwhile, the payoff generation effect of the user published content is improved.
It should be noted that, outside the domain may refer to different user scenarios, for example, for a user watching live broadcast through a video number, continuing to recommend related video content in the video number may be regarded as an in-domain recommendation, and for a user watching live broadcast through the video number, recommending related video content through other downstream applications may be regarded as an out-domain recommendation.
It should be noted that, based on the implementation manner provided in the above aspects, further combinations may be further performed to provide further implementation manners.
Based on the data processing method provided in the corresponding embodiment of fig. 2, the embodiment of the present application further provides a data processing apparatus 800, where the data processing apparatus 800 includes an obtaining unit 801, a mapping unit 802, a clustering unit 803, and an integrating unit 804:
The acquiring unit 801 is configured to acquire a plurality of object groups;
the obtaining unit 801 is further configured to obtain, for each of the plurality of object groups, a group image feature of the object group;
the mapping unit 802 is configured to perform vector mapping on the group portrait features of each of the object groups, so as to obtain a feature vector representation of each of the group portrait features;
the clustering unit 803 is configured to cluster the plurality of object groups according to the feature vector representation of the group portrait feature to obtain a cluster;
the integrating unit 804 is configured to integrate group portrait features of the object group included in the cluster to obtain a cluster portrait feature of the cluster.
In a possible implementation, the group portrayal feature includes a group relation feature and a group interest feature, and the mapping unit is further configured to:
vector mapping is carried out on the group relation features aiming at the group relation features and the group interest features of each object group to obtain a first feature vector representation of the group relation features, and vector mapping is carried out on the group interest features to obtain a second feature vector representation of the group interest features;
And splicing the first feature vector representation and the second feature vector representation to obtain the feature vector representation of the group portrait features.
In a possible implementation, the mapping unit is further configured to:
acquiring a first weight represented by the first feature vector and a second weight represented by the second feature vector;
and carrying out weighted stitching on the first feature vector representation and the second feature vector representation by using the first weight and the second weight to obtain the feature vector representation of the group portrait feature.
In a possible implementation, the mapping unit is further configured to:
taking each object group as a first object group, and carrying out similarity measurement according to the group relation characteristics of the first object group and the group relation characteristics of a second object group, so as to obtain a similarity measurement value between the first object group and the second object group, wherein the second object group is each object group except the first object group in the plurality of object groups;
a similarity measure between the first population of objects and the second population of objects is converted to a first feature vector representation of the first population of objects.
In a possible implementation, the mapping unit is further configured to:
taking each object group as a node, taking the relation among the object groups as an edge, and taking the similarity measurement value among the object groups as the weight of the edge to construct an object group relation diagram;
and carrying out random walk on the object group relation diagram by taking a node corresponding to the first object group as a starting point and taking the weight of the edge as the walk probability to obtain a first feature vector representation of the first object group.
In a possible implementation, the clustering unit is further configured to:
selecting K object groups from the plurality of object groups as initial clustering centers;
calculating the distance between each object group and each cluster center according to the feature vector of the group portrait feature;
and for each object group, distributing the object group to the cluster represented by the cluster center closest to the object group, distributing one object group to the cluster, recalculating the cluster center according to the existing object groups in the cluster, and recalculating the distance between each object group and each cluster center according to the feature vector of the group portrait feature until the cluster termination condition is met, thereby obtaining K clusters.
In one possible implementation manner, the K is determined according to at least one of an object coverage amount in the cluster, an amount of object interaction data in the cluster, and an object relationship concentration degree in the cluster.
In one possible implementation, the group representation features include group relationship features or group interest features.
In a possible implementation manner, the obtaining unit is further configured to:
and constructing group portrait features according to the group feature information of the object group and the object feature information of the objects included in the object group to obtain the group portrait features of the object group.
In a possible implementation manner, the obtaining unit is further configured to:
performing vertical class identification on the group characteristic information of the object group to obtain a first vertical class identification result;
performing vertical class identification on object characteristic information of objects included in the object group to obtain a second vertical class identification result;
determining a common feature vertice between the first vertice identification result and the second vertice identification result;
determining the common feature vertice as a group portrait feature of the subject group.
In a possible implementation manner, the obtaining unit is further configured to:
Determining the weight of the group characteristic information according to the release time of the group characteristic information, wherein the release time of the group characteristic information is positively correlated with the weight of the group characteristic information;
weighting and integrating the group characteristic information according to the weight of the group characteristic information to obtain candidate group characteristic information;
aggregating the candidate group characteristic information to obtain the first vertical identification result;
eliminating object feature information with distribution frequency lower than a preset frequency threshold value from object feature information of objects included in the object group to obtain candidate object feature information;
and aggregating the candidate object characteristic information to obtain the second vertical class identification result.
In a possible implementation manner, the obtaining unit is further configured to:
aggregating the candidate group characteristic information to obtain a plurality of group characteristic verticals;
sorting the plurality of group feature sags according to group sags strength to obtain a group sags sequence;
determining the first k group characteristic verticals in the group verticals sequence as the first verticals identification result;
aggregating the candidate object feature information to obtain a plurality of object feature verticals;
Sorting the object feature classes according to the object class strength to obtain an object class sequence;
and determining the first m object feature sags in the object sags sequence as the second sags recognition result.
In a possible implementation manner, the obtaining unit is further configured to:
determining a target feature class in the first and second class identification results as an effective content class, wherein the target feature class is a feature class which is shared by the first and second class identification results and has highest distribution frequency;
the first vertical class identification result and the second vertical class identification result are respectively compared with the effective content class in correlation degree;
and determining the feature verticals with the correlation degree reaching a first preset threshold as the common feature verticals.
In a possible implementation manner, the obtaining unit is further configured to:
determining a target group feature cluster with confidence higher than a second preset threshold value from the first cluster recognition result;
and determining the object feature verticals matched with the target group feature verticals in the second verticals identification result as the common feature verticals.
In a possible implementation manner, the apparatus further includes a determining unit and a returning unit:
the acquisition unit is further used for acquiring an application service request aiming at the object to be served, wherein the application service request comprises an object identifier;
the determining unit is used for determining a cluster to which the object to be served belongs according to the object identifier;
the return unit is used for returning a request result based on the cluster portrait characteristics of the cluster to which the object to be served belongs.
According to the technical scheme, for each object group in the plurality of acquired object groups, the group portrait features of the object group are acquired, and vector mapping is carried out on the group portrait features of each object group, so that the feature vector representation of each group portrait feature is obtained. The feature vector representation of the group portrait features of the object group can reflect the group features of the object group and the object features of the objects included in the object group, so that the object groups are clustered according to the feature vector representation of the group portrait features, and the object groups with the group features and the object features having similarity can be clustered together to obtain a group, namely, the object groups included in the group have similarity in the aspects of group features and the object features. Further, the group portrait features of the object groups included in the group can be integrated to obtain the group portrait features of the group. Compared with the group portrait features of a single object group, the group portrait features can cover more interactive data and wider interest content, so that interest popularization is improved. Furthermore, when the service is provided for the user, the cluster portrait characteristic of the cluster to which the user belongs can be directly utilized to determine the content possibly interested by the user, and compared with the content possibly interested determined based on the cluster portrait characteristic of the single object group to which the user belongs, the content possibly interested is richer, so that the service with higher quality is provided for the user, and the user experience is improved.
The embodiment of the application also provides a computer device, which can be a terminal device, taking the terminal device as a smart phone as an example:
fig. 9 is a block diagram showing a part of a structure of a smart phone according to an embodiment of the present application. Referring to fig. 9, the smart phone includes: radio Frequency (r.f. Frequency) circuit 910, memory 920, input unit 930, display unit 940, sensor 950, audio circuit 960, wireless fidelity (r.f. WiFi) module 970, processor 980, and power source 990. The input unit 930 may include a touch panel 931 and other input devices 932, the display unit 940 may include a display panel 941, and the audio circuit 960 may include a speaker 961 and a microphone 962. Those skilled in the art will appreciate that the smartphone structure shown in fig. 9 is not limiting of the smartphone and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
The memory 920 may be used to store software programs and modules, and the processor 980 performs various functional applications and data processing by operating the software programs and modules stored in the memory 920. The memory 920 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebooks, etc.) created according to the use of the smart phone, etc. In addition, memory 920 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
Processor 980 is a control center for the smartphone, connecting various portions of the entire smartphone using various interfaces and lines, performing various functions and processing data for the smartphone by running or executing software programs and/or modules stored in memory 920, and invoking data stored in memory 920. Optionally, processor 980 may include one or more processing units; preferably, the processor 980 may integrate an application processor with a modem processor, wherein the application processor primarily handles operating systems, user interfaces, applications programs, etc., and the modem processor primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 980.
In this embodiment, the steps performed by the processor 980 in the smartphone may be implemented based on the architecture shown in fig. 9.
The computer device provided in the embodiment of the present application may also be a server, as shown in fig. 10, fig. 10 is a block diagram of a server 1000 provided in the embodiment of the present application, where the server 1000 may have a relatively large difference due to different configurations or performances, and may include one or more processors, such as a central processing unit (Central Processing Units, abbreviated as CPU) 1022, and a memory 1032, and one or more storage media 1030 (such as one or more mass storage devices) storing application programs 1042 or data 1044. Wherein memory 1032 and storage medium 1030 may be transitory or persistent. The program stored on the storage medium 1030 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Further, central processor 1022 may be configured to communicate with storage medium 1030 to perform a series of instruction operations in storage medium 1030 on server 1000.
The Server 1000 may also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input/output interfaces 1058, and/or one or more operating systems 1041, such as Windows Server TM ,Mac OS X TM ,Unix TM ,Linux TM ,FreeBSD TM Etc.
In this embodiment, the central processor 1022 in the server 1000 may perform the following steps:
acquiring a plurality of object groups;
for each object group in the plurality of object groups, acquiring group portrait features of the object group;
respectively carrying out vector mapping on the group portrait features of each object group to obtain a feature vector representation of each group portrait feature;
clustering the plurality of object groups according to the feature vector representation of the group portrait features to obtain clusters;
and integrating the group portrait features of the object groups included in the group to obtain the group portrait features of the group.
According to an aspect of the present application, there is provided a computer-readable storage medium for storing a computer program for implementing the data processing method according to the foregoing embodiments.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the methods provided in the various alternative implementations of the above embodiments.
The descriptions of the processes or structures corresponding to the drawings have emphasis, and the descriptions of other processes or structures may be referred to for the parts of a certain process or structure that are not described in detail.
The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be essentially or a part contributing to the related art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (19)

1. A method of data processing, the method comprising:
acquiring a plurality of object groups;
for each object group in the plurality of object groups, acquiring group portrait features of the object group;
respectively carrying out vector mapping on the group portrait features of each object group to obtain a feature vector representation of each group portrait feature;
clustering the plurality of object groups according to the feature vector representation of the group portrait features to obtain clusters;
and integrating the group portrait features of the object groups included in the group to obtain the group portrait features of the group.
2. The method of claim 1, wherein the group representation features include group relationship features and group interest features, the vector mapping the group representation features of each of the object groups, respectively, to obtain a feature vector representation of each of the group representation features, comprising:
vector mapping is carried out on the group relation features aiming at the group relation features and the group interest features of each object group to obtain a first feature vector representation of the group relation features, and vector mapping is carried out on the group interest features to obtain a second feature vector representation of the group interest features;
And splicing the first feature vector representation and the second feature vector representation to obtain the feature vector representation of the group portrait features.
3. The method of claim 2, wherein the stitching the first feature vector representation and the second feature vector representation to obtain a feature vector representation of the group representation feature comprises:
acquiring a first weight represented by the first feature vector and a second weight represented by the second feature vector;
and carrying out weighted stitching on the first feature vector representation and the second feature vector representation by using the first weight and the second weight to obtain the feature vector representation of the group portrait feature.
4. The method of claim 2, wherein vector mapping the population relationship features to obtain a first feature vector representation of the population relationship features comprises:
taking each object group as a first object group, and carrying out similarity measurement according to the group relation characteristics of the first object group and the group relation characteristics of a second object group, so as to obtain a similarity measurement value between the first object group and the second object group, wherein the second object group is each object group except the first object group in the plurality of object groups;
A similarity measure between the first population of objects and the second population of objects is converted to a first feature vector representation of the first population of objects.
5. The method of claim 4, wherein the converting the similarity measure between the first population of objects and the second population of objects to the first eigenvector representation of the first population of objects comprises:
taking each object group as a node, taking the relation among the object groups as an edge, and taking the similarity measurement value among the object groups as the weight of the edge to construct an object group relation diagram;
and carrying out random walk on the object group relation diagram by taking a node corresponding to the first object group as a starting point and taking the weight of the edge as the walk probability to obtain a first feature vector representation of the first object group.
6. The method of any of claims 1-5, wherein clustering the plurality of object populations according to the feature vector representation of the population representation features to obtain clusters comprises:
selecting K object groups from the plurality of object groups as initial clustering centers;
calculating the distance between each object group and each cluster center according to the feature vector of the group portrait feature;
And for each object group, distributing the object group to the cluster represented by the cluster center closest to the object group, distributing one object group to the cluster, recalculating the cluster center according to the existing object groups in the cluster, and recalculating the distance between each object group and each cluster center according to the feature vector of the group portrait feature until the cluster termination condition is met, thereby obtaining K clusters.
7. The method of claim 6, wherein K is determined based on at least one of an amount of object coverage within a cluster, an amount of object interaction data within a cluster, and a concentration of object relationships within a cluster.
8. The method of claim 1, wherein the group representation features comprise group relationship features or group interest features.
9. The method of any one of claims 1-5, wherein the obtaining the population representation features of the subject population comprises:
and constructing group portrait features according to the group feature information of the object group and the object feature information of the objects included in the object group to obtain the group portrait features of the object group.
10. The method according to claim 9, wherein the constructing the group representation feature according to the group feature information of the object group and the object feature information of the objects included in the object group to obtain the group representation feature of the object group includes:
performing vertical class identification on the group characteristic information of the object group to obtain a first vertical class identification result;
performing vertical class identification on object characteristic information of objects included in the object group to obtain a second vertical class identification result;
determining a common feature vertice between the first vertice identification result and the second vertice identification result;
determining the common feature vertice as a group portrait feature of the subject group.
11. The method of claim 10, wherein performing the cluster recognition on the population characteristic information of the subject population to obtain a first cluster recognition result comprises:
determining the weight of the group characteristic information according to the release time of the group characteristic information, wherein the release time of the group characteristic information is positively correlated with the weight of the group characteristic information;
weighting and integrating the group characteristic information according to the weight of the group characteristic information to obtain candidate group characteristic information;
Aggregating the candidate group characteristic information to obtain the first vertical identification result;
the performing the vertical class recognition on the object feature information of the objects included in the object group to obtain a second vertical class recognition result includes:
eliminating object feature information with distribution frequency lower than a preset frequency threshold value from object feature information of objects included in the object group to obtain candidate object feature information;
and aggregating the candidate object characteristic information to obtain the second vertical class identification result.
12. The method of claim 11, wherein aggregating the candidate population feature information to obtain the first vertical identification result comprises:
aggregating the candidate group characteristic information to obtain a plurality of group characteristic verticals;
sorting the plurality of group feature sags according to group sags strength to obtain a group sags sequence;
determining the first k group characteristic verticals in the group verticals sequence as the first verticals identification result;
the step of aggregating the candidate object feature information to obtain the second vertical class identification result includes:
aggregating the candidate object feature information to obtain a plurality of object feature verticals;
Sorting the object feature classes according to the object class strength to obtain an object class sequence;
and determining the first m object feature sags in the object sags sequence as the second sags recognition result.
13. The method of claim 10, wherein the determining a common feature class between the first class identification result and the second class identification result comprises:
determining a target feature class in the first and second class identification results as an effective content class, wherein the target feature class is a feature class which is shared by the first and second class identification results and has highest distribution frequency;
the first vertical class identification result and the second vertical class identification result are respectively compared with the effective content class in correlation degree;
and determining the feature verticals with the correlation degree reaching a first preset threshold as the common feature verticals.
14. The method of claim 10, wherein the determining a common feature class between the first class identification result and the second class identification result comprises:
determining a target group feature cluster with confidence higher than a second preset threshold value from the first cluster recognition result;
And determining the object feature verticals matched with the target group feature verticals in the second verticals identification result as the common feature verticals.
15. The method according to any one of claims 1-5, further comprising:
acquiring an application service request aiming at an object to be served, wherein the application service request comprises an object identifier;
determining a cluster to which the object to be served belongs according to the object identifier;
and returning a request result based on the cluster portrait characteristics of the cluster to which the object to be served belongs.
16. A data processing apparatus, characterized in that the apparatus comprises an acquisition unit, a mapping unit, a clustering unit and an integration unit:
the acquisition unit is used for acquiring a plurality of object groups;
the acquisition unit is further used for acquiring group image features of each of the plurality of object groups;
the mapping unit is used for respectively carrying out vector mapping on the group portrait features of each object group to obtain a feature vector representation of each group portrait feature;
the clustering unit is used for clustering the plurality of object groups according to the characteristic vector representation of the group portrait characteristic to obtain a cluster;
The integration unit is used for integrating the group portrait features of the object groups included in the group to obtain the group portrait features of the group.
17. A computer device, the computer device comprising a processor and a memory:
the memory is used for storing a computer program and transmitting the computer program to the processor;
the processor is configured to perform the method of any of claims 1-15 according to instructions in the computer program.
18. A computer readable storage medium, characterized in that the computer readable storage medium is for storing a computer program for implementing the method of any one of claims 1-15.
19. A computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the method of any of claims 1-15.
CN202211457492.8A 2022-11-21 2022-11-21 Data processing method and related device Pending CN116955772A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211457492.8A CN116955772A (en) 2022-11-21 2022-11-21 Data processing method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211457492.8A CN116955772A (en) 2022-11-21 2022-11-21 Data processing method and related device

Publications (1)

Publication Number Publication Date
CN116955772A true CN116955772A (en) 2023-10-27

Family

ID=88444984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211457492.8A Pending CN116955772A (en) 2022-11-21 2022-11-21 Data processing method and related device

Country Status (1)

Country Link
CN (1) CN116955772A (en)

Similar Documents

Publication Publication Date Title
CN111008332B (en) Content item recommendation method, device, server and storage medium
Cufoglu User profiling-a short review
US20180144367A1 (en) Method and system for creating user based summaries for content distribution
US20140282493A1 (en) System for replicating apps from an existing device to a new device
US20160188661A1 (en) Multilingual business intelligence for actions
CN108028962A (en) Video service condition information is handled to launch advertisement
KR102626275B1 (en) Dynamic application content analysis
CN113254711B (en) Interactive image display method and device, computer equipment and storage medium
CN111611436A (en) Label data processing method and device and computer readable storage medium
CN106776701B (en) Problem determination method and device for item recommendation
CN112765387A (en) Image retrieval method, image retrieval device and electronic equipment
CN115470344A (en) Video barrage and comment theme fusion method based on text clustering
CN111954017B (en) Live broadcast room searching method and device, server and storage medium
CN116051192A (en) Method and device for processing data
CN116823410B (en) Data processing method, object processing method, recommending method and computing device
CN111970525B (en) Live broadcast room searching method and device, server and storage medium
CN108205551B (en) Song recommendation method and song recommendation system
CN109241202B (en) Stranger social user matching method and system based on clustering
CN116955772A (en) Data processing method and related device
CN117272056A (en) Object feature construction method, device and computer readable storage medium
CN114969493A (en) Content recommendation method and related device
CN113420209A (en) Recommendation method, device and equipment based on weather search and storage medium
Khanwalkar et al. A method of designing museum ubiquitous visitor model
US10776438B2 (en) Information providing system, information providing server, information providing method, and program for information providing system
CN116468486B (en) Popularization optimizing management system based on Internet platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication