CN116109332A - Method and device for constructing user portrait information and electronic equipment - Google Patents

Method and device for constructing user portrait information and electronic equipment Download PDF

Info

Publication number
CN116109332A
CN116109332A CN202310070737.XA CN202310070737A CN116109332A CN 116109332 A CN116109332 A CN 116109332A CN 202310070737 A CN202310070737 A CN 202310070737A CN 116109332 A CN116109332 A CN 116109332A
Authority
CN
China
Prior art keywords
user
feature
features
clustering
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310070737.XA
Other languages
Chinese (zh)
Inventor
刘正夫
周振华
陈浩
李楚桐
张孝丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN202310070737.XA priority Critical patent/CN116109332A/en
Publication of CN116109332A publication Critical patent/CN116109332A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, a device and electronic equipment for constructing user portrait information, wherein the method comprises the following steps: acquiring a user data set corresponding to a specific user set, wherein each piece of data in the user data set uniquely corresponds to one user in the specific user set, and each piece of data comprises a plurality of characteristic values respectively corresponding to a plurality of characteristics; selecting part of the features from the plurality of features as clustering features; clustering is carried out based on the selected clustering features, and the specific user set is divided into a plurality of user groups; selecting a portrait discovery feature from a plurality of features; for each portrait discovery feature, calculating the score of each user group on the portrait discovery feature, and determining portrait information corresponding to each user group according to the score; and outputting the portrait information corresponding to each user group.

Description

Method and device for constructing user portrait information and electronic equipment
The application is a divisional application of patent application with the application date of 2019, 7, 16, the application number of 201910641681.2 and the title of 'a method, a device and an electronic device for constructing user portrait information'.
Technical Field
The present invention relates to the field of internet technology, and more particularly, to a method of constructing user portrait information, an apparatus for constructing user portrait information, an electronic device, and a readable storage medium.
Background
In the internet age, a large amount of user data is generated every day. These user data contain characteristics of different users. And analyzing a large amount of user data by using a proper method to construct proper user portrait information, so that a personalized product recommendation strategy is formulated for each user according to the user portrait information.
A common method of constructing user portrayal information at present is to construct the user portrayal information by expert rules. The method has the advantage that rules can be adjusted at any time through human subjective ideas. However, this approach also has several drawbacks, including: expert rules with reasonable design require abundant industry experience; moreover, due to the limitation of human beings, large data with multiple dimensions cannot be analyzed at the same time, and a lot of useful information can be missed, so that the accuracy of the constructed user portrait information is low.
Disclosure of Invention
Other features of the present invention and its advantages will become apparent from the following detailed description of exemplary embodiments of the invention, which proceeds with reference to the accompanying drawings.
According to a first aspect of the present invention, there is provided a method of constructing user portrait information, comprising:
acquiring a user data set corresponding to a specific user set, wherein each piece of data in the user data set uniquely corresponds to one user in the specific user set, and each piece of data comprises a plurality of characteristic values respectively corresponding to a plurality of characteristics;
selecting part of the features from the plurality of features as clustering features;
clustering is carried out based on the selected clustering features, and the specific user set is divided into a plurality of user groups;
selecting an image discovery feature from the plurality of features;
calculating the score of each user group on the image finding feature according to each image finding feature, and determining the image information corresponding to each user group according to the score;
and outputting the portrait information corresponding to each user group.
Optionally, the clustering based on the selected clustering features includes:
constructing a sample consisting of characteristic values corresponding to the clustering characteristics for each user in the specific user set to obtain a user clustering sample set;
And clustering the users in the specific user set according to the user clustering sample set by adopting a clustering algorithm to obtain a plurality of user groups.
Optionally, the acquiring the user data set corresponding to the specific user set includes:
acquiring a plurality of data tables for the particular set of users;
splicing the plurality of data tables into a wide table by taking the user identification as a main key; wherein each row of the broad table corresponds to a user, and each column except for the user identification corresponds to a feature; alternatively, each column of the broad table corresponds to a user, and each row other than the user identification corresponds to a feature.
Optionally, the method for selecting a part of the features from the plurality of features as the clustering features is any one or more of the following:
selecting a feature with the null rate smaller than or equal to a set value from the plurality of features as the clustering feature;
selecting part of features meeting service requirements from the plurality of features as the clustering features, wherein the service is a service for deciding by applying the portrait information;
selecting at least one group of repeated characteristics from the plurality of characteristics, respectively combining the repeated characteristics of each group of types to obtain new characteristics, and taking the new characteristics as the clustering characteristics;
And selecting at least one group of repeated characteristics from the plurality of characteristics, selecting one representative characteristic from each group of repeated characteristics, and taking the representative characteristic as the clustering characteristic.
Optionally, the user data set is a broad table with a user identifier as a primary key;
when a feature with a null rate smaller than a set value is selected from the plurality of features, the method further comprises the following steps of:
obtaining the total number of users as a first number according to the wide table;
according to the wide table, the number of users with the empty characteristic value of each characteristic is obtained respectively and is used as a second number corresponding to each characteristic;
and respectively determining the ratio of the second quantity corresponding to each feature to the first quantity as the null value rate of each feature.
Optionally, the step of clustering the specific user set into a plurality of user groups based on the selected clustering feature includes:
determining the Euclidean distance between every two users according to the characteristic value of the clustering characteristic;
and clustering the users in the specific user set according to the Euclidean distance between every two users to obtain a plurality of user groups.
Optionally, the number of the user groups is a set number;
the step of clustering the users in the specific user set according to the Euclidean distance between every two users to obtain a plurality of user groups comprises the following steps:
selecting a set number of users with the largest Euclidean distance as the initial center of each user group;
dividing other users into user groups with the nearest Euclidean distance between the other users and the corresponding initial centers; wherein the other users are users other than the set number of users in the specific user set.
Optionally, the step of selecting the portrait discovery feature from the plurality of features includes:
and using all the plurality of features as the portrait discovery feature.
Optionally, the step of calculating, for each image discovery feature, a score of each user group on the image discovery feature includes:
for each portrait discovery feature, determining an average feature value of users in each user group on the portrait discovery feature as a first average value of each user group on the portrait discovery feature;
for each image finding feature, calculating an average value of the first average values of the image finding features of the user groups as a second average value of the image finding features;
For each image finding feature, calculating a standard deviation of each first average value of each user group on the image finding feature as the standard deviation of the image finding feature;
for each user group, a score of the user group on each image discovery feature is obtained according to a first average value of the user group on each image discovery feature and a second average value and standard deviation of the corresponding image discovery feature.
Optionally, for each user group, the step of obtaining the score of the user group on each image discovery feature according to the first average value of the user group on each image discovery feature and the second average value and standard deviation of the corresponding image discovery feature includes:
for each user group, respectively determining a difference value between a first average value of the user group on each image discovery feature and a second average value of the corresponding image discovery feature as a difference value of the user group on each image discovery feature;
for each user group, determining the ratio of the difference value of the user group on each image discovery feature and the standard deviation of the corresponding image discovery feature as the score of the user group on the corresponding image discovery feature.
Optionally, the step of determining portrait information corresponding to each user group according to the score includes:
for each user group, determining the image discovery feature with the corresponding score exceeding the set range as the target image feature of the user group, and obtaining the image information corresponding to the user group according to the target image feature of the user group and the corresponding score.
Optionally, the method further comprises:
acquiring a clustering center of each user group;
acquiring new data of a new user, wherein the new data comprises new feature values corresponding to each cluster feature;
and determining the user group corresponding to the new user according to the new feature value corresponding to each cluster feature of the new user and the cluster center of each user group.
Optionally, the step of obtaining a cluster center of each user group includes:
and respectively determining the average value of the characteristic values of each cluster characteristic of the user corresponding to each user group as a cluster center of each user group.
Optionally, the step of determining the user group corresponding to the new user according to the new feature value corresponding to each cluster feature of the new user and the cluster center of each user group includes:
Determining a new Euclidean distance between the new user and each clustering center according to a new feature value corresponding to the clustering feature of the new user;
and dividing the new users into user groups corresponding to the cluster centers with the minimum new Euclidean distance.
Optionally, the method further comprises:
obtaining a product to be recommended;
selecting products to be recommended matched with each user group according to the portrait information corresponding to each user group;
recommending the products to be recommended, which are matched with the corresponding user groups, to the users in each user group.
Optionally, the step of selecting the product to be recommended matched with each user group according to the portrait information corresponding to each user group includes:
obtaining a product label of a product to be recommended;
determining the matching degree of each user group and each product to be recommended according to the portrait information corresponding to each user group and the product label of the product to be recommended;
and selecting the products to be recommended matched with each user group according to the matching degree.
Optionally, the method further comprises:
acquiring an application scene to which the portrait information is applicable;
and displaying the portrait information corresponding to each user group based on the application scene.
According to a second aspect of the present invention, there is provided an apparatus for constructing user portrait information, comprising:
the data acquisition module is used for acquiring a user data set corresponding to a specific user set, wherein each piece of data in the user data set uniquely corresponds to one user in the specific user set, and each piece of data comprises a plurality of characteristic values corresponding to a plurality of characteristics respectively;
the clustering feature selection module is used for selecting part of features from the plurality of features to serve as clustering features;
the clustering module is used for carrying out clustering processing based on the selected clustering characteristics and dividing the specific user set into a plurality of user groups;
the found feature selection module is used for selecting portrait found features from the plurality of features;
the image information determining module is used for respectively aiming at each image finding feature, calculating the score of each user group on the image finding feature and determining the image information corresponding to each user group according to the score;
and the portrait information output module is used for outputting portrait information corresponding to each user group.
According to a third aspect of the present invention, there is provided an electronic device comprising:
the apparatus according to the second aspect of the invention; or alternatively, the process may be performed,
A processor and a memory for storing instructions for controlling the processor to perform the method according to the first aspect of the invention.
According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method according to the first aspect of the present invention.
According to the embodiment of the invention, partial characteristics are selected from the characteristics contained in each piece of data to serve as clustering characteristics, and the users contained in the specific user set are clustered based on the selected clustering characteristics, so that the specific user set is divided into a plurality of user groups; and selecting image finding features from the plurality of features, calculating a score of each user group on the image finding features for each image finding feature, and determining and outputting image information corresponding to each user group according to the score.
The embodiment of the invention selects a small quantity of clustering features to cluster, so that unimportant features can be prevented from excessively influencing the clustering result. The portrait information of each user group is characterized by a large number of characteristics, so that the analysis dimension of the user portrait can be widened.
Other features of the present invention and its advantages will become apparent from the following detailed description of exemplary embodiments of the invention, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a block diagram showing a hardware configuration of an electronic device that may be used to implement an embodiment of the invention;
FIG. 2 is a flow chart showing a method of constructing user portrait information according to a first embodiment of the present invention;
FIG. 3 is a flow chart showing a method of constructing user portrait information according to a second embodiment of the present invention;
FIG. 4 is a flow chart showing a method of constructing user portrait information according to a third embodiment of the present invention;
FIG. 5 is a flowchart showing an example of a method of constructing image information according to an embodiment of the present invention;
FIG. 6 is a flowchart showing an example of a method of constructing image information according to an embodiment of the present invention;
FIG. 7 is a flowchart showing an example of a method of constructing image information according to an embodiment of the present invention;
FIG. 8 shows a block diagram of an apparatus for constructing user portrait information according to an embodiment of the present invention;
Fig. 9 shows a block diagram of an electronic device of an embodiment of the invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Various embodiments and examples according to embodiments of the present invention are described below with reference to the accompanying drawings.
< hardware configuration >
Fig. 1 is a block diagram showing a hardware configuration of an electronic device that can be used to implement an embodiment of the present invention.
The electronic device 1000 may be a laptop, desktop, cell phone, tablet, etc. As shown in fig. 1, the electronic device 1000 may include a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600, a speaker 1700, a microphone 1800, and the like. The processor 1100 may be a central processing unit CPU, a microprocessor MCU, or the like. The memory 1200 includes, for example, ROM (read only memory), RAM (random access memory), nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1400 can be capable of wired or wireless communication, and specifically can include Wifi communication, bluetooth communication, 2G/3G/4G/5G communication, and the like. The display device 1500 is, for example, a liquid crystal display, a touch display, or the like. The input device 1600 may include, for example, a touch screen, keyboard, somatosensory input, and the like. A user may input/output voice information through the speaker 1700 and microphone 1800.
The electronic device shown in fig. 1 is merely illustrative and is in no way meant to limit the invention, its application or uses. In an embodiment of the present invention, the memory 1200 of the electronic device 1000 is configured to store instructions for controlling the processor 1100 to operate to perform any one of the methods for constructing user portrait information provided in the embodiment of the present invention. It will be appreciated by those skilled in the art that although a plurality of devices are shown for the electronic apparatus 1000 in fig. 1, the present invention may relate to only some of the devices thereof, for example, the electronic apparatus 1000 relates to only the processor 1100 and the storage device 1200. The skilled person can design instructions according to the disclosed solution. How the instructions control the processor to operate is well known in the art and will not be described in detail here.
< method example >
In this embodiment, a method of constructing user portrait information is provided. The method of constructing user portrayal information may be implemented by an electronic device. The electronic device may be an electronic device 1000 as shown in fig. 1.
As shown in fig. 2, the method for constructing user portrait information according to this embodiment may include steps S2100 to S2600 as follows:
In step S2100, a user data set corresponding to a specific user set is acquired.
Each piece of data in the user data set uniquely corresponds to one user in the specific user set, and each piece of data comprises a plurality of characteristic values respectively corresponding to the characteristics.
A particular set of users may include a plurality of users, each of which may have a unique corresponding user identification. The user identification may be a character string made up of letters, symbols, and/or numbers.
In this embodiment, each piece of data in the user data set is obtained by integrating all source data of the corresponding user. The plurality of features in this embodiment may be extracted from source data of all users in a particular user set. If the characteristic value of one of the characteristics cannot be acquired from the source data corresponding to any user, the characteristic value of the characteristic in one piece of data corresponding to the user can be set to be zero or null.
Each piece of data comprises characteristic values corresponding to a plurality of characteristics. For example, the information in the source data corresponding to the user 1 may include an age of 33 years and a deposit balance of 10000 yuan, the information in the source data corresponding to the user 2 may include an age of 25 years, a deposit balance of 1000 yuan, and a total consumption per month of 200 yuan, and then it may be determined that in one piece of data corresponding to the user 1 in the user data set, a feature value corresponding to the feature of "age" is 33, a feature value corresponding to the feature of "deposit balance" is 10000, and a feature value corresponding to the feature of "total consumption per month" may be zero or null; in the piece of data corresponding to the user 2, the characteristic value corresponding to the characteristic of "age" is 25, the characteristic value corresponding to the characteristic of "deposit balance" is 1000, and the characteristic value corresponding to the characteristic of "total consumption of the present month" is 200.
In one example, the step of acquiring the user data set corresponding to the specific user set may include steps S2110 to S2120 as follows:
step S2110, a plurality of data tables for a specific user set are acquired.
In particular, the plurality of data tables for a particular set of users may include: basic information table for representing basic information of all users in a specific user set, user transaction flow water table for representing transaction flow information of all users in the specific user set, user behavior record table for representing behavior information of all users in the specific user set, and the like. Each data table may be integrated with corresponding types of source data for users in a particular set of users. Each data table may include at least one feature extracted from the source data and feature values corresponding to the features. For example, the basic information table may include feature values corresponding to features 1 and 2 and features 1 and 2, the user transaction flow table may include feature values corresponding to features 3 and 4 and features 3 and 4, and the user behavior record table may include feature values corresponding to features 5-n and features 5-n.
Step S2120, a plurality of data tables are spliced into a wide table by taking a user identifier as a main key; wherein each row of the broad table corresponds to a user, and each column except for the user identification corresponds to a feature; alternatively, each column of the broad table corresponds to a user and each row, except for the user identification, corresponds to a feature.
If the data table corresponding to a user does not contain the feature 1 and the feature value thereof, the feature value of the user on the feature 1 in the wide table can be zero or null.
In one example, the resulting broad table may be as shown in table 1 below:
TABLE 1
Figure BDA0004073698940000071
In the broad table shown in Table 1, user 1 has a feature value D on feature 1 1,1 The feature value of user 1 on feature n is D 1,n User 2 has a feature value D on feature 1 2,1 User 2 has a feature value D on feature 2 2,2 User 2 has a feature value D on feature n 2,2 User 3 has a feature value D on feature 1 3,1 The feature value of user 3 on feature n-1 is D 3,n-1 The feature value of user m-1 on feature 3 is D m-1,3 The feature value of the user m-1 on the feature n is D m-1,n The feature value of user m on feature 2 is D m,2 The feature value of user m on feature n-1 is D m,n-1 . The feature value of user 1 on feature 2, the feature value of user 1 on feature 3, the feature value of user 1 on feature n-1, the feature value of user 2 on feature 3, the feature value of user 2 on feature n-1, the feature value of user 3 on feature 2, the feature value of user 3 on feature 3, the feature value of user 3 on feature n, the feature value of user m-1 on feature 1, the feature value of user m-1 on feature 2, the feature value of user m-1 on feature n-1, the feature value of user m on feature 3, and the feature value of user m on feature n may all be null.
Step S2200, selecting part of the features from the plurality of features as clustering features.
In one embodiment of the present invention, the method for selecting a part of the plurality of features as the clustering feature is any one or more of the following:
selecting a feature with the null rate smaller than a set value from the plurality of features as a clustering feature;
selecting part of features meeting service requirements from a plurality of features as clustering features, wherein the service is a service for deciding by applying portrait information;
selecting at least one group of repeated characteristics from the plurality of characteristics, respectively combining the repeated characteristics of each group of the types to obtain new characteristics, and taking the new characteristics as clustering characteristics;
and selecting at least one group of repeated characteristics of the types from the plurality of characteristics, selecting one representative characteristic from the repeated characteristics of each group of the types, and taking the representative characteristic as a clustering characteristic.
In an embodiment in which a method of selecting a part of features from a plurality of features as a clustering feature is to select a feature having a null rate smaller than or equal to a set value from the plurality of features, as a clustering feature, the feature value may be set to null when a feature value of a corresponding user on one of the features is not available in advance in step S2100.
The setting value in this embodiment may be set in advance according to the application scenario or specific requirements, and the setting value may be a value greater than or equal to 0 and less than or equal to 1. For example, the set value may be 0.2, and then a method of selecting a part of the plurality of features as the clustering feature may be to select a feature having a null rate of 0.2 or less from the plurality of features as the clustering feature.
In one example, on the basis of the wide table obtained through steps S2100 to S2200 as above, the step of calculating the feature null rate may include:
obtaining the total number of users as a first number according to the wide table;
according to the wide table, the number of users with the empty characteristic value of each characteristic is respectively obtained and used as a second number corresponding to each characteristic;
and respectively determining the ratio of the second quantity to the first quantity corresponding to each feature as the null rate of each feature.
In particular, the first number may be the total number of users comprised in the particular set of users, e.g. the first number may be m. For each feature, determining the number of users with the feature value of the feature being empty, namely the second number corresponding to the feature. For example, for features 1-n, the corresponding second number is A1-An, respectively, then the null rate for feature k may be Ak/m, k ε [1, n ].
In the embodiment of selecting a part of features from a plurality of features as clustering features, which are selected to meet the service requirement, and selecting the features considered important by the service as the clustering features, the service can be the service of the final application of the portrait information and the service for making decisions by applying the portrait information.
For example, when a bank wishes to increase AUM (Asset Under Management, asset management Scale), AUM-related features should be selected as clustering features (e.g., deposit balance, national debt balance, financial product balance, stock balance, etc.). When a bank wishes to increase the purchase rate of a certain financial product, the characteristics related to the purchase behavior of the user should be selected as the clustering characteristics (such as the amount consumed, the number of consumed times, the number of browsed times, the number of clicked times, etc. within one month).
In the embodiment of selecting a part of features from a plurality of features as clustering features, selecting at least one group of repeated features from the plurality of features, respectively combining each group of repeated features to obtain new features, and taking the new features as the clustering features, in order to prevent the excessive influence of certain types of features on the clustering result, each group of repeated features needs to be combined. For example, the broad table contains two features of "average deposit in the current year" and "deposit balance", and through business analysis, the two features can be considered to belong to a group of repeated features, so that the repeated features of each group of the repeated features can be combined to obtain new features as clustering features, and the average feature value of the repeated features of each group of the types can be taken as the feature value of the corresponding clustering features.
In the embodiment in which the method of selecting a part of the features from the plurality of features as the clustering features is to select at least one group of the features of the type repetition from the plurality of features, select a representative feature from each group of the features of the type repetition, and use the representative feature as the clustering feature, for example, the broad table contains two features of "average deposit in the current year" and "deposit balance", and the two features can be considered to belong to one group of the features of the type repetition through service analysis, then one representative feature from each group of the features of the type repetition can be selected as the clustering feature, and the feature value of the representative feature can be used as the feature value of the corresponding clustering feature.
Step S2300, clustering is performed based on the selected clustering features to divide the specific user set into a plurality of user groups.
Specifically, all users included in the specific user set may be clustered based on the selected cluster feature to obtain a plurality of user groups, where each user group includes at least one user, and each user included in the user group is included in the specific user set.
In one example, clustering based on the selected cluster features, the grouping of the particular set of users into a plurality of user groups may include:
Constructing a sample consisting of characteristic values corresponding to the clustering characteristics for each user in a specific user set to obtain a user clustering sample set; and clustering the users in the specific user set according to the user clustering sample set by adopting a clustering algorithm to obtain a plurality of user groups.
In this embodiment, only the feature value of each sample in the user data set on each cluster feature may be extracted to obtain the user cluster sample set. Each sample in the set of user-clustered samples uniquely corresponds to a user, and each sample may contain only the feature value of each clustered feature.
The clustering algorithm may be a k-means algorithm (k-means clustering algorithm ), a means shift algorithm (mean shift algorithm), a DBSCAN algorithm (Density-Based Spatial Clustering of Applications with Noise, density-based clustering method with noise), or a BIRCH algorithm (Balanced Iterative Reducing and Clustering using Hierarchies,) or the like.
In another example, a k-means algorithm may be used to perform clustering based on the selected clustering features, and the specific user set is divided into a plurality of user groups, which specifically may include steps S2310 to S2320 as follows:
In step S2310, the euclidean distance between every two users is determined according to the feature value of the cluster feature.
For example, the total number of users is m, the number of cluster features is l, and the feature values of the user 1 on the cluster features 1 to the cluster features l are D respectively 1,1 ~D 1,l The characteristic values of the user 2 on the clustering characteristics 1-the clustering characteristics l are respectively D 2,1 ~D 2,l … … the characteristic values of the user m on the clustering characteristics 1 to the clustering characteristics l are D respectively m,1 ~D m,l Then, the Euclidean distance between user i and user j may be
Figure BDA0004073698940000101
Wherein i, j are integers and i, j e [1, m]。
Step S2320, the users in the specific user set are clustered according to Euclidean distance between every two users, so as to obtain a plurality of user groups.
In the case of clustering using the k-means algorithm, the number of user groups needs to be set in advance. Therefore, in this embodiment, the number of user groups may be set to a set number, which may be set according to the application scenario or specific requirements, for example, the set number may be 10, and then the number of user groups obtained may be 10.
On the basis of the present embodiment, the step of clustering users in the specific user set according to the euclidean distance between every two users to obtain a plurality of user groups includes steps S2321 to S2322 as follows:
In step S2321, the set number of users with the largest euclidean distance is selected as the initial center of each user group.
Specifically, the selecting of the set number of users with the largest Euclidean distance may be sorting the Euclidean distances, and selecting the users corresponding to the Euclidean distances according to the sequence from the big to the small. For example, in order from large to small, the users corresponding to the euclidean distances are user 1 and user 2, user 3 and user 1, user 12 and user 5, and user 4 and user 5, … …, and then, in the case where the set number is 5, the set number of users having the largest euclidean distance may be selected as user 1, user 2, user 3, user 12 and user 5.
In step S2322, other users are divided into user groups with the closest euclidean distance to the corresponding initial center.
Wherein the other users are users in the specific user set except for the set number of users which are used as the initial center of each user group.
For example, in the case where the number of users selected as the initial center of each user group is set to be user 1, user 2, user 3, user 12, and user 5, the euclidean distance ρ between user 6 and user 1 is compared for the other users 6 6,1 Euclidean distance ρ between user 6 and user 2 6,2 Euclidean distance ρ between user 6 and user 3 6,3 Euclidean distance ρ between user 6 and user 12 6,12 Euclidean distance ρ between user 6 and user 5 6,5 If the Euclidean distance ρ between user 6 and user 1 6,1 At a minimum, user 6 may be divided into user groups with user 1 as the initial center.
Step S2400 selects an image discovery feature from among a plurality of features.
In general, the number of image discovery features is large.
In one example, a feature of interest may be selected from a plurality of features as the portrait discovery feature.
In another example, a plurality of features may be used as the image discovery feature. Specifically, all the features in the broad table may be regarded as image discovery features.
Step S2500, calculating the score of each user group on the image finding feature for each image finding feature, and determining the image information corresponding to each user group according to the score.
In one example, the step of calculating the score of each user group on each image finding feature includes steps S2510 to S2540 as follows:
Step S2510, for each image finding feature, determines an average feature value of the image finding feature for the user in each user group as a first average value of the image finding feature for each user group.
For example, the number of image-finding features is x, the user group 1 includes y users, and the average value of the feature values of the y users on the image-finding feature 1 is the first average value avg_F of the user group 1 on the image-finding feature 1 1,1 . If the number of user groups obtained in step S2400 is z, the first average value of the user groups v on the image finding feature w may be avg_f v,w Wherein v, w are integers and v.epsilon.1, z],w∈[1,x]。
Step S2520, for each image finding feature, calculates an average value of the first average values of the image finding features for each user group as a second average value of the image finding features.
Where the number of image finding features is x, the number of user groups is z, and the first average value of the user groups v over the image finding features w is avg_F v,w (wherein v, w are integers and v.epsilon.1, z],w∈[1,x]) In the case of (a), the average of the first average of all user groups over the portrait discovery feature w, i.e. the second average of the portrait discovery feature w, may be
Figure BDA0004073698940000111
Step S2530, for each image finding feature, calculates a standard deviation of each first average value of each user group on the image finding feature as a standard deviation of the image finding feature.
Where the number of image finding features is x, the number of user groups is z, and the first average value of the user groups v over the image finding features w is avg_F v,w (wherein v, w are integers and v.epsilon.1, z],w∈[1,x]) In the case of (a), the standard deviation of the first average value of all user groups on the portrait discovery feature w, i.e. the portraitThe standard deviation of the image finding feature w may be
Figure BDA0004073698940000112
Step S2540, for each user group, obtaining a score of the user group on each image discovery feature according to the first average value of the user group on each image discovery feature and the second average value and standard deviation of the corresponding image discovery feature.
In one example, for each user group, the step of obtaining the score of the user group on each image discovery feature according to the first average value of the user group on each image discovery feature and the second average value and standard deviation of the corresponding image discovery feature includes steps S2541 to S2542 as follows:
step S2541, for each user group, determining a difference between the first average value of the user group on each image finding feature and the second average value of the corresponding image finding feature as a difference of the user group on each image finding feature.
Where the number of image finding features is x, the number of user groups is z, and the first average value of the user groups v over the image finding features w is avg_F v,w The second average value of the portrait discovery feature w is mean w (wherein v, w are integers and v.epsilon.1, z],w∈[1,x]) In the case of (a), the user group v has a first average value avg_F on the portrait discovery feature w v,w Second mean value mean of image finding feature w w The difference between the user groups v on the image finding feature w is diff v,w =avg_F v,w -mean w
Step S2542, for each user group, determining a ratio between the difference value of the user group on each image finding feature and the standard deviation of the corresponding image finding feature as the score of the user group on the corresponding image finding feature.
Where the number of image finding features is x, the number of user groups is z, and the difference between the user groups v and the image finding features wIs diff of v,w The standard deviation of the image finding feature w is stddev w (wherein v, w are integers and v.epsilon.1, z],w∈[1,x]) In the case of (a), the ratio of the difference of the user population v on the image finding feature w to the standard deviation of the image finding feature w, i.e. the score of the user population v on the image finding feature w may be score v,w =diff v,w /stddev w
In one embodiment of the present invention, the step of determining portrait information corresponding to each user group according to the score may include:
for each user group, determining the image discovery feature with the corresponding score exceeding the set range as the target image feature of the user group, and obtaining the image information corresponding to the user group according to the target image feature of the user group and the corresponding score.
The setting range can be preset according to application scenes or specific requirements. Specifically, the setting ranges corresponding to different image finding features may be the same or different.
Next, description will be made on how to specify portrait information of a corresponding user group, taking as an example the age of the portrait discovery feature.
For example, step S2400 obtains 10 user groups, and the first average avg_age of the 10 user groups on the age-representation discovery feature v (wherein v is an integer and v.epsilon.1, 10]) 50, 55, 80, 10, 60, 58, 48, 52, 30, 55, respectively. The second average of the age-finding features may be obtained from the first average of the age-finding features of the 10 user groups
Figure BDA0004073698940000121
Figure BDA0004073698940000122
The standard deviation of the image-finding characteristic of age can be
Figure BDA0004073698940000123
Score according to the formula v =(avg_age v -mean)/stddev, the scores of the 10 user groups on the image finding feature of age are obtained as follows: 0.011,0.294,1.709, -2.252,0.577,0.464, -0.101,0.124, -1.120,0.294.
The setting range corresponding to the image finding characteristic of age is [ -1.5,1.5]. Then, score on the image finding feature of age due to user population 3 3 =1.709, score of user population 4 on age-image finding feature 4 = -2.252, all exceeding the corresponding setting range [ -1.5,1.5 [ -1.5 ]]Therefore, the age-finding feature can be used as the target portrait feature of the user groups 3 and 4.
Score on age-target portrayal feature due to user population 3 3 Since =1.709 is larger than the maximum threshold value 1.5 in the set range, "older" can be regarded as one image information of the 3 rd user group. Score on age-target portrayal feature due to user population 4 4 Since = -2.252 is smaller than the minimum threshold value of the set range of-1.5, "age smaller" can be regarded as one piece of image information of the 4 th user group.
For the other 8 user groups, the scores on the image discovery feature of the age are within the set range of [ -1.5,1.5], so the image discovery feature of the age cannot be used as the target image feature of the 8 user groups, and the corresponding "older" and "younger" cannot be used as the image information of the 8 user groups.
Step S2600, outputting image information corresponding to each user group.
In one embodiment, the method for outputting the portrait information corresponding to each user group may include:
acquiring an application scene to which the portrait information is applicable; and displaying the portrait information corresponding to each user group based on the application scene.
In another embodiment, the method for outputting the portrait information corresponding to each user group may further include: and generating a user portrait report in a corresponding file format according to the service requirement so as to be used for guiding the actual service. Wherein, the portrait information corresponding to each user group is described in the portrait report of the user.
According to the embodiment of the invention, partial characteristics are selected from the characteristics contained in each piece of data to serve as clustering characteristics, and the users contained in the specific user set are clustered based on the selected clustering characteristics, so that the specific user set is divided into a plurality of user groups; and selecting image finding features from the plurality of features, calculating a score of each user group on the image finding features for each image finding feature, and determining and outputting image information corresponding to each user group according to the score.
The embodiment of the invention selects a small quantity of clustering features to cluster, so that unimportant features can be prevented from excessively influencing the clustering result. The portrait information of each user group is characterized by a large number of characteristics, so that the analysis dimension of the user portrait can be widened.
In one embodiment, the method may further include steps S3100 to S3300 as shown in fig. 3:
step S3100, a cluster center of each user group is obtained.
Specifically, the step S3100 may be performed after the step S2300 is performed, or may be performed after the step S2500 is performed, and in this embodiment, the order of performing the step S3100 is not limited.
In one embodiment, the step of obtaining a cluster center for each user population may include:
and respectively determining the average value of the characteristic values of each cluster characteristic of the user corresponding to each user group as a cluster center of each user group.
In this embodiment, the cluster center may be an average value of feature values of the users in the corresponding user group in each cluster feature. For example, user group 1 includes users 1 to t, the number of cluster features is l, and the feature value of user i on cluster feature j is f i,j Wherein i, j is an integer and i.epsilon.1, t],j∈[1,l]Then, the cluster center of the user group 1 may be
Figure BDA0004073698940000141
Step S3200, new data of a new user is acquired, wherein the new data comprises new feature values corresponding to each cluster feature.
The new user in this embodiment may be a user who generates new data. Specifically, the user may be a user belonging to the specific user group, or may be a user not belonging to the specific user group.
In an embodiment in which the new user belongs to a user in the specific user set, the new data may include content that is different from the data corresponding to the user in the user data set, for example, feature values of the same feature in the new data and the data corresponding to the user in the user data set may be different.
In one example, the data format of each new piece of data may be the same as the data format of each piece of data contained in the user data set.
In step S3300, the user group corresponding to the new user is determined according to the new feature value corresponding to each cluster feature of the new user and the cluster center of each user group.
In one embodiment, the step of determining the user group corresponding to the new user according to the new feature value corresponding to each cluster feature of the new user and the cluster center of each user group may include steps S3310 to S3320 as follows:
Step S3310, determining a new euclidean distance between the new user and each cluster center according to the new feature value corresponding to the cluster feature of the new user.
The method for determining the new euclidean distance between the new user and the cluster center may refer to the foregoing method for determining the euclidean distance between every two users, which is not described herein again.
In step S3320, the new users are divided into user groups corresponding to the cluster centers with the smallest euclidean distance.
For example, the number of user groups is r, and the new Euclidean distance between the new user and the cluster center of user group γ is ρ γ Wherein, the method comprises the steps of, wherein,gamma is an integer and gamma is [1, r ]]If ρ 3 The smallest indicates that the distance between the new user and the cluster center of the user group 3 is smallest, and thus the new user may be divided into the user group 3.
If the new user does not belong to a user of the above specific set of users, it may be that the new user is directly divided into user groups 3.
If the new user belongs to the user in the specific user set, the new user belongs to the user group 1 according to the data of the user in 6 months, through the clustering step of step S2300, but the distance between the new user and the clustering center of the user group 3 is determined to be the smallest according to the data of the user in 7 months, the user may be deleted from the user group 1, and the user may be re-divided into the user group 3.
After obtaining the new user group, the above steps S2400 to S2600 may be performed on the new user group, and the portrait information for each user group may be newly determined and outputted.
According to the embodiment of the invention, after each user group is updated, the clustering center of each user group is obtained, and when the data distribution is changed, namely new data of a new user is obtained, a new clustering result can be obtained only according to the characteristic value of the clustering characteristic of the new user and the clustering center of each user group. Therefore, the iterative clustering center can ensure the stability of the user groups and save the labor cost.
In one embodiment, after the step S2500 is performed, the method may further include steps S4100 to S4300 shown in fig. 4:
step S4100, obtaining a product to be recommended.
Specifically, the product to be recommended can be one or more products determined according to application scenes or specific requirements, and the product to be recommended can be virtual products such as information, financial accounting and the like or real products.
Step S4200, selecting the products to be recommended matching each user group according to the image information corresponding to each user group.
In one embodiment, the step of selecting the product to be recommended matching each user group according to the portrait information corresponding to each user group may include steps S4210 to S4230 as follows:
step S4210, obtaining a product label of the product to be recommended.
The product tags may be data embodying relevant attributes of the corresponding product to be recommended.
Step S4220, determining the matching degree between each user group and each product to be recommended according to the portrait information corresponding to each user group and the product label of the product to be recommended.
Specifically, the portrait information corresponding to each user group and the product label of the product to be recommended are input into a pre-trained recommendation model, so as to obtain the matching degree between each user group and the matching degree of each product to be recommended.
Step S4230, selecting the products to be recommended matching each user group according to the matching degree.
If the number of the products to be recommended is one, the products to be recommended can be matched with the first number of user groups with the highest matching degree.
If the number of the products to be recommended is plural, each product to be recommended may be matched with the second number of user groups with the highest matching degree, or each user group may be matched with the third number of products to be recommended with the highest matching degree.
The first number, the second number and the third number can be set according to corresponding application scenes or specific requirements respectively.
In step S4300, recommending the products to be recommended, which are matched with the corresponding user group, to the users in each user group.
For example, in the case that the user group 1 matches with the product to be recommended 1, the product to be recommended 1 may be recommended to all users included in the user group 1.
In this embodiment, according to the portrait information corresponding to each user group and the product label of the product to be recommended, the matching degree of each user group and each product to be recommended is obtained, so that the preference degree of the information to be recommended of each user group can be accurately and effectively obtained, and accurate product recommendation is performed for each user group, so as to actually meet the product acquisition requirement of each user group, and promote the product acquisition experience of each user group.
< example 1>
The method for constructing user portrait information provided in this embodiment will be further described below with reference to fig. 5.
In step S5001, a plurality of data tables for a specific user set are acquired.
In step S5002, a plurality of data tables are spliced into a wide table by using the user identifier as a primary key.
Wherein each row of the broad table corresponds to a user, and each column except for the user identification corresponds to a feature; alternatively, each column of the broad table corresponds to a user and each row, except for the user identification, corresponds to a feature.
In step S5003, the total number of users is obtained as the first number from the wide table.
Step S5004, according to the broad table, the number of users whose feature values are null for each feature is obtained as the second number corresponding to each feature.
In step S5005, a ratio of the second number to the first number corresponding to each feature is determined as a null rate of each feature.
For example, the first number may be m. For each feature, for features 1-n, the corresponding second number is A1-An, respectively, then the null rate for feature k may be
Figure BDA0004073698940000161
Wherein k is an integer and k.epsilon.1, n]。
In step S5006, a feature with a null rate smaller than a set value is selected from the plurality of features as a clustering feature.
Step S5007, determining the Euclidean distance between every two users according to the characteristic value of the clustering characteristic.
For example, the total number of users is m, the number of cluster featuresThe quantity is L, and the characteristic values of the user i on the clustering characteristics 1-the clustering characteristics L are D respectively i,1 ~D i,L Wherein i, l are integers and i.epsilon.1, m],l∈[1,L]Then, for user i and user j (where i+.j, i, j is an integer, and i, j ε [1, m)]) The Euclidean distance between can be
Figure BDA0004073698940000162
Figure BDA0004073698940000163
In step S5008, the set number of users with the largest euclidean distance is selected as the initial center of the set number of user groups.
In step S5009, each other user in the specific user set is respectively divided into user groups with the closest euclidean distance to the corresponding initial center.
Step S5010, selecting an image finding feature from a plurality of features.
Step S5011, for each image finding feature, determines an average feature value of the image finding feature for the user in each user group as a first average value of the image finding feature for each user group.
For example, if the number of image discovery features is x, the user group 1 includes y users, and the number of user groups is z, then the first average value of the user group v over the image discovery features w may be avg_f v,w (wherein v, w are integers and v.epsilon.1, z],w∈[1,x])。
Step S5012, for each image finding feature, calculates an average value of the first average values of the image finding features for each user group as a second average value of the image finding features.
For example, the second average value of the portrait discovery feature w may be
Figure BDA0004073698940000164
Step S5013, for each image finding feature, calculates a standard deviation of each first average value of each user group on the image finding feature as a standard deviation of the image finding feature.
For example, the standard deviation of the image finding feature w may be
Figure BDA0004073698940000165
(wherein v, w are integers and v.epsilon.1, z],w∈[1,x])。
Step S5014, for each user group, determining a difference between the first average value of the user group on each image finding feature and the second average value of the corresponding image finding feature as a difference of the user group on each image finding feature.
For example, the difference of the user population v on the image finding feature w is diff v,w =avg_F v,w -mean w (wherein v, w are integers and v.epsilon.1, z],w∈[1,x])。
Step S5015, for each user group, determines a ratio between a difference value of the user group on each image finding feature and a standard deviation of the corresponding image finding feature as a score of the user group on the corresponding image finding feature.
For example, the score of the user population v on the image finding feature w may be score v,w =diff v,w /stddev w (wherein v, w are integers and v.epsilon.1, z],w∈[1,x])。
Step S5016, the image information corresponding to each user group is determined according to the score.
Step S5017, the image information corresponding to each user group is output.
< example 2>
On the basis of the above example 1, after the above step S5009 is performed, the method may further include steps S6001 to S6004 as shown in fig. 6:
in step S6001, a cluster center of each user group is obtained.
The cluster center may be an average of feature values of the users in the corresponding user population at each cluster feature. For example, user group 1 includes users 1 to t, and a group of usersThe number of class features is L, and the feature value of the user i on the cluster feature L is f i,j Wherein i, l is an integer, i.e. [1, t],l∈[1,L]Then, the cluster center of the user group 1 may be
Figure BDA0004073698940000171
Step S6002, obtaining new data of the new user, where the new data includes new feature values corresponding to each cluster feature.
Step S6003, determining a new Euclidean distance between the new user and each cluster center according to the new feature value corresponding to the cluster feature of the new user.
Step S6004, dividing the new user into user groups corresponding to the new cluster center with the smallest Euclidean distance.
After obtaining the new user group, steps S5011 to S5017 in example 1 may be performed on the new user group, and the image information for each user group may be newly determined and outputted.
< example 3>
On the basis of the above examples 1 and 2, after the above step S5016 is performed, the method may further include steps S7001 to S7005 as shown in fig. 7:
s7001, obtaining a product to be recommended.
S7002, obtaining a product label of the product to be recommended.
S7003, determining the matching degree of each user group and each product to be recommended according to the portrait information corresponding to each user group and the product label of the product to be recommended.
S7004, selecting the products to be recommended matched with each user group according to the matching degree.
S7005, recommending the products to be recommended, which are matched with the corresponding user groups, to the users in each user group.
< device >
In this embodiment, an apparatus 8000 for constructing user portrait information is provided, as shown in fig. 8, which includes a data acquisition module 8100, a cluster feature selection module 8200, a cluster module 8300, a discovery feature selection module 8400, a portrait information determination module 8500, and a portrait information output module 8600.
The data obtaining module 8100 is configured to obtain a user data set corresponding to a specific user set, where each piece of data in the user data set uniquely corresponds to one user in the specific user set, and each piece of data includes feature values corresponding to a plurality of features respectively.
The cluster feature selection module 8200 is configured to select a part of features from the plurality of features as the cluster feature.
The clustering module 8300 is configured to perform clustering based on the selected clustering features, and divide the specific user set into a plurality of user groups.
The discovery feature selection module 8400 is configured to select an image discovery feature from a plurality of features.
The image information determining module 8500 is configured to calculate, for each image finding feature, a score of each user group on the image finding feature, and determine image information corresponding to each user group based on the score.
The image information output module 8600 is configured to output image information corresponding to each user group.
In one embodiment, the clustering module 8300 may also be used to:
constructing a sample consisting of characteristic values corresponding to the clustering characteristics for each user in a specific user set to obtain a user clustering sample set;
and clustering the users in the specific user set according to the user clustering sample set by adopting a clustering algorithm to obtain a plurality of user groups.
In one embodiment, the data acquisition module 8100 may also be configured to:
acquiring a plurality of data tables for a particular set of users;
Splicing a plurality of data tables into a wide table by taking the user identification as a main key; wherein each row of the broad table corresponds to a user, and each column except for the user identification corresponds to a feature; alternatively, each column of the broad table corresponds to a user and each row, except for the user identification, corresponds to a feature.
In one embodiment, the cluster feature selection module 8200 may be specifically configured to select a cluster feature using any one or more of the following:
selecting a feature with the null rate smaller than or equal to a set value from the plurality of features as a clustering feature;
selecting part of features meeting service requirements from a plurality of features as clustering features, wherein the service is a service for deciding by applying portrait information;
selecting at least one group of repeated characteristics from the plurality of characteristics, respectively combining the repeated characteristics of each group of the types to obtain new characteristics, and taking the new characteristics as clustering characteristics;
and selecting at least one group of repeated characteristics of the types from the plurality of characteristics, selecting one representative characteristic from the repeated characteristics of each group of the types, and taking the representative characteristic as a clustering characteristic.
In one embodiment, the user data set is a broad table with user identification as a primary key;
The cluster feature selection module 8200 is configured to select, from a plurality of features, a feature with a null rate less than or equal to a set value as a cluster feature;
the apparatus 8000 may further include:
a module for obtaining the total number of users as a first number from the broad table;
the module is used for respectively obtaining the number of users with empty characteristic values of each characteristic according to the wide table and taking the number as a second number corresponding to each characteristic;
and the module is used for respectively determining the ratio of the second quantity to the first quantity corresponding to each feature as the null value rate of each feature.
In one embodiment, the clustering module 8300 may also be used to:
determining Euclidean distance between every two users according to the characteristic value of the clustering characteristic;
and clustering the users in the specific user set according to the Euclidean distance between every two users to obtain a plurality of user groups.
In one embodiment, the number of user groups is a set number;
clustering users in a specific user set according to Euclidean distance between every two users to obtain a plurality of user groups, wherein the step of obtaining the plurality of user groups comprises the following steps:
selecting a set number of users with the largest Euclidean distance as the initial center of each user group;
Dividing other users into user groups with the nearest Euclidean distance between the other users and the corresponding initial centers; wherein the other users are users in the specific user set except for the set number of users.
In one embodiment, the discovery feature selection module 8400 may also be configured to:
the plurality of features are all image discovery features.
In one embodiment, calculating the score of each user population on each image discovery feature for that image discovery feature comprises:
for each portrait discovery feature, determining an average feature value of users in each user group on the portrait discovery feature as a first average value of each user group on each portrait discovery feature;
for each image finding feature, calculating an average value of the first average values of the image finding features of the user groups as a second average value of the image finding features;
for each image finding feature, calculating a standard deviation of each first average value of each user group on the image finding feature as the standard deviation of the image finding feature;
for each user group, a score of the user group on each image discovery feature is obtained according to a first average value of the user group on each image discovery feature and a second average value and standard deviation of the corresponding image discovery feature.
In one embodiment, for each user population, deriving the score for the user population on each image discovery feature based on the first average of the user population on each image discovery feature and the second average and standard deviation of the corresponding image discovery feature comprises:
for each user group, respectively determining a difference value between a first average value of the user group on each image discovery feature and a second average value of the corresponding image discovery feature as a difference value of the user group on each image discovery feature;
for each user group, determining the ratio of the difference value of the user group on each image discovery feature and the standard deviation of the corresponding image discovery feature as the score of the user group on the corresponding image discovery feature.
In one embodiment, determining portrait information corresponding to each user population based on the score includes:
for each user group, determining the image discovery feature with the corresponding score exceeding the set range as the target image feature of the user group, and obtaining the image information corresponding to the user group according to the target image feature of the user group and the corresponding score.
In one embodiment, the apparatus 8000 may further include:
a module for acquiring a cluster center of each user group;
the module is used for acquiring new data of a new user, wherein the new data comprises new feature values corresponding to each cluster feature;
and determining the user group corresponding to the new user according to the new feature value corresponding to each cluster feature of the new user and the cluster center of each user group.
In one embodiment, obtaining a cluster center for each user population comprises:
and respectively determining the average value of the characteristic values of each cluster characteristic of the user corresponding to each user group as a cluster center of each user group.
In one embodiment, determining the user group corresponding to the new user according to the new feature value corresponding to each cluster feature of the new user and the cluster center of each user group includes:
determining a new Euclidean distance between the new user and each clustering center according to a new characteristic value corresponding to the clustering characteristic of the new user;
and dividing the new users into user groups corresponding to the cluster centers with the minimum new Euclidean distance.
In one embodiment, the apparatus 8000 may further include:
The module is used for acquiring a product to be recommended;
a module for selecting the products to be recommended matched with each user group according to the portrait information corresponding to each user group;
and recommending the products to be recommended, which are matched with the corresponding user groups, to the users in each user group.
In one embodiment, selecting the product to be recommended matching each user group according to the portrait information corresponding to each user group includes:
obtaining a product label of a product to be recommended;
determining the matching degree of each user group and each product to be recommended according to the portrait information corresponding to each user group and the product label of the product to be recommended;
and selecting the products to be recommended matched with each user group according to the matching degree.
In one embodiment, the apparatus 8000 may further include:
a module for acquiring an application scene to which the portrait information is applicable;
and a module for displaying portrait information corresponding to each user group based on the application scene.
It will be appreciated by those skilled in the art that the means 8000 for constructing user portrayal information can be implemented in a variety of ways. For example, the apparatus 8000 for constructing user portrait information may be implemented by an instruction configuration processor. For example, instructions may be stored in ROM and when the device is booted, the instructions are read from ROM into a programmable device to implement apparatus 8000 for building user portrait information. For example, the device 8000 that constructs user portrayal information may be solidified into a dedicated device (e.g., ASIC). The means 8000 for constructing user portrayal information may be separate from each other or they may be combined together. The means 8000 for constructing user portrait information may be implemented by one of the various implementations described above, or may be implemented by a combination of two or more of the various implementations described above.
In this embodiment, the device 8000 for constructing user portrait information may have various implementation forms, for example, the device 8000 for constructing user portrait information may be any functional module running in a software product or an application program that provides a network access service, or a peripheral embedded part, a plug-in part, a patch part, etc. of the software product or the application program, or may be the software product or the application program itself.
< electronic device >
In this embodiment, an electronic device 9000 is also provided. The electronic device 9000 may be the electronic device 1000 shown in fig. 1.
In one aspect, the electronic device 9000 may comprise the aforementioned means 8000 for constructing user portrait information for implementing a method for constructing user portrait information according to any embodiment of the present invention.
In another aspect, as shown in fig. 9, the electronic device 9000 may further comprise a processor 9100 and a memory 9200, the memory 9200 for storing executable instructions; the processor 9100 is configured to execute the electronic device 9000 according to control of the instruction to perform a method of constructing user portrait information according to any embodiment of the present invention.
In this embodiment, the electronic device 9000 may be a mobile phone, a tablet computer, a palm computer, a desktop computer, a notebook computer, a workstation, a game console, or the like.
< computer-readable storage Medium >
In this embodiment, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of constructing user portrait information according to any of the embodiments of the present invention.
The present invention may be an apparatus, method and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are all equivalent.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (10)

1. A method of constructing user portrayal information comprising:
acquiring a user data set corresponding to a specific user set, wherein each piece of data in the user data set uniquely corresponds to one user in the specific user set, and each piece of data comprises a plurality of characteristic values respectively corresponding to a plurality of characteristics;
selecting part of the features from the plurality of features as clustering features;
clustering is carried out based on the selected clustering features, and the specific user set is divided into a plurality of user groups;
Selecting an image discovery feature from the plurality of features;
calculating the score of each user group on the image finding feature according to each image finding feature, and determining the image information corresponding to each user group according to the score;
and outputting the portrait information corresponding to each user group.
2. The method of claim 1, wherein the clustering based on the selected cluster features to group the particular set of users into a plurality of user groups comprises:
constructing a sample consisting of characteristic values corresponding to the clustering characteristics for each user in the specific user set to obtain a user clustering sample set;
and clustering the users in the specific user set according to the user clustering sample set by adopting a clustering algorithm to obtain a plurality of user groups.
3. The method of claim 1, wherein the obtaining a set of user data corresponding to a particular set of users comprises:
acquiring a plurality of data tables for the particular set of users;
splicing the plurality of data tables into a wide table by taking the user identification as a main key; wherein each row of the broad table corresponds to a user, and each column except for the user identification corresponds to a feature; alternatively, each column of the broad table corresponds to a user, and each row other than the user identification corresponds to a feature.
4. The method of claim 1, wherein selecting a portion of the plurality of features as a cluster feature is any one or more of:
selecting a feature with the null rate smaller than or equal to a set value from the plurality of features as the clustering feature;
selecting part of features meeting service requirements from the plurality of features as the clustering features, wherein the service is a service for deciding by applying the portrait information;
selecting at least one group of repeated characteristics from the plurality of characteristics, respectively combining the repeated characteristics of each group of types to obtain new characteristics, and taking the new characteristics as the clustering characteristics;
and selecting at least one group of repeated characteristics from the plurality of characteristics, selecting one representative characteristic from each group of repeated characteristics, and taking the representative characteristic as the clustering characteristic.
5. The method of claim 4, wherein the user data set is a broad table with user identification as a primary key;
when a feature with a null rate smaller than a set value is selected from the plurality of features, the method further comprises the following steps of:
Obtaining the total number of users as a first number according to the wide table;
according to the wide table, the number of users with the empty characteristic value of each characteristic is obtained respectively and is used as a second number corresponding to each characteristic;
and respectively determining the ratio of the second quantity corresponding to each feature to the first quantity as the null value rate of each feature.
6. The method of claim 1, wherein the step of clustering the particular set of users into a plurality of user groups based on the selected cluster features comprises:
determining the Euclidean distance between every two users according to the characteristic value of the clustering characteristic;
and clustering the users in the specific user set according to the Euclidean distance between every two users to obtain a plurality of user groups.
7. The method of claim 6, wherein the number of user groups is a set number;
the step of clustering the users in the specific user set according to the Euclidean distance between every two users to obtain a plurality of user groups comprises the following steps:
selecting a set number of users with the largest Euclidean distance as the initial center of each user group;
Dividing other users into user groups with the nearest Euclidean distance between the other users and the corresponding initial centers; wherein the other users are users other than the set number of users in the specific user set.
8. An apparatus for constructing user portrayal information, comprising:
the data acquisition module is used for acquiring a user data set corresponding to a specific user set, wherein each piece of data in the user data set uniquely corresponds to one user in the specific user set, and each piece of data comprises a plurality of characteristic values corresponding to a plurality of characteristics respectively;
the clustering feature selection module is used for selecting part of features from the plurality of features to serve as clustering features;
the clustering module is used for carrying out clustering processing based on the selected clustering characteristics and dividing the specific user set into a plurality of user groups;
the found feature selection module is used for selecting portrait found features from the plurality of features;
the image information determining module is used for respectively aiming at each image finding feature, calculating the score of each user group on the image finding feature and determining the image information corresponding to each user group according to the score;
And the portrait information output module is used for outputting portrait information corresponding to each user group.
9. An electronic device, comprising:
the apparatus of claim 8; or alternatively, the process may be performed,
a processor and a memory for storing instructions for controlling the processor to perform the method according to any one of claims 1 to 7.
10. A computer readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by a processor, implements the method according to any of claims 1 to 7.
CN202310070737.XA 2019-07-16 2019-07-16 Method and device for constructing user portrait information and electronic equipment Pending CN116109332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310070737.XA CN116109332A (en) 2019-07-16 2019-07-16 Method and device for constructing user portrait information and electronic equipment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310070737.XA CN116109332A (en) 2019-07-16 2019-07-16 Method and device for constructing user portrait information and electronic equipment
CN201910641681.2A CN110472998A (en) 2019-07-16 2019-07-16 A kind of method, apparatus and electronic equipment of building user portrait information

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201910641681.2A Division CN110472998A (en) 2019-07-16 2019-07-16 A kind of method, apparatus and electronic equipment of building user portrait information

Publications (1)

Publication Number Publication Date
CN116109332A true CN116109332A (en) 2023-05-12

Family

ID=68508804

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202310070737.XA Pending CN116109332A (en) 2019-07-16 2019-07-16 Method and device for constructing user portrait information and electronic equipment
CN201910641681.2A Pending CN110472998A (en) 2019-07-16 2019-07-16 A kind of method, apparatus and electronic equipment of building user portrait information

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201910641681.2A Pending CN110472998A (en) 2019-07-16 2019-07-16 A kind of method, apparatus and electronic equipment of building user portrait information

Country Status (1)

Country Link
CN (2) CN116109332A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723984A (en) * 2021-03-03 2021-11-30 京东城市(北京)数字科技有限公司 Method and device for acquiring crowd consumption portrait information and storage medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056407A (en) * 2016-06-03 2016-10-26 北京网智天元科技股份有限公司 Online banking user portrait drawing method and equipment based on user behavior analysis
CN108776907B (en) * 2018-05-31 2023-07-25 康键信息技术(深圳)有限公司 Intelligent advertisement recommendation method, server and storage medium
CN109241202B (en) * 2018-09-11 2020-10-16 杭州飞弛网络科技有限公司 Stranger social user matching method and system based on clustering
CN109635184A (en) * 2018-11-02 2019-04-16 平安科技(深圳)有限公司 Financial product recommended method, device and computer equipment based on data analysis
CN109543111B (en) * 2018-11-28 2021-09-21 广州虎牙信息科技有限公司 Recommendation information screening method and device, storage medium and server
CN109684330A (en) * 2018-12-17 2019-04-26 深圳市华云中盛科技有限公司 User's portrait base construction method, device, computer equipment and storage medium
CN109815987B (en) * 2018-12-27 2020-12-01 北京卓思天成数据咨询股份有限公司 Crowd classification method and system
CN109685581A (en) * 2019-01-03 2019-04-26 江苏方天电力技术有限公司 A kind of large power customers electricity consumption behavior analysis method based on label clustering technology
CN109493199A (en) * 2019-01-04 2019-03-19 深圳壹账通智能科技有限公司 Products Show method, apparatus, computer equipment and storage medium
CN109711484A (en) * 2019-01-10 2019-05-03 哈步数据科技(上海)有限公司 A kind of classification method and system of customer

Also Published As

Publication number Publication date
CN110472998A (en) 2019-11-19

Similar Documents

Publication Publication Date Title
US11853354B2 (en) Override of automatically shared meta-data of media
EP3288275B1 (en) Methods and systems of providing visual content editing functions
US11243992B2 (en) System and method for information recommendation
JP6261547B2 (en) Determination device, determination method, and determination program
US11556851B2 (en) Establishing a communication session between client terminals of users of a social network selected using a machine learning model
CN111078931B (en) Song list pushing method, device, computer equipment and storage medium
CN103617230A (en) Method and system for advertisement recommendation based microblog
US10896458B2 (en) Method, system, and computer-readable medium for product and vendor selection
WO2016202214A2 (en) Method and device for displaying keyword
US20210117834A1 (en) Method and device for providing notes by using artificial intelligence-based correlation calculation
US20220350959A1 (en) Cross channel digital forms integration and presentation system
CN111344696B (en) System and method for evaluating advertisements
KR20150112089A (en) Method for servicing recommended goods and apparatus for the same
US10795648B2 (en) Systems and methods of developments, testing, and distribution of applications in a computer network
CN114330752A (en) Ranking model training method and ranking method
US20160042370A1 (en) Providing survey content recommendations
CN116109332A (en) Method and device for constructing user portrait information and electronic equipment
JP5048852B2 (en) Search device, search method, search program, and computer-readable recording medium storing the program
KR20210140838A (en) Apparatus and Method for Providing Advertising Platform Service Enabling Two-Way Selection
US20180330422A1 (en) Recommender systems and methods for pricing and evaluation of fine art works
US20150170035A1 (en) Real time personalization and categorization of entities
KR102335408B1 (en) Method and apparatus for managing movie tribute language
KR20220134084A (en) System for providing personalized video contents
JPWO2014068666A1 (en) Content providing system, content providing method, and content providing program
JP2018067095A (en) Business card information management system, and search result display method and search result display program in business card information management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination