CN111784069A

CN111784069A - User preference prediction method, device, equipment and storage medium

Info

Publication number: CN111784069A
Application number: CN202010659275.1A
Authority: CN
Inventors: 余玉霞
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2020-10-16
Anticipated expiration: 2040-07-09
Also published as: CN111784069B

Abstract

The application relates to artificial intelligence, and discloses a user preference prediction method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring portrait data of a plurality of users, wherein the portrait data comprises index data values; clustering the index data values, dividing the index data values into a plurality of clustering categories and determining clustering center points corresponding to the clustering categories; performing data regression on the index data value according to the index data value and the clustering center point corresponding to the clustering category of the index data value to obtain a standard data value corresponding to the index data value; acquiring a target data value of user portrait data to be predicted; determining a standard data value corresponding to the target index data value according to the standard data value corresponding to each index data value; the target data value is replaced by the standard data value, corresponding preference data of the user is predicted according to the replaced user portrait data, discretization of index values in the user portrait is achieved, the index values are distributed in a certain range, and accuracy of user preference prediction is improved.

Description

User preference prediction method, device, equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for predicting user preference.

Background

The big data technology is an information processing technology which takes all data resources of any system as objects and discovers the correlation relationship expressed between data, and is widely applied to the aspects of flow optimization, targeted message and advertisement push, user personalized service and improvement and the like of the internet at present. The user portrait is an important application of a big data technology, and the goal of the user portrait is to establish descriptive tag attributes for users in a plurality of dimensions, so that real personal characteristics of the users in various aspects are sketched by using the tag attributes, further, user demands can be explored by using the user portrait, user preferences are analyzed, and more efficient and more targeted information transmission and user experience closer to personal habits are provided for the users by matching the user portrait. However, the existing user portrait process is easy to have the problems that the numerical range of the index value is not uniform, the data value is continuous, and the like, so that the numerical value of the index in the user portrait is not visual, and the user preference can not be accurately predicted according to the numerical value of the index.

Therefore, how to improve the accuracy of the user preference prediction becomes a problem to be solved urgently at present.

Disclosure of Invention

The embodiment of the application provides a user preference prediction method, a device, equipment and a storage medium, which can discretize index values in a user portrait into discrete values distributed in a certain range so as to improve the accuracy of user preference prediction.

In a first aspect, the present application provides a method for predicting user preference, the method comprising:

acquiring portrait data of a plurality of users, wherein the portrait data of each user comprises an index data value of a preset index;

clustering the index data values, dividing the index data values into a plurality of clustering categories, and determining clustering center points corresponding to the clustering categories;

performing data regression on each index data value respectively according to each index data value and a clustering central point corresponding to a clustering category of each index data value to obtain a standard data value corresponding to each index data value;

acquiring user portrait data to be predicted, wherein the user portrait data to be predicted comprises a target data value, and the target data value is an index data value of the preset index;

determining a standard data value corresponding to the target index data value according to the standard data value corresponding to each index data value;

and replacing the target data value in the user portrait data with the standard data value, and predicting corresponding preference data of the user according to the replaced user portrait data.

In a second aspect, the present application further provides a user preference prediction apparatus, including:

the information acquisition module is used for acquiring portrait data of a plurality of users, wherein the portrait data of each user comprises an index data value of a preset index;

the clustering processing module is used for clustering the index data values, dividing the index data values into a plurality of clustering categories and determining clustering center points corresponding to the clustering categories;

the data regression module is used for respectively performing data regression on each index data value according to each index data value and the clustering central point corresponding to the clustering category of the index data value to obtain a standard data value corresponding to each index data value;

the data acquisition module is used for acquiring user portrait data to be predicted, wherein the user portrait data to be predicted comprises a target data value, and the target data value is an index data value of the preset index;

the numerical value determining module is used for determining a standard data value corresponding to the target index data value according to the standard data value corresponding to each index data value;

and the preference prediction module is used for replacing the target data value in the user portrait data with the standard data value and predicting the preference data of the corresponding user according to the replaced user portrait data.

In a third aspect, the present application further provides a computer device comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and to implement the user preference prediction method as described above when executing the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement the user preference prediction method as described above.

The embodiment of the application discloses a user preference prediction method, a device, equipment and a storage medium, which comprises the steps of obtaining portrait data of a plurality of users, wherein the portrait data of each user comprises index data values of preset indexes, clustering the index data values, dividing the index data values into a plurality of clustering categories, determining a clustering center point corresponding to each clustering category, performing data regression on the index data values according to the index data values and the clustering center points corresponding to the clustering categories to obtain standard data values corresponding to the index data values, obtaining the portrait data of the user to be predicted, wherein the portrait data of the user to be predicted comprises target data values, the target data values are the index data values of the preset indexes, and then obtaining the standard data values corresponding to the index data values, and determining a standard data value corresponding to the target index data value, finally replacing the target data value in the user portrait data with the standard data value, and predicting the preference data of the corresponding user according to the replaced user portrait data, so that the embodiment of the application can effectively solve the problems that the numerical range of the index value in the user portrait is not uniform and the data value is a continuous value and is not visual, and discretizing the index value in the user portrait into discrete values distributed in a certain range so as to improve the accuracy of user preference prediction.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart illustrating a user preference prediction method according to an embodiment of the present invention;

fig. 2 is a sub-process diagram of clustering a plurality of index data values based on the kmeans model in fig. 1, dividing the plurality of index data values into a plurality of cluster categories, and determining a cluster center point of each cluster category;

fig. 3 is a sub-flow diagram illustrating that data regression is performed on the index data value according to the index data value and the cluster center point corresponding to the index data value in fig. 1 to obtain a standard data value corresponding to the index data value;

fig. 4 is a sub-flow diagram illustrating data regression performed on the index data value according to a preset data regression formula and a clustering target value corresponding to the index data value in fig. 3;

FIG. 5 is a schematic block diagram of an apparatus for predicting user preference according to an embodiment of the present invention;

FIG. 6 is a schematic block diagram of another user preference prediction apparatus provided by an embodiment of the present invention;

fig. 7 is a schematic block diagram of a structure of a computer device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

The embodiment of the application provides a user preference prediction method, a user preference prediction device, computer equipment and a computer readable storage medium. The user preference prediction method can be applied to terminal equipment or a server, the terminal equipment can be electronic equipment such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant and wearable equipment, and the server can be a single server or a server cluster consisting of a plurality of servers. The following explains the application of the user preference prediction method to a server as an example.

Specifically, the user preference prediction method includes: obtaining a plurality of user portrait data, each of the plurality of user portrait data comprising an index data value of an index; clustering the index data values, dividing the index data values into a plurality of clustering categories and determining clustering center points of the clustering categories; performing data regression on the index data value according to the index data value and the clustering center point corresponding to the index data value to obtain a standard data value corresponding to the index data value; acquiring user portrait data to be predicted, wherein the user portrait data comprises the index data value; determining a standard data value corresponding to the index data value in the user image data to be predicted; and replacing the index data value in the user portrait data with the standard data value, and predicting corresponding preference data of the user according to the replaced user portrait data. The user preference prediction method can effectively solve the problems that the numerical range of the index value in the user portrait is not uniform, and the data value is a continuous value and is not visual, and discretizes the index value in the user portrait into discrete values distributed in a certain range so as to improve the accuracy of user preference prediction.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a user preference prediction method according to an embodiment of the present disclosure.

As shown in fig. 1, the user preference prediction method includes steps S101 to S106.

Step S101, obtaining portrait data of a plurality of users, wherein the portrait data of each user comprises an index data value of a preset index.

Illustratively, user representation data for a plurality of users is obtained from a user information database, the user representation data being used to plot user representations comprising user behavioral representations, user health representations, enterprise credit representations, personal credit representations, static product representations, rotating equipment representations, social representations, economic representations, and the like, to predict user preferences, the user representation data comprising indicator data values of the users with respect to a plurality of indicators, wherein the user representation data comprises demographic attributes, interest features, consumption features, location features, device attributes, behavioral data, social data, and the like.

Specifically, the acquired user portrait data all include index data values of preset indexes, for example, the preset indexes include the number of contents browsed by the user, the number of contents collected by the user, the duration of reading and consulting by the user, and the like, where the index data values of the indexes are output in a continuous value form and a data range cannot be determined. It is understood that the user image data may include an index data value of a preset index or index data values of a plurality of preset indexes, and the number of the preset indexes is not limited herein.

Step S102, clustering processing is carried out on the index data values, the index data values are divided into a plurality of cluster categories, and a cluster center point corresponding to each cluster category is determined.

The clustering is a multivariate statistical analysis method for classifying samples or indexes according to the reason of 'clustering of objects', and the clustering is to divide similar objects into different groups or more subsets by a static classification method so that member objects in the same subset have similar attributes. And clustering the index data values to divide the index data values into a plurality of clustering categories and determine the clustering center point of each clustering category.

For example, the clustering process on the index data values, dividing the index data values into a plurality of cluster categories, and determining a cluster center point of each cluster category may specifically include: and clustering the index data values based on a kmeans model, dividing the index data values into a plurality of clustering categories, and determining a clustering center point corresponding to each clustering category. For example, a plurality of index data values are clustered by using a kmeans model, and the index data values are divided into five cluster categories, and a cluster center point of each cluster category is determined.

The kmeans model is a clustering model for clustering according to distance, and is applied to continuous data. The kmeans model has the following specific algorithm: selecting k data objects from the n data objects as initial clustering centers, wherein k can be set by a user and/or randomly determined; calculating the distance from the rest data objects to each clustering center, and dividing the distance to the nearest clustering center to form clusters; and finally, calculating the clustering central point value of each cluster, namely an average value, and resetting the clustering center according to the clustering central point value.

In an embodiment, the clustering process on the index data values based on the kmeans model specifically includes the following steps: performing exception removal processing on a plurality of index data values; and clustering the index data values subjected to exception removal processing based on a kmeans model. Illustratively, the performing of the exception removal processing on the plurality of index data values includes performing binning processing on the plurality of index data values to filter abnormal data and/or garbage data, so as to avoid that the abnormal data or the garbage data affects the clustering result.

Illustratively, the performing of the exception removal processing on the plurality of index data values includes performing binning processing on the plurality of index data values to filter abnormal data and/or garbage data, so as to avoid that the abnormal data or the garbage data affects the clustering result. Specifically, the upper and lower binning bounds of the plurality of index data values are obtained, the minimum value greater than the upper binning bound is set to be the maximum value less than the upper binning bound, and the maximum value less than the lower binning bound is set to be the minimum value greater than the lower binning bound, wherein the upper and lower binning bounds are calculated as follows:

a. the upper quartile, which indicates that only 1/4 of all index data values are greater than U, is set to U.

b. The lower quartile, which represents only 1/4 of all index data values, is set to L.

c. Setting the difference value between the upper quartile and the lower quartile as IQR, namely: IQR ═ U-L.

d. Setting the upper bound of the sub-box as U +1.5IQR, and setting the lower bound of the sub-box as: l-1.5 IQR.

e. And performing binning processing on the index data value according to the upper binning boundary and the lower binning boundary to remove abnormal data.

By performing exception removal processing on the index data values, the influence of abnormal data or junk data on the clustering result can be avoided, so that the accuracy of the clustering result is improved. In an embodiment, as shown in fig. 2, the clustering process is performed on the index data values based on the kmeans model, the index data values are divided into a plurality of cluster categories, and a cluster center point of each cluster category is determined, which specifically includes step S1021 and step S1022.

Step S1021, determining a plurality of sample data values from the index data values, clustering the sample data values based on a kmeans model, dividing the sample data values into a plurality of clustering categories, and determining a first central point corresponding to each clustering category.

Illustratively, a plurality of sample data values are determined in the index data values through a preset selection rule, and the sample data values are used for training a kmeans model.

In one embodiment, the user preference prediction method further includes obtaining preference data of a user corresponding to the index data value and determining a plurality of sample data values according to the preference data, wherein the preference data includes an interest value of the user in a specific target. For example, preference data may be annotated manually or generated from the user representation data by a preference prediction model. And determining a plurality of sample data values according to the preference data is favorable for selecting the sample data values which can represent the user better, so that the accuracy of user preference prediction is improved.

For example, according to a preset selection rule, user portrait data corresponding to preference data with the interest value of 0 of the user to the specific target is removed, and then 20% of user portrait data in the user portrait data remaining after the user portrait data corresponding to the preference data with the interest value of 0 of the user to the specific target is removed is randomly selected as a sample data value, and the sample data value is used for training the kmeans model. The accuracy of user preference prediction is improved by eliminating the user portrait data corresponding to the preference data of which the interest value of the user to the specific target is 0.

Specifically, model training is performed on the kmeans model by using the plurality of sample data values. And dividing the plurality of sample data values into a plurality of clustering categories through clustering processing of a kmeans model, and determining a first central point of each clustering category.

For example, the kmeans model is subjected to model training by using the plurality of sample data values, and the plurality of sample data values are divided into five cluster categories through clustering processing of the kmeans model, and a first central point of each cluster category is determined.

Exemplarily, the clustering category and the first center point are stored as model parameters of a trained kmeans model, and the model parameters are used for clustering the index data values. By determining a plurality of sample data values in the index data values and training the kmeans model according to the sample data values, the clustering accuracy of the kmeans model is improved, and a more accurate clustering result is obtained.

Step S1022, based on the kmeans model, performing clustering processing on a plurality of index data values according to first central points corresponding to the plurality of clustering categories, dividing the plurality of index data values into the plurality of clustering categories, determining second central points corresponding to the clustering categories, and determining the second central points as clustering central points.

And clustering the plurality of index data values according to a first central point obtained by training a plurality of sample data values determined in the plurality of index data values and the number of the clustering categories to obtain a second central point, so that the clustering efficiency of the kmeans model is further improved, and the accuracy of a clustering result is improved.

It can be understood that the cluster center point corresponding to the index data value may be a first center point or a second center point.

In one embodiment, the clustering process of the index data values based on the kmeans model in step S102 includes: performing exception removal processing on a plurality of index data values; and clustering the index data values subjected to exception removal processing based on a kmeans model.

In one embodiment, the method for predicting user preference further includes performing a de-exception process on the plurality of index data values, and the determining a plurality of sample data values from the plurality of index data values in step S1021 includes: and determining a plurality of sample data values from the plurality of index data values subjected to the exception removal processing.

Illustratively, the performing of the exception removal processing on the plurality of index data values includes performing binning processing on the plurality of index data values to filter abnormal data and/or garbage data, so as to avoid that the abnormal data or the garbage data affects the clustering result. By performing exception removal processing on the index data values, the influence of abnormal data or junk data on the clustering result can be avoided, so that the accuracy of the clustering result is improved.

Step S103, performing data regression on each index data value according to the index data value and the clustering center point corresponding to the clustering category of the index data value to obtain a standard data value corresponding to each index data value.

Illustratively, data regression is performed on the index data value according to the index data value, the cluster center point corresponding to the index data value and a preset data regression formula, so as to obtain a standard data value corresponding to the index data value.

In an embodiment, as shown in fig. 3, performing data regression on the index data value according to the index data value and a cluster center point corresponding to the index data value to obtain a standard data value corresponding to the index data value specifically includes steps S1031 to S1033.

And step S1031, sorting the clustering center points according to the size of the center point value of the clustering center points, and determining the clustering target value of the clustering category corresponding to the clustering center point according to the sorting result.

Illustratively, according to the size of the center point value of the cluster center point, the cluster center points are sorted in an ascending order, and the cluster target value of the cluster category corresponding to the cluster center point is determined according to the ascending order result.

For example, the clustering target value may be determined according to the value range after the index value discretization processing and the number of the clustering categories. For example, if the value range after the discretization processing of the expected index value is 0 to 5 and the number of the cluster categories is 5, it is determined that the cluster target values corresponding to the cluster center points of the 5 cluster categories are 1, 2, 3, 4, and 5, respectively, or 0, 1, 2, 3, and 4, respectively; for example, if the value range after the desired index value discretization is 0 to 10 and the number of cluster types is 5, the cluster target values corresponding to the cluster center points of the 5 cluster types are determined to be 2, 6, 8, and 10, respectively, 0, 2, 4, 6, and 8, respectively, or 1, 3, 5, 7, and 9, respectively. It can be understood that the clustering target value may be an upper limit, a lower limit, or a central value of a standard data value expected to be obtained after discretization of the index data value of the corresponding clustering class.

Specifically, the clustering center points are sorted in an ascending order according to the size of the center point value, for example, the kmean model determines five clustering categories and five clustering center points of the five clustering categories, after the five clustering center points are sorted in the ascending order according to the size of the center point value, the clustering target values of the five clustering categories are sequentially 1, 2, 3, 4 and 5, wherein the clustering target value corresponding to the clustering center point with the smallest center point value is 1, and the clustering target value corresponding to the clustering center point with the largest center point value is 5, so that the standard data value in the range of 0-5 is obtained after the index data value is subjected to data regression.

Step S1032, according to the clustering target value of the clustering category, determining the clustering target value corresponding to the index data value.

Illustratively, the cluster type corresponding to the index data value is determined according to the index data value, and the cluster target value corresponding to the index data value is determined according to the cluster target value of the cluster type.

Step S1033, performing data regression on the index data value according to a preset data regression formula and a clustering target value corresponding to the index data value, to obtain a standard data value corresponding to the index data value.

Illustratively, a data regression formula is set in advance according to the type of the user preference prediction, and the data regression formula is used for performing data regression on the index data value so as to improve the accuracy of the user preference prediction.

In an embodiment, as shown in fig. 4, the performing data regression on the index data value according to a preset data regression formula and the clustering target value corresponding to the index data value in step S1033 specifically includes steps S331 to S333.

Step S331, if the index data value is larger than the center point value of the clustering center point corresponding to the index data value, performing data regression on the index data value according to the clustering target value corresponding to the index data value based on a first regression formula to obtain a standard data value corresponding to the index data value.

The first regression formula is: (a-b) ÷ wx 0.5+ L-N, where a is the index data value, b is the center point value, W is the first difference value, L is the clustering target value, and N is 1/2 of the difference between two adjacent clustering target values.

Specifically, a first difference value is determined according to a difference value between each cluster center point and an index data value corresponding to each cluster center point. For example, the maximum positive difference between each cluster center point and the index data value corresponding to each cluster center point is determined as the first difference. For example, the center point value of a cluster center point is 2.15, and the largest index data value among the index data values corresponding to the cluster center point is 3.15, so that the largest positive difference value between the cluster center point and the index data values corresponding to the cluster center points is 1, that is, the first difference value is 1.

Illustratively, the first difference is 1.5, the clustering target value is 3, the difference between two adjacent clustering target values is 1, and if an index data value is 2.8, which is greater than the center point value of the clustering center point corresponding to the index data value by 1.8, the index data value is subjected to data regression based on a first regression formula, that is, data is substituted into the first regression formula, that is, (2.8-1.8)/1.5 × 0.5+3-0.5, so as to obtain a standard data value corresponding to the index data value of 2.83333333.

Step S332, if the index data value is equal to the center point value of the cluster center point corresponding to the index data value, performing data regression on the index data value according to the cluster target value corresponding to the index data value based on a second regression formula to obtain a standard data value corresponding to the index data value.

The second regression formula is: and L-N, wherein L is the clustering target value, and N is 1/2 of the difference between two adjacent clustering target values.

Illustratively, the difference between the two adjacent clustering target values is 1, and if an index data value is 1.8 and is equal to the center point value of the clustering center point corresponding to the index data value by 1.8, performing data regression on the index data value based on a second regression formula, that is, substituting the data into the second regression formula, that is, 3-0.5, to obtain a standard data value corresponding to the index data value by 2.5.

Step S333, if the index data value is smaller than the center point value of the clustering center point corresponding to the index data value, performing data regression on the index data value according to the clustering target value corresponding to the index data value based on a third regression formula to obtain a standard data value corresponding to the index data value.

The third regression formula is: (a-b)/Vx-0.5 + L-N, wherein a is the index data value, b is the center point value, V is the second difference value, L is the clustering target value, and N is 1/2 of the difference value between two adjacent clustering target values.

Specifically, a difference between each cluster center point and an index data value corresponding to each cluster center point is determined to determine a second difference. For example, the second difference is determined as the minimum negative difference between each cluster center point and the index data value corresponding to each cluster center point. For example, the center point value of a cluster center point is 2.15, and the minimum index data value in the cluster category corresponding to the cluster center point is 1.0, so the minimum negative difference between the cluster center point and the index data value corresponding to each cluster center point is-1.15, that is, the second difference is-1.15. Illustratively, the second difference value is-1.15, the clustering target value is 3, the difference value between two adjacent clustering target values is 1, and if an index data value is 0.8 smaller than the center point value of the clustering center point corresponding to the index data value by 1.8, data regression is performed on the index data value based on a third regression formula, that is, data is substituted into the third regression formula (0.8-1.8)/-1.15 × -0.5+3-0.5, so as to obtain a standard data value corresponding to the index data value of 2.065217391.

The standard data value corresponding to the index data value is obtained by performing data regression on the index data value, a large-range data value can be dispersed to a small range, the distribution of original data is not damaged, and the index data value is more visual, so that the accuracy of user preference prediction is improved.

In an embodiment, after performing data regression on the index data value according to the index data value and the cluster center point corresponding to the index data value to obtain a standard data value corresponding to the index data value, the method specifically further includes: performing data processing on the standard data value, and taking the standard data value after the data processing as a standard data value corresponding to the index data value; the standard data values are managed based on a blockchain technique.

Illustratively, the data processing the standard data value includes data processing a decimal point reserved number of bits on the standard data value. For example, a standard data value corresponding to the index data value obtained by performing data regression on the index data value based on a third regression formula is 2.065217391, and the standard data value is subjected to decimal point 1-bit-left data processing, that is, the standard data value is processed to 2.1. By carrying out data processing of reserving a plurality of bits for decimal points on the standard data value, the data amount and the calculated amount can be reduced while certain precision is kept, and the prediction efficiency is improved.

It is emphasized that, to further ensure the privacy and security of the standard data value, the standard data value may also be stored in a node of a blockchain, and the standard data value is managed based on a blockchain technique.

In an embodiment, after performing data regression on the index data value according to the index data value and a clustering center point corresponding to the index data value to obtain a standard data value corresponding to the index data value, the method further includes: and generating a user portrait numerical value lookup table according to the standard data value.

Specifically, the user portrait value lookup table includes an index of the user and a standard data value corresponding to the index, where the index may include user browsing content, user collection content, user reading consultation, and the like. The standard data value corresponding to the index can be conveniently inquired by setting the user portrait numerical value query table, so that the efficiency of user preference prediction is improved. To further ensure privacy and security of the user representation value lookup table, the user representation value lookup table may be managed, illustratively, based on blockchain techniques.

Illustratively, the generated user representation numerical lookup table is stored on a blockchain, and a user can obtain the user representation numerical lookup table from the blockchain, wherein the blockchain comprises a user management module, and the user management module is used for being responsible for identity information management of the user, including maintaining public and private key generation, such as account management, key management, and user real identity and blockchain address correspondence maintenance, such as authority management and the like. When a request for a user to check a user portrait numerical query table is obtained, the identity and corresponding authority of the user are verified based on a block chain technology, so that the safety of the user portrait numerical query table is ensured. Meanwhile, the generated user portrait numerical query table is stored in the block chain, so that the user portrait numerical query table is not easy to be tampered, the accuracy of the user portrait numerical query table is ensured, and the accuracy of user preference prediction is improved.

And step S104, obtaining user portrait data to be predicted, wherein the user portrait data to be predicted comprises a target data value, and the target data value is an index data value of the preset index.

Illustratively, a server receives a user preference prediction request sent by a client, where the preference prediction request includes user image data to be predicted, or the server obtains the user image data to be predicted according to the preference prediction request sent by the client, where the user image data to be predicted includes index data values corresponding to multiple indexes, where the user image data includes demographic attributes, interest characteristics, consumption characteristics, location characteristics, device attributes, behavior data, social data, and the like, and the index data values include user browsing content, user collection content, user reading consultation, and the like.

And step S105, determining a standard data value corresponding to the target index data value according to the standard data value corresponding to each index data value.

For example, the standard data value corresponding to the index data value in the user portrait data may be determined by querying a preset user portrait numerical lookup table, where the user portrait numerical lookup table includes index data values of a plurality of indexes and standard data values corresponding to the index data values.

For example, based on the kmeans model, the cluster type and the cluster center point corresponding to the index data value may be determined, and data regression may be performed on the index data value according to the index data value and the cluster center point corresponding to the index data value, so as to obtain a standard data value corresponding to the index data value.

And S106, replacing the target data value in the user portrait data with the standard data value, and predicting corresponding preference data of the user according to the replaced user portrait data.

For example, after replacing the indicator data value in the user representation data with a standard data value corresponding to the determined indicator data value, preference data of the corresponding user is predicted based on the user representation data including the standard data value, i.e., the replaced user representation data.

In one embodiment, the predicting preference data of the corresponding user according to the replaced user portrait data may include: and predicting the preference data of the corresponding user according to the replaced user portrait data based on a preference prediction model.

For example, the preference prediction model may be a pre-trained decision tree model and/or a support vector machine model, and the preference prediction model is used for generating a prediction result of user preference according to the user portrait data, and the prediction result comprises preference data of the user.

The user preference prediction method provided in the foregoing embodiment includes obtaining portrait data of a plurality of users, where the portrait data of each user includes an index data value of a preset index, performing clustering processing on the index data values, dividing the index data values into a plurality of cluster categories, determining a cluster center point corresponding to each cluster category, performing data regression on each index data value according to each index data value and the cluster center point corresponding to each cluster category to obtain a standard data value corresponding to each index data value, then obtaining portrait data of a user to be predicted, where the portrait data of the user to be predicted includes a target data value, the target data value is the index data value of the preset index, and then determining the standard data value corresponding to the target index data value according to the standard data value corresponding to each index data value, and finally, replacing the target data value in the user portrait data with the standard data value, and predicting the preference data of the corresponding user according to the replaced user portrait data, so that the embodiment of the application can effectively solve the problems that the numerical range of the index value in the user portrait is not uniform, and the data value is a continuous value which is not visual, and discretizes the index value in the user portrait into discrete values distributed in a certain range so as to improve the accuracy of user preference prediction.

Referring to fig. 5, fig. 5 is a schematic block diagram of a user preference prediction apparatus according to an embodiment of the present application, where the user preference prediction apparatus is configured to perform the user preference prediction method. Wherein, the user preference prediction device can be configured in a server or a terminal.

The server may be an independent server or a server cluster. The terminal can be an electronic device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant and a wearable device.

As shown in fig. 5, the user preference prediction apparatus 400 includes: an information acquisition module 401, a cluster processing module 402, a data regression module 403, a data acquisition module 404, a numerical value determination module 405, and a preference prediction module 406.

An information obtaining module 401, configured to obtain portrait data of multiple users, where the portrait data of each user includes an index data value of a preset index;

a clustering module 402, configured to perform clustering on the index data values, divide the index data values into multiple clustering categories, and determine a clustering center point corresponding to each clustering category;

a data regression module 403, configured to perform data regression on each index data value according to each index data value and a cluster center point corresponding to a cluster type of the index data value, to obtain a standard data value corresponding to each index data value;

a data obtaining module 404, configured to obtain user portrait data to be predicted, where the user portrait data to be predicted includes a target data value, and the target data value is an index data value of the preset index;

a numerical value determining module 405, configured to determine a standard data value corresponding to the target index data value according to the standard data value corresponding to each index data value;

a preference prediction module 406, configured to replace the target data value in the user portrait data with the standard data value, and predict preference data of a corresponding user according to the replaced user portrait data.

In an embodiment, as shown in fig. 6, the cluster processing module 402 includes a first clustering submodule 4021 and a second clustering submodule 4022.

The first clustering submodule 4021 is configured to determine a plurality of sample data values from the index data values, perform clustering processing on the plurality of sample data values based on a kmeans model, divide the plurality of sample data values into a plurality of clustering categories, and determine a first central point corresponding to each clustering category.

The second clustering submodule 4022 is configured to perform clustering processing on a plurality of index data values according to the first central points corresponding to the plurality of clustering categories, divide the plurality of index data values into the plurality of clustering categories, determine a second central point corresponding to each clustering category, and determine the second central point as a clustering central point, based on the kmeans model.

In one embodiment, as shown in fig. 6, the data regression module 403 includes a numerical determination sub-module 4031, a difference determination sub-module 4032 and a data regression sub-module 4033.

The numerical value determining submodule 4031 is configured to sort the cluster center points according to the sizes of the center point values of the cluster center points, and determine a cluster target value of a cluster category corresponding to the cluster center point according to a sorting result.

And the difference determining submodule 4032 is configured to determine, according to the clustering target value of the clustering category, a clustering target value corresponding to the index data value.

And the data regression submodule 4033 is configured to perform data regression on the index data value according to a preset data regression formula and the clustering target value corresponding to the index data value, so as to obtain a standard data value corresponding to the index data value.

It should be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatus and the modules and units described above may refer to the corresponding processes in the foregoing embodiment of the document abstract extraction method, and are not described herein again.

The user preference prediction apparatus provided by the above-mentioned embodiment may be implemented in the form of a computer program, which can be run on a computer device as shown in fig. 7.

Referring to fig. 7, fig. 7 is a schematic block diagram illustrating a structure of a computer device according to an embodiment of the present disclosure. The computer device may be a server or a terminal device.

As shown in fig. 7, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a nonvolatile storage medium and an internal memory.

The non-volatile storage medium may store an operating system and a computer program. The computer program includes program instructions that, when executed, cause a processor to perform any of the user preference prediction methods.

The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.

The internal memory provides an environment for the execution of a computer program on a non-volatile storage medium, which when executed by a processor, causes the processor to perform any of the methods for user preference prediction.

The network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein, in one embodiment, the processor is configured to execute a computer program stored in the memory to implement the steps of:

In an embodiment, when implementing the clustering on the index data values, dividing the index data values into a plurality of cluster categories, and determining a cluster center point of each cluster category, the processor is configured to implement:

and clustering the index data values based on a kmeans model, dividing the index data values into a plurality of clustering categories, and determining a clustering center point corresponding to each clustering category.

In an embodiment, when implementing the clustering process on the index data values based on the kmeans model, dividing the index data values into a plurality of cluster categories, and determining a cluster center point corresponding to each cluster category, the processor is configured to implement:

determining a plurality of sample data values from the index data values, clustering the sample data values based on a kmeans model, dividing the sample data values into a plurality of clustering categories, and determining a first central point corresponding to each clustering category;

based on the kmeans model, clustering a plurality of index data values according to first central points corresponding to the plurality of clustering categories, dividing the plurality of index data values into the plurality of clustering categories, determining second central points corresponding to the clustering categories, and determining the second central points as clustering central points.

In one embodiment, the processor, when implementing the kmeans model-based clustering of the plurality of indicator data values, is configured to implement:

performing exception removal processing on a plurality of index data values;

and clustering the index data values subjected to exception removal processing based on a kmeans model.

In an embodiment, when the processor performs data regression on the index data value according to the index data value and the clustering center point corresponding to the index data value to obtain the standard data value corresponding to the index data value, the processor is configured to:

sorting the clustering central points according to the size of the central point values of the clustering central points, and determining clustering target values of clustering categories corresponding to the clustering central points according to sorting results;

determining a clustering target value corresponding to the index data value according to the clustering target value of the clustering category;

and performing data regression on the index data value according to a preset data regression formula and the clustering target value corresponding to the index data value to obtain a standard data value corresponding to the index data value.

In an embodiment, when implementing the data regression on the index data value according to a preset data regression formula and the clustering target value corresponding to the index data value, the processor is configured to implement:

if the index data value is larger than the central point value of the clustering central point corresponding to the index data value, performing data regression on the index data value according to the clustering target value corresponding to the index data value based on a first regression formula to obtain a standard data value corresponding to the index data value; and/or

If the index data value is equal to the central point value of the clustering central point corresponding to the index data value, performing data regression on the index data value according to the clustering target value corresponding to the index data value based on a second regression formula to obtain a standard data value corresponding to the index data value; and/or

And if the index data value is smaller than the central point value of the clustering central point corresponding to the index data value, performing data regression on the index data value according to the clustering target value corresponding to the index data value based on a third regression formula to obtain a standard data value corresponding to the index data value.

In an embodiment, after the performing data regression on the index data value according to the index data value and the cluster center point corresponding to the index data value to obtain the standard data value corresponding to the index data value, the processor is further configured to perform:

generating a numerical value query table according to the index data value and a standard data value corresponding to the index data value;

performing data processing on the standard data value, and taking the standard data value after the data processing as a standard data value corresponding to the index data value to update the numerical value query table; and managing the numerical value lookup table based on a block chain technology.

Embodiments of the present application also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed, a method implemented by the computer program instructions may refer to the embodiments of the user preference prediction method in the present application.

The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device.

The user preference prediction apparatus, the computer device, and the computer-readable storage medium provided in the foregoing embodiments obtain portrait data of a plurality of users, where portrait data of each user includes an index data value of a preset index, perform clustering processing on the index data values, divide the index data values into a plurality of cluster categories, determine a cluster center point corresponding to each cluster category, perform data regression on the index data values according to the index data values and the cluster center points corresponding to the cluster categories, respectively, to obtain a standard data value corresponding to each index data value, and then obtain portrait data of a user to be predicted, where portrait data of the user to be predicted includes a target data value, and the target data value is an index data value of the preset index, and then according to the standard data value corresponding to each index data value, and determining a standard data value corresponding to the target index data value, finally replacing the target data value in the user portrait data with the standard data value, and predicting the preference data of the corresponding user according to the replaced user portrait data, so that the embodiment of the application can effectively solve the problems that the numerical range of the index value in the user portrait is not uniform and the data value is a continuous value and is not visual, and discretizing the index value in the user portrait into discrete values distributed in a certain range so as to improve the accuracy of user preference prediction.

It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for predicting user preferences, comprising:

2. The method according to claim 1, wherein the clustering the index data values, dividing the index data values into a plurality of cluster categories, and determining a cluster center point of each cluster category comprises:

3. The method according to claim 2, wherein the clustering the index data values based on the kmeans model, dividing the index data values into a plurality of cluster categories, and determining a cluster center point corresponding to each cluster category includes:

4. The method of claim 2, wherein the clustering the plurality of metric data values based on the kmeans model comprises:

performing exception removal processing on a plurality of index data values;

5. The method according to any one of claims 1 to 4, wherein performing data regression on the index data value according to the index data value and a cluster center point corresponding to the index data value to obtain a standard data value corresponding to the index data value includes:

6. The method according to claim 5, wherein performing data regression on the index data value according to a preset data regression formula and a clustering target value corresponding to the index data value includes:

7. The method according to claim 6, wherein after performing data regression on the index data value according to the index data value and a cluster center point corresponding to the index data value to obtain a standard data value corresponding to the index data value, the method further comprises:

performing data processing on the standard data value, and taking the standard data value after the data processing as a standard data value corresponding to the index data value to update the numerical value query table;

and managing the numerical value lookup table based on a block chain technology.

8. A user preference prediction apparatus, comprising:

9. A computer device, wherein the computer device comprises a memory and a processor;

the memory is used for storing a computer program;

the processor for executing the computer program and implementing the user preference prediction method of any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to implement the user preference prediction method according to any one of claims 1 to 7.