CN111784069B

CN111784069B - User preference prediction method, device, equipment and storage medium

Info

Publication number: CN111784069B
Application number: CN202010659275.1A
Authority: CN
Inventors: 余玉霞
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2023-11-14
Anticipated expiration: 2040-07-09
Also published as: CN111784069A

Abstract

The application relates to artificial intelligence and discloses a user preference prediction method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring portrait data of a plurality of users, wherein the portrait data comprise index data values; clustering is carried out on the index data values, the index data values are divided into a plurality of clustering categories, and clustering center points corresponding to the clustering categories are determined; carrying out data regression on the index data value according to the index data value and the clustering center point corresponding to the clustering category of the index data value to obtain a standard data value corresponding to the index data value; acquiring a target data value of user portrait data to be predicted; determining a standard data value corresponding to the target index data value according to the standard data value corresponding to each index data value; and replacing the target data value with the standard data value, and predicting the preference data of the corresponding user according to the replaced user portrait data, so as to realize discretization processing of the index value in the user portrait into discrete values distributed in a certain range, thereby improving the accuracy of user preference prediction.

Description

User preference prediction method, device, equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for predicting user preference.

Background

The big data technology is an information processing technology which takes all data resources of any system as objects and discovers the correlation relationship expressed among data from the data resources, and is widely applied to the aspects of flow optimization, targeted message and advertisement pushing, user personalized service and improvement and the like of the Internet at present. User portraits are an important application of big data technology, and the goal is to establish descriptive tag attributes for users in a plurality of dimensions, so that real personal characteristics of various aspects of the users are outlined by utilizing the tag attributes, further, user requirements can be explored by utilizing the user portraits, user preferences can be analyzed, and the user experience which is more efficient and targeted for information transmission and is more close to personal habits can be provided for the users by matching the user portraits. However, the problems that the index value range is not uniform, the data value is continuous and the like easily occur in the existing user portrait process, so that the index value in the user portrait is not visual, and the user preference cannot be accurately predicted according to the index value.

Therefore, how to improve the accuracy of user preference prediction is a problem to be solved.

Disclosure of Invention

The embodiment of the application provides a user preference prediction method, a device, equipment and a storage medium, which can discretize index values in a user portrait into discrete values distributed in a certain range so as to improve the accuracy of user preference prediction.

In a first aspect, the present application provides a user preference prediction method, the method comprising:

acquiring portrait data of a plurality of users, wherein the portrait data of each user comprises index data values of preset indexes;

clustering is carried out on the index data values, the index data values are divided into a plurality of clustering categories, and clustering center points corresponding to the clustering categories are determined;

according to the index data values and the clustering center points corresponding to the clustering categories of the index data values, respectively carrying out data regression on the index data values to obtain standard data values corresponding to the index data values;

acquiring user portrait data to be predicted, wherein the user portrait data to be predicted comprises a target data value, and the target data value is an index data value of the preset index;

Determining a standard data value corresponding to the target index data value according to the standard data value corresponding to each index data value;

and replacing the target data value in the user portrait data with the standard data value, and predicting the preference data of the corresponding user according to the replaced user portrait data.

In a second aspect, the present application also provides a user preference prediction apparatus, the apparatus comprising:

the information acquisition module is used for acquiring the portrait data of a plurality of users, wherein the portrait data of each user comprises index data values of preset indexes;

the clustering processing module is used for carrying out clustering processing on the index data values, dividing the index data values into a plurality of clustering categories and determining a clustering center point corresponding to each clustering category;

the data regression module is used for respectively carrying out data regression on each index data value according to the index data value and the clustering center point corresponding to the clustering category of the index data value to obtain a standard data value corresponding to each index data value;

the data acquisition module is used for acquiring user portrait data to be predicted, wherein the user portrait data to be predicted comprises a target data value, and the target data value is an index data value of the preset index;

The numerical value determining module is used for determining the standard data value corresponding to the target index data value according to the standard data value corresponding to each index data value;

and the preference prediction module is used for replacing the target data value in the user portrait data with the standard data value and predicting the preference data of the corresponding user according to the replaced user portrait data.

In a third aspect, the present application also provides a computer device comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and implement the user preference prediction method as described above when the computer program is executed.

In a fourth aspect, the present application also provides a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement a user preference prediction method as described above.

The embodiment of the application discloses a user preference prediction method, a device, equipment and a storage medium, wherein the image data of a plurality of users are obtained, wherein the image data of each user comprises index data values of preset indexes, clustering processing is carried out on the index data values, the index data values are divided into a plurality of clustering categories, clustering center points corresponding to the clustering categories are determined, then data regression is carried out on the index data values according to the index data values and the clustering center points corresponding to the clustering categories, standard data values corresponding to the index data values are obtained, user image data to be predicted are obtained, the user image data to be predicted comprises target data values, the target data values are index data values of the preset indexes, standard data values corresponding to the target index data values are determined according to the standard data values corresponding to the index data values, finally the target data values in the user image data are replaced with the standard data values, the user image data values are predicted according to the index data values corresponding to the index data values of the replaced user image data, the user image data are discrete, and the user preference value within a certain visual range is not predicted, and the user preference is not in a uniform value range.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a user preference prediction method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a sub-process of clustering a plurality of index data values based on the kmeans model shown in FIG. 1, dividing the plurality of index data values into a plurality of cluster categories, and determining a cluster center point of each cluster category;

FIG. 3 is a schematic flow chart of a sub-process for performing data regression on the index data values according to the index data values and the cluster center points corresponding to the index data values in FIG. 1 to obtain standard data values corresponding to the index data values;

FIG. 4 is a schematic flow chart illustrating a sub-process of performing data regression on the index data values according to a preset data regression formula and the clustering target values corresponding to the index data values in FIG. 3;

FIG. 5 is a schematic block diagram of a user preference prediction apparatus provided by an embodiment of the present application;

FIG. 6 is a schematic block diagram of another user preference prediction apparatus provided by an embodiment of the present application;

fig. 7 is a schematic block diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

The embodiment of the application provides a user preference prediction method, a user preference prediction device, computer equipment and a computer readable storage medium. The user preference prediction method can be applied to terminal equipment or servers, wherein the terminal equipment can be mobile phones, tablet computers, notebook computers, desktop computers, personal digital assistants, wearable equipment and other electronic equipment, and the servers can be single servers or server clusters formed by a plurality of servers. The following explanation will be made taking an example in which the user preference prediction method is applied to a server.

Specifically, the user preference prediction method includes: acquiring a plurality of user portrait data, wherein the plurality of user portrait data comprise index data values of an index; clustering the index data values, dividing the index data values into a plurality of clustering categories, and determining a clustering center point of each clustering category; carrying out data regression on the index data value according to the index data value and the clustering center point corresponding to the index data value to obtain a standard data value corresponding to the index data value; acquiring user portrait data to be predicted, wherein the user portrait data comprises the index data value; determining a standard data value corresponding to the index data value in the user portrait data to be predicted; and replacing the index data value in the user portrait data with the standard data value, and predicting the preference data of the corresponding user according to the replaced user portrait data. The user preference prediction method can effectively solve the problems that the numerical range of index values in the user portrait is not uniform and the data value is continuous and is not intuitive, and discretizes the index values in the user portrait into discrete values distributed in a certain range so as to improve the accuracy of user preference prediction.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic flowchart of a user preference prediction method according to an embodiment of the present application.

As shown in fig. 1, the user preference prediction method includes steps S101 to S106.

Step S101, portrait data of a plurality of users are obtained, wherein the portrait data of each user comprises index data values of preset indexes.

Illustratively, user portrayal data for a plurality of users is obtained from a user information database, the user portrayal data being used to draw a user portrayal for predicting user preferences, the user portrayal data comprising index data values for a plurality of indices of the user, wherein the user portrayal data comprises demographic attributes, interest features, consumption features, location features, device attributes, behavioral data, social data, etc., the user portrayal comprising a user behavioral portrayal, a user health portrayal, an enterprise credit portrayal, a personal credit portrayal, a static product portrayal, a rotating device portrayal, a social portrayal, an economic portrayal, etc.

Specifically, the obtained plurality of user portrait data all include index data values of preset indexes, for example, the preset indexes include the number of browsed contents of a user, the number of collected contents of the user, the time period for the user to read consultation and the like, wherein the index data values of the indexes are output in a continuous value form and cannot define a data range. It will be appreciated that the user portrait data may include an index data value of a preset index or index data values of a plurality of preset indexes, where the number of preset indexes is not limited.

Step S102, clustering is carried out on the index data values, the index data values are divided into a plurality of clustering categories, and clustering center points corresponding to the clustering categories are determined.

The data clustering Clusteranalysis is a technology for static data analysis, is a multi-element statistical analysis method for classifying samples or indexes according to the theory of 'object clustering', and is characterized in that similar objects are divided into different groups or more subsets by a static classification method, so that member objects in the same subset have similar attributes. And clustering the index data values to divide the index data values into a plurality of clustering categories and determine the clustering center point of each clustering category.

Exemplary, the clustering processing of the plurality of index data values, dividing the plurality of index data values into a plurality of cluster categories, and determining a cluster center point of each cluster category may specifically include: and carrying out clustering processing on the index data values based on a kmeans model, dividing the index data values into a plurality of clustering categories, and determining a clustering center point corresponding to each clustering category. For example, clustering is performed on a plurality of index data values by using a kmeans model, and the plurality of index data values are divided into five cluster categories and cluster center points of the cluster categories are determined.

The kmeans model is a clustering model for clustering according to distance and is applied to continuity data. The kmeans model specifically works as follows: selecting k data objects from n data objects as initial clustering centers, wherein k can be set and/or randomly determined by a user; calculating the distance from the rest data object to each clustering center, and dividing the distance to the nearest clustering center to form clusters; and finally, calculating a clustering center point value, namely an average value, of each cluster, and resetting the clustering center according to the clustering center point value.

In an embodiment, the clustering process is performed on a plurality of index data values based on a kmeans model, and specifically includes the following steps: performing exception removal processing on a plurality of index data values; and clustering the index data values subjected to the exception removal processing based on a kmeans model. Illustratively, the performing exception removal processing on the plurality of index data values includes performing box division processing on the plurality of index data values to filter exception data and/or garbage data, so as to avoid influence of the exception data or the garbage data on a clustering result.

Illustratively, the performing exception removal processing on the plurality of index data values includes performing box division processing on the plurality of index data values to filter exception data and/or garbage data, so as to avoid influence of the exception data or the garbage data on a clustering result. Specifically, the upper and lower boundaries of the sub-boxes of the index data values are obtained, the minimum value larger than the upper boundary of the sub-boxes is set to be smaller than the maximum value of the upper boundary of the sub-boxes, and the maximum value smaller than the lower boundary of the sub-boxes is set to be larger than the minimum value of the lower boundary of the sub-boxes, wherein the calculation methods of the upper and lower boundaries of the sub-boxes are as follows:

a. The upper quartile is set to U, which indicates that only 1/4 of all index data values are greater than U.

b. The lower quartile is set to L, and the upper quartile represents that only 1/4 of all index data values are smaller than L.

c. The difference between the upper quartile and the lower quartile is set to IQR, i.e.: iqr=u-L.

d. Setting the upper bound of the sub-tank as U+1.5IQR, and setting the lower bound of the sub-tank as: l-1.5IQR.

e. And carrying out box division processing on the index data value according to the box division upper bound and the box division lower bound so as to remove abnormal data.

The plurality of index data values are used for carrying out exception removal processing, so that the influence of abnormal data or junk data on a clustering result can be avoided, and the accuracy of the clustering result is improved. In one embodiment, as shown in fig. 2, the clustering process is performed on the plurality of index data values based on the kmeans model, the plurality of index data values are divided into a plurality of cluster categories, and a cluster center point of each cluster category is determined, which specifically includes step S1021 and step S1022.

And S1021, determining a plurality of sample data values from a plurality of index data values, carrying out clustering processing on the plurality of sample data values based on a kmeans model, dividing the plurality of sample data values into a plurality of clustering categories, and determining a first center point corresponding to each clustering category.

For example, a plurality of sample data values are determined from the plurality of index data values by a preset selection rule, the sample data values being used for training of the kmeans model.

In one embodiment, the user preference prediction method further includes obtaining preference data of a user corresponding to the index data value, and determining a plurality of sample data values according to the preference data, wherein the preference data includes interest values of the user in a specific target. For example, preference data may be manually annotated, or generated by a preference prediction model from the user representation data. Determining a plurality of sample data values from the preference data facilitates selection of sample data values that are more representative of the user, thereby improving accuracy of user preference predictions.

For example, according to a preset selection rule, rejecting user image data corresponding to preference data with the interest value of 0 of the user to the specific target, and randomly selecting 20% of user image data as sample data values from the user image data remaining after rejecting the user image data corresponding to preference data with the interest value of 0 of the user to the specific target, wherein the sample data values are used for training of a kmeans model. The accuracy of user preference prediction is improved by eliminating the user portrait data corresponding to the preference data with the interest value of 0 of the user to the specific target.

Specifically, model training is performed on the kmeans model by using the plurality of sample data values. And dividing the plurality of sample data values into a plurality of clustering categories through clustering processing of a kmeans model, and determining a first center point of each clustering category.

For example, model training is performed on the kmeans model by using the plurality of sample data values, and the plurality of sample data values are divided into five clustering categories through clustering processing of the kmeans model, and a first center point of each clustering category is determined.

The clustering category and the first center point are stored as model parameters of a trained kmeans model, wherein the model parameters are used for clustering the index data values. And determining a plurality of sample data values from the index data values and training the kmeans model according to the plurality of sample data values, so that the clustering accuracy of the kmeans model is improved, and a more accurate clustering result is obtained.

Step 1022, based on the kmeans model, clustering is performed on the index data values according to the first center points corresponding to the clustering categories, the index data values are divided into the clustering categories, the second center points corresponding to the clustering categories are determined, and the second center points are determined to be the clustering center points.

And carrying out clustering processing on the index data values according to the first center points obtained by training the sample data values determined in the index data values and the number of the clustering categories to obtain second center points, so that the clustering processing efficiency of the kmeans model is further improved, and the accuracy of a clustering result is improved.

It can be understood that the cluster center point corresponding to the index data value may be a first center point or a second center point.

In one embodiment, the clustering the plurality of index data values based on the kmeans model in step S102 includes: performing exception removal processing on a plurality of index data values; and clustering the index data values subjected to the exception removal processing based on a kmeans model.

In one embodiment, the method for predicting user preference further includes performing exception removal processing on the plurality of index data values, and determining a plurality of sample data values from the plurality of index data values in step S1021 includes: and determining a plurality of sample data values from the index data values subjected to the exception removal processing.

Illustratively, the performing exception removal processing on the plurality of index data values includes performing box division processing on the plurality of index data values to filter exception data and/or garbage data, so as to avoid influence of the exception data or the garbage data on a clustering result. The plurality of index data values are used for carrying out exception removal processing, so that the influence of abnormal data or junk data on a clustering result can be avoided, and the accuracy of the clustering result is improved.

And step 103, respectively carrying out data regression on each index data value according to the index data value and the clustering center point corresponding to the clustering category of the index data value to obtain the standard data value corresponding to each index data value.

The data regression is performed on the index data value according to the index data value, the clustering center point corresponding to the index data value and a preset data regression formula, so as to obtain a standard data value corresponding to the index data value.

In one embodiment, as shown in fig. 3, the data regression is performed on the index data value according to the index data value and the cluster center point corresponding to the index data value, so as to obtain the standard data value corresponding to the index data value, which specifically includes steps S1031 to S1033.

Step S1031, sorting the cluster center points according to the magnitude of the center point value of the cluster center point, and determining a cluster target value of the cluster category corresponding to the cluster center point according to the sorting result.

Illustratively, according to the magnitude of the central point value of the cluster central point, the cluster central points are subjected to ascending order, and the cluster target value of the cluster category corresponding to the cluster central point is determined according to the ascending order result.

For example, the clustering target value may be determined according to the numerical range after the index numerical value discretization processing and the number of clustering categories. For example, if the numerical range of the discretized expected index value is 0 to 5 and the number of the clustering categories is 5, determining that the clustering target values corresponding to the clustering center points of the 5 clustering categories are 1,2,3,4 and 5 or 0, 1,2,3 and 4 respectively; for example, if the number range of the discretized target values is 0 to 10 and the number of the cluster categories is 5, it is determined that the cluster target values corresponding to the cluster center points of the 5 cluster categories are respectively 2, 6, 8, 10, or respectively 0, 2, 4, 6, 8, or respectively 1, 3, 5, 7, 9. It can be appreciated that the cluster target value may be an upper limit, a lower limit or a central value of a standard data value expected to be obtained after the discretization of the index data value of the corresponding cluster category.

Specifically, the clustering center points are ranked in ascending order according to the magnitude of the center point value, for example, the kmeans model determines five clustering center points of five clustering categories and five clustering categories, after the five clustering center points are ranked in ascending order according to the magnitude of the center point value, the determined clustering target values of the five clustering categories are sequentially 1,2,3,4 and 5, wherein the clustering target value corresponding to the clustering center point with the smallest center point value is 1, the clustering target value corresponding to the clustering center point with the largest center point value is 5, and the standard data value in the range of 0-5 is obtained after the index data value is subjected to data regression.

Step S1032, determining the clustering target value corresponding to the index data value according to the clustering target value of the clustering class.

Illustratively, a cluster category corresponding to the index data value is determined according to the index data value, and a cluster target value corresponding to the index data value is determined according to a cluster target value of the cluster category.

And step S1033, carrying out data regression on the index data value according to a preset data regression formula and a clustering target value corresponding to the index data value to obtain a standard data value corresponding to the index data value.

The data regression formula is set in advance according to the type of the user preference prediction, and is used for carrying out data regression on the index data value so as to improve the accuracy of the user preference prediction.

In an embodiment, as shown in fig. 4, in step S1033, the data regression is performed on the index data value according to a preset data regression formula and a clustering target value corresponding to the index data value, and specifically includes steps S331 to S333.

And step 331, if the index data value is greater than the center point value of the clustering center point corresponding to the index data value, carrying out data regression on the index data value according to the clustering target value corresponding to the index data value based on a first regression formula so as to obtain the standard data value corresponding to the index data value.

The first regression equation is: (a-b)/(w×0.5+l-N, wherein a is an index data value, b is a center point value, W is a first difference value, L is a cluster target value, and N is 1/2 of a difference value between adjacent two cluster target values.

Specifically, a first difference value is determined according to the difference value between each cluster center point and the index data value corresponding to each cluster center point. For example, a maximum positive difference between each of the cluster center points and the index data value corresponding to each of the cluster center points is determined as a first difference. For example, the central point value of a cluster central point is 2.15, and the maximum index data value in the index data values corresponding to the cluster central points is 3.15, so that the maximum positive difference between the cluster central point and the index data values corresponding to the cluster central points is 1, that is, the first difference is 1.

For example, the first difference value is 1.5, the cluster target value is 3, the difference value between the two adjacent cluster target values is 1, if an index data value is 2.8 and is greater than a central point value 1.8 of a cluster central point corresponding to the index data value, data regression is performed on the index data value based on a first regression formula, namely, data is substituted into the first regression formula, namely, (2.8-1.8)/1.5x0.5+3-0.5, so as to obtain a standard data value corresponding to the index data value as 2.83333333.

Step S332, if the index data value is equal to the center point value of the cluster center point corresponding to the index data value, performing data regression on the index data value according to the cluster target value corresponding to the index data value based on a second regression formula, so as to obtain a standard data value corresponding to the index data value.

The second regression equation is: L-N, wherein L is a clustering target value, and N is 1/2 of the difference between two adjacent clustering target values.

For example, if the difference between the two adjacent clustering target values is 1, and if an index data value is 1.8 and is equal to a center point value 1.8 of a clustering center point corresponding to the index data value, performing data regression on the index data value based on a second regression formula, namely substituting data into the second regression formula, namely 3-0.5, so as to obtain a standard data value corresponding to the index data value is 2.5.

Step S333, if the index data value is smaller than the center point value of the cluster center point corresponding to the index data value, performing data regression on the index data value according to the cluster target value corresponding to the index data value based on a third regression formula, so as to obtain a standard data value corresponding to the index data value.

The third regression equation is: (a-b)/v× -0.5+l-N, wherein a is an index data value, b is a center point value, V is a second difference value, L is a cluster target value, and N is 1/2 of a difference value between two adjacent cluster target values.

Specifically, a second difference value is determined by determining a difference value between each of the cluster center points and the index data value corresponding to each of the cluster center points. For example, a minimum negative difference between each of the cluster center points and the index data value corresponding to each of the cluster center points is determined as the second difference. For example, the central point value of a cluster central point is 2.15, and the smallest index data value in the cluster category corresponding to the cluster central point is 1.0, so the smallest negative difference between the index data values corresponding to the cluster central point and each cluster central point is-1.15, i.e. the second difference is-1.15. For example, the second difference is-1.15, the cluster target value is 3, the difference between the two adjacent cluster target values is 1, if an index data value is 0.8 and is smaller than a center point value 1.8 of a cluster center point corresponding to the index data value, data regression is performed on the index data value based on a third regression formula, namely, data is substituted into the third regression formula (0.8-1.8)/-1.15× -0.5+3-0.5, so as to obtain a standard data value corresponding to the index data value as 2.065217391.

And carrying out data regression on the index data value to obtain a standard data value corresponding to the index data value, so that a large range of data values can be scattered into a small range, the distribution of original data is not destroyed, and the index data value is more visual, thereby improving the accuracy of the user preference prediction.

In an embodiment, after the performing data regression on the index data value according to the index data value and the cluster center point corresponding to the index data value to obtain the standard data value corresponding to the index data value, the method specifically further includes: performing data processing on the standard data value, and taking the standard data value after data processing as a standard data value corresponding to the index data value; the standard data values are managed based on a blockchain technique.

Illustratively, the data processing of the standard data value includes data processing of the standard data value with decimal places reserved for a number of bits. For example, a standard data value corresponding to the index data value obtained by performing data regression on the index data value based on a third regression formula is 2.065217391, and the standard data value is subjected to data processing of decimal point retention 1 bit, namely, the standard data value is processed to be 2.1. The data processing of a plurality of bits is reserved by decimal points on the standard data value, so that the data quantity and the calculated quantity can be reduced while a certain precision is maintained, and the prediction efficiency is improved.

It is emphasized that to further guarantee the privacy and security of the standard data values, the standard data values may also be stored in a blockchain node, the standard data values being managed based on blockchain technology.

In an embodiment, after the performing data regression on the index data value according to the index data value and the cluster center point corresponding to the index data value to obtain the standard data value corresponding to the index data value, the method further includes: and generating a user portrait numerical lookup table according to the standard data value.

Specifically, the user portrait numerical value lookup table includes indexes of the user and standard data values corresponding to the indexes, and the indexes can include user browsing content, user collecting content, user reading consultation and the like. By setting the user portrait numerical value lookup table, standard data values corresponding to indexes can be conveniently queried, so that the efficiency of user preference prediction is improved. Illustratively, to further ensure privacy and security of the user portrait value lookup table, the user portrait value lookup table may be managed based on a blockchain technique.

The generated user portrait value lookup table is stored on a blockchain, and a user can acquire the user portrait value lookup table from the blockchain, wherein the blockchain comprises a user management module, and the user management module is used for managing identity information of the user and comprises maintenance of public and private key generation such as account management, key management and maintenance of correspondence between the true identity of the user and the address of the blockchain such as authority management. When a request of a user for checking the user portrait value lookup table is acquired, the identity and the corresponding authority of the user are verified based on the blockchain technology, so that the security of the user portrait value lookup table is ensured. Meanwhile, the generated user portrait numerical value lookup table is stored in the block chain, so that the user portrait numerical value lookup table is not easy to tamper, the accuracy of the user portrait numerical value lookup table is ensured, and the accuracy of user preference prediction is improved.

Step S104, obtaining user portrait data to be predicted, wherein the user portrait data to be predicted comprises target data values, and the target data values are index data values of the preset indexes.

The server receives a user preference prediction request sent by a client, wherein the preference prediction request comprises user image data to be predicted, or the server obtains the user image data to be predicted according to the preference prediction request sent by the client, wherein the user image data to be predicted comprises index data values corresponding to a plurality of indexes, the user image data comprises population attributes, interest features, consumption features, location features, equipment attributes, behavior data, social data and the like, and the index data values comprise user browsing content, user collecting content, user reading consultation and the like.

Step 105, determining a standard data value corresponding to the target index data value according to the standard data value corresponding to each index data value.

For example, the standard data value corresponding to the index data value in the user portrait data may be determined by querying a preset user portrait value lookup table, where the user portrait value lookup table includes index data values of a plurality of indexes and standard data values corresponding to the index data values.

For example, the cluster category and the cluster center point corresponding to the index data value may be determined based on the kmeans model, and the data regression may be performed on the index data value according to the index data value and the cluster center point corresponding to the index data value, so as to obtain the standard data value corresponding to the index data value.

And S106, replacing the target data value in the user portrait data with the standard data value, and predicting the preference data of the corresponding user according to the replaced user portrait data.

For example, after replacing the index data value in the user portrait data with the standard data value corresponding to the determined index data value, preference data of the corresponding user is predicted based on the user portrait data including the standard data value, that is, the user portrait data after replacement.

In an embodiment, the predicting the preference data of the corresponding user according to the replaced user portrait data may include: and predicting the preference data of the corresponding user according to the replaced user portrait data based on a preference prediction model.

The preference prediction model may be a pre-trained decision tree model and/or a support vector machine model, for generating a prediction result of user preference from user portrayal data, the prediction result comprising preference data of the user.

According to the user preference prediction method provided by the embodiment, the portrait data of a plurality of users are obtained, wherein the portrait data of each user comprises index data values of preset indexes, clustering is carried out on the index data values, the index data values are divided into clustering categories, clustering center points corresponding to the clustering categories are determined, then data regression is carried out on the index data values according to the index data values and the clustering center points corresponding to the clustering categories respectively to obtain standard data values corresponding to the index data values, then the portrait data to be predicted is obtained, the portrait data of the user to be predicted comprises target data values, the target data values are index data values of the preset indexes, then standard data values corresponding to the target index data values are determined according to the standard data values corresponding to the index data values, finally the target data values in the portrait data of the user are replaced with the standard data values, and the user preference data corresponding to the portrait data after replacement are predicted according to the clustering center points corresponding to the index data values.

Referring to fig. 5, fig. 5 is a schematic block diagram of a user preference prediction apparatus for performing the aforementioned user preference prediction method according to an embodiment of the present application. Wherein the user preference prediction apparatus may be configured in a server or a terminal.

The servers may be independent servers or may be server clusters. The terminal can be electronic equipment such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, wearable equipment and the like.

As shown in fig. 5, the user preference prediction apparatus 400 includes: an information acquisition module 401, a cluster processing module 402, a data regression module 403, a data acquisition module 404, a numerical value determination module 405, and a preference prediction module 406.

An information acquisition module 401, configured to acquire portrait data of a plurality of users, where the portrait data of each user includes an index data value of a preset index;

a clustering module 402, configured to perform clustering on a plurality of the index data values, divide the plurality of the index data values into a plurality of cluster categories, and determine a cluster center point corresponding to each of the cluster categories;

the data regression module 403 is configured to perform data regression on each index data value according to the index data value and the cluster center point corresponding to the cluster category thereof, so as to obtain a standard data value corresponding to each index data value;

The data acquisition module 404 is configured to acquire user portrait data to be predicted, where the user portrait data to be predicted includes a target data value, and the target data value is an index data value of the preset index;

a numerical value determining module 405, configured to determine a standard data value corresponding to the target index data value according to the standard data value corresponding to each index data value;

and a preference prediction module 406, configured to replace the target data value in the user portrait data with the standard data value, and predict preference data of a corresponding user according to the replaced user portrait data.

In an embodiment, as shown in fig. 6, the cluster processing module 402 includes a first cluster sub-module 4021 and a second cluster sub-module 4022.

The first clustering sub-module 4021 is configured to determine a plurality of sample data values from a plurality of index data values, perform clustering processing on the plurality of sample data values based on a kmeans model, divide the plurality of sample data values into a plurality of cluster categories, and determine a first center point corresponding to each of the cluster categories.

The second clustering sub-module 4022 is configured to perform clustering processing on the plurality of index data values according to the first center points corresponding to the plurality of cluster categories based on the kmeans model, divide the plurality of index data values into the plurality of cluster categories, determine second center points corresponding to the cluster categories, and determine the second center points as cluster center points.

In one embodiment, as shown in FIG. 6, the data regression module 403 includes a value determination submodule 4031, a difference determination submodule 4032, and a data regression submodule 4033.

The numerical value determining submodule 4031 is configured to rank the cluster center points according to the magnitude of the center point value of the cluster center point, and determine a cluster target value of a cluster class corresponding to the cluster center point according to a ranking result.

The difference determining submodule 4032 is configured to determine a cluster target value corresponding to the index data value according to the cluster target value of the cluster category.

And the data regression submodule 4033 is configured to perform data regression on the index data value according to a preset data regression formula and a clustering target value corresponding to the index data value, so as to obtain a standard data value corresponding to the index data value.

It should be noted that, for convenience and brevity of description, specific working processes of the above-described apparatus and each module and unit may refer to corresponding processes in the foregoing document abstract extraction method embodiment, and will not be described herein again.

The user preference prediction apparatus provided by the above-described embodiments may be implemented in the form of a computer program which is executable on a computer device as shown in fig. 7.

Referring to fig. 7, fig. 7 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device may be a server or a terminal device.

As shown in fig. 7, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a non-volatile storage medium and an internal memory.

The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause the processor to perform any one of a number of user preference prediction methods.

The processor is used to provide computing and control capabilities to support the operation of the entire computer device.

The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by a processor, causes the processor to perform any of a number of user preference prediction methods.

The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein in one embodiment the processor is configured to run a computer program stored in the memory to implement the steps of:

In one embodiment, when implementing the clustering processing on the plurality of index data values, the processor is configured to implement:

and carrying out clustering processing on the index data values based on a kmeans model, dividing the index data values into a plurality of clustering categories, and determining a clustering center point corresponding to each clustering category.

In one embodiment, when the processor performs clustering processing on the index data values based on the kmeans model, the processor divides the index data values into a plurality of cluster categories, and determines a cluster center point corresponding to each cluster category, the processor is configured to perform:

Determining a plurality of sample data values from a plurality of index data values, carrying out clustering processing on the plurality of sample data values based on a kmeans model, dividing the plurality of sample data values into a plurality of clustering categories, and determining a first center point corresponding to each clustering category;

based on the kmeans model, clustering is carried out on the index data values according to the first center points corresponding to the clustering categories, the index data values are divided into the clustering categories, the second center points corresponding to the clustering categories are determined, and the second center points are determined to be the clustering center points.

In one embodiment, when implementing the kmeans model to cluster a plurality of the index data values, the processor is configured to implement:

performing exception removal processing on a plurality of index data values;

and clustering the index data values subjected to the exception removal processing based on a kmeans model.

In one embodiment, when implementing the clustering center point corresponding to the index data value according to the index data value and the index data value, the processor performs data regression on the index data value to obtain a standard data value corresponding to the index data value, the processor is configured to implement:

Sorting the clustering center points according to the magnitude of the center point values of the clustering center points, and determining a clustering target value of a clustering category corresponding to the clustering center points according to a sorting result;

determining a clustering target value corresponding to the index data value according to the clustering target value of the clustering class;

and carrying out data regression on the index data value according to a preset data regression formula and a clustering target value corresponding to the index data value so as to obtain a standard data value corresponding to the index data value.

In one embodiment, when implementing the data regression on the index data value according to a preset data regression formula and the cluster target value corresponding to the index data value, the processor is configured to implement:

if the index data value is larger than the center point value of the clustering center point corresponding to the index data value, carrying out data regression on the index data value according to the clustering target value corresponding to the index data value based on a first regression formula so as to obtain a standard data value corresponding to the index data value; and/or

If the index data value is equal to the center point value of the clustering center point corresponding to the index data value, carrying out data regression on the index data value according to the clustering target value corresponding to the index data value based on a second regression formula so as to obtain a standard data value corresponding to the index data value; and/or

And if the index data value is smaller than the center point value of the clustering center point corresponding to the index data value, carrying out data regression on the index data value according to the clustering target value corresponding to the index data value based on a third regression formula so as to obtain the standard data value corresponding to the index data value.

In one embodiment, after implementing the data regression on the index data value according to the index data value and the cluster center point corresponding to the index data value, the processor is further configured to implement:

generating a numerical value lookup table according to the index data value and a standard data value corresponding to the index data value;

performing data processing on the standard data value, and taking the standard data value after data processing as a standard data value corresponding to the index data value so as to update the numerical value lookup table; the numerical lookup table is managed based on a blockchain technique.

Embodiments of the present application also provide a computer readable storage medium having a computer program stored thereon, the computer program comprising program instructions that when executed implement methods that can be referenced by various embodiments of the user preference prediction method of the present application.

The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, which are provided on the computer device.

The user preference prediction apparatus, the computer device and the computer readable storage medium provided in the foregoing embodiments are configured to obtain image data of a plurality of users, where the image data of each user includes an index data value of a preset index, perform clustering processing on the plurality of index data values, divide the plurality of index data values into a plurality of cluster categories, determine a cluster center point corresponding to each cluster category, perform data regression on each index data value according to each index data value and the cluster center point corresponding to the cluster category thereof, respectively, obtain a standard data value corresponding to each index data value, then obtain user image data to be predicted, where the user image data to be predicted includes a target data value, where the target data value is the index data value of the preset index, then determine a standard data value corresponding to the target index data value according to the standard data value corresponding to each index data value, finally replace the target data value in the user image data with the standard data value, and implement discrete value-based on the user preference value of the user image data after replacement in a range of the user image data to solve the problem that the user preference value is not in a uniform and discrete value-scale-predicted according to the user preference value.

It is to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A method for predicting user preferences, comprising:

according to the index data values and the clustering center points corresponding to the clustering categories of the index data values, respectively carrying out data regression on the index data values to obtain standard data values corresponding to the index data values; acquiring user portrait data to be predicted, wherein the user portrait data to be predicted comprises a target data value, and the target data value is an index data value of the preset index;

replacing the target data value in the user portrait data with the standard data value, and predicting corresponding user preference data according to the replaced user portrait data;

the data regression is performed on the index data value according to the index data value and the clustering center point corresponding to the index data value to obtain a standard data value corresponding to the index data value, including:

and carrying out data regression on the index data value according to a preset data regression formula and a clustering target value corresponding to the index data value to obtain a standard data value corresponding to the index data value.

2. The method of claim 1, wherein the clustering the plurality of index data values to divide the plurality of index data values into a plurality of cluster categories and determining a cluster center point of each of the cluster categories comprises:

3. The method according to claim 2, wherein the clustering the plurality of index data values based on the kmeans model, dividing the plurality of index data values into a plurality of cluster categories, and determining a cluster center point corresponding to each of the cluster categories, comprises:

4. The method according to claim 2, wherein the clustering of the plurality of index data values based on the kmeans model includes:

Performing exception removal processing on a plurality of index data values;

5. The method for predicting user preference according to claim 1, wherein the performing data regression on the index data value according to a preset data regression formula and a clustering target value corresponding to the index data value includes:

6. The method for predicting user preference according to claim 5, wherein the performing data regression on the index data value according to the index data value and the cluster center point corresponding to the index data value, to obtain the standard data value corresponding to the index data value, further comprises:

performing data processing on the standard data value, and taking the standard data value after data processing as the standard data value corresponding to the index data value to update the numerical value lookup table;

the numerical lookup table is managed based on a blockchain technique.

7. A user preference prediction apparatus, comprising:

the preference prediction module is used for replacing the target data value in the user portrait data with the standard data value and predicting the preference data of the corresponding user according to the replaced user portrait data;

the data regression module is specifically configured to: sorting the clustering center points according to the magnitude of the center point values of the clustering center points, and determining a clustering target value of a clustering category corresponding to the clustering center points according to a sorting result; determining a clustering target value corresponding to the index data value according to the clustering target value of the clustering class; and carrying out data regression on the index data value according to a preset data regression formula and a clustering target value corresponding to the index data value to obtain a standard data value corresponding to the index data value.

8. A computer device, the computer device comprising a memory and a processor;

the memory is used for storing a computer program;

the processor for executing the computer program and for implementing the user preference prediction method according to any one of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to implement the user preference prediction method according to any one of claims 1 to 6.