WO2022105129A1 - Content data recommendation method and apparatus, and computer device, and storage medium - Google Patents

Content data recommendation method and apparatus, and computer device, and storage medium Download PDF

Info

Publication number
WO2022105129A1
WO2022105129A1 PCT/CN2021/091067 CN2021091067W WO2022105129A1 WO 2022105129 A1 WO2022105129 A1 WO 2022105129A1 CN 2021091067 W CN2021091067 W CN 2021091067W WO 2022105129 A1 WO2022105129 A1 WO 2022105129A1
Authority
WO
WIPO (PCT)
Prior art keywords
order
crowd
data
attribute data
model
Prior art date
Application number
PCT/CN2021/091067
Other languages
French (fr)
Chinese (zh)
Inventor
陈婷婷
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022105129A1 publication Critical patent/WO2022105129A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of data processing of big data, and in particular, to a content data recommendation method, device, computer equipment and storage medium.
  • a better way to solve this problem is to introduce the recommendation method, which can recommend the content that the user is really interested in in a large amount of information, so that the user can obtain the content information that he really prefers from the recommended content.
  • the present application provides a content data recommendation method, device, computer equipment and storage medium, which realize crowd feature extraction, index analysis and scene adaptation for user data, determine the user's content recommendation label, automatically match content data, and send data to the user.
  • a user makes a recommendation, the content data can be accurately recommended to the user, the user experience satisfaction is improved, and the effectiveness of the content data recommendation is improved.
  • a content data recommendation method comprising:
  • the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
  • the content preference model is based on two-step aggregation. Multi-order model of class method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;
  • the service attribute data is adapted to the scene to obtain the theme scene corresponding to the user;
  • a content data recommendation device comprising:
  • an acquisition module configured to acquire user data of a user, preprocess the user data, and obtain data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
  • the content preference model is a multi-order model based on two-step clustering method and decision tree;
  • the content preference model includes a first-order crowd clustering model and a second-order index subdivision model;
  • An identification module configured to perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model to obtain a first-order crowd classification result corresponding to the user, and recommend through the scene
  • the model performs scene adaptation on the traffic service attribute data to obtain a theme scene corresponding to the user;
  • an analysis module configured to perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;
  • a determining module configured to determine a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;
  • a recommendation module configured to acquire content data matching the content recommendation tag from the content database, and recommend the acquired content data to the user.
  • a computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer-readable instructions:
  • the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
  • the content preference model is based on two-step aggregation. Multi-order model of class method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;
  • the service attribute data is adapted to the scene to obtain the theme scene corresponding to the user;
  • One or more readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
  • the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
  • the content preference model is based on two-step aggregation. Multi-order model of class method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;
  • the service attribute data is adapted to the scene to obtain the theme scene corresponding to the user;
  • the content data recommendation method, device, computer equipment and storage medium provided by this application, by acquiring user data of a user, preprocessing the user data to obtain consumption attribute data, social attribute data, access attribute data and traffic service attributes data to be recommended; input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and simultaneously input the traffic service attribute data into the scene recommendation model;
  • the first-order crowd clustering model performs crowd feature extraction on the consumption attribute data and the social attribute data, and obtains a first-order crowd classification result corresponding to the user.
  • Scenario adaptation to obtain the theme scene corresponding to the user; index analysis is performed on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and the crowd corresponding to the user is determined a preference tag; determine a content recommendation tag corresponding to the user according to the crowd preference tag and the theme scene; acquire content data matching the content recommendation tag from a content database, and use the acquired content data Recommend to the user, in this way, through the content preference model and the scene recommendation model, crowd feature extraction, index analysis and scene adaptation are performed on the user data, the user's content recommendation label is determined, the content data is automatically matched, and the user is sent to the user.
  • Recommendation can accurately recommend content data to users, improve the accuracy of content data recommendation, recommend preferred content data to users, avoid disliked content data from being displayed to users, improve user experience satisfaction, and Improve the effectiveness of content data recommendation.
  • FIG. 1 is a schematic diagram of an application environment of a method for recommending content data in an embodiment of the present application
  • FIG. 2 is a flowchart of a content data recommendation method in an embodiment of the present application
  • FIG. 3 is a flowchart of step S30 of a content data recommendation method in an embodiment of the present application
  • FIG. 4 is a flowchart of step S303 of the content data recommendation method in an embodiment of the present application.
  • FIG. 5 is a flowchart of step S304 of the content data recommendation method in an embodiment of the present application.
  • step S40 of a content data recommendation method in an embodiment of the present application is a flowchart of step S40 of a content data recommendation method in an embodiment of the present application
  • step S401 of a content data recommendation method in an embodiment of the present application is a flowchart of step S401 of a content data recommendation method in an embodiment of the present application.
  • FIG. 8 is a flowchart of step S403 of the content data recommendation method in an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of an apparatus for recommending content data in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a computer device in an embodiment of the present application.
  • the content data recommendation method provided by the present application can be applied in the application environment as shown in FIG. 1 , in which the client (computer device) communicates with the server through the network.
  • the client computer equipment
  • the server includes but is not limited to various personal computers, notebook computers, smart phones, tablet computers, cameras and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • a method for recommending content data is provided, and its technical solution mainly includes the following steps S10-S60:
  • S10 Acquire user data of the user, preprocess the user data, and obtain data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data, and traffic service attribute data.
  • the user data is data of related attributes corresponding to the user in the server corresponding to the application software, and the user data includes attributes such as consumption attributes, social attributes, access attributes, and traffic service attributes corresponding to the user.
  • the preprocessing is to perform regular expression processing, missing value supplementation or de-extreme value processing on the user data, and the regular expression processing is to uniformly convert the data of an attribute into the corresponding attribute
  • the data required by the data format, the missing value is supplemented by uniformly converting the data with empty attributes into the filling data corresponding to the attribute, and the de-extreme value processing is to make the data of an attribute exceed or fall below the limit set by the attribute.
  • the value data are all replaced with the adjacent limit values, and the user data after the preprocessing is determined as the data to be recommended, and the data to be recommended includes the consumption attribute data, the social attribute data,
  • the access attribute data and the traffic service attribute data the consumption attribute data is data of attributes related to user consumption
  • the social attribute data is related to the user's basic social identity, held terminals, and enjoyment of services and other related attributes.
  • the access attribute data is data related to the user's access data, behavior, etc.
  • the traffic service attribute data is the attribute data related to the user's operation service, such as operation service provider, data package and so on.
  • S20 Input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into a content preference model, and at the same time input the traffic service attribute data into a scene recommendation model;
  • the content preference model is based on two Step clustering method and multi-order model of decision tree;
  • the content preference model includes a first-order crowd clustering model and a second-order index segmentation model.
  • the content preference model is a multi-order model based on the two-step clustering method and the decision tree algorithm and has been constructed; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model,
  • the two-step clustering method is a method of performing preliminary clustering through hierarchical clustering or density clustering to obtain the results of preliminary clustering, and then using segmentation clustering method to perform secondary clustering from the results of preliminary clustering.
  • the decision tree algorithm is an algorithm that uses a tree structure to establish a decision model according to the attributes of the data, and the content preference model can automatically generate the user's crowd preference label according to the user's consumption attribute data, social attribute data and access attribute data.
  • the crowd preference tag marks the user's preference, for example, the crowd preference tag includes inspirational, contemporary, youth literature, life, social science, fantasy, etc. in the field of reading, travel in the field of animation, love, etc., fantasy in the field of video , martial arts, reality shows, etc.
  • the scene recommendation model is a trained neural network model, and the network structure of the scene recommendation model can be set according to requirements, such as the network structure of the BP neural network model and the network of the LSTM neural network model. structure, etc.
  • the scene recommendation model can realize the automatic identification of the theme scene suitable for the user according to the user's traffic service attribute data
  • the theme scene is a scene suitable for the theme related to the user, such as a video that requires a large amount of traffic. Scenarios, reading scenarios that require small traffic, etc.
  • the crowd feature is to extract the relevant features of the classification of the crowd
  • the crowd feature extraction is the process of extracting the feature of the attribute difference between the crowds
  • the crowd feature extraction may include crowd feature exploration, decision tree. Analysis and path restoration in the algorithm
  • the crowd feature exploration includes crowd density clustering and crowd feature clustering
  • the first-order crowd classification result is a crowd type, that is, a crowd type in the first-order crowd types in the full text
  • the scene adaptation is a process of identifying the traffic service attribute data after convolution
  • the theme scene adapted to the user is automatically identified through the traffic service attribute data.
  • the first-order crowd clustering model may be a clustering model based on density clustering and decision tree algorithm, or may be a clustering model based on hierarchical clustering and BP neural network, and the first-order crowd clustering model can be Realize the automatic extraction of crowd characteristics according to consumption attribute data and social attribute data, and classify according to the extracted crowd characteristics, and output the user's crowd type.
  • the method before the step S30, that is, before performing crowd feature extraction on the consumption attribute data and the social attribute data by using the first-order crowd clustering model, the method includes:
  • the sample data set includes consumption attribute sample data, social attribute sample data and access attribute sample data.
  • the first-order attributes include consumption attributes and social attributes, where the consumption attributes are attributes related to user consumption, and the social attributes are attributes related to the user's basic social identity, held terminal attributes, enjoyed business attributes, and the like.
  • S303 Input the first-order attribute data set into a two-step clustering model, and perform crowd feature exploration on the first-order attribute data through the two-step clustering model to obtain a first-order crowd clustering result.
  • the two-step clustering model is a model based on a two-step clustering method
  • the two-step clustering method is to perform preliminary clustering by means of hierarchical clustering or density clustering to obtain the results of preliminary clustering.
  • the segmentation clustering method to perform secondary clustering from the results of the preliminary clustering
  • the crowd feature exploration is a method of standardizing the first-order attribute data set, crowd density clustering and crowd feature clustering.
  • the characteristics of attribute similarity and dissimilarity between the populations are explored, so as to obtain the first-order population clustering result, and the first-order population clustering result is the initially explored population types, such as 9 types of populations.
  • step S303 the first-order attribute data is subjected to crowd feature exploration through the two-step clustering model, and a first-order crowd clustering result is obtained, including:
  • the two-step clustering model includes a density clustering model and a K-means clustering model.
  • the normalization processing is to perform the regular expression processing, the missing value supplementation, the de-extreme value processing, the one-hot encoding conversion processing, and the regularization processing on the first-order attribute data set.
  • the one-hot encoding conversion is also called one-bit effective encoding, mainly using an N-bit state register to encode N states, each state is assigned an integer value, and the regularization process is to convert each sample The sum of the absolute values of each vector is used as the norm, and then each vector is used to remove the norm, and the processing process of the normalized vector of this sample is obtained, or the vector of each sample is squared and then squared as the norm
  • the processing process of dividing and dividing the first-order attribute data set so as to obtain the first-order attribute data to be processed by performing the normalization processing on the first-order attribute data set.
  • the two-step clustering model includes a density clustering model and a K-means clustering model, and the first-order attribute data to be processed is data provided to the two-step clustering model for clustering.
  • the DBSCAN (Density-based Clustering Method, density-based clustering algorithm) algorithm is to determine the various types of the area through the density of each area, and isolate the outliers, and determine it as a class.
  • the algorithm of the crowd density clustering is to use the DBSCAN algorithm to determine the clustering process of all crowd types, and the transition clustering data result is the crowd type obtained after the crowd density clustering, such as 8 types Crowd type, where all outliers are assigned to one type of crowd (anomaly type).
  • the density clustering model is a model that uses the DBSCAN algorithm to perform clustering to distinguish crowd types.
  • the K-means algorithm is a segmentation and clustering algorithm with the mean value as the "center" of the class, and the segmentation and clustering algorithm is to randomly select objects from the data set as the prototype of the cluster, and then use Other objects are respectively assigned to the most similar (that is, the closest class) represented by the prototype, and the K-means clustering model is to use the K-means algorithm to cluster the transitional clustering data results to determine the crowd type.
  • the model of the crowd feature clustering is to use the K-means algorithm to determine the clustering process of all crowd types on the basis of the transition clustering data results, wherein, the first-order crowd clustering results include: The crowd type corresponding to the abnormal type in the transition clustering data result.
  • the present application realizes that the first-order attribute data set is standardized through the two-step clustering model to obtain the first-order attribute data to be processed; the DBSCAN algorithm is used to perform the standardization processing on the first-order attribute data set through the density clustering model.
  • the clustering results in this way, realize crowd density clustering and crowd feature clustering through preprocessing, DBSCAN algorithm and K-means algorithm, so as to obtain first-order crowd clustering results, which can improve the accuracy of crowd classification.
  • the decision tree algorithm is an algorithm that uses a tree structure to establish a decision model according to the attributes of the data, and the analysis and path are restored to the inverse process in the decision tree algorithm.
  • the corresponding relationship between the set and the clustering results of the first-order crowd is analyzed, the decision nodes of each attribute of the sample users are analyzed, and the path of the decision nodes passed through is reversed, so as to restore the decision from the first-order effective data set to the decision-making node.
  • the path of the crowd type in the first-order crowd clustering result is described, so that the variable corresponding to the decision node of the attribute whose number of times the path node passes reaches the threshold is refined, and determined as the classification variable.
  • the step S304 that is, the first-order crowd clustering result and the first-order effective data set are analyzed and path restored by the decision tree algorithm, and the at least one categorical variable corresponding to the first-order crowd clustering result, including:
  • S3041 associate the first-order crowd types corresponding to the same sample users with the first-order valid data, and determine the associated first-order valid data set as a decision-making data set;
  • the first-order crowd clustering result includes The first-order crowd types corresponding to the sample users in the first-order valid data set;
  • the first-order valid data set includes the first-order valid data corresponding to the sample users one-to-one.
  • the two-step clustering model can be used to divide the population types of the sample users in the first-order valid data set, so that the first-order valid data corresponding to the sample users one-to-one can be determined.
  • the type of crowd to which it belongs that is, the first-order crowd type, determines the associated first-order valid data set as a decision-making data set, indicating that all the first-order valid data are associated.
  • the decision inversion model is a model in which variable parameters for identifying crowd types are derived by inversely deriving the decision data according to the tree structure of the decision tree.
  • the decision tree algorithm is an algorithm that uses a tree structure to establish a decision model according to the attributes of the data, that is, the decision data set is divided according to the data features of the decision data set, until all the features are divided. or all the data of the divided data subsets have the same population type, and then move closer according to the first-order population type associated with the first-order valid data in the decision-making data set, and continuously deduce and analyze the data that can be divided into
  • the variable parameters of the first-order population type are described, and the initial variable parameters are updated until they are completely close, and the initial variable parameters at this time are determined as the updated initial variable parameters.
  • S3044 Perform path restoration according to the updated initial variable parameter, and extract the classification variable corresponding to the first-order crowd clustering result.
  • the path is restored to the division path of each of the first-order valid data, and it is confirmed whether the corresponding first-order crowd type can be reached.
  • determine the The way of determining the categorical variable can be set according to requirements, such as identifying the number of divided nodes passed through, determining the variable parameter in the node greater than or equal to the preset number as the categorical variable, or determining the variable parameter greater than or equal to all
  • the variable parameter in the node is determined as the categorical variable, etc., by the mean of the number of nodes passed.
  • the present application realizes that by associating first-order crowd types and first-order valid data corresponding to the same sample users, the associated first-order valid data set is determined as a decision-making data set;
  • the decision inversion model of variable parameters the decision tree algorithm is used to analyze the decision data set through the decision inversion model, and the initial variable parameters are updated; the path restoration is performed according to the updated initial variable parameters,
  • the classification variables corresponding to the first-order crowd clustering results are extracted.
  • S305 perform model reconstruction according to all the classification variables, the first-order clustering results, and the first-order valid data set, construct a first-order crowd clustering model, and determine the difference between the first-order crowd clustering model and the first-order crowd clustering model.
  • corresponding first-order crowd types and label each first-order valid data in the first-order valid data set with its corresponding crowd type to obtain a first-order data set; the first-order crowd types include at least one of the crowd types.
  • the first-order population type includes at least one population type, such as 11 population types, and each first-order valid data set in the first-order valid data set is marked with its corresponding population type.
  • the present application realizes that by obtaining a sample data set; screening the sample data set according to the first-order attributes, and filtering out a first-order attribute data set; inputting the first-order attribute data set into a two-step clustering model,
  • the two-step clustering model performs crowd characteristics exploration on the first-order attribute data, and obtains a first-order crowd clustering result; through decision tree algorithm, analyzes the first-order crowd clustering result and the first-order effective data set.
  • S40 Perform index analysis on the first-order crowd classification result and the access attribute data by using the second-order index subdivision model, and determine a crowd preference label corresponding to the user.
  • the first-order population classification result is a population type, that is, a population type in the first-order population categories in the full text
  • the second-order index subdivision model is a clustering model of the completed user subdivision.
  • the second-order index subdivision model can realize index analysis according to the crowd type and the access attribute data in the obtained first-order crowd classification result, and can analyze and match the user's crowd preference label, the crowd preference The label identifies the user's preferences.
  • step S40 that is, before the index analysis is performed on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, the following steps are included: :
  • the access attribute sample data in the sample data set is correspondingly added to the first-order data set, that is, the access attribute sample data of a user is added to the first-order data corresponding to the user.
  • the access attribute sample data may be inserted after the first-order valid data, thereby combining to obtain the second-order attribute data set.
  • step S401 that is, combining and generating a second-order attribute data set according to the first-order data set and the access attribute sample data in the sample data set, including:
  • the randomly selected fields are randomly selected fields in all the access attribute sample data, so that the scattered data distribution can be analyzed, the user's access behavior can be analyzed more objectively, and the extracted data can be analyzed.
  • the output field data is determined as the to-be-processed attribute data.
  • S4012 Perform missing value processing and extreme value processing on the attribute data to be processed to obtain attribute data to be added.
  • the missing value processing and the extreme value processing are performed on all the attribute data to be processed, and the missing value processing includes deleting data containing missing values and interpolating missing values with possible values, that is, for some fields.
  • the attribute data is deleted by the data containing missing values, and the attribute data of some fields is supplemented by means of possible value interpolation of missing values.
  • the attribute data to be added is inserted after the first-order valid data, so as to combine to obtain the second-order attribute data set.
  • the index feature extraction is a process of calculating the contribution degree of each index according to the second-order attribute data set, and extracting the index of the contribution degree that meets the threshold requirement, and the contribution degree is the index in the user's profile.
  • the degree of contribution in the access behavior data that is, the degree of proportion.
  • S403 Perform segmental analysis on the second-order attribute data set according to the first-order crowd type results and all the comprehensive index variables, and construct a second-order index subdivision model.
  • each of the first-order crowd types is subdivided into regions corresponding to the comprehensive index variables one-to-one according to the comprehensive index variables.
  • segment and then analyze each segment through the segment into which the second-order attribute data set falls, and the segmental analysis is the process of performing proportion and weight analysis on each segmented segment, that is, performing adjacent segment analysis on each segment.
  • Sections are merged or disassembled, and finally the proportion of each processed section can be greater than or equal to the preset proportion, and the weight is greater than the preset weight, so that the second-order index details can be constructed according to each section that meets the requirements. split model.
  • the present application realizes that a second-order attribute data set is generated by combining the access attribute sample data in the first-order data set and the sample data set; and the index feature extraction is performed on the second-order attribute data set through a preference behavior model to obtain At least one comprehensive index variable; according to the first-order crowd type results and all the comprehensive index variables, segmental analysis is performed on the second-order attribute data set, and a second-order index subdivision model is constructed.
  • the preference behavior model is used to extract index features, and after segmental analysis, a second-order index subdivision model can be constructed, so that the second-order index subdivision model can be constructed accurately, scientifically and objectively, and the accuracy and reliability of crowd segmentation can be improved. sex.
  • step S403 that is, according to the first-order crowd type results and all the comprehensive index variables, the second-order attribute data set is segmented and analyzed, And build a second-order indicator segmentation model, including:
  • S4031 Perform feature analysis and dimension reduction processing on all the comprehensive index variables to obtain principal component index variables.
  • the comprehensive indicator variable includes many indicators, and the number of indicators is too large, it is impossible to directly conduct research on user feature extraction and user segmentation, and it is necessary to integrate indicators and further analyze the correlation of attribute data.
  • Dimension reduction processing is performed on the indicators, and the feature analysis is to calculate the similarity of the indicators in each dimension, and analyze the similarity value of each indicator in each dimension, that is, the maximum similarity value between each indicator and each dimension is analyzed.
  • the contribution of each indicator to carry out the process of merging and classifying the two dimensions, so that the tolerance of the total contribution and the average contribution of the combined dimension of the two dimensions is the smallest, and the combined dimension of the two dimensions is determined as an advanced dimension.
  • the total contribution degree is the sum of the contribution degrees of each index under the advanced dimension.
  • the dimensionality reduction process is to set a weight parameter for each index in the advanced dimension, and the weight parameter is the proportion of the index in the advanced dimension corresponding to the index, that is, the contribution of the index is in the advanced dimension.
  • the proportion of the total contribution calculate the weight mean of the weight parameters under all the advanced dimensions, and compare the index corresponding to the weight parameter greater than the weight mean and the advanced dimension corresponding to the index.
  • associating the crowd type corresponding to the same user with the second-order attribute data is equivalent to assigning a crowd type label to the second-order attribute data
  • the first-order crowd type result includes: the crowd type corresponding to the user.
  • S4033 Perform segmental analysis on the data set to be subdivided according to the principal component index variable, and construct the second-order index subdivision model.
  • the segmental analysis is an analysis process of dividing each of the second-order attribute data in the data set to be subdivided into sections of each of the principal component index variables, and refers to the process of dividing the second-order attribute data according to the second-order attribute data.
  • the learning process is performed with the clustering degree between the principal component index variables, and the learning method is unsupervised clustering learning, so as to construct the second-order index subdivision model.
  • the present application realizes that the principal component index variable is obtained by performing feature analysis and dimension reduction processing on all the comprehensive index variables; the group type corresponding to the same user is associated with the second-order attribute data, and the associated second-order
  • the attribute data set is determined as the data set to be subdivided; according to the principal component index variable, segmental analysis is performed on the data set to be subdivided, and the second-order index subdivision model is constructed. Analysis and dimensionality reduction processing can more directly subdivide the population, and improve the reliability and accuracy of the population segmentation.
  • the content recommendation label that matches both the crowd preference label and the theme scene is mapped, and the content
  • the recommendation tag is a tag to which the content data recommended to the user belongs, and the content recommendation tag can be set according to requirements, for example, the content tag can be video fantasy, video martial arts, and so on.
  • S60 Acquire content data matching the content recommendation tag from a content database, and recommend the acquired content data to the user.
  • the content database is all content data stored in the server corresponding to the application software within a period of time or on the current day
  • the content data is content information on the mobile Internet
  • the content The data will be marked with a content tag, find the content data corresponding to the content tag matching the content recommendation tag from the content database, obtain the found content data, and recommend it to the relevant users. on the interface of the application software of the mobile terminal corresponding to the user for the user to view.
  • the present application realizes that by acquiring the user data of the user, preprocessing the user data to obtain the data to be recommended including consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
  • the consumption attribute data, the social attribute data and the access attribute data are input into the content preference model, while the traffic service attribute data is input into the scene recommendation model;
  • the social attribute data is used to perform crowd feature extraction to obtain the first-order crowd classification result corresponding to the user.
  • the scene recommendation model is used to adapt the traffic service attribute data to the scene to obtain the theme scene corresponding to the user.
  • Model and scene recommendation model perform crowd feature extraction, indicator analysis and scene adaptation on user data, determine user content recommendation tags, automatically match content data, and recommend to users, which can accurately recommend content data to users, improve It improves the accuracy of content data recommendation, recommends preferred content data to users, avoids disliked content data from being displayed to users, improves user experience satisfaction, and improves the effectiveness of content data recommendation.
  • a content data recommendation apparatus is provided, and the content data recommendation apparatus corresponds one-to-one with the content data recommendation method in the above-mentioned embodiment.
  • the content data recommendation apparatus includes an acquisition module 11 , an input module 12 , an identification module 13 , an analysis module 14 , a determination module 15 and a recommendation module 16 .
  • the detailed description of each functional module is as follows:
  • the obtaining module 11 is used for obtaining user data of the user, and preprocessing the user data to obtain the data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
  • the input module 12 is used to input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scene recommendation model;
  • the content preference The model is a multi-order model based on a two-step clustering method and a decision tree;
  • the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;
  • the identification module 13 is configured to perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model to obtain a first-order crowd classification result corresponding to the user, and at the same time pass through the scene
  • the recommendation model performs scene adaptation on the traffic service attribute data to obtain the theme scene corresponding to the user;
  • An analysis module 14 configured to perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;
  • a determination module 15 configured to determine a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;
  • the recommendation module 16 is configured to acquire content data matching the content recommendation tag from the content database, and recommend the acquired content data to the user.
  • Each module in the above-mentioned content data recommendation apparatus may be implemented in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10 .
  • the computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a readable storage medium, an internal memory.
  • the readable storage medium stores an operating system, computer readable instructions and a database.
  • the internal memory provides an environment for the execution of the operating system and computer-readable instructions in the readable storage medium.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions when executed by a processor, implement a content data recommendation method.
  • the readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor implements the content in the above embodiments when the processor executes the computer-readable instructions Data recommendation method.
  • one or more readable storage media storing computer-readable instructions are provided, and the readable storage media provided in this embodiment include non-volatile readable storage media and volatile readable storage media medium; computer-readable instructions are stored on the readable storage medium, and when the computer-readable instructions are executed by one or more processors, cause the one or more processors to implement the method for recommending content data in the foregoing embodiments.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A content data recommendation method and apparatus, and a computer device, and a storage medium. The method comprises: performing preprocessing on obtained user data to obtain data to be recommended; inputting consumption attribute data, social attribute data, and access attribute data, which correspond to a user, to a content preference model, and inputting traffic service attribute data to a scenario recommendation model; performing population feature extraction by means of a first-order population clustering model to obtain a first-order population classification result, and performing scenario adaptation by means of the scenario recommendation model to obtain a theme scenario; performing index analysis on the first-order population classification result and the access attribute data by means of a second-order index subdivision model to determine a population preference tag; determining a content recommendation tag according to the population preference tag and the theme scenario; and obtaining content data and recommending same to the user. According to the method, population feature extraction, index analysis, and scenario adaptation are performed on the user data, and accurate recommendation is provided for the user.

Description

内容数据推荐方法、装置、计算机设备及存储介质Content data recommendation method, device, computer equipment and storage medium
本申请要求于2020年11月17日提交中国专利局、申请号为202011285730.2,发明名称为“内容数据推荐方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on November 17, 2020 with the application number 202011285730.2 and the invention title is "content data recommendation method, device, computer equipment and storage medium", the entire contents of which are by reference Incorporated in this application.
技术领域technical field
本申请涉及大数据的数据处理领域,尤其涉及一种内容数据推荐方法、装置、计算机设备及存储介质。The present application relates to the field of data processing of big data, and in particular, to a content data recommendation method, device, computer equipment and storage medium.
背景技术Background technique
发明人发现随着移动互联网快速发展,人们越来越普及通过移动终端中的APP从移动互联网上获取自己想要的内容信息,但是随着互联网的迅速发展,信息量也在大幅增长,这会导致用户在面对大量信息时无法很快从APP中获得自己真正需要的信息,从而降低了APP的使用率。解决这一问题较好的办法就是引入推荐方法,它可以在大量的信息中为用户推荐用户真正感兴趣的内容,以便用户从推荐的内容中获取到自己真正偏好的内容信息。The inventor found that with the rapid development of the mobile Internet, it has become more and more popular for people to obtain the content information they want from the mobile Internet through the APP in the mobile terminal. As a result, users cannot quickly obtain the information they really need from the APP when faced with a large amount of information, thus reducing the usage rate of the APP. A better way to solve this problem is to introduce the recommendation method, which can recommend the content that the user is really interested in in a large amount of information, so that the user can obtain the content information that he really prefers from the recommended content.
发明内容SUMMARY OF THE INVENTION
本申请提供一种内容数据推荐方法、装置、计算机设备及存储介质,实现了对用户数据进行人群特征提取、指标分析和场景适配,确定用户的内容推荐标签,自动匹配出内容数据,并向用户进行推荐,能够准确地推荐内容数据给用户,提升了用户的体验满意度,并提升了内容数据推荐的有效性。The present application provides a content data recommendation method, device, computer equipment and storage medium, which realize crowd feature extraction, index analysis and scene adaptation for user data, determine the user's content recommendation label, automatically match content data, and send data to the user. When a user makes a recommendation, the content data can be accurately recommended to the user, the user experience satisfaction is improved, and the effectiveness of the content data recommendation is improved.
一种内容数据推荐方法,包括:A content data recommendation method, comprising:
获取用户的用户数据,对所述用户数据进行预处理,得到待推荐数据;所述待推荐数据包括消费属性数据、社会属性数据、访问属性数据和流量服务属性数据;Obtaining user data of the user, preprocessing the user data, and obtaining data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
将与用户对应的所述消费属性数据、所述社会属性数据和所述访问属性数据输入内容偏好模型,同时将所述流量服务属性数据输入场景推荐模型;所述内容偏好模型为基于两步聚类法和决策树的多阶模型;所述内容偏好模型包括一阶人群聚类模型和二阶指标细分模型;Input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scene recommendation model; the content preference model is based on two-step aggregation. Multi-order model of class method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;
通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取,得到与所述用户对应的一阶人群分类结果,同时通过所述场景推荐模型对所述流量服务属性数据进行场景适配,得到与所述用户对应的主题场景;Perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model, and obtain a first-order crowd classification result corresponding to the user. The service attribute data is adapted to the scene to obtain the theme scene corresponding to the user;
通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析,确定出与所述用户对应的人群偏好标签;Perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;
根据与所述用户对应的所述人群偏好标签和所述主题场景,确定与所述用户对应的内容推荐标签;determining a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;
从内容数据库中获取与所述内容推荐标签匹配的内容数据,并将获取的所述内容数据推荐给所述用户。Acquire content data matching the content recommendation tag from a content database, and recommend the acquired content data to the user.
一种内容数据推荐装置,包括:A content data recommendation device, comprising:
获取模块,用于获取用户的用户数据,对所述用户数据进行预处理,得到待推荐数据;所述待推荐数据包括消费属性数据、社会属性数据、访问属性数据和流量服务属性数据;an acquisition module, configured to acquire user data of a user, preprocess the user data, and obtain data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
输入模块,用于将与用户对应的所述消费属性数据、所述社会属性数据和所述访问属性数据输入内容偏好模型,同时将所述流量服务属性数据输入场景推荐模型;所述内容偏 好模型为基于两步聚类法和决策树的多阶模型;所述内容偏好模型包括一阶人群聚类模型和二阶指标细分模型;an input module, configured to input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scene recommendation model; the content preference model is a multi-order model based on two-step clustering method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index subdivision model;
识别模块,用于通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取,得到与所述用户对应的一阶人群分类结果,同时通过所述场景推荐模型对所述流量服务属性数据进行场景适配,得到与所述用户对应的主题场景;An identification module, configured to perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model to obtain a first-order crowd classification result corresponding to the user, and recommend through the scene The model performs scene adaptation on the traffic service attribute data to obtain a theme scene corresponding to the user;
分析模块,用于通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析,确定出与所述用户对应的人群偏好标签;an analysis module, configured to perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;
确定模块,用于根据与所述用户对应的所述人群偏好标签和所述主题场景,确定与所述用户对应的内容推荐标签;a determining module, configured to determine a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;
推荐模块,用于从内容数据库中获取与所述内容推荐标签匹配的内容数据,并将获取的所述内容数据推荐给所述用户。A recommendation module, configured to acquire content data matching the content recommendation tag from the content database, and recommend the acquired content data to the user.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device, comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer-readable instructions:
获取用户的用户数据,对所述用户数据进行预处理,得到待推荐数据;所述待推荐数据包括消费属性数据、社会属性数据、访问属性数据和流量服务属性数据;Obtaining user data of the user, preprocessing the user data, and obtaining data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
将与用户对应的所述消费属性数据、所述社会属性数据和所述访问属性数据输入内容偏好模型,同时将所述流量服务属性数据输入场景推荐模型;所述内容偏好模型为基于两步聚类法和决策树的多阶模型;所述内容偏好模型包括一阶人群聚类模型和二阶指标细分模型;Input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scene recommendation model; the content preference model is based on two-step aggregation. Multi-order model of class method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;
通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取,得到与所述用户对应的一阶人群分类结果,同时通过所述场景推荐模型对所述流量服务属性数据进行场景适配,得到与所述用户对应的主题场景;Perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model, and obtain a first-order crowd classification result corresponding to the user. The service attribute data is adapted to the scene to obtain the theme scene corresponding to the user;
通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析,确定出与所述用户对应的人群偏好标签;Perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;
根据与所述用户对应的所述人群偏好标签和所述主题场景,确定与所述用户对应的内容推荐标签;determining a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;
从内容数据库中获取与所述内容推荐标签匹配的内容数据,并将获取的所述内容数据推荐给所述用户。Acquire content data matching the content recommendation tag from a content database, and recommend the acquired content data to the user.
一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
获取用户的用户数据,对所述用户数据进行预处理,得到待推荐数据;所述待推荐数据包括消费属性数据、社会属性数据、访问属性数据和流量服务属性数据;Obtaining user data of the user, preprocessing the user data, and obtaining data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
将与用户对应的所述消费属性数据、所述社会属性数据和所述访问属性数据输入内容偏好模型,同时将所述流量服务属性数据输入场景推荐模型;所述内容偏好模型为基于两步聚类法和决策树的多阶模型;所述内容偏好模型包括一阶人群聚类模型和二阶指标细分模型;Input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scene recommendation model; the content preference model is based on two-step aggregation. Multi-order model of class method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;
通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取,得到与所述用户对应的一阶人群分类结果,同时通过所述场景推荐模型对所述流量服务属性数据进行场景适配,得到与所述用户对应的主题场景;Perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model, and obtain a first-order crowd classification result corresponding to the user. The service attribute data is adapted to the scene to obtain the theme scene corresponding to the user;
通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析,确定出与所述用户对应的人群偏好标签;Perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;
根据与所述用户对应的所述人群偏好标签和所述主题场景,确定与所述用户对应的内容推荐标签;determining a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;
从内容数据库中获取与所述内容推荐标签匹配的内容数据,并将获取的所述内容数据推荐给所述用户。Acquire content data matching the content recommendation tag from a content database, and recommend the acquired content data to the user.
本申请提供的内容数据推荐方法、装置、计算机设备及存储介质,通过获取用户的用户数据,对所述用户数据进行预处理,得到包含消费属性数据、社会属性数据、访问属性数据和流量服务属性数据的待推荐数据;将与用户对应的所述消费属性数据、所述社会属性数据和所述访问属性数据输入内容偏好模型,同时将所述流量服务属性数据输入场景推荐模型;通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取,得到与所述用户对应的一阶人群分类结果,同时通过所述场景推荐模型对所述流量服务属性数据进行场景适配,得到与所述用户对应的主题场景;通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析,确定出与所述用户对应的人群偏好标签;根据所述人群偏好标签和所述主题场景,确定与所述用户对应的内容推荐标签;从内容数据库中获取与所述内容推荐标签匹配的内容数据,并将获取的所述内容数据推荐给所述用户,如此,实现了通过内容偏好模型、场景推荐模型,对用户数据进行人群特征提取、指标分析和场景适配,确定用户的内容推荐标签,自动匹配出内容数据,并向用户进行推荐,能够准确地推荐内容数据给用户,提高了内容数据推荐的准确率,向用户推荐偏好的内容该数据,避免了不喜好的内容数据展示给用户,提升了用户的体验满意度,并提升了内容数据推荐的有效性。In the content data recommendation method, device, computer equipment and storage medium provided by this application, by acquiring user data of a user, preprocessing the user data to obtain consumption attribute data, social attribute data, access attribute data and traffic service attributes data to be recommended; input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and simultaneously input the traffic service attribute data into the scene recommendation model; The first-order crowd clustering model performs crowd feature extraction on the consumption attribute data and the social attribute data, and obtains a first-order crowd classification result corresponding to the user. Scenario adaptation to obtain the theme scene corresponding to the user; index analysis is performed on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and the crowd corresponding to the user is determined a preference tag; determine a content recommendation tag corresponding to the user according to the crowd preference tag and the theme scene; acquire content data matching the content recommendation tag from a content database, and use the acquired content data Recommend to the user, in this way, through the content preference model and the scene recommendation model, crowd feature extraction, index analysis and scene adaptation are performed on the user data, the user's content recommendation label is determined, the content data is automatically matched, and the user is sent to the user. Recommendation can accurately recommend content data to users, improve the accuracy of content data recommendation, recommend preferred content data to users, avoid disliked content data from being displayed to users, improve user experience satisfaction, and Improve the effectiveness of content data recommendation.
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below, and other features and advantages of the application will become apparent from the description, drawings, and claims.
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. , for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.
图1是本申请一实施例中内容数据推荐方法的应用环境示意图;1 is a schematic diagram of an application environment of a method for recommending content data in an embodiment of the present application;
图2是本申请一实施例中内容数据推荐方法的流程图;2 is a flowchart of a content data recommendation method in an embodiment of the present application;
图3是本申请一实施例中内容数据推荐方法的步骤S30的流程图;FIG. 3 is a flowchart of step S30 of a content data recommendation method in an embodiment of the present application;
图4是本申请一实施例中内容数据推荐方法的步骤S303的流程图;FIG. 4 is a flowchart of step S303 of the content data recommendation method in an embodiment of the present application;
图5是本申请一实施例中内容数据推荐方法的步骤S304的流程图;FIG. 5 is a flowchart of step S304 of the content data recommendation method in an embodiment of the present application;
图6是本申请一实施例中内容数据推荐方法的步骤S40的流程图;6 is a flowchart of step S40 of a content data recommendation method in an embodiment of the present application;
图7是本申请一实施例中内容数据推荐方法的步骤S401的流程图;7 is a flowchart of step S401 of a content data recommendation method in an embodiment of the present application;
图8是本申请一实施例中内容数据推荐方法的步骤S403的流程图;FIG. 8 is a flowchart of step S403 of the content data recommendation method in an embodiment of the present application;
图9是本申请一实施例中内容数据推荐装置的原理框图;9 is a schematic block diagram of an apparatus for recommending content data in an embodiment of the present application;
图10是本申请一实施例中计算机设备的示意图。FIG. 10 is a schematic diagram of a computer device in an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.
本申请提供的内容数据推荐方法,可应用在如图1的应用环境中,其中,客户端(计算机设备)通过网络与服务器进行通信。其中,客户端(计算机设备)包括但不限于为各种个人计算机、笔记本电脑、智能手机、平板电脑、摄像头和便携式可穿戴设备。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The content data recommendation method provided by the present application can be applied in the application environment as shown in FIG. 1 , in which the client (computer device) communicates with the server through the network. Among them, the client (computer equipment) includes but is not limited to various personal computers, notebook computers, smart phones, tablet computers, cameras and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.
在一实施例中,如图2所示,提供一种内容数据推荐方法,其技术方案主要包括以下步骤S10-S60:In one embodiment, as shown in FIG. 2 , a method for recommending content data is provided, and its technical solution mainly includes the following steps S10-S60:
S10,获取用户的用户数据,对所述用户数据进行预处理,得到待推荐数据;所述待推荐数据包括消费属性数据、社会属性数据、访问属性数据和流量服务属性数据。S10: Acquire user data of the user, preprocess the user data, and obtain data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data, and traffic service attribute data.
可理解地,在用户需要在用户的移动终端的应用程序软件中获取自己想要的内容信息时,在所述应用程序软件的界面上触发推荐指令,获取所述用户的所述用户数据,所述用户数据为在所述应用程序软件对应的服务器中与所述用户对应的相关属性的数据,所述用户数据包括与所述用户对应的消费属性、社会属性、访问属性和流量服务属性等属性的数据,所述预处理为对所述用户数据进行正则表达式处理、缺失值补充或去极值处理等过程,所述正则表达式处理为将一个属性的数据统一转换成与该属性对应的数据格式要求的数据,所述缺失值补充为对属性存在空的数据统一转换成与该属性对应的填充数据,所述去极值处理为将一个属性的数据超过或者低于该属性设置的限值的数据统统置换成与其邻近的限值,将经过所述预处理之后的所述用户数据确定为所述待推荐数据,所述待推荐数据包括所述消费属性数据、所述社会属性数据、所述访问属性数据和所述流量服务属性数据,所述消费属性数据为与用户消费相关的属性的数据,所述社会属性数据为与用户的社会基本身份、所持终端和享受业务等相关属性的数据,所述访问属性数据为与用户的访问数据、行为等的相关属性的数据,所述流量服务属性数据为与用户的运营服务相关的属性的数据,例如运营服务商、流量套餐等等。Understandably, when the user needs to obtain the content information he wants in the application software of the user's mobile terminal, a recommendation instruction is triggered on the interface of the application software to obtain the user data of the user, so The user data is data of related attributes corresponding to the user in the server corresponding to the application software, and the user data includes attributes such as consumption attributes, social attributes, access attributes, and traffic service attributes corresponding to the user. The preprocessing is to perform regular expression processing, missing value supplementation or de-extreme value processing on the user data, and the regular expression processing is to uniformly convert the data of an attribute into the corresponding attribute The data required by the data format, the missing value is supplemented by uniformly converting the data with empty attributes into the filling data corresponding to the attribute, and the de-extreme value processing is to make the data of an attribute exceed or fall below the limit set by the attribute. The value data are all replaced with the adjacent limit values, and the user data after the preprocessing is determined as the data to be recommended, and the data to be recommended includes the consumption attribute data, the social attribute data, The access attribute data and the traffic service attribute data, the consumption attribute data is data of attributes related to user consumption, and the social attribute data is related to the user's basic social identity, held terminals, and enjoyment of services and other related attributes. The access attribute data is data related to the user's access data, behavior, etc., and the traffic service attribute data is the attribute data related to the user's operation service, such as operation service provider, data package and so on.
S20,将与用户对应的所述消费属性数据、所述社会属性数据和所述访问属性数据输入内容偏好模型,同时将所述流量服务属性数据输入场景推荐模型;所述内容偏好模型为基于两步聚类法和决策树的多阶模型;所述内容偏好模型包括一阶人群聚类模型和二阶指标细分模型。S20: Input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into a content preference model, and at the same time input the traffic service attribute data into a scene recommendation model; the content preference model is based on two Step clustering method and multi-order model of decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model.
可理解地,所述内容偏好模型为基于两步聚类法和决策树算法且已经构建完成的的多阶模型;所述内容偏好模型包括一阶人群聚类模型和二阶指标细分模型,所述两步聚类法为通过层次聚类或者密度聚类的方法进行初步聚类,得到初步聚类的结果,再从初步聚类的结果中运用分割聚类方法进行二次聚类的方法,所述决策树算法为根据数据的属性采用树状结构建立决策模型的算法,所述内容偏好模型能够实现根据用户的消费属性数据、社会属性数据和访问属性数据自动生成该用户的人群偏好标签,所述人群偏好标签标注出所述用户的偏好,例如人群偏好标签包括阅读领域的励志、当代、青春文学、生活、社科、幻言等,动漫领域的穿越、恋爱等,视频领域的奇幻、武侠、真人秀等,所述场景推荐模型为训练完成的神经网络模型,所述场景推荐模型的网络结构可以根据需求进行设定,比如BP神经网络模型的网络结构、LSTM神经网络模型的网络结构等等,所述场景推荐模型能够实现根据用户的流量服务属性数据自动识别出适配与所述用户的主题场景,所述主题场景为适合用户相关的主题的场景,比如需要流量大的视频场景、需要流量小的阅读场景等等。Understandably, the content preference model is a multi-order model based on the two-step clustering method and the decision tree algorithm and has been constructed; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model, The two-step clustering method is a method of performing preliminary clustering through hierarchical clustering or density clustering to obtain the results of preliminary clustering, and then using segmentation clustering method to perform secondary clustering from the results of preliminary clustering. The decision tree algorithm is an algorithm that uses a tree structure to establish a decision model according to the attributes of the data, and the content preference model can automatically generate the user's crowd preference label according to the user's consumption attribute data, social attribute data and access attribute data. , the crowd preference tag marks the user's preference, for example, the crowd preference tag includes inspirational, contemporary, youth literature, life, social science, fantasy, etc. in the field of reading, travel in the field of animation, love, etc., fantasy in the field of video , martial arts, reality shows, etc., the scene recommendation model is a trained neural network model, and the network structure of the scene recommendation model can be set according to requirements, such as the network structure of the BP neural network model and the network of the LSTM neural network model. structure, etc., the scene recommendation model can realize the automatic identification of the theme scene suitable for the user according to the user's traffic service attribute data, and the theme scene is a scene suitable for the theme related to the user, such as a video that requires a large amount of traffic. Scenarios, reading scenarios that require small traffic, etc.
S30,通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取,得到与所述用户对应的一阶人群分类结果,同时通过所述场景推荐模型对所述流量服务属性数据进行场景适配,得到与所述用户对应的主题场景。S30, perform crowd feature extraction on the consumption attribute data and the social attribute data by using the first-order crowd clustering model, to obtain a first-order crowd classification result corresponding to the user, and at the same time use the scene recommendation model to The traffic service attribute data is used for scene adaptation to obtain the theme scene corresponding to the user.
可理解地,所述人群特征为提取出人群的分类的相关特征,所述人群特征提取为提取出人群之间的属性差异的特征的过程,所述人群特征提取可以包括人群特征探索、决策树算法中的分析及路径还原,所述人群特征探索包括人群密度聚类和人群特征聚类,所述一阶人群分类结果为一种人群类型,即全文中一阶人群种类中的一种人群类型,所述场景适配为通过对所述流量服务属性数据进行卷积后进行识别的过程,通过所述流量服务属性数据自动识别出适配与所述用户的主题场景。Understandably, the crowd feature is to extract the relevant features of the classification of the crowd, and the crowd feature extraction is the process of extracting the feature of the attribute difference between the crowds, and the crowd feature extraction may include crowd feature exploration, decision tree. Analysis and path restoration in the algorithm, the crowd feature exploration includes crowd density clustering and crowd feature clustering, and the first-order crowd classification result is a crowd type, that is, a crowd type in the first-order crowd types in the full text , the scene adaptation is a process of identifying the traffic service attribute data after convolution, and the theme scene adapted to the user is automatically identified through the traffic service attribute data.
其中,所述一阶人群聚类模型可以为基于密度聚类和决策树算法的聚类模型,也可以为基于层次聚类和BP神经网络的聚类模型,所述一阶人群聚类模型能够实现根据消费属性数据和社会属性数据自动提取人群特征,并且根据提取的人群特征进行分类,输出用户 的人群类型。Wherein, the first-order crowd clustering model may be a clustering model based on density clustering and decision tree algorithm, or may be a clustering model based on hierarchical clustering and BP neural network, and the first-order crowd clustering model can be Realize the automatic extraction of crowd characteristics according to consumption attribute data and social attribute data, and classify according to the extracted crowd characteristics, and output the user's crowd type.
在一实施例中,如图3所示,所述步骤S30之前,即所述通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取之前,包括:In one embodiment, as shown in FIG. 3 , before the step S30, that is, before performing crowd feature extraction on the consumption attribute data and the social attribute data by using the first-order crowd clustering model, the method includes:
S301,获取样本数据集。S301, obtain a sample data set.
可理解地,所述样本数据集包括消费属性样本数据、社会属性样本数据和访问属性样本数据。Understandably, the sample data set includes consumption attribute sample data, social attribute sample data and access attribute sample data.
S302,根据一阶属性,对所述样本数据集进行筛选,筛选出一阶属性数据集。S302 , filter the sample data set according to the first-order attribute, and filter out the first-order attribute data set.
可理解地,所述一阶属性包括消费属性和社会属性,所述消费属性为与用户消费相关的属性,所述社会属性为与用户的社会基本身份属性、所持终端属性、享受业务属性等。Understandably, the first-order attributes include consumption attributes and social attributes, where the consumption attributes are attributes related to user consumption, and the social attributes are attributes related to the user's basic social identity, held terminal attributes, enjoyed business attributes, and the like.
S303,将所述一阶属性数据集输入两步聚类模型中,通过所述两步聚类模型对所述一阶属性数据进行人群特征探索,得到一阶人群聚类结果。S303: Input the first-order attribute data set into a two-step clustering model, and perform crowd feature exploration on the first-order attribute data through the two-step clustering model to obtain a first-order crowd clustering result.
可理解地,所述两步聚类模型为基于两步聚类法的模型,所述两步聚类法为通过层次聚类或者密度聚类的方法进行初步聚类,得到初步聚类的结果,再从初步聚类的结果中运用分割聚类方法进行二次聚类的方法,所述人群特征探索为对所述一阶属性数据集进行标准化处理、人群密度聚类和人群特征聚类的过程,探索出人群之间的属性相似性和相异性的特征,从而得到所述一阶人群聚类结果,所述一阶人群聚类结果为初步探索出的人群类型,比如9类人群类型。Understandably, the two-step clustering model is a model based on a two-step clustering method, and the two-step clustering method is to perform preliminary clustering by means of hierarchical clustering or density clustering to obtain the results of preliminary clustering. , and then use the segmentation clustering method to perform secondary clustering from the results of the preliminary clustering, and the crowd feature exploration is a method of standardizing the first-order attribute data set, crowd density clustering and crowd feature clustering. In the process, the characteristics of attribute similarity and dissimilarity between the populations are explored, so as to obtain the first-order population clustering result, and the first-order population clustering result is the initially explored population types, such as 9 types of populations.
在一实施例中,如图4所示,所述步骤S303中,所述通过所述两步聚类模型对所述一阶属性数据进行人群特征探索,得到一阶人群聚类结果,包括:In one embodiment, as shown in FIG. 4 , in step S303, the first-order attribute data is subjected to crowd feature exploration through the two-step clustering model, and a first-order crowd clustering result is obtained, including:
S3031,通过所述两步聚类模型对所述一阶属性数据集进行标准化处理,得到一阶待处理属性数据;所述两步聚类模型包括密度聚类模型和K-means聚类模型。S3031, standardize the first-order attribute data set by using the two-step clustering model to obtain first-order attribute data to be processed; the two-step clustering model includes a density clustering model and a K-means clustering model.
可理解地,所述标准化处理为对所述一阶属性数据集进行所述正则表达式处理、所述缺失值补充、所述去极值处理、one-hot编码转换处理以及正则化处理等处理过程,所述one-hot编码转换又称为一位有效编码,主要是采用N位状态寄存器来对N个状态进行编码,每个状态分配一个整数值,所述正则化处理为将每一个样本的各向量绝对值之和作为范数,再用每个向量去除了这个范数,就得到这个样本正则化后的向量的处理过程,或者将每一个样本的向量先平方和再开方作为范数,再相除的处理过程,从而对所述一阶属性数据集进行所述标准化处理得到所述一阶待处理属性数据。Understandably, the normalization processing is to perform the regular expression processing, the missing value supplementation, the de-extreme value processing, the one-hot encoding conversion processing, and the regularization processing on the first-order attribute data set. Process, the one-hot encoding conversion is also called one-bit effective encoding, mainly using an N-bit state register to encode N states, each state is assigned an integer value, and the regularization process is to convert each sample The sum of the absolute values of each vector is used as the norm, and then each vector is used to remove the norm, and the processing process of the normalized vector of this sample is obtained, or the vector of each sample is squared and then squared as the norm The processing process of dividing and dividing the first-order attribute data set, so as to obtain the first-order attribute data to be processed by performing the normalization processing on the first-order attribute data set.
其中,所述两步聚类模型包括密度聚类模型和K-means聚类模型,所述一阶待处理属性数据为提供给所述两步聚类模型进行聚类的数据。The two-step clustering model includes a density clustering model and a K-means clustering model, and the first-order attribute data to be processed is data provided to the two-step clustering model for clustering.
S3032,运用DBSCAN算法,通过所述密度聚类模型对所述一阶待处理属性数据进行人群密度聚类,得到过渡聚类数据结果。S3032 , using the DBSCAN algorithm, perform crowd density clustering on the first-order attribute data to be processed through the density clustering model, to obtain a transitional clustering data result.
可理解地,所述DBSCAN(Density-based Clustering Method,基于密度的聚类算法)算法为通过各个区域的密度情况,确定出该区域的各类,并隔离出异常值,将其确定为一类的算法,所述人群密度聚类为运用所述DBSCAN算法确定出所有的人群类型的聚类过程,所述过渡聚类数据结果为经过所述人群密度聚类之后获得的人群类型,比如8类人群类型,其中将所有异常值归属于一类人群类型(异常类型)。Understandably, the DBSCAN (Density-based Clustering Method, density-based clustering algorithm) algorithm is to determine the various types of the area through the density of each area, and isolate the outliers, and determine it as a class. The algorithm of the crowd density clustering is to use the DBSCAN algorithm to determine the clustering process of all crowd types, and the transition clustering data result is the crowd type obtained after the crowd density clustering, such as 8 types Crowd type, where all outliers are assigned to one type of crowd (anomaly type).
其中,所述密度聚类模型为运用所述DBSCAN算法进行聚类区分出人群类型的模型。Wherein, the density clustering model is a model that uses the DBSCAN algorithm to perform clustering to distinguish crowd types.
S3033,运用K-means算法,通过所述K-means聚类模型对所述过渡聚类数据结果进行人群特征聚类,得到所述一阶人群聚类结果。S3033, using the K-means algorithm, perform crowd feature clustering on the transition clustering data result by using the K-means clustering model to obtain the first-order crowd clustering result.
可理解地,所述K-means算法为以平均值作为类的“中心”的一种分割聚类算法,所述分割聚类算法为从数据集中随机地选择对象作为聚类的原型,再将其他对象分别分配到由原型所代表的最相似(即距离最近的类中),所述K-means聚类模型为运用K-means算法对所述过渡聚类数据结果进行聚类确定出人群类型的模型,所述人群特征聚类为运用所述K-means算法在所述过渡聚类数据结果的基础上确定出所有的人群类型的聚类过程,其 中,所述一阶人群聚类结果包括与所述过渡聚类数据结果中的异常类型对应的人群类型。Understandably, the K-means algorithm is a segmentation and clustering algorithm with the mean value as the "center" of the class, and the segmentation and clustering algorithm is to randomly select objects from the data set as the prototype of the cluster, and then use Other objects are respectively assigned to the most similar (that is, the closest class) represented by the prototype, and the K-means clustering model is to use the K-means algorithm to cluster the transitional clustering data results to determine the crowd type. The model of the crowd feature clustering is to use the K-means algorithm to determine the clustering process of all crowd types on the basis of the transition clustering data results, wherein, the first-order crowd clustering results include: The crowd type corresponding to the abnormal type in the transition clustering data result.
本申请实现了通过所述两步聚类模型对所述一阶属性数据集进行标准化处理,得到一阶待处理属性数据;运用DBSCAN算法,通过所述密度聚类模型对所述一阶待处理属性数据进行人群密度聚类,得到过渡聚类数据结果;运用K-means算法,通过所述K-means聚类模型对所述过渡聚类数据结果进行人群特征聚类,得到所述一阶人群聚类结果,如此,实现了通过预处理、DBSCAN算法和K-means算法,进行人群密度聚类和人群特征聚类,从而得到一阶人群聚类结果,能够提高人群分类的准确性。The present application realizes that the first-order attribute data set is standardized through the two-step clustering model to obtain the first-order attribute data to be processed; the DBSCAN algorithm is used to perform the standardization processing on the first-order attribute data set through the density clustering model. Perform crowd density clustering on attribute data to obtain transitional clustering data results; use K-means algorithm to perform crowd feature clustering on the transitional clustering data results through the K-means clustering model to obtain the first-order crowd The clustering results, in this way, realize crowd density clustering and crowd feature clustering through preprocessing, DBSCAN algorithm and K-means algorithm, so as to obtain first-order crowd clustering results, which can improve the accuracy of crowd classification.
S304,通过决策树算法,对所述一阶人群聚类结果和所述一阶有效数据集进行分析及路径还原,提炼出与所述一阶人群聚类结果对应的至少一个分类变量。S304 , by using a decision tree algorithm, analyze and restore the path of the first-order crowd clustering result and the first-order valid data set, and extract at least one categorical variable corresponding to the first-order crowd clustering result.
可理解地,所述决策树算法为根据数据的属性采用树状结构建立决策模型的算法,所述分析及路径还原为所述决策树算法中的反推过程,通过分析所述一阶有效数据集与所述一阶人群聚类结果之间的对应关系,分析所述样本用户的各个属性的决策节点,并反推经过的决策节点的路径,从而还原出从一阶有效数据集决策至所述一阶人群聚类结果中人群类型的路径,从而对与路径节点经过的次数达到阈值的属性的决策节点对应的变量进行提炼,并确定为所述分类变量。Understandably, the decision tree algorithm is an algorithm that uses a tree structure to establish a decision model according to the attributes of the data, and the analysis and path are restored to the inverse process in the decision tree algorithm. By analyzing the first-order valid data The corresponding relationship between the set and the clustering results of the first-order crowd is analyzed, the decision nodes of each attribute of the sample users are analyzed, and the path of the decision nodes passed through is reversed, so as to restore the decision from the first-order effective data set to the decision-making node. The path of the crowd type in the first-order crowd clustering result is described, so that the variable corresponding to the decision node of the attribute whose number of times the path node passes reaches the threshold is refined, and determined as the classification variable.
在一实施例中,如图5所示,所述步骤S304中,即所述通过决策树算法,对所述一阶人群聚类结果和所述一阶有效数据集进行分析及路径还原,提炼出与所述一阶人群聚类结果对应的至少一个分类变量,包括:In one embodiment, as shown in FIG. 5 , in the step S304, that is, the first-order crowd clustering result and the first-order effective data set are analyzed and path restored by the decision tree algorithm, and the at least one categorical variable corresponding to the first-order crowd clustering result, including:
S3041,将与相同样本用户对应的一阶人群类型和一阶有效数据进行关联,将关联后的所述一阶有效数据集确定为决策数据集;所述一阶人群聚类结果包括与所述一阶有效数据集中的所述样本用户对应的所述一阶人群类型;所述一阶有效数据集包括与所述样本用户一一对应的所述一阶有效数据。S3041, associate the first-order crowd types corresponding to the same sample users with the first-order valid data, and determine the associated first-order valid data set as a decision-making data set; the first-order crowd clustering result includes The first-order crowd types corresponding to the sample users in the first-order valid data set; the first-order valid data set includes the first-order valid data corresponding to the sample users one-to-one.
可理解地,通过所述两步聚类模型可用划分出所述一阶有效数据集中的所述样本用户的人群类型,从而可以确定出与所述样本用户一一对应的所述一阶有效数据所属的人群类型,即一阶人群类型,将关联后的所述一阶有效数据集确定为决策数据集,说明所有所述一阶有效数据都被关联。Understandably, the two-step clustering model can be used to divide the population types of the sample users in the first-order valid data set, so that the first-order valid data corresponding to the sample users one-to-one can be determined. The type of crowd to which it belongs, that is, the first-order crowd type, determines the associated first-order valid data set as a decision-making data set, indicating that all the first-order valid data are associated.
S3042,将所述决策数据集输入含有初始变量参数的决策反推模型中;S3042, inputting the decision data set into a decision inversion model containing initial variable parameters;
可理解地,所述决策反推模型为根据决策树的树形结构对所述决策数据进行反向推导出识别出人群类型的变量参数的模型。Understandably, the decision inversion model is a model in which variable parameters for identifying crowd types are derived by inversely deriving the decision data according to the tree structure of the decision tree.
S3043,运用决策树算法,通过所述决策反推模型对所述决策数据集进行分析,更新所述初始变量参数。S3043, using a decision tree algorithm, analyze the decision data set through the decision inversion model, and update the initial variable parameters.
可理解地,所述决策树算法为根据数据的属性采用树状结构建立决策模型的算法,也即根据所述决策数据集的数据特征对所述决策数据集进行划分,直到针对所有特征都划分过,或者划分的数据子集的所有数据的人群类型相同,再根据与所述决策数据集中的所述一阶有效数据关联的所述一阶人群类型进行靠拢,不断推导分析出能够划分出所述一阶人群类型的变量参数,并更新初始变量参数,直到达到完全靠拢,将此时的初始变量参数确定为更新后的所述初始变量参数。Understandably, the decision tree algorithm is an algorithm that uses a tree structure to establish a decision model according to the attributes of the data, that is, the decision data set is divided according to the data features of the decision data set, until all the features are divided. or all the data of the divided data subsets have the same population type, and then move closer according to the first-order population type associated with the first-order valid data in the decision-making data set, and continuously deduce and analyze the data that can be divided into The variable parameters of the first-order population type are described, and the initial variable parameters are updated until they are completely close, and the initial variable parameters at this time are determined as the updated initial variable parameters.
S3044,根据更新后的所述初始变量参数进行路径还原,提炼出与所述一阶人群聚类结果对应的所述分类变量。S3044: Perform path restoration according to the updated initial variable parameter, and extract the classification variable corresponding to the first-order crowd clustering result.
可理解地,所述路径还原为还原各所述一阶有效数据的划分路径,确认是否能够抵达与其对应的所述一阶人群类型,还原完所有路径之后,根据路径重合的节点数,确定出所述分类变量,确定的方式可以根据需求设定,比如识别各划分节点被经过的数量,将大于或等于预设数的节点中的变量参数确定为所述分类变量,或者将大于或等于所有节点被经过的数量的平均值的节点中的变量参数确定为所述分类变量等等。Understandably, the path is restored to the division path of each of the first-order valid data, and it is confirmed whether the corresponding first-order crowd type can be reached. After restoring all the paths, according to the number of overlapping nodes of the paths, determine the The way of determining the categorical variable can be set according to requirements, such as identifying the number of divided nodes passed through, determining the variable parameter in the node greater than or equal to the preset number as the categorical variable, or determining the variable parameter greater than or equal to all The variable parameter in the node is determined as the categorical variable, etc., by the mean of the number of nodes passed.
本申请实现了通过将与相同样本用户对应的一阶人群类型和一阶有效数据进行关联, 将关联后的所述一阶有效数据集确定为决策数据集;将所述决策数据集输入含有初始变量参数的决策反推模型中;运用决策树算法,通过所述决策反推模型对所述决策数据集进行分析,更新所述初始变量参数;根据更新后的所述初始变量参数进行路径还原,提炼出与所述一阶人群聚类结果对应的所述分类变量,如此,实现了通过决策反推模型,能够分析及路径还原出细分的规则,从而提炼出分类变量,能够运用决策树算法,更加科学地提取出分类变量,提高人群细分的质量和准确性。The present application realizes that by associating first-order crowd types and first-order valid data corresponding to the same sample users, the associated first-order valid data set is determined as a decision-making data set; In the decision inversion model of variable parameters; the decision tree algorithm is used to analyze the decision data set through the decision inversion model, and the initial variable parameters are updated; the path restoration is performed according to the updated initial variable parameters, The classification variables corresponding to the first-order crowd clustering results are extracted. In this way, it is possible to analyze and restore the subdivision rules through the decision inversion model, so as to extract the classification variables and use the decision tree algorithm. , extract categorical variables more scientifically, and improve the quality and accuracy of crowd segmentation.
S305,根据所有所述分类变量、所述一阶聚类结果和所述一阶有效数据集进行模型重构,构建出一阶人群聚类模型,以及确定出与所述一阶人群聚类模型对应的一阶人群种类,并给所述一阶有效数据集中的各一阶有效数据标记与其对应的人群类型,得到一阶数据集;所述一阶人群种类包括至少一个所述人群类型。S305, perform model reconstruction according to all the classification variables, the first-order clustering results, and the first-order valid data set, construct a first-order crowd clustering model, and determine the difference between the first-order crowd clustering model and the first-order crowd clustering model. corresponding first-order crowd types, and label each first-order valid data in the first-order valid data set with its corresponding crowd type to obtain a first-order data set; the first-order crowd types include at least one of the crowd types.
可理解地,通过决策树算法,重新将所有所述分类变量、所述一阶聚类结果和所述一阶有效数据集进行模型重构,从而构建出一阶人群聚类模型,所述一阶人群种类包括至少一种的人群类型,比如11种人群类型,并给所述一阶有效数据集中的各一阶有效数据标记与其对应的人群类型。Understandably, through the decision tree algorithm, all the classification variables, the first-order clustering results and the first-order valid data sets are remodeled to construct a first-order crowd clustering model, and the first-order crowd clustering model is constructed. The first-order population type includes at least one population type, such as 11 population types, and each first-order valid data set in the first-order valid data set is marked with its corresponding population type.
本申请实现了通过获取样本数据集;根据一阶属性,对所述样本数据集进行筛选,筛选出一阶属性数据集;将所述一阶属性数据集输入两步聚类模型中,通过所述两步聚类模型对所述一阶属性数据进行人群特征探索,得到一阶人群聚类结果;通过决策树算法,对所述一阶人群聚类结果和所述一阶有效数据集进行分析及路径还原,提炼出与所述一阶人群聚类结果对应的至少一个分类变量;根据所有所述分类变量、所述一阶聚类结果和所述一阶有效数据集进行模型重构,构建出一阶人群聚类模型,以及确定出与所述一阶人群聚类模型对应的一阶人群种类,如此,实现了通过两步聚类模型和决策树算法,进行人群特征探索及路径还原,从而分析出分类变量,从而构建出一阶人群聚类模型,能够准确地构建一阶人群聚类模型,提高人群分类的准确性。The present application realizes that by obtaining a sample data set; screening the sample data set according to the first-order attributes, and filtering out a first-order attribute data set; inputting the first-order attribute data set into a two-step clustering model, The two-step clustering model performs crowd characteristics exploration on the first-order attribute data, and obtains a first-order crowd clustering result; through decision tree algorithm, analyzes the first-order crowd clustering result and the first-order effective data set. and path restoration, extract at least one classification variable corresponding to the first-order crowd clustering result; carry out model reconstruction according to all the classification variables, the first-order clustering results and the first-order valid data set, and construct A first-order crowd clustering model is created, and the first-order crowd type corresponding to the first-order crowd clustering model is determined. In this way, a two-step clustering model and a decision tree algorithm are used to explore crowd characteristics and restore paths. Thereby, the classification variables are analyzed, and the first-order crowd clustering model can be constructed, which can accurately construct the first-order crowd clustering model and improve the accuracy of crowd classification.
S40,通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析,确定出与所述用户对应的人群偏好标签。S40: Perform index analysis on the first-order crowd classification result and the access attribute data by using the second-order index subdivision model, and determine a crowd preference label corresponding to the user.
可理解地,所述一阶人群分类结果为一种人群类型,即全文中一阶人群种类中的一种人群类型,所述二阶指标细分模型为构建完成的用户细分的聚类模型,所述二阶指标细分模型能够实现根据获取的所述一阶人群分类结果中的人群类型和所述访问属性数据进行指标分析,能够分析及匹配出用户的人群偏好标签,所述人群偏好标签标注出所述用户的偏好。Understandably, the first-order population classification result is a population type, that is, a population type in the first-order population categories in the full text, and the second-order index subdivision model is a clustering model of the completed user subdivision. , the second-order index subdivision model can realize index analysis according to the crowd type and the access attribute data in the obtained first-order crowd classification result, and can analyze and match the user's crowd preference label, the crowd preference The label identifies the user's preferences.
在一实施例中,如图6所示,所述步骤S40之前,即所述通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析之前,包括:In one embodiment, as shown in FIG. 6 , before the step S40, that is, before the index analysis is performed on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, the following steps are included: :
S401,根据所述一阶数据集和所述样本数据集中的访问属性样本数据,合并生成二阶属性数据集。S401. Combine the first-order data set and the access attribute sample data in the sample data set to generate a second-order attribute data set.
可理解地,将所述样本数据集中的所述访问属性样本数据相应的增加至所述一阶数据集中,即将一个用户的所述访问属性样本数据增加至与该用户对应的所述一阶数据集中的所述一阶有效数据,可以在所述一阶有效数据后插入所述访问属性样本数据,从而合并得到所述二阶属性数据集。Understandably, the access attribute sample data in the sample data set is correspondingly added to the first-order data set, that is, the access attribute sample data of a user is added to the first-order data corresponding to the user. For the first-order valid data in the collection, the access attribute sample data may be inserted after the first-order valid data, thereby combining to obtain the second-order attribute data set.
在一实施例中,如图7所示,所述步骤S401中,即所述根据所述一阶数据集和所述样本数据集中的访问属性样本数据,合并生成二阶属性数据集,包括:In one embodiment, as shown in FIG. 7 , in the step S401, that is, combining and generating a second-order attribute data set according to the first-order data set and the access attribute sample data in the sample data set, including:
S4011,对所述访问属性样本数据进行随机抽取字段,抽取出待处理属性数据。S4011 , randomly extracting fields from the access attribute sample data, and extracting to-be-processed attribute data.
可理解地,所述随机抽取字段为随机性的抽取所有所述访问属性样本数据中的字段,从而能够通过分散的数据分布情况进行分析,能够更加客观的对用户的访问行为进行分析,将抽取出的字段数据确定为所述待处理属性数据。Understandably, the randomly selected fields are randomly selected fields in all the access attribute sample data, so that the scattered data distribution can be analyzed, the user's access behavior can be analyzed more objectively, and the extracted data can be analyzed. The output field data is determined as the to-be-processed attribute data.
S4012,对所述待处理属性数据进行缺失值处理和极值处理,得到待增加属性数据。S4012: Perform missing value processing and extreme value processing on the attribute data to be processed to obtain attribute data to be added.
可理解地,对所有所述待处理属性数据进行所述缺失值处理和所述极值处理,所述缺失值处理包括删除含有缺失值的数据和可能值插补缺失值,即对部分字段的属性数据进行删除含有缺失值的数据,对部分字段的属性数据进行可能值插补缺失值的方式补充数据,所述极值处理为将一个属性的数据超过或者低于该属性设置的限值的数据统统去除或者置换成统一的数据的处理过程。Understandably, the missing value processing and the extreme value processing are performed on all the attribute data to be processed, and the missing value processing includes deleting data containing missing values and interpolating missing values with possible values, that is, for some fields. The attribute data is deleted by the data containing missing values, and the attribute data of some fields is supplemented by means of possible value interpolation of missing values. The process of removing or replacing all data into unified data.
S4013,将所述待增加属性数据对应增加至所述一阶数据集中,生成所述二阶属性数据集。S4013. Correspondingly add the attribute data to be added to the first-order data set to generate the second-order attribute data set.
可理解地,在所述一阶有效数据后插入所述待增加属性数据,从而合并得到所述二阶属性数据集。Understandably, the attribute data to be added is inserted after the first-order valid data, so as to combine to obtain the second-order attribute data set.
S402,通过偏好行为模型对所述二阶属性数据集进行指标特征提取,得到至少一个综合指标变量。S402. Perform index feature extraction on the second-order attribute data set through a preference behavior model to obtain at least one comprehensive index variable.
可理解地,所述指标特征提取为根据所述二阶属性数据集计算出各个指标的贡献度,并提取出达到阈值要求的贡献度的指标的过程,所述贡献度为该指标在用户的访问行为数据中的贡献程度,即占比程度。Understandably, the index feature extraction is a process of calculating the contribution degree of each index according to the second-order attribute data set, and extracting the index of the contribution degree that meets the threshold requirement, and the contribution degree is the index in the user's profile. The degree of contribution in the access behavior data, that is, the degree of proportion.
S403,根据所述一阶人群类型结果和所有所述综合指标变量,对所述二阶属性数据集进行分段分析,并构建出二阶指标细分模型。S403: Perform segmental analysis on the second-order attribute data set according to the first-order crowd type results and all the comprehensive index variables, and construct a second-order index subdivision model.
可理解地,根据所述一阶人群类型结果和所有所述综合指标变量,将每个所述一阶人群类型再按所述综合指标变量细分成与所述综合指标变量一一对应的区段,再通过所述二阶属性数据集落入的区段进行各个区段的分析,所述分段分析为对各个被划分的区段进行占比及权重分析过程,即对各个区段进行邻近区段合并或者拆解,最终能够达到各个处理后的区段的占比大于或者等于预设占比,以及权重大于预设权重,从而根据各个达到要求的区段构建出所述二阶指标细分模型。Understandably, according to the results of the first-order crowd types and all the comprehensive index variables, each of the first-order crowd types is subdivided into regions corresponding to the comprehensive index variables one-to-one according to the comprehensive index variables. segment, and then analyze each segment through the segment into which the second-order attribute data set falls, and the segmental analysis is the process of performing proportion and weight analysis on each segmented segment, that is, performing adjacent segment analysis on each segment. Sections are merged or disassembled, and finally the proportion of each processed section can be greater than or equal to the preset proportion, and the weight is greater than the preset weight, so that the second-order index details can be constructed according to each section that meets the requirements. split model.
本申请实现了通过根据所述一阶数据集和所述样本数据集中的访问属性样本数据,合并生成二阶属性数据集;通过偏好行为模型对所述二阶属性数据集进行指标特征提取,得到至少一个综合指标变量;根据所述一阶人群类型结果和所有所述综合指标变量,对所述二阶属性数据集进行分段分析,并构建出二阶指标细分模型,如此,实现了通过偏好行为模型进行指标特征提取,并经过分段分析之后,构建二阶指标细分模型,从而能够准确地、科学地、客观地构建二阶指标细分模型,提高人群细分的准确性和可靠性。The present application realizes that a second-order attribute data set is generated by combining the access attribute sample data in the first-order data set and the sample data set; and the index feature extraction is performed on the second-order attribute data set through a preference behavior model to obtain At least one comprehensive index variable; according to the first-order crowd type results and all the comprehensive index variables, segmental analysis is performed on the second-order attribute data set, and a second-order index subdivision model is constructed. The preference behavior model is used to extract index features, and after segmental analysis, a second-order index subdivision model can be constructed, so that the second-order index subdivision model can be constructed accurately, scientifically and objectively, and the accuracy and reliability of crowd segmentation can be improved. sex.
在一实施例中,如图8所示,所述步骤S403中,即所述根据所述一阶人群类型结果和所有所述综合指标变量,对所述二阶属性数据集进行分段分析,并构建出二阶指标细分模型,包括:In one embodiment, as shown in FIG. 8 , in step S403, that is, according to the first-order crowd type results and all the comprehensive index variables, the second-order attribute data set is segmented and analyzed, And build a second-order indicator segmentation model, including:
S4031,对所有所述综合指标变量进行特征分析及降维处理,得到主成分指标变量。S4031: Perform feature analysis and dimension reduction processing on all the comprehensive index variables to obtain principal component index variables.
可理解地,由于所述综合指标变量包括很多指标,指标数量太多,不能直接对用户特征提炼与用户细分进行研究,需会对指标进行整合,进一步对属性数据的相关联性进行分析,对指标进行降维处理,所述特征分析为计算所述指标在各维度上的相似度,通过各指标在各维度上的相似度值进行分析,即对各指标与各维度的最大相似度值,并结合各指标的贡献度进行两个维度合并归类的过程,让两个维度合并后的维度的总贡献度与平均贡献度公差最小,将两个维度合并后的维度确定为进阶维度,所述总贡献度为所述进阶维度下的各指标的贡献度之和。Understandably, because the comprehensive indicator variable includes many indicators, and the number of indicators is too large, it is impossible to directly conduct research on user feature extraction and user segmentation, and it is necessary to integrate indicators and further analyze the correlation of attribute data. Dimension reduction processing is performed on the indicators, and the feature analysis is to calculate the similarity of the indicators in each dimension, and analyze the similarity value of each indicator in each dimension, that is, the maximum similarity value between each indicator and each dimension is analyzed. , and combine the contribution of each indicator to carry out the process of merging and classifying the two dimensions, so that the tolerance of the total contribution and the average contribution of the combined dimension of the two dimensions is the smallest, and the combined dimension of the two dimensions is determined as an advanced dimension. , and the total contribution degree is the sum of the contribution degrees of each index under the advanced dimension.
其中,所述降维处理为对进阶维度下的各指标设置权重参数,所述权重参数为指标在与该指标对应的进阶维度下的占比,即指标的贡献度在进阶维度的总贡献度的占比,计算所有所述进阶维度下的所述权重参数的权重均值,将与大于所述权重均值的所述权重参数对应的所述指标和与该指标对应的进阶维度进行合并,确定为所述主成分指标变量的过程,如此,可以将多个进阶维度进行降维成几个具有代表性的主成分指标变量,所述主成分指标变量表明了用户内容偏好的主要因子。The dimensionality reduction process is to set a weight parameter for each index in the advanced dimension, and the weight parameter is the proportion of the index in the advanced dimension corresponding to the index, that is, the contribution of the index is in the advanced dimension. The proportion of the total contribution, calculate the weight mean of the weight parameters under all the advanced dimensions, and compare the index corresponding to the weight parameter greater than the weight mean and the advanced dimension corresponding to the index The process of merging and determining as the principal component index variable, in this way, multiple advanced dimensions can be reduced into several representative principal component index variables, and the principal component index variable indicates the user's content preference. main factor.
S4032,将与相同用户对应的人群类型和二阶属性数据进行关联,将关联后的所述二阶属性数据集确定为待细分数据集;所述一阶人群类型结果包括与所述用户对应的所述人群类型;所述二阶属性数据集包括与所述用户一一对应的所述二阶属性数据。S4032, associate the crowd type corresponding to the same user with the second-order attribute data, and determine the associated second-order attribute data set as the data set to be subdivided; the first-order crowd type result includes the data corresponding to the user and the second-order attribute data set includes the second-order attribute data corresponding to the users one-to-one.
可理解地,将与同一个用户对应的所述人群类型跟所述二阶属性数据进行关联,相当于给所述二阶属性数据赋予一种人群类型的标签,所述一阶人群类型结果包括与所述用户对应的所述人群类型。Understandably, associating the crowd type corresponding to the same user with the second-order attribute data is equivalent to assigning a crowd type label to the second-order attribute data, and the first-order crowd type result includes: the crowd type corresponding to the user.
S4033,根据所述主成分指标变量,对所述待细分数据集进行分段分析,并构建出所述二阶指标细分模型。S4033: Perform segmental analysis on the data set to be subdivided according to the principal component index variable, and construct the second-order index subdivision model.
可理解地,所述分段分析为将所述待细分数据集中的各所述二阶属性数据划分到各所述主成分指标变量的区段的分析过程,指根据所述二阶属性数据与所述主成分指标变量之间的聚类程度进行学习过程,学习方式为无监督聚类学习,从而构建出所述二阶指标细分模型。Understandably, the segmental analysis is an analysis process of dividing each of the second-order attribute data in the data set to be subdivided into sections of each of the principal component index variables, and refers to the process of dividing the second-order attribute data according to the second-order attribute data. The learning process is performed with the clustering degree between the principal component index variables, and the learning method is unsupervised clustering learning, so as to construct the second-order index subdivision model.
本申请实现了通过对所有所述综合指标变量进行特征分析及降维处理,得到主成分指标变量;将与相同用户对应的人群类型和二阶属性数据进行关联,将关联后的所述二阶属性数据集确定为待细分数据集;根据所述主成分指标变量,对所述待细分数据集进行分段分析,并构建出所述二阶指标细分模型,如此,实现了通过特征分析和降维处理,能够更加直接地对人群进行细分,提高了人群细分的可靠性和准确性。The present application realizes that the principal component index variable is obtained by performing feature analysis and dimension reduction processing on all the comprehensive index variables; the group type corresponding to the same user is associated with the second-order attribute data, and the associated second-order The attribute data set is determined as the data set to be subdivided; according to the principal component index variable, segmental analysis is performed on the data set to be subdivided, and the second-order index subdivision model is constructed. Analysis and dimensionality reduction processing can more directly subdivide the population, and improve the reliability and accuracy of the population segmentation.
S50,根据与所述用户对应的所述人群偏好标签和所述主题场景,确定与所述用户对应的内容推荐标签。S50, according to the group preference tag corresponding to the user and the theme scene, determine a content recommendation tag corresponding to the user.
可理解地,根据确定出的与所述用户对应的所述人群偏好标签和所述主题场景,映射出与所述人群偏好标签和所述主题场景均匹配的所述内容推荐标签,所述内容推荐标签为推荐给用户的内容数据所属的标签,所述内容推荐标签可以根据需求设定,比如内容标签可以为视频奇幻、视频武侠等等。Understandably, according to the determined crowd preference label and the theme scene corresponding to the user, the content recommendation label that matches both the crowd preference label and the theme scene is mapped, and the content The recommendation tag is a tag to which the content data recommended to the user belongs, and the content recommendation tag can be set according to requirements, for example, the content tag can be video fantasy, video martial arts, and so on.
S60,从内容数据库中获取与所述内容推荐标签匹配的内容数据,并将获取的所述内容数据推荐给所述用户。S60: Acquire content data matching the content recommendation tag from a content database, and recommend the acquired content data to the user.
可理解地,所述内容数据库为与所述应用程序软件对应的所述服务器中存储的一段时间段内或者当天存储的所有内容数据,所述内容数据为移动互联网上的内容信息,所述内容数据都会标注内容标签,从所述内容数据库中查找到与所述内容推荐标签匹配的所述内容标签对应的所述内容数据,并获取查找到的所述内容该数据,将其推荐给与所述用户对应的移动终端的所述应用程序软件的界面上,以供所述用户进行查看。Understandably, the content database is all content data stored in the server corresponding to the application software within a period of time or on the current day, the content data is content information on the mobile Internet, the content The data will be marked with a content tag, find the content data corresponding to the content tag matching the content recommendation tag from the content database, obtain the found content data, and recommend it to the relevant users. on the interface of the application software of the mobile terminal corresponding to the user for the user to view.
本申请实现了通过获取用户的用户数据,对所述用户数据进行预处理,得到包含消费属性数据、社会属性数据、访问属性数据和流量服务属性数据的待推荐数据;将与用户对应的所述消费属性数据、所述社会属性数据和所述访问属性数据输入内容偏好模型,同时将所述流量服务属性数据输入场景推荐模型;通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取,得到与所述用户对应的一阶人群分类结果,同时通过所述场景推荐模型对所述流量服务属性数据进行场景适配,得到与所述用户对应的主题场景;通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析,确定出与所述用户对应的人群偏好标签;根据所述人群偏好标签和所述主题场景,确定与所述用户对应的内容推荐标签;从内容数据库中获取与所述内容推荐标签匹配的内容数据,并将获取的所述内容数据推荐给所述用户,如此,实现了通过内容偏好模型、场景推荐模型,对用户数据进行人群特征提取、指标分析和场景适配,确定用户的内容推荐标签,自动匹配出内容数据,并向用户进行推荐,能够准确地推荐内容数据给用户,提高了内容数据推荐的准确率,向用户推荐偏好的内容该数据,避免了不喜好的内容数据展示给用户,提升了用户的体验满意度,并提升了内容数据推荐的有效性。The present application realizes that by acquiring the user data of the user, preprocessing the user data to obtain the data to be recommended including consumption attribute data, social attribute data, access attribute data and traffic service attribute data; The consumption attribute data, the social attribute data and the access attribute data are input into the content preference model, while the traffic service attribute data is input into the scene recommendation model; The social attribute data is used to perform crowd feature extraction to obtain the first-order crowd classification result corresponding to the user. At the same time, the scene recommendation model is used to adapt the traffic service attribute data to the scene to obtain the theme scene corresponding to the user. ; perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine the crowd preference label corresponding to the user; according to the crowd preference label and the theme In the scenario, the content recommendation tag corresponding to the user is determined; the content data matching the content recommendation tag is obtained from the content database, and the acquired content data is recommended to the user. Model and scene recommendation model, perform crowd feature extraction, indicator analysis and scene adaptation on user data, determine user content recommendation tags, automatically match content data, and recommend to users, which can accurately recommend content data to users, improve It improves the accuracy of content data recommendation, recommends preferred content data to users, avoids disliked content data from being displayed to users, improves user experience satisfaction, and improves the effectiveness of content data recommendation.
在一实施例中,提供一种内容数据推荐装置,该内容数据推荐装置与上述实施例中内 容数据推荐方法一一对应。如图9所示,该内容数据推荐装置包括获取模块11、输入模块12、识别模块13、分析模块14、确定模块15和推荐模块16。各功能模块详细说明如下:In one embodiment, a content data recommendation apparatus is provided, and the content data recommendation apparatus corresponds one-to-one with the content data recommendation method in the above-mentioned embodiment. As shown in FIG. 9 , the content data recommendation apparatus includes an acquisition module 11 , an input module 12 , an identification module 13 , an analysis module 14 , a determination module 15 and a recommendation module 16 . The detailed description of each functional module is as follows:
获取模块11,用于获取用户的用户数据,对所述用户数据进行预处理,得到待推荐数据;所述待推荐数据包括消费属性数据、社会属性数据、访问属性数据和流量服务属性数据;The obtaining module 11 is used for obtaining user data of the user, and preprocessing the user data to obtain the data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
输入模块12,用于将与用户对应的所述消费属性数据、所述社会属性数据和所述访问属性数据输入内容偏好模型,同时将所述流量服务属性数据输入场景推荐模型;所述内容偏好模型为基于两步聚类法和决策树的多阶模型;所述内容偏好模型包括一阶人群聚类模型和二阶指标细分模型;The input module 12 is used to input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scene recommendation model; the content preference The model is a multi-order model based on a two-step clustering method and a decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;
识别模块13,用于通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取,得到与所述用户对应的一阶人群分类结果,同时通过所述场景推荐模型对所述流量服务属性数据进行场景适配,得到与所述用户对应的主题场景;The identification module 13 is configured to perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model to obtain a first-order crowd classification result corresponding to the user, and at the same time pass through the scene The recommendation model performs scene adaptation on the traffic service attribute data to obtain the theme scene corresponding to the user;
分析模块14,用于通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析,确定出与所述用户对应的人群偏好标签;An analysis module 14, configured to perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;
确定模块15,用于根据与所述用户对应的所述人群偏好标签和所述主题场景,确定与所述用户对应的内容推荐标签;A determination module 15, configured to determine a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;
推荐模块16,用于从内容数据库中获取与所述内容推荐标签匹配的内容数据,并将获取的所述内容数据推荐给所述用户。The recommendation module 16 is configured to acquire content data matching the content recommendation tag from the content database, and recommend the acquired content data to the user.
关于内容数据推荐装置的具体限定可以参见上文中对于内容数据推荐方法的限定,在此不再赘述。上述内容数据推荐装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the content data recommendation apparatus, please refer to the limitation of the content data recommendation method above, which will not be repeated here. Each module in the above-mentioned content data recommendation apparatus may be implemented in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括可读存储介质、内存储器。该可读存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种内容数据推荐方法。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。In one embodiment, a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10 . The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a readable storage medium, an internal memory. The readable storage medium stores an operating system, computer readable instructions and a database. The internal memory provides an environment for the execution of the operating system and computer-readable instructions in the readable storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions, when executed by a processor, implement a content data recommendation method. The readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现上述实施例中内容数据推荐方法。In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor implements the content in the above embodiments when the processor executes the computer-readable instructions Data recommendation method.
在一个实施例中,提供了一个或多个存储有计算机可读指令的可读存储介质,本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质;该可读存储介质上存储有计算机可读指令,该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现上述实施例中内容数据推荐方法。In one embodiment, one or more readable storage media storing computer-readable instructions are provided, and the readable storage media provided in this embodiment include non-volatile readable storage media and volatile readable storage media medium; computer-readable instructions are stored on the readable storage medium, and when the computer-readable instructions are executed by one or more processors, cause the one or more processors to implement the method for recommending content data in the foregoing embodiments.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质或易失性可读存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM (SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing the relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium or a volatile readable storage medium, the computer-readable instructions, when executed, may include the processes of the foregoing method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated to different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it can still be used for the above-mentioned implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be included in the within the scope of protection of this application.

Claims (20)

  1. 一种内容数据推荐方法,其中,包括:A content data recommendation method, comprising:
    获取用户的用户数据,对所述用户数据进行预处理,得到待推荐数据;所述待推荐数据包括消费属性数据、社会属性数据、访问属性数据和流量服务属性数据;Obtaining user data of the user, preprocessing the user data, and obtaining data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
    将与用户对应的所述消费属性数据、所述社会属性数据和所述访问属性数据输入内容偏好模型,同时将所述流量服务属性数据输入场景推荐模型;所述内容偏好模型为基于两步聚类法和决策树的多阶模型;所述内容偏好模型包括一阶人群聚类模型和二阶指标细分模型;Input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scenario recommendation model; the content preference model is based on a two-step aggregation model. Multi-order model of class method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;
    通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取,得到与所述用户对应的一阶人群分类结果,同时通过所述场景推荐模型对所述流量服务属性数据进行场景适配,得到与所述用户对应的主题场景;Perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model, and obtain a first-order crowd classification result corresponding to the user. The service attribute data is adapted to the scene to obtain the theme scene corresponding to the user;
    通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析,确定出与所述用户对应的人群偏好标签;Perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;
    根据与所述用户对应的所述人群偏好标签和所述主题场景,确定与所述用户对应的内容推荐标签;determining a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;
    从内容数据库中获取与所述内容推荐标签匹配的内容数据,并将获取的所述内容数据推荐给所述用户。Acquire content data matching the content recommendation tag from a content database, and recommend the acquired content data to the user.
  2. 如权利要求1所述的内容数据推荐方法,其中,所述通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取之前,包括:The content data recommendation method according to claim 1, wherein before performing crowd feature extraction on the consumption attribute data and the social attribute data by using the first-order crowd clustering model, the method comprises:
    获取样本数据集;Get a sample dataset;
    根据一阶属性,对所述样本数据集进行筛选,筛选出一阶属性数据集;Screening the sample data set according to the first-order attribute, and filtering out the first-order attribute data set;
    将所述一阶属性数据集输入两步聚类模型中,通过所述两步聚类模型对所述一阶属性数据进行人群特征探索,得到一阶人群聚类结果;Inputting the first-order attribute data set into a two-step clustering model, and performing crowd feature exploration on the first-order attribute data through the two-step clustering model to obtain a first-order crowd clustering result;
    通过决策树算法,对所述一阶人群聚类结果和所述一阶有效数据集进行分析及路径还原,提炼出与所述一阶人群聚类结果对应的至少一个分类变量;Through a decision tree algorithm, the first-order crowd clustering result and the first-order effective data set are analyzed and path restored, and at least one categorical variable corresponding to the first-order crowd clustering result is extracted;
    根据所有所述分类变量、所述一阶聚类结果和所述一阶有效数据集进行模型重构,构建出一阶人群聚类模型,以及确定出与所述一阶人群聚类模型对应的一阶人群种类,并给所述一阶有效数据集中的各一阶有效数据标记与其对应的人群类型,得到一阶数据集;所述一阶人群种类包括至少一个所述人群类型。Carry out model reconstruction according to all the classification variables, the first-order clustering results and the first-order valid data sets, construct a first-order crowd clustering model, and determine the corresponding first-order crowd clustering model. First-order crowd types, and label each first-order valid data set with its corresponding crowd type to obtain a first-order data set; the first-order crowd types include at least one of the crowd types.
  3. 如权利要求2所述的内容数据推荐方法,其中,所述通过所述两步聚类模型对所述一阶属性数据进行人群特征探索,得到一阶人群聚类结果,包括:The method for recommending content data according to claim 2, wherein, performing crowd feature exploration on the first-order attribute data through the two-step clustering model to obtain a first-order crowd clustering result, comprising:
    通过所述两步聚类模型对所述一阶属性数据集进行标准化处理,得到一阶待处理属性数据;所述两步聚类模型包括密度聚类模型和K-means聚类模型;The first-order attribute data set is standardized by the two-step clustering model to obtain first-order attribute data to be processed; the two-step clustering model includes a density clustering model and a K-means clustering model;
    运用DBSCAN算法,通过所述密度聚类模型对所述一阶待处理属性数据进行人群密度聚类,得到过渡聚类数据结果;Using the DBSCAN algorithm, crowd density clustering is performed on the first-order attribute data to be processed through the density clustering model to obtain a transitional clustering data result;
    运用K-means算法,通过所述K-means聚类模型对所述过渡聚类数据结果进行人群特征聚类,得到所述一阶人群聚类结果。The K-means algorithm is used to perform crowd feature clustering on the transition clustering data results through the K-means clustering model to obtain the first-order crowd clustering result.
  4. 如权利要求2所述的内容数据推荐方法,其中,所述通过决策树算法,对所述一阶人群聚类结果和所述一阶有效数据集进行分析及路径还原,提炼出与所述一阶人群聚类结果对应的至少一个分类变量,包括:The content data recommendation method according to claim 2, wherein the first-order crowd clustering result and the first-order effective data set are analyzed and the path is restored through a decision tree algorithm, and the first-order crowd clustering result and the first-order effective data set are analyzed and the path is restored, and a At least one categorical variable corresponding to the clustering results of the order population, including:
    将与相同样本用户对应的一阶人群类型和一阶有效数据进行关联,将关联后的所述一阶有效数据集确定为决策数据集;所述一阶人群聚类结果包括与所述一阶有效数据集中的所述样本用户对应的所述一阶人群类型;所述一阶有效数据集包括与所述样本用户一一对应的所述一阶有效数据;Associating first-order crowd types corresponding to the same sample users with first-order valid data, and determining the associated first-order valid data set as a decision-making data set; the first-order crowd clustering result includes a the first-order crowd types corresponding to the sample users in the valid data set; the first-order valid data set includes the first-order valid data corresponding to the sample users one-to-one;
    将所述决策数据集输入含有初始变量参数的决策反推模型中;Inputting the decision data set into a decision inversion model containing initial variable parameters;
    运用决策树算法,通过所述决策反推模型对所述决策数据集进行分析,更新所述初始变量参数;Using a decision tree algorithm, the decision data set is analyzed through the decision inversion model, and the initial variable parameters are updated;
    根据更新后的所述初始变量参数进行路径还原,提炼出与所述一阶人群聚类结果对应的所述分类变量。The path restoration is performed according to the updated initial variable parameters, and the classification variable corresponding to the first-order crowd clustering result is extracted.
  5. 如权利要求2所述的内容数据推荐方法,其中,所述通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析之前,包括:The content data recommendation method according to claim 2, wherein before performing the index analysis on the first-order crowd classification result and the access attribute data by using the second-order index subdivision model, the method comprises:
    根据所述一阶数据集和所述样本数据集中的访问属性样本数据,合并生成二阶属性数据集;According to the first-order data set and the access attribute sample data in the sample data set, merge to generate a second-order attribute data set;
    通过偏好行为模型对所述二阶属性数据集进行指标特征提取,得到至少一个综合指标变量;Perform index feature extraction on the second-order attribute data set through a preference behavior model to obtain at least one comprehensive index variable;
    根据所述一阶人群类型结果和所有所述综合指标变量,对所述二阶属性数据集进行分段分析,并构建出二阶指标细分模型。According to the results of the first-order crowd type and all the comprehensive index variables, segmental analysis is performed on the second-order attribute data set, and a second-order index subdivision model is constructed.
  6. 如权利要求5所述的内容数据推荐方法,其中,所述根据所述一阶数据集和所述样本数据集中的访问属性样本数据,合并生成二阶属性数据集,包括:The method for recommending content data according to claim 5, wherein the combining to generate a second-order attribute data set according to the first-order data set and the access attribute sample data in the sample data set comprises:
    对所述访问属性样本数据进行随机抽取字段,抽取出待处理属性数据;Randomly extracting fields from the access attribute sample data, and extracting the attribute data to be processed;
    对所述待处理属性数据进行缺失值处理和极值处理,得到待增加属性数据;Perform missing value processing and extreme value processing on the attribute data to be processed to obtain attribute data to be added;
    将所述待增加属性数据对应增加至所述一阶数据集中,生成所述二阶属性数据集。The attribute data to be added is correspondingly added to the first-order data set to generate the second-order attribute data set.
  7. 如权利要求5所述的内容数据推荐方法,其中,所述根据所述一阶人群类型结果和所有所述综合指标变量,对所述二阶属性数据集进行分段分析,并构建出二阶指标细分模型,包括:The content data recommendation method according to claim 5, wherein the second-order attribute data set is segmented and analyzed according to the first-order crowd type results and all the comprehensive index variables, and a second-order attribute data set is constructed. Metrics segmentation models, including:
    对所有所述综合指标变量进行特征分析及降维处理,得到主成分指标变量;Perform feature analysis and dimensionality reduction processing on all the comprehensive index variables to obtain principal component index variables;
    将与相同用户对应的人群类型和二阶属性数据进行关联,将关联后的所述二阶属性数据集确定为待细分数据集;所述一阶人群类型结果包括与所述用户对应的所述人群类型;所述二阶属性数据集包括与所述用户一一对应的所述二阶属性数据;Associating the crowd type corresponding to the same user with the second-order attribute data, and determining the associated second-order attribute data set as the data set to be subdivided; the first-order crowd type result includes all the data sets corresponding to the user. the crowd type; the second-order attribute data set includes the second-order attribute data corresponding to the users one-to-one;
    根据所述主成分指标变量,对所述待细分数据集进行分段分析,并构建出所述二阶指标细分模型。According to the principal component index variable, segmental analysis is performed on the data set to be subdivided, and the second-order index subdivision model is constructed.
  8. 一种内容数据推荐装置,其中,包括:A content data recommendation device, comprising:
    获取模块,用于获取用户的用户数据,对所述用户数据进行预处理,得到待推荐数据;所述待推荐数据包括消费属性数据、社会属性数据、访问属性数据和流量服务属性数据;an acquisition module, configured to acquire user data of the user, preprocess the user data, and obtain data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
    输入模块,用于将与用户对应的所述消费属性数据、所述社会属性数据和所述访问属性数据输入内容偏好模型,同时将所述流量服务属性数据输入场景推荐模型;所述内容偏好模型为基于两步聚类法和决策树的多阶模型;所述内容偏好模型包括一阶人群聚类模型和二阶指标细分模型;an input module, configured to input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scene recommendation model; the content preference model is a multi-order model based on two-step clustering method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index subdivision model;
    识别模块,用于通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取,得到与所述用户对应的一阶人群分类结果,同时通过所述场景推荐模型对所述流量服务属性数据进行场景适配,得到与所述用户对应的主题场景;An identification module, configured to perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model to obtain a first-order crowd classification result corresponding to the user, and recommend through the scene The model performs scene adaptation on the traffic service attribute data to obtain a theme scene corresponding to the user;
    分析模块,用于通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析,确定出与所述用户对应的人群偏好标签;an analysis module, configured to perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;
    确定模块,用于根据与所述用户对应的所述人群偏好标签和所述主题场景,确定与所述用户对应的内容推荐标签;a determining module, configured to determine a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;
    推荐模块,用于从内容数据库中获取与所述内容推荐标签匹配的内容数据,并将获取的所述内容数据推荐给所述用户。A recommendation module, configured to acquire content data matching the content recommendation tag from the content database, and recommend the acquired content data to the user.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the following steps when executing the computer-readable instructions:
    获取用户的用户数据,对所述用户数据进行预处理,得到待推荐数据;所述待推荐数据包括消费属性数据、社会属性数据、访问属性数据和流量服务属性数据;Obtaining user data of the user, preprocessing the user data, and obtaining data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
    将与用户对应的所述消费属性数据、所述社会属性数据和所述访问属性数据输入内容偏好模型,同时将所述流量服务属性数据输入场景推荐模型;所述内容偏好模型为基于两步聚类法和决策树的多阶模型;所述内容偏好模型包括一阶人群聚类模型和二阶指标细分模型;Input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scenario recommendation model; the content preference model is based on a two-step aggregation model. Multi-order model of class method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;
    通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取,得到与所述用户对应的一阶人群分类结果,同时通过所述场景推荐模型对所述流量服务属性数据进行场景适配,得到与所述用户对应的主题场景;Perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model, and obtain a first-order crowd classification result corresponding to the user. The service attribute data is adapted to the scene to obtain the theme scene corresponding to the user;
    通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析,确定出与所述用户对应的人群偏好标签;Perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;
    根据与所述用户对应的所述人群偏好标签和所述主题场景,确定与所述用户对应的内容推荐标签;determining a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;
    从内容数据库中获取与所述内容推荐标签匹配的内容数据,并将获取的所述内容数据推荐给所述用户。Acquire content data matching the content recommendation tag from a content database, and recommend the acquired content data to the user.
  10. 如权利要求9所述的计算机设备,其中,所述通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取之前,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 9, wherein before the crowd feature extraction is performed on the consumption attribute data and the social attribute data by using the first-order crowd clustering model, the processor executes the computer executable The following steps are also implemented when reading the command:
    获取样本数据集;Get a sample dataset;
    根据一阶属性,对所述样本数据集进行筛选,筛选出一阶属性数据集;Screening the sample data set according to the first-order attribute, and filtering out the first-order attribute data set;
    将所述一阶属性数据集输入两步聚类模型中,通过所述两步聚类模型对所述一阶属性数据进行人群特征探索,得到一阶人群聚类结果;Inputting the first-order attribute data set into a two-step clustering model, and performing crowd feature exploration on the first-order attribute data through the two-step clustering model to obtain a first-order crowd clustering result;
    通过决策树算法,对所述一阶人群聚类结果和所述一阶有效数据集进行分析及路径还原,提炼出与所述一阶人群聚类结果对应的至少一个分类变量;Through a decision tree algorithm, the first-order crowd clustering result and the first-order effective data set are analyzed and path restored, and at least one categorical variable corresponding to the first-order crowd clustering result is extracted;
    根据所有所述分类变量、所述一阶聚类结果和所述一阶有效数据集进行模型重构,构建出一阶人群聚类模型,以及确定出与所述一阶人群聚类模型对应的一阶人群种类,并给所述一阶有效数据集中的各一阶有效数据标记与其对应的人群类型,得到一阶数据集;所述一阶人群种类包括至少一个所述人群类型。Carry out model reconstruction according to all the classification variables, the first-order clustering results and the first-order valid data sets, construct a first-order crowd clustering model, and determine the corresponding first-order crowd clustering model. First-order crowd types, and label each first-order valid data set with its corresponding crowd type to obtain a first-order data set; the first-order crowd types include at least one of the crowd types.
  11. 如权利要求10所述的计算机设备,其中,所述通过所述两步聚类模型对所述一阶属性数据进行人群特征探索,得到一阶人群聚类结果,包括:The computer device according to claim 10, wherein, performing crowd feature exploration on the first-order attribute data through the two-step clustering model to obtain a first-order crowd clustering result, comprising:
    通过所述两步聚类模型对所述一阶属性数据集进行标准化处理,得到一阶待处理属性数据;所述两步聚类模型包括密度聚类模型和K-means聚类模型;The first-order attribute data set is standardized by the two-step clustering model to obtain first-order attribute data to be processed; the two-step clustering model includes a density clustering model and a K-means clustering model;
    运用DBSCAN算法,通过所述密度聚类模型对所述一阶待处理属性数据进行人群密度聚类,得到过渡聚类数据结果;Using the DBSCAN algorithm, crowd density clustering is performed on the first-order attribute data to be processed through the density clustering model to obtain a transitional clustering data result;
    运用K-means算法,通过所述K-means聚类模型对所述过渡聚类数据结果进行人群特征聚类,得到所述一阶人群聚类结果。The K-means algorithm is used to perform crowd feature clustering on the transition clustering data results through the K-means clustering model to obtain the first-order crowd clustering result.
  12. 如权利要求10所述的计算机设备,其中,所述通过决策树算法,对所述一阶人群聚类结果和所述一阶有效数据集进行分析及路径还原,提炼出与所述一阶人群聚类结果对应的至少一个分类变量,包括:The computer device according to claim 10, wherein the first-order crowd clustering result and the first-order effective data set are analyzed and path restored by using a decision tree algorithm, and a solution that is related to the first-order crowd is extracted. At least one categorical variable corresponding to the clustering results, including:
    将与相同样本用户对应的一阶人群类型和一阶有效数据进行关联,将关联后的所述一阶有效数据集确定为决策数据集;所述一阶人群聚类结果包括与所述一阶有效数据集中的所述样本用户对应的所述一阶人群类型;所述一阶有效数据集包括与所述样本用户一一对应的所述一阶有效数据;Associating first-order crowd types corresponding to the same sample users with first-order valid data, and determining the associated first-order valid data set as a decision-making data set; the first-order crowd clustering result includes a the first-order crowd types corresponding to the sample users in the valid data set; the first-order valid data set includes the first-order valid data corresponding to the sample users one-to-one;
    将所述决策数据集输入含有初始变量参数的决策反推模型中;Inputting the decision data set into a decision inversion model containing initial variable parameters;
    运用决策树算法,通过所述决策反推模型对所述决策数据集进行分析,更新所述初始 变量参数;Using the decision tree algorithm, the decision data set is analyzed by the decision inversion model, and the initial variable parameters are updated;
    根据更新后的所述初始变量参数进行路径还原,提炼出与所述一阶人群聚类结果对应的所述分类变量。The path restoration is performed according to the updated initial variable parameters, and the classification variable corresponding to the first-order crowd clustering result is extracted.
  13. 如权利要求10所述的计算机设备,其中,所述通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析之前,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 10, wherein before the index analysis is performed on the first-order crowd classification result and the access attribute data by the second-order index subdivision model, the processor executes the computer The following steps are also implemented when the instruction is readable:
    根据所述一阶数据集和所述样本数据集中的访问属性样本数据,合并生成二阶属性数据集;According to the first-order data set and the access attribute sample data in the sample data set, merge to generate a second-order attribute data set;
    通过偏好行为模型对所述二阶属性数据集进行指标特征提取,得到至少一个综合指标变量;Perform index feature extraction on the second-order attribute data set through a preference behavior model to obtain at least one comprehensive index variable;
    根据所述一阶人群类型结果和所有所述综合指标变量,对所述二阶属性数据集进行分段分析,并构建出二阶指标细分模型。According to the results of the first-order crowd type and all the comprehensive index variables, segmental analysis is performed on the second-order attribute data set, and a second-order index subdivision model is constructed.
  14. 如权利要求13所述的计算机设备,其中,所述根据所述一阶数据集和所述样本数据集中的访问属性样本数据,合并生成二阶属性数据集,包括:The computer device according to claim 13, wherein the combining to generate a second-order attribute data set according to the first-order data set and the access attribute sample data in the sample data set comprises:
    对所述访问属性样本数据进行随机抽取字段,抽取出待处理属性数据;Randomly extracting fields from the access attribute sample data, and extracting the attribute data to be processed;
    对所述待处理属性数据进行缺失值处理和极值处理,得到待增加属性数据;Perform missing value processing and extreme value processing on the attribute data to be processed to obtain attribute data to be added;
    将所述待增加属性数据对应增加至所述一阶数据集中,生成所述二阶属性数据集。The attribute data to be added is correspondingly added to the first-order data set to generate the second-order attribute data set.
  15. 一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer-readable instructions, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processors to perform the following steps:
    获取用户的用户数据,对所述用户数据进行预处理,得到待推荐数据;所述待推荐数据包括消费属性数据、社会属性数据、访问属性数据和流量服务属性数据;Obtaining user data of the user, preprocessing the user data, and obtaining data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;
    将与用户对应的所述消费属性数据、所述社会属性数据和所述访问属性数据输入内容偏好模型,同时将所述流量服务属性数据输入场景推荐模型;所述内容偏好模型为基于两步聚类法和决策树的多阶模型;所述内容偏好模型包括一阶人群聚类模型和二阶指标细分模型;Input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scenario recommendation model; the content preference model is based on a two-step aggregation model. Multi-order model of class method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;
    通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取,得到与所述用户对应的一阶人群分类结果,同时通过所述场景推荐模型对所述流量服务属性数据进行场景适配,得到与所述用户对应的主题场景;Perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model, and obtain a first-order crowd classification result corresponding to the user. The service attribute data is adapted to the scene to obtain the theme scene corresponding to the user;
    通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析,确定出与所述用户对应的人群偏好标签;Perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;
    根据与所述用户对应的所述人群偏好标签和所述主题场景,确定与所述用户对应的内容推荐标签;determining a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;
    从内容数据库中获取与所述内容推荐标签匹配的内容数据,并将获取的所述内容数据推荐给所述用户。Acquire content data matching the content recommendation tag from a content database, and recommend the acquired content data to the user.
  16. 如权利要求15所述的可读存储介质,其中,所述通过所述一阶人群聚类模型对所述消费属性数据和所述社会属性数据进行人群特征提取之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:16. The readable storage medium of claim 15, wherein before performing crowd feature extraction on the consumption attribute data and the social attribute data by using the first-order crowd clustering model, the computer-readable instructions are executed by When the one or more processors are executed, the one or more processors are caused to further perform the following steps:
    获取样本数据集;Get a sample dataset;
    根据一阶属性,对所述样本数据集进行筛选,筛选出一阶属性数据集;Screening the sample data set according to the first-order attribute, and filtering out the first-order attribute data set;
    将所述一阶属性数据集输入两步聚类模型中,通过所述两步聚类模型对所述一阶属性数据进行人群特征探索,得到一阶人群聚类结果;Inputting the first-order attribute data set into a two-step clustering model, and performing crowd feature exploration on the first-order attribute data through the two-step clustering model to obtain a first-order crowd clustering result;
    通过决策树算法,对所述一阶人群聚类结果和所述一阶有效数据集进行分析及路径还原,提炼出与所述一阶人群聚类结果对应的至少一个分类变量;Through a decision tree algorithm, the first-order crowd clustering result and the first-order effective data set are analyzed and path restored, and at least one categorical variable corresponding to the first-order crowd clustering result is extracted;
    根据所有所述分类变量、所述一阶聚类结果和所述一阶有效数据集进行模型重构,构建出一阶人群聚类模型,以及确定出与所述一阶人群聚类模型对应的一阶人群种类,并给 所述一阶有效数据集中的各一阶有效数据标记与其对应的人群类型,得到一阶数据集;所述一阶人群种类包括至少一个所述人群类型。Carry out model reconstruction according to all the classification variables, the first-order clustering results and the first-order valid data sets, construct a first-order crowd clustering model, and determine the corresponding first-order crowd clustering model. First-order crowd types, and label each first-order valid data set with its corresponding crowd type to obtain a first-order data set; the first-order crowd types include at least one of the crowd types.
  17. 如权利要求16所述的可读存储介质,其中,所述通过所述两步聚类模型对所述一阶属性数据进行人群特征探索,得到一阶人群聚类结果,包括:The readable storage medium according to claim 16, wherein, performing crowd feature exploration on the first-order attribute data by using the two-step clustering model to obtain a first-order crowd clustering result, comprising:
    通过所述两步聚类模型对所述一阶属性数据集进行标准化处理,得到一阶待处理属性数据;所述两步聚类模型包括密度聚类模型和K-means聚类模型;The first-order attribute data set is standardized by the two-step clustering model to obtain first-order attribute data to be processed; the two-step clustering model includes a density clustering model and a K-means clustering model;
    运用DBSCAN算法,通过所述密度聚类模型对所述一阶待处理属性数据进行人群密度聚类,得到过渡聚类数据结果;Using the DBSCAN algorithm, crowd density clustering is performed on the first-order attribute data to be processed through the density clustering model to obtain a transitional clustering data result;
    运用K-means算法,通过所述K-means聚类模型对所述过渡聚类数据结果进行人群特征聚类,得到所述一阶人群聚类结果。The K-means algorithm is used to perform crowd feature clustering on the transitional clustering data results through the K-means clustering model to obtain the first-order crowd clustering result.
  18. 如权利要求16所述的可读存储介质,其中,所述通过决策树算法,对所述一阶人群聚类结果和所述一阶有效数据集进行分析及路径还原,提炼出与所述一阶人群聚类结果对应的至少一个分类变量,包括:The readable storage medium according to claim 16, wherein the first-order crowd clustering result and the first-order effective data set are analyzed and path restored by using a decision tree algorithm, and the first-order crowd clustering result and the first-order effective data set are analyzed and the path is restored, and a At least one categorical variable corresponding to the clustering results of the order population, including:
    将与相同样本用户对应的一阶人群类型和一阶有效数据进行关联,将关联后的所述一阶有效数据集确定为决策数据集;所述一阶人群聚类结果包括与所述一阶有效数据集中的所述样本用户对应的所述一阶人群类型;所述一阶有效数据集包括与所述样本用户一一对应的所述一阶有效数据;Associating first-order crowd types corresponding to the same sample users with first-order valid data, and determining the associated first-order valid data set as a decision-making data set; the first-order crowd clustering result includes a the first-order crowd types corresponding to the sample users in the valid data set; the first-order valid data set includes the first-order valid data corresponding to the sample users one-to-one;
    将所述决策数据集输入含有初始变量参数的决策反推模型中;Inputting the decision data set into a decision inversion model containing initial variable parameters;
    运用决策树算法,通过所述决策反推模型对所述决策数据集进行分析,更新所述初始变量参数;Using a decision tree algorithm, the decision data set is analyzed through the decision inversion model, and the initial variable parameters are updated;
    根据更新后的所述初始变量参数进行路径还原,提炼出与所述一阶人群聚类结果对应的所述分类变量。The path restoration is performed according to the updated initial variable parameters, and the classification variable corresponding to the first-order crowd clustering result is extracted.
  19. 如权利要求16所述的可读存储介质,其中,所述通过所述二阶指标细分模型对所述一阶人群分类结果和所述访问属性数据进行指标分析之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The readable storage medium of claim 16, wherein before the index analysis is performed on the first-order population classification result and the access attribute data by the second-order index segmentation model, the computer-readable instructions When executed by one or more processors, the one or more processors are caused to further perform the following steps:
    根据所述一阶数据集和所述样本数据集中的访问属性样本数据,合并生成二阶属性数据集;According to the first-order data set and the access attribute sample data in the sample data set, merge to generate a second-order attribute data set;
    通过偏好行为模型对所述二阶属性数据集进行指标特征提取,得到至少一个综合指标变量;Perform index feature extraction on the second-order attribute data set through a preference behavior model to obtain at least one comprehensive index variable;
    根据所述一阶人群类型结果和所有所述综合指标变量,对所述二阶属性数据集进行分段分析,并构建出二阶指标细分模型。According to the results of the first-order crowd type and all the comprehensive index variables, segmental analysis is performed on the second-order attribute data set, and a second-order index subdivision model is constructed.
  20. 如权利要求19所述的可读存储介质,其中,所述根据所述一阶数据集和所述样本数据集中的访问属性样本数据,合并生成二阶属性数据集,包括:The readable storage medium according to claim 19, wherein the combining to generate a second-order attribute data set according to the first-order data set and the access attribute sample data in the sample data set comprises:
    对所述访问属性样本数据进行随机抽取字段,抽取出待处理属性数据;Randomly extracting fields from the access attribute sample data, and extracting to-be-processed attribute data;
    对所述待处理属性数据进行缺失值处理和极值处理,得到待增加属性数据;Perform missing value processing and extreme value processing on the attribute data to be processed to obtain attribute data to be added;
    将所述待增加属性数据对应增加至所述一阶数据集中,生成所述二阶属性数据集。The attribute data to be added is correspondingly added to the first-order data set to generate the second-order attribute data set.
PCT/CN2021/091067 2020-11-17 2021-04-29 Content data recommendation method and apparatus, and computer device, and storage medium WO2022105129A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011285730.2 2020-11-17
CN202011285730.2A CN112395500B (en) 2020-11-17 2020-11-17 Content data recommendation method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022105129A1 true WO2022105129A1 (en) 2022-05-27

Family

ID=74599815

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/091067 WO2022105129A1 (en) 2020-11-17 2021-04-29 Content data recommendation method and apparatus, and computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN112395500B (en)
WO (1) WO2022105129A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116450925A (en) * 2022-12-27 2023-07-18 深圳市网新新思软件有限公司 User relationship analysis method and system based on artificial intelligence
CN116595342A (en) * 2023-07-07 2023-08-15 北京数巅科技有限公司 Crowd circling method, device and equipment and storage medium
CN117056613A (en) * 2023-10-12 2023-11-14 中质国优测评技术(北京)有限公司 Evaluation optimization method and system based on user joint preference

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395500B (en) * 2020-11-17 2023-09-05 平安科技(深圳)有限公司 Content data recommendation method, device, computer equipment and storage medium
CN113094578B (en) * 2021-03-16 2023-03-31 平安普惠企业管理有限公司 Deep learning-based content recommendation method, device, equipment and storage medium
CN113095570B (en) * 2021-04-14 2023-06-06 上海市城市建设设计研究总院(集团)有限公司 Bicycle riding path recommending method based on demand difference
CN113837859B (en) * 2021-08-25 2024-05-14 天元大数据信用管理有限公司 Image construction method for small and micro enterprises
CN117763228A (en) * 2023-12-11 2024-03-26 广州小白信息技术有限公司 Creative expression dynamic adaptation method based on multi-culture framework

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107798557A (en) * 2017-09-30 2018-03-13 平安科技(深圳)有限公司 Electronic installation, the service location based on LBS data recommend method and storage medium
US20200250715A1 (en) * 2019-01-31 2020-08-06 Salesforce.Com, Inc. Automatic rule generation for recommendation engine using hybrid machine learning
CN111582932A (en) * 2020-03-25 2020-08-25 平安壹钱包电子商务有限公司 Inter-scene information pushing method and device, computer equipment and storage medium
CN111897861A (en) * 2020-06-30 2020-11-06 苏宁金融科技(南京)有限公司 Content recommendation method and device, computer equipment and storage medium
CN112395500A (en) * 2020-11-17 2021-02-23 平安科技(深圳)有限公司 Content data recommendation method and device, computer equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7165105B2 (en) * 2001-07-16 2007-01-16 Netgenesis Corporation System and method for logical view analysis and visualization of user behavior in a distributed computer network
US20200073953A1 (en) * 2018-08-30 2020-03-05 Salesforce.Com, Inc. Ranking Entity Based Search Results Using User Clusters
CN109376759A (en) * 2018-09-10 2019-02-22 平安科技(深圳)有限公司 User information classification method, device, computer equipment and storage medium
CN110046634B (en) * 2018-12-04 2021-04-27 创新先进技术有限公司 Interpretation method and device of clustering result

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107798557A (en) * 2017-09-30 2018-03-13 平安科技(深圳)有限公司 Electronic installation, the service location based on LBS data recommend method and storage medium
US20200250715A1 (en) * 2019-01-31 2020-08-06 Salesforce.Com, Inc. Automatic rule generation for recommendation engine using hybrid machine learning
CN111582932A (en) * 2020-03-25 2020-08-25 平安壹钱包电子商务有限公司 Inter-scene information pushing method and device, computer equipment and storage medium
CN111897861A (en) * 2020-06-30 2020-11-06 苏宁金融科技(南京)有限公司 Content recommendation method and device, computer equipment and storage medium
CN112395500A (en) * 2020-11-17 2021-02-23 平安科技(深圳)有限公司 Content data recommendation method and device, computer equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116450925A (en) * 2022-12-27 2023-07-18 深圳市网新新思软件有限公司 User relationship analysis method and system based on artificial intelligence
CN116450925B (en) * 2022-12-27 2023-12-15 深圳市网新新思软件有限公司 User relationship analysis method and system based on artificial intelligence
CN116595342A (en) * 2023-07-07 2023-08-15 北京数巅科技有限公司 Crowd circling method, device and equipment and storage medium
CN116595342B (en) * 2023-07-07 2023-09-29 北京数巅科技有限公司 Crowd circling method, device and equipment and storage medium
CN117056613A (en) * 2023-10-12 2023-11-14 中质国优测评技术(北京)有限公司 Evaluation optimization method and system based on user joint preference
CN117056613B (en) * 2023-10-12 2024-06-04 中质国优测评技术(北京)有限公司 Evaluation optimization method and system based on user joint preference

Also Published As

Publication number Publication date
CN112395500A (en) 2021-02-23
CN112395500B (en) 2023-09-05

Similar Documents

Publication Publication Date Title
WO2022105129A1 (en) Content data recommendation method and apparatus, and computer device, and storage medium
CN108921221B (en) User feature generation method, device, equipment and storage medium
WO2020207196A1 (en) Method and apparatus for generating user tag, storage medium and computer device
CN110321422B (en) Method for training model on line, pushing method, device and equipment
WO2021169111A1 (en) Resume screening method and apparatus, computer device and storage medium
CN109960761B (en) Information recommendation method, device, equipment and computer readable storage medium
WO2018223719A1 (en) Method for predicting insurance purchasing behavior of a user, device, computing apparatus, and medium
CN109543925B (en) Risk prediction method and device based on machine learning, computer equipment and storage medium
WO2021027595A1 (en) User portrait generation method and apparatus, computer device, and computer-readable storage medium
WO2017133615A1 (en) Service parameter acquisition method and apparatus
WO2021012482A1 (en) Method and device for generating group interest tag, computer device, and storage medium
CN111931809A (en) Data processing method and device, storage medium and electronic equipment
CN116882520A (en) Prediction method and system for predetermined prediction problem
CN114491084B (en) Self-encoder-based relation network information mining method, device and equipment
WO2020253369A1 (en) Method and device for generating interest tag, computer equipment and storage medium
CN113821657A (en) Artificial intelligence-based image processing model training method and image processing method
CN112202849A (en) Content distribution method, content distribution device, electronic equipment and computer-readable storage medium
CN111291795A (en) Crowd characteristic analysis method and device, storage medium and computer equipment
CN114491093B (en) Multimedia resource recommendation and object representation network generation method and device
CN115222112A (en) Behavior prediction method, behavior prediction model generation method and electronic equipment
US20220414262A1 (en) Rule-based anonymization of datasets
CN113610215B (en) Task processing network generation method, task processing device and electronic equipment
CN114741540A (en) Multimedia sequence recommendation method, operation prediction model training method, device, equipment and storage medium
CN115186173A (en) Multimedia resource pushing and intelligent agent network generating method and device
CN113497978B (en) Video scene classification method, device, server and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21893285

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21893285

Country of ref document: EP

Kind code of ref document: A1