WO2022105129A1

WO2022105129A1 - Content data recommendation method and apparatus, and computer device, and storage medium

Info

Publication number: WO2022105129A1
Application number: PCT/CN2021/091067
Authority: WO
Inventors: 陈婷婷
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-11-17
Filing date: 2021-04-29
Publication date: 2022-05-27
Also published as: CN112395500A; CN112395500B

Abstract

A content data recommendation method and apparatus, and a computer device, and a storage medium. The method comprises: performing preprocessing on obtained user data to obtain data to be recommended; inputting consumption attribute data, social attribute data, and access attribute data, which correspond to a user, to a content preference model, and inputting traffic service attribute data to a scenario recommendation model; performing population feature extraction by means of a first-order population clustering model to obtain a first-order population classification result, and performing scenario adaptation by means of the scenario recommendation model to obtain a theme scenario; performing index analysis on the first-order population classification result and the access attribute data by means of a second-order index subdivision model to determine a population preference tag; determining a content recommendation tag according to the population preference tag and the theme scenario; and obtaining content data and recommending same to the user. According to the method, population feature extraction, index analysis, and scenario adaptation are performed on the user data, and accurate recommendation is provided for the user.

Description

Content data recommendation method, device, computer equipment and storage medium

This application claims the priority of the Chinese patent application filed on November 17, 2020 with the application number 202011285730.2 and the invention title is "content data recommendation method, device, computer equipment and storage medium", the entire contents of which are by reference Incorporated in this application.

technical field

The present application relates to the field of data processing of big data, and in particular, to a content data recommendation method, device, computer equipment and storage medium.

Background technique

The inventor found that with the rapid development of the mobile Internet, it has become more and more popular for people to obtain the content information they want from the mobile Internet through the APP in the mobile terminal. As a result, users cannot quickly obtain the information they really need from the APP when faced with a large amount of information, thus reducing the usage rate of the APP. A better way to solve this problem is to introduce the recommendation method, which can recommend the content that the user is really interested in in a large amount of information, so that the user can obtain the content information that he really prefers from the recommended content.

SUMMARY OF THE INVENTION

The present application provides a content data recommendation method, device, computer equipment and storage medium, which realize crowd feature extraction, index analysis and scene adaptation for user data, determine the user's content recommendation label, automatically match content data, and send data to the user. When a user makes a recommendation, the content data can be accurately recommended to the user, the user experience satisfaction is improved, and the effectiveness of the content data recommendation is improved.

A content data recommendation method, comprising:

Obtaining user data of the user, preprocessing the user data, and obtaining data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;

Input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scene recommendation model; the content preference model is based on two-step aggregation. Multi-order model of class method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;

Perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model, and obtain a first-order crowd classification result corresponding to the user. The service attribute data is adapted to the scene to obtain the theme scene corresponding to the user;

Perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;

determining a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;

Acquire content data matching the content recommendation tag from a content database, and recommend the acquired content data to the user.

A content data recommendation device, comprising:

an acquisition module, configured to acquire user data of a user, preprocess the user data, and obtain data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;

an input module, configured to input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scene recommendation model; the content preference model is a multi-order model based on two-step clustering method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index subdivision model;

An identification module, configured to perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model to obtain a first-order crowd classification result corresponding to the user, and recommend through the scene The model performs scene adaptation on the traffic service attribute data to obtain a theme scene corresponding to the user;

an analysis module, configured to perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;

a determining module, configured to determine a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;

A recommendation module, configured to acquire content data matching the content recommendation tag from the content database, and recommend the acquired content data to the user.

A computer device, comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer-readable instructions:

One or more readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:

In the content data recommendation method, device, computer equipment and storage medium provided by this application, by acquiring user data of a user, preprocessing the user data to obtain consumption attribute data, social attribute data, access attribute data and traffic service attributes data to be recommended; input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and simultaneously input the traffic service attribute data into the scene recommendation model; The first-order crowd clustering model performs crowd feature extraction on the consumption attribute data and the social attribute data, and obtains a first-order crowd classification result corresponding to the user. Scenario adaptation to obtain the theme scene corresponding to the user; index analysis is performed on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and the crowd corresponding to the user is determined a preference tag; determine a content recommendation tag corresponding to the user according to the crowd preference tag and the theme scene; acquire content data matching the content recommendation tag from a content database, and use the acquired content data Recommend to the user, in this way, through the content preference model and the scene recommendation model, crowd feature extraction, index analysis and scene adaptation are performed on the user data, the user's content recommendation label is determined, the content data is automatically matched, and the user is sent to the user. Recommendation can accurately recommend content data to users, improve the accuracy of content data recommendation, recommend preferred content data to users, avoid disliked content data from being displayed to users, improve user experience satisfaction, and Improve the effectiveness of content data recommendation.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below, and other features and advantages of the application will become apparent from the description, drawings, and claims.

Description of drawings

In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. , for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.

1 is a schematic diagram of an application environment of a method for recommending content data in an embodiment of the present application;

2 is a flowchart of a content data recommendation method in an embodiment of the present application;

FIG. 3 is a flowchart of step S30 of a content data recommendation method in an embodiment of the present application;

FIG. 4 is a flowchart of step S303 of the content data recommendation method in an embodiment of the present application;

FIG. 5 is a flowchart of step S304 of the content data recommendation method in an embodiment of the present application;

6 is a flowchart of step S40 of a content data recommendation method in an embodiment of the present application;

7 is a flowchart of step S401 of a content data recommendation method in an embodiment of the present application;

FIG. 8 is a flowchart of step S403 of the content data recommendation method in an embodiment of the present application;

9 is a schematic block diagram of an apparatus for recommending content data in an embodiment of the present application;

FIG. 10 is a schematic diagram of a computer device in an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

The content data recommendation method provided by the present application can be applied in the application environment as shown in FIG. 1 , in which the client (computer device) communicates with the server through the network. Among them, the client (computer equipment) includes but is not limited to various personal computers, notebook computers, smart phones, tablet computers, cameras and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.

In one embodiment, as shown in FIG. 2 , a method for recommending content data is provided, and its technical solution mainly includes the following steps S10-S60:

S10: Acquire user data of the user, preprocess the user data, and obtain data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data, and traffic service attribute data.

Understandably, when the user needs to obtain the content information he wants in the application software of the user's mobile terminal, a recommendation instruction is triggered on the interface of the application software to obtain the user data of the user, so The user data is data of related attributes corresponding to the user in the server corresponding to the application software, and the user data includes attributes such as consumption attributes, social attributes, access attributes, and traffic service attributes corresponding to the user. The preprocessing is to perform regular expression processing, missing value supplementation or de-extreme value processing on the user data, and the regular expression processing is to uniformly convert the data of an attribute into the corresponding attribute The data required by the data format, the missing value is supplemented by uniformly converting the data with empty attributes into the filling data corresponding to the attribute, and the de-extreme value processing is to make the data of an attribute exceed or fall below the limit set by the attribute. The value data are all replaced with the adjacent limit values, and the user data after the preprocessing is determined as the data to be recommended, and the data to be recommended includes the consumption attribute data, the social attribute data, The access attribute data and the traffic service attribute data, the consumption attribute data is data of attributes related to user consumption, and the social attribute data is related to the user's basic social identity, held terminals, and enjoyment of services and other related attributes. The access attribute data is data related to the user's access data, behavior, etc., and the traffic service attribute data is the attribute data related to the user's operation service, such as operation service provider, data package and so on.

S20: Input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into a content preference model, and at the same time input the traffic service attribute data into a scene recommendation model; the content preference model is based on two Step clustering method and multi-order model of decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model.

Understandably, the content preference model is a multi-order model based on the two-step clustering method and the decision tree algorithm and has been constructed; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model, The two-step clustering method is a method of performing preliminary clustering through hierarchical clustering or density clustering to obtain the results of preliminary clustering, and then using segmentation clustering method to perform secondary clustering from the results of preliminary clustering. The decision tree algorithm is an algorithm that uses a tree structure to establish a decision model according to the attributes of the data, and the content preference model can automatically generate the user's crowd preference label according to the user's consumption attribute data, social attribute data and access attribute data. , the crowd preference tag marks the user's preference, for example, the crowd preference tag includes inspirational, contemporary, youth literature, life, social science, fantasy, etc. in the field of reading, travel in the field of animation, love, etc., fantasy in the field of video , martial arts, reality shows, etc., the scene recommendation model is a trained neural network model, and the network structure of the scene recommendation model can be set according to requirements, such as the network structure of the BP neural network model and the network of the LSTM neural network model. structure, etc., the scene recommendation model can realize the automatic identification of the theme scene suitable for the user according to the user's traffic service attribute data, and the theme scene is a scene suitable for the theme related to the user, such as a video that requires a large amount of traffic. Scenarios, reading scenarios that require small traffic, etc.

S30, perform crowd feature extraction on the consumption attribute data and the social attribute data by using the first-order crowd clustering model, to obtain a first-order crowd classification result corresponding to the user, and at the same time use the scene recommendation model to The traffic service attribute data is used for scene adaptation to obtain the theme scene corresponding to the user.

Understandably, the crowd feature is to extract the relevant features of the classification of the crowd, and the crowd feature extraction is the process of extracting the feature of the attribute difference between the crowds, and the crowd feature extraction may include crowd feature exploration, decision tree. Analysis and path restoration in the algorithm, the crowd feature exploration includes crowd density clustering and crowd feature clustering, and the first-order crowd classification result is a crowd type, that is, a crowd type in the first-order crowd types in the full text , the scene adaptation is a process of identifying the traffic service attribute data after convolution, and the theme scene adapted to the user is automatically identified through the traffic service attribute data.

Wherein, the first-order crowd clustering model may be a clustering model based on density clustering and decision tree algorithm, or may be a clustering model based on hierarchical clustering and BP neural network, and the first-order crowd clustering model can be Realize the automatic extraction of crowd characteristics according to consumption attribute data and social attribute data, and classify according to the extracted crowd characteristics, and output the user's crowd type.

In one embodiment, as shown in FIG. 3 , before the step S30, that is, before performing crowd feature extraction on the consumption attribute data and the social attribute data by using the first-order crowd clustering model, the method includes:

S301, obtain a sample data set.

Understandably, the sample data set includes consumption attribute sample data, social attribute sample data and access attribute sample data.

S302 , filter the sample data set according to the first-order attribute, and filter out the first-order attribute data set.

Understandably, the first-order attributes include consumption attributes and social attributes, where the consumption attributes are attributes related to user consumption, and the social attributes are attributes related to the user's basic social identity, held terminal attributes, enjoyed business attributes, and the like.

S303: Input the first-order attribute data set into a two-step clustering model, and perform crowd feature exploration on the first-order attribute data through the two-step clustering model to obtain a first-order crowd clustering result.

Understandably, the two-step clustering model is a model based on a two-step clustering method, and the two-step clustering method is to perform preliminary clustering by means of hierarchical clustering or density clustering to obtain the results of preliminary clustering. , and then use the segmentation clustering method to perform secondary clustering from the results of the preliminary clustering, and the crowd feature exploration is a method of standardizing the first-order attribute data set, crowd density clustering and crowd feature clustering. In the process, the characteristics of attribute similarity and dissimilarity between the populations are explored, so as to obtain the first-order population clustering result, and the first-order population clustering result is the initially explored population types, such as 9 types of populations.

In one embodiment, as shown in FIG. 4 , in step S303, the first-order attribute data is subjected to crowd feature exploration through the two-step clustering model, and a first-order crowd clustering result is obtained, including:

S3031, standardize the first-order attribute data set by using the two-step clustering model to obtain first-order attribute data to be processed; the two-step clustering model includes a density clustering model and a K-means clustering model.

Understandably, the normalization processing is to perform the regular expression processing, the missing value supplementation, the de-extreme value processing, the one-hot encoding conversion processing, and the regularization processing on the first-order attribute data set. Process, the one-hot encoding conversion is also called one-bit effective encoding, mainly using an N-bit state register to encode N states, each state is assigned an integer value, and the regularization process is to convert each sample The sum of the absolute values of each vector is used as the norm, and then each vector is used to remove the norm, and the processing process of the normalized vector of this sample is obtained, or the vector of each sample is squared and then squared as the norm The processing process of dividing and dividing the first-order attribute data set, so as to obtain the first-order attribute data to be processed by performing the normalization processing on the first-order attribute data set.

The two-step clustering model includes a density clustering model and a K-means clustering model, and the first-order attribute data to be processed is data provided to the two-step clustering model for clustering.

S3032 , using the DBSCAN algorithm, perform crowd density clustering on the first-order attribute data to be processed through the density clustering model, to obtain a transitional clustering data result.

Understandably, the DBSCAN (Density-based Clustering Method, density-based clustering algorithm) algorithm is to determine the various types of the area through the density of each area, and isolate the outliers, and determine it as a class. The algorithm of the crowd density clustering is to use the DBSCAN algorithm to determine the clustering process of all crowd types, and the transition clustering data result is the crowd type obtained after the crowd density clustering, such as 8 types Crowd type, where all outliers are assigned to one type of crowd (anomaly type).

Wherein, the density clustering model is a model that uses the DBSCAN algorithm to perform clustering to distinguish crowd types.

S3033, using the K-means algorithm, perform crowd feature clustering on the transition clustering data result by using the K-means clustering model to obtain the first-order crowd clustering result.

Understandably, the K-means algorithm is a segmentation and clustering algorithm with the mean value as the "center" of the class, and the segmentation and clustering algorithm is to randomly select objects from the data set as the prototype of the cluster, and then use Other objects are respectively assigned to the most similar (that is, the closest class) represented by the prototype, and the K-means clustering model is to use the K-means algorithm to cluster the transitional clustering data results to determine the crowd type. The model of the crowd feature clustering is to use the K-means algorithm to determine the clustering process of all crowd types on the basis of the transition clustering data results, wherein, the first-order crowd clustering results include: The crowd type corresponding to the abnormal type in the transition clustering data result.

The present application realizes that the first-order attribute data set is standardized through the two-step clustering model to obtain the first-order attribute data to be processed; the DBSCAN algorithm is used to perform the standardization processing on the first-order attribute data set through the density clustering model. Perform crowd density clustering on attribute data to obtain transitional clustering data results; use K-means algorithm to perform crowd feature clustering on the transitional clustering data results through the K-means clustering model to obtain the first-order crowd The clustering results, in this way, realize crowd density clustering and crowd feature clustering through preprocessing, DBSCAN algorithm and K-means algorithm, so as to obtain first-order crowd clustering results, which can improve the accuracy of crowd classification.

S304 , by using a decision tree algorithm, analyze and restore the path of the first-order crowd clustering result and the first-order valid data set, and extract at least one categorical variable corresponding to the first-order crowd clustering result.

Understandably, the decision tree algorithm is an algorithm that uses a tree structure to establish a decision model according to the attributes of the data, and the analysis and path are restored to the inverse process in the decision tree algorithm. By analyzing the first-order valid data The corresponding relationship between the set and the clustering results of the first-order crowd is analyzed, the decision nodes of each attribute of the sample users are analyzed, and the path of the decision nodes passed through is reversed, so as to restore the decision from the first-order effective data set to the decision-making node. The path of the crowd type in the first-order crowd clustering result is described, so that the variable corresponding to the decision node of the attribute whose number of times the path node passes reaches the threshold is refined, and determined as the classification variable.

In one embodiment, as shown in FIG. 5 , in the step S304, that is, the first-order crowd clustering result and the first-order effective data set are analyzed and path restored by the decision tree algorithm, and the at least one categorical variable corresponding to the first-order crowd clustering result, including:

S3041, associate the first-order crowd types corresponding to the same sample users with the first-order valid data, and determine the associated first-order valid data set as a decision-making data set; the first-order crowd clustering result includes The first-order crowd types corresponding to the sample users in the first-order valid data set; the first-order valid data set includes the first-order valid data corresponding to the sample users one-to-one.

Understandably, the two-step clustering model can be used to divide the population types of the sample users in the first-order valid data set, so that the first-order valid data corresponding to the sample users one-to-one can be determined. The type of crowd to which it belongs, that is, the first-order crowd type, determines the associated first-order valid data set as a decision-making data set, indicating that all the first-order valid data are associated.

S3042, inputting the decision data set into a decision inversion model containing initial variable parameters;

Understandably, the decision inversion model is a model in which variable parameters for identifying crowd types are derived by inversely deriving the decision data according to the tree structure of the decision tree.

S3043, using a decision tree algorithm, analyze the decision data set through the decision inversion model, and update the initial variable parameters.

Understandably, the decision tree algorithm is an algorithm that uses a tree structure to establish a decision model according to the attributes of the data, that is, the decision data set is divided according to the data features of the decision data set, until all the features are divided. or all the data of the divided data subsets have the same population type, and then move closer according to the first-order population type associated with the first-order valid data in the decision-making data set, and continuously deduce and analyze the data that can be divided into The variable parameters of the first-order population type are described, and the initial variable parameters are updated until they are completely close, and the initial variable parameters at this time are determined as the updated initial variable parameters.

S3044: Perform path restoration according to the updated initial variable parameter, and extract the classification variable corresponding to the first-order crowd clustering result.

Understandably, the path is restored to the division path of each of the first-order valid data, and it is confirmed whether the corresponding first-order crowd type can be reached. After restoring all the paths, according to the number of overlapping nodes of the paths, determine the The way of determining the categorical variable can be set according to requirements, such as identifying the number of divided nodes passed through, determining the variable parameter in the node greater than or equal to the preset number as the categorical variable, or determining the variable parameter greater than or equal to all The variable parameter in the node is determined as the categorical variable, etc., by the mean of the number of nodes passed.

The present application realizes that by associating first-order crowd types and first-order valid data corresponding to the same sample users, the associated first-order valid data set is determined as a decision-making data set; In the decision inversion model of variable parameters; the decision tree algorithm is used to analyze the decision data set through the decision inversion model, and the initial variable parameters are updated; the path restoration is performed according to the updated initial variable parameters, The classification variables corresponding to the first-order crowd clustering results are extracted. In this way, it is possible to analyze and restore the subdivision rules through the decision inversion model, so as to extract the classification variables and use the decision tree algorithm. , extract categorical variables more scientifically, and improve the quality and accuracy of crowd segmentation.

S305, perform model reconstruction according to all the classification variables, the first-order clustering results, and the first-order valid data set, construct a first-order crowd clustering model, and determine the difference between the first-order crowd clustering model and the first-order crowd clustering model. corresponding first-order crowd types, and label each first-order valid data in the first-order valid data set with its corresponding crowd type to obtain a first-order data set; the first-order crowd types include at least one of the crowd types.

Understandably, through the decision tree algorithm, all the classification variables, the first-order clustering results and the first-order valid data sets are remodeled to construct a first-order crowd clustering model, and the first-order crowd clustering model is constructed. The first-order population type includes at least one population type, such as 11 population types, and each first-order valid data set in the first-order valid data set is marked with its corresponding population type.

The present application realizes that by obtaining a sample data set; screening the sample data set according to the first-order attributes, and filtering out a first-order attribute data set; inputting the first-order attribute data set into a two-step clustering model, The two-step clustering model performs crowd characteristics exploration on the first-order attribute data, and obtains a first-order crowd clustering result; through decision tree algorithm, analyzes the first-order crowd clustering result and the first-order effective data set. and path restoration, extract at least one classification variable corresponding to the first-order crowd clustering result; carry out model reconstruction according to all the classification variables, the first-order clustering results and the first-order valid data set, and construct A first-order crowd clustering model is created, and the first-order crowd type corresponding to the first-order crowd clustering model is determined. In this way, a two-step clustering model and a decision tree algorithm are used to explore crowd characteristics and restore paths. Thereby, the classification variables are analyzed, and the first-order crowd clustering model can be constructed, which can accurately construct the first-order crowd clustering model and improve the accuracy of crowd classification.

S40: Perform index analysis on the first-order crowd classification result and the access attribute data by using the second-order index subdivision model, and determine a crowd preference label corresponding to the user.

Understandably, the first-order population classification result is a population type, that is, a population type in the first-order population categories in the full text, and the second-order index subdivision model is a clustering model of the completed user subdivision. , the second-order index subdivision model can realize index analysis according to the crowd type and the access attribute data in the obtained first-order crowd classification result, and can analyze and match the user's crowd preference label, the crowd preference The label identifies the user's preferences.

In one embodiment, as shown in FIG. 6 , before the step S40, that is, before the index analysis is performed on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, the following steps are included: :

S401. Combine the first-order data set and the access attribute sample data in the sample data set to generate a second-order attribute data set.

Understandably, the access attribute sample data in the sample data set is correspondingly added to the first-order data set, that is, the access attribute sample data of a user is added to the first-order data corresponding to the user. For the first-order valid data in the collection, the access attribute sample data may be inserted after the first-order valid data, thereby combining to obtain the second-order attribute data set.

In one embodiment, as shown in FIG. 7 , in the step S401, that is, combining and generating a second-order attribute data set according to the first-order data set and the access attribute sample data in the sample data set, including:

S4011 , randomly extracting fields from the access attribute sample data, and extracting to-be-processed attribute data.

Understandably, the randomly selected fields are randomly selected fields in all the access attribute sample data, so that the scattered data distribution can be analyzed, the user's access behavior can be analyzed more objectively, and the extracted data can be analyzed. The output field data is determined as the to-be-processed attribute data.

S4012: Perform missing value processing and extreme value processing on the attribute data to be processed to obtain attribute data to be added.

Understandably, the missing value processing and the extreme value processing are performed on all the attribute data to be processed, and the missing value processing includes deleting data containing missing values and interpolating missing values with possible values, that is, for some fields. The attribute data is deleted by the data containing missing values, and the attribute data of some fields is supplemented by means of possible value interpolation of missing values. The process of removing or replacing all data into unified data.

S4013. Correspondingly add the attribute data to be added to the first-order data set to generate the second-order attribute data set.

Understandably, the attribute data to be added is inserted after the first-order valid data, so as to combine to obtain the second-order attribute data set.

S402. Perform index feature extraction on the second-order attribute data set through a preference behavior model to obtain at least one comprehensive index variable.

Understandably, the index feature extraction is a process of calculating the contribution degree of each index according to the second-order attribute data set, and extracting the index of the contribution degree that meets the threshold requirement, and the contribution degree is the index in the user's profile. The degree of contribution in the access behavior data, that is, the degree of proportion.

S403: Perform segmental analysis on the second-order attribute data set according to the first-order crowd type results and all the comprehensive index variables, and construct a second-order index subdivision model.

Understandably, according to the results of the first-order crowd types and all the comprehensive index variables, each of the first-order crowd types is subdivided into regions corresponding to the comprehensive index variables one-to-one according to the comprehensive index variables. segment, and then analyze each segment through the segment into which the second-order attribute data set falls, and the segmental analysis is the process of performing proportion and weight analysis on each segmented segment, that is, performing adjacent segment analysis on each segment. Sections are merged or disassembled, and finally the proportion of each processed section can be greater than or equal to the preset proportion, and the weight is greater than the preset weight, so that the second-order index details can be constructed according to each section that meets the requirements. split model.

The present application realizes that a second-order attribute data set is generated by combining the access attribute sample data in the first-order data set and the sample data set; and the index feature extraction is performed on the second-order attribute data set through a preference behavior model to obtain At least one comprehensive index variable; according to the first-order crowd type results and all the comprehensive index variables, segmental analysis is performed on the second-order attribute data set, and a second-order index subdivision model is constructed. The preference behavior model is used to extract index features, and after segmental analysis, a second-order index subdivision model can be constructed, so that the second-order index subdivision model can be constructed accurately, scientifically and objectively, and the accuracy and reliability of crowd segmentation can be improved. sex.

In one embodiment, as shown in FIG. 8 , in step S403, that is, according to the first-order crowd type results and all the comprehensive index variables, the second-order attribute data set is segmented and analyzed, And build a second-order indicator segmentation model, including:

S4031: Perform feature analysis and dimension reduction processing on all the comprehensive index variables to obtain principal component index variables.

Understandably, because the comprehensive indicator variable includes many indicators, and the number of indicators is too large, it is impossible to directly conduct research on user feature extraction and user segmentation, and it is necessary to integrate indicators and further analyze the correlation of attribute data. Dimension reduction processing is performed on the indicators, and the feature analysis is to calculate the similarity of the indicators in each dimension, and analyze the similarity value of each indicator in each dimension, that is, the maximum similarity value between each indicator and each dimension is analyzed. , and combine the contribution of each indicator to carry out the process of merging and classifying the two dimensions, so that the tolerance of the total contribution and the average contribution of the combined dimension of the two dimensions is the smallest, and the combined dimension of the two dimensions is determined as an advanced dimension. , and the total contribution degree is the sum of the contribution degrees of each index under the advanced dimension.

The dimensionality reduction process is to set a weight parameter for each index in the advanced dimension, and the weight parameter is the proportion of the index in the advanced dimension corresponding to the index, that is, the contribution of the index is in the advanced dimension. The proportion of the total contribution, calculate the weight mean of the weight parameters under all the advanced dimensions, and compare the index corresponding to the weight parameter greater than the weight mean and the advanced dimension corresponding to the index The process of merging and determining as the principal component index variable, in this way, multiple advanced dimensions can be reduced into several representative principal component index variables, and the principal component index variable indicates the user's content preference. main factor.

S4032, associate the crowd type corresponding to the same user with the second-order attribute data, and determine the associated second-order attribute data set as the data set to be subdivided; the first-order crowd type result includes the data corresponding to the user and the second-order attribute data set includes the second-order attribute data corresponding to the users one-to-one.

Understandably, associating the crowd type corresponding to the same user with the second-order attribute data is equivalent to assigning a crowd type label to the second-order attribute data, and the first-order crowd type result includes: the crowd type corresponding to the user.

S4033: Perform segmental analysis on the data set to be subdivided according to the principal component index variable, and construct the second-order index subdivision model.

Understandably, the segmental analysis is an analysis process of dividing each of the second-order attribute data in the data set to be subdivided into sections of each of the principal component index variables, and refers to the process of dividing the second-order attribute data according to the second-order attribute data. The learning process is performed with the clustering degree between the principal component index variables, and the learning method is unsupervised clustering learning, so as to construct the second-order index subdivision model.

The present application realizes that the principal component index variable is obtained by performing feature analysis and dimension reduction processing on all the comprehensive index variables; the group type corresponding to the same user is associated with the second-order attribute data, and the associated second-order The attribute data set is determined as the data set to be subdivided; according to the principal component index variable, segmental analysis is performed on the data set to be subdivided, and the second-order index subdivision model is constructed. Analysis and dimensionality reduction processing can more directly subdivide the population, and improve the reliability and accuracy of the population segmentation.

S50, according to the group preference tag corresponding to the user and the theme scene, determine a content recommendation tag corresponding to the user.

Understandably, according to the determined crowd preference label and the theme scene corresponding to the user, the content recommendation label that matches both the crowd preference label and the theme scene is mapped, and the content The recommendation tag is a tag to which the content data recommended to the user belongs, and the content recommendation tag can be set according to requirements, for example, the content tag can be video fantasy, video martial arts, and so on.

S60: Acquire content data matching the content recommendation tag from a content database, and recommend the acquired content data to the user.

Understandably, the content database is all content data stored in the server corresponding to the application software within a period of time or on the current day, the content data is content information on the mobile Internet, the content The data will be marked with a content tag, find the content data corresponding to the content tag matching the content recommendation tag from the content database, obtain the found content data, and recommend it to the relevant users. on the interface of the application software of the mobile terminal corresponding to the user for the user to view.

The present application realizes that by acquiring the user data of the user, preprocessing the user data to obtain the data to be recommended including consumption attribute data, social attribute data, access attribute data and traffic service attribute data; The consumption attribute data, the social attribute data and the access attribute data are input into the content preference model, while the traffic service attribute data is input into the scene recommendation model; The social attribute data is used to perform crowd feature extraction to obtain the first-order crowd classification result corresponding to the user. At the same time, the scene recommendation model is used to adapt the traffic service attribute data to the scene to obtain the theme scene corresponding to the user. ; perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine the crowd preference label corresponding to the user; according to the crowd preference label and the theme In the scenario, the content recommendation tag corresponding to the user is determined; the content data matching the content recommendation tag is obtained from the content database, and the acquired content data is recommended to the user. Model and scene recommendation model, perform crowd feature extraction, indicator analysis and scene adaptation on user data, determine user content recommendation tags, automatically match content data, and recommend to users, which can accurately recommend content data to users, improve It improves the accuracy of content data recommendation, recommends preferred content data to users, avoids disliked content data from being displayed to users, improves user experience satisfaction, and improves the effectiveness of content data recommendation.

In one embodiment, a content data recommendation apparatus is provided, and the content data recommendation apparatus corresponds one-to-one with the content data recommendation method in the above-mentioned embodiment. As shown in FIG. 9 , the content data recommendation apparatus includes an acquisition module 11 , an input module 12 , an identification module 13 , an analysis module 14 , a determination module 15 and a recommendation module 16 . The detailed description of each functional module is as follows:

The obtaining module 11 is used for obtaining user data of the user, and preprocessing the user data to obtain the data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;

The input module 12 is used to input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scene recommendation model; the content preference The model is a multi-order model based on a two-step clustering method and a decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;

The identification module 13 is configured to perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model to obtain a first-order crowd classification result corresponding to the user, and at the same time pass through the scene The recommendation model performs scene adaptation on the traffic service attribute data to obtain the theme scene corresponding to the user;

An analysis module 14, configured to perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;

A determination module 15, configured to determine a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;

The recommendation module 16 is configured to acquire content data matching the content recommendation tag from the content database, and recommend the acquired content data to the user.

For the specific limitation of the content data recommendation apparatus, please refer to the limitation of the content data recommendation method above, which will not be repeated here. Each module in the above-mentioned content data recommendation apparatus may be implemented in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In one embodiment, a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10 . The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a readable storage medium, an internal memory. The readable storage medium stores an operating system, computer readable instructions and a database. The internal memory provides an environment for the execution of the operating system and computer-readable instructions in the readable storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions, when executed by a processor, implement a content data recommendation method. The readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.

In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor implements the content in the above embodiments when the processor executes the computer-readable instructions Data recommendation method.

In one embodiment, one or more readable storage media storing computer-readable instructions are provided, and the readable storage media provided in this embodiment include non-volatile readable storage media and volatile readable storage media medium; computer-readable instructions are stored on the readable storage medium, and when the computer-readable instructions are executed by one or more processors, cause the one or more processors to implement the method for recommending content data in the foregoing embodiments.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing the relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium or a volatile readable storage medium, the computer-readable instructions, when executed, may include the processes of the foregoing method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated to different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it can still be used for the above-mentioned implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be included in the within the scope of protection of this application.

Claims

A content data recommendation method, comprising:

Obtaining user data of the user, preprocessing the user data, and obtaining data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;

Input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scenario recommendation model; the content preference model is based on a two-step aggregation model. Multi-order model of class method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;

Perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model, and obtain a first-order crowd classification result corresponding to the user. The service attribute data is adapted to the scene to obtain the theme scene corresponding to the user;

Perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;

determining a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;

Acquire content data matching the content recommendation tag from a content database, and recommend the acquired content data to the user.
The content data recommendation method according to claim 1, wherein before performing crowd feature extraction on the consumption attribute data and the social attribute data by using the first-order crowd clustering model, the method comprises:

Get a sample dataset;

Screening the sample data set according to the first-order attribute, and filtering out the first-order attribute data set;

Inputting the first-order attribute data set into a two-step clustering model, and performing crowd feature exploration on the first-order attribute data through the two-step clustering model to obtain a first-order crowd clustering result;

Through a decision tree algorithm, the first-order crowd clustering result and the first-order effective data set are analyzed and path restored, and at least one categorical variable corresponding to the first-order crowd clustering result is extracted;

Carry out model reconstruction according to all the classification variables, the first-order clustering results and the first-order valid data sets, construct a first-order crowd clustering model, and determine the corresponding first-order crowd clustering model. First-order crowd types, and label each first-order valid data set with its corresponding crowd type to obtain a first-order data set; the first-order crowd types include at least one of the crowd types.
The method for recommending content data according to claim 2, wherein, performing crowd feature exploration on the first-order attribute data through the two-step clustering model to obtain a first-order crowd clustering result, comprising:

The first-order attribute data set is standardized by the two-step clustering model to obtain first-order attribute data to be processed; the two-step clustering model includes a density clustering model and a K-means clustering model;

Using the DBSCAN algorithm, crowd density clustering is performed on the first-order attribute data to be processed through the density clustering model to obtain a transitional clustering data result;

The K-means algorithm is used to perform crowd feature clustering on the transition clustering data results through the K-means clustering model to obtain the first-order crowd clustering result.
The content data recommendation method according to claim 2, wherein the first-order crowd clustering result and the first-order effective data set are analyzed and the path is restored through a decision tree algorithm, and the first-order crowd clustering result and the first-order effective data set are analyzed and the path is restored, and a At least one categorical variable corresponding to the clustering results of the order population, including:

Associating first-order crowd types corresponding to the same sample users with first-order valid data, and determining the associated first-order valid data set as a decision-making data set; the first-order crowd clustering result includes a the first-order crowd types corresponding to the sample users in the valid data set; the first-order valid data set includes the first-order valid data corresponding to the sample users one-to-one;

Inputting the decision data set into a decision inversion model containing initial variable parameters;

Using a decision tree algorithm, the decision data set is analyzed through the decision inversion model, and the initial variable parameters are updated;

The path restoration is performed according to the updated initial variable parameters, and the classification variable corresponding to the first-order crowd clustering result is extracted.
The content data recommendation method according to claim 2, wherein before performing the index analysis on the first-order crowd classification result and the access attribute data by using the second-order index subdivision model, the method comprises:

According to the first-order data set and the access attribute sample data in the sample data set, merge to generate a second-order attribute data set;

Perform index feature extraction on the second-order attribute data set through a preference behavior model to obtain at least one comprehensive index variable;

According to the results of the first-order crowd type and all the comprehensive index variables, segmental analysis is performed on the second-order attribute data set, and a second-order index subdivision model is constructed.
The method for recommending content data according to claim 5, wherein the combining to generate a second-order attribute data set according to the first-order data set and the access attribute sample data in the sample data set comprises:

Randomly extracting fields from the access attribute sample data, and extracting the attribute data to be processed;

Perform missing value processing and extreme value processing on the attribute data to be processed to obtain attribute data to be added;

The attribute data to be added is correspondingly added to the first-order data set to generate the second-order attribute data set.
The content data recommendation method according to claim 5, wherein the second-order attribute data set is segmented and analyzed according to the first-order crowd type results and all the comprehensive index variables, and a second-order attribute data set is constructed. Metrics segmentation models, including:

Perform feature analysis and dimensionality reduction processing on all the comprehensive index variables to obtain principal component index variables;

Associating the crowd type corresponding to the same user with the second-order attribute data, and determining the associated second-order attribute data set as the data set to be subdivided; the first-order crowd type result includes all the data sets corresponding to the user. the crowd type; the second-order attribute data set includes the second-order attribute data corresponding to the users one-to-one;

According to the principal component index variable, segmental analysis is performed on the data set to be subdivided, and the second-order index subdivision model is constructed.
A content data recommendation device, comprising:

an acquisition module, configured to acquire user data of the user, preprocess the user data, and obtain data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;

an input module, configured to input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scene recommendation model; the content preference model is a multi-order model based on two-step clustering method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index subdivision model;

An identification module, configured to perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model to obtain a first-order crowd classification result corresponding to the user, and recommend through the scene The model performs scene adaptation on the traffic service attribute data to obtain a theme scene corresponding to the user;

an analysis module, configured to perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;

a determining module, configured to determine a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;

A recommendation module, configured to acquire content data matching the content recommendation tag from the content database, and recommend the acquired content data to the user.
A computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the following steps when executing the computer-readable instructions:

Obtaining user data of the user, preprocessing the user data, and obtaining data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;

Input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scenario recommendation model; the content preference model is based on a two-step aggregation model. Multi-order model of class method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;

Perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model, and obtain a first-order crowd classification result corresponding to the user. The service attribute data is adapted to the scene to obtain the theme scene corresponding to the user;

Perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;

determining a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;

Acquire content data matching the content recommendation tag from a content database, and recommend the acquired content data to the user.
The computer device according to claim 9, wherein before the crowd feature extraction is performed on the consumption attribute data and the social attribute data by using the first-order crowd clustering model, the processor executes the computer executable The following steps are also implemented when reading the command:

Get a sample dataset;

Screening the sample data set according to the first-order attribute, and filtering out the first-order attribute data set;

Inputting the first-order attribute data set into a two-step clustering model, and performing crowd feature exploration on the first-order attribute data through the two-step clustering model to obtain a first-order crowd clustering result;

Through a decision tree algorithm, the first-order crowd clustering result and the first-order effective data set are analyzed and path restored, and at least one categorical variable corresponding to the first-order crowd clustering result is extracted;

Carry out model reconstruction according to all the classification variables, the first-order clustering results and the first-order valid data sets, construct a first-order crowd clustering model, and determine the corresponding first-order crowd clustering model. First-order crowd types, and label each first-order valid data set with its corresponding crowd type to obtain a first-order data set; the first-order crowd types include at least one of the crowd types.
The computer device according to claim 10, wherein, performing crowd feature exploration on the first-order attribute data through the two-step clustering model to obtain a first-order crowd clustering result, comprising:

The first-order attribute data set is standardized by the two-step clustering model to obtain first-order attribute data to be processed; the two-step clustering model includes a density clustering model and a K-means clustering model;

Using the DBSCAN algorithm, crowd density clustering is performed on the first-order attribute data to be processed through the density clustering model to obtain a transitional clustering data result;

The K-means algorithm is used to perform crowd feature clustering on the transition clustering data results through the K-means clustering model to obtain the first-order crowd clustering result.
The computer device according to claim 10, wherein the first-order crowd clustering result and the first-order effective data set are analyzed and path restored by using a decision tree algorithm, and a solution that is related to the first-order crowd is extracted. At least one categorical variable corresponding to the clustering results, including:

Associating first-order crowd types corresponding to the same sample users with first-order valid data, and determining the associated first-order valid data set as a decision-making data set; the first-order crowd clustering result includes a the first-order crowd types corresponding to the sample users in the valid data set; the first-order valid data set includes the first-order valid data corresponding to the sample users one-to-one;

Inputting the decision data set into a decision inversion model containing initial variable parameters;

Using the decision tree algorithm, the decision data set is analyzed by the decision inversion model, and the initial variable parameters are updated;

The path restoration is performed according to the updated initial variable parameters, and the classification variable corresponding to the first-order crowd clustering result is extracted.
The computer device according to claim 10, wherein before the index analysis is performed on the first-order crowd classification result and the access attribute data by the second-order index subdivision model, the processor executes the computer The following steps are also implemented when the instruction is readable:

According to the first-order data set and the access attribute sample data in the sample data set, merge to generate a second-order attribute data set;

Perform index feature extraction on the second-order attribute data set through a preference behavior model to obtain at least one comprehensive index variable;

According to the results of the first-order crowd type and all the comprehensive index variables, segmental analysis is performed on the second-order attribute data set, and a second-order index subdivision model is constructed.
The computer device according to claim 13, wherein the combining to generate a second-order attribute data set according to the first-order data set and the access attribute sample data in the sample data set comprises:

Randomly extracting fields from the access attribute sample data, and extracting the attribute data to be processed;

Perform missing value processing and extreme value processing on the attribute data to be processed to obtain attribute data to be added;

The attribute data to be added is correspondingly added to the first-order data set to generate the second-order attribute data set.
One or more readable storage media storing computer-readable instructions, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processors to perform the following steps:

Obtaining user data of the user, preprocessing the user data, and obtaining data to be recommended; the data to be recommended includes consumption attribute data, social attribute data, access attribute data and traffic service attribute data;

Input the consumption attribute data, the social attribute data and the access attribute data corresponding to the user into the content preference model, and at the same time input the traffic service attribute data into the scenario recommendation model; the content preference model is based on a two-step aggregation model. Multi-order model of class method and decision tree; the content preference model includes a first-order crowd clustering model and a second-order index segmentation model;

Perform crowd feature extraction on the consumption attribute data and the social attribute data through the first-order crowd clustering model, and obtain a first-order crowd classification result corresponding to the user. The service attribute data is adapted to the scene to obtain the theme scene corresponding to the user;

Perform index analysis on the first-order crowd classification result and the access attribute data through the second-order index subdivision model, and determine a crowd preference label corresponding to the user;

determining a content recommendation tag corresponding to the user according to the crowd preference tag corresponding to the user and the theme scene;

Acquire content data matching the content recommendation tag from a content database, and recommend the acquired content data to the user.
16. The readable storage medium of claim 15, wherein before performing crowd feature extraction on the consumption attribute data and the social attribute data by using the first-order crowd clustering model, the computer-readable instructions are executed by When the one or more processors are executed, the one or more processors are caused to further perform the following steps:

Get a sample dataset;

Screening the sample data set according to the first-order attribute, and filtering out the first-order attribute data set;

Inputting the first-order attribute data set into a two-step clustering model, and performing crowd feature exploration on the first-order attribute data through the two-step clustering model to obtain a first-order crowd clustering result;

Through a decision tree algorithm, the first-order crowd clustering result and the first-order effective data set are analyzed and path restored, and at least one categorical variable corresponding to the first-order crowd clustering result is extracted;

Carry out model reconstruction according to all the classification variables, the first-order clustering results and the first-order valid data sets, construct a first-order crowd clustering model, and determine the corresponding first-order crowd clustering model. First-order crowd types, and label each first-order valid data set with its corresponding crowd type to obtain a first-order data set; the first-order crowd types include at least one of the crowd types.
The readable storage medium according to claim 16, wherein, performing crowd feature exploration on the first-order attribute data by using the two-step clustering model to obtain a first-order crowd clustering result, comprising:

The first-order attribute data set is standardized by the two-step clustering model to obtain first-order attribute data to be processed; the two-step clustering model includes a density clustering model and a K-means clustering model;

Using the DBSCAN algorithm, crowd density clustering is performed on the first-order attribute data to be processed through the density clustering model to obtain a transitional clustering data result;

The K-means algorithm is used to perform crowd feature clustering on the transitional clustering data results through the K-means clustering model to obtain the first-order crowd clustering result.
The readable storage medium according to claim 16, wherein the first-order crowd clustering result and the first-order effective data set are analyzed and path restored by using a decision tree algorithm, and the first-order crowd clustering result and the first-order effective data set are analyzed and the path is restored, and a At least one categorical variable corresponding to the clustering results of the order population, including:

Associating first-order crowd types corresponding to the same sample users with first-order valid data, and determining the associated first-order valid data set as a decision-making data set; the first-order crowd clustering result includes a the first-order crowd types corresponding to the sample users in the valid data set; the first-order valid data set includes the first-order valid data corresponding to the sample users one-to-one;

Inputting the decision data set into a decision inversion model containing initial variable parameters;

Using a decision tree algorithm, the decision data set is analyzed through the decision inversion model, and the initial variable parameters are updated;

The path restoration is performed according to the updated initial variable parameters, and the classification variable corresponding to the first-order crowd clustering result is extracted.
The readable storage medium of claim 16, wherein before the index analysis is performed on the first-order population classification result and the access attribute data by the second-order index segmentation model, the computer-readable instructions When executed by one or more processors, the one or more processors are caused to further perform the following steps:

According to the first-order data set and the access attribute sample data in the sample data set, merge to generate a second-order attribute data set;

Perform index feature extraction on the second-order attribute data set through a preference behavior model to obtain at least one comprehensive index variable;

According to the results of the first-order crowd type and all the comprehensive index variables, segmental analysis is performed on the second-order attribute data set, and a second-order index subdivision model is constructed.
The readable storage medium according to claim 19, wherein the combining to generate a second-order attribute data set according to the first-order data set and the access attribute sample data in the sample data set comprises:

Randomly extracting fields from the access attribute sample data, and extracting to-be-processed attribute data;

Perform missing value processing and extreme value processing on the attribute data to be processed to obtain attribute data to be added;

The attribute data to be added is correspondingly added to the first-order data set to generate the second-order attribute data set.