CN110046943B

CN110046943B - Optimization method and optimization system for network consumer subdivision

Info

Publication number: CN110046943B
Application number: CN201910398178.9A
Authority: CN
Inventors: 王伟军; 黄英辉; 刘辉; 李伟卿
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2023-01-03
Anticipated expiration: 2039-05-14
Also published as: CN110046943A

Abstract

The present invention belongs to the information processing technology of network consumerThe technical field discloses an optimization method and an optimization system for network consumer segmentation, which are characterized in that firstly, psychological lexical evidences are utilized, two main psychological segmentation dictionaries are constructed aiming at the word use behaviors of consumers, and the psychological map features of the consumers are obtained; secondly, selecting different cluster distances by using a clustering method to obtain consumer fine clusters; thirdly, constructing a machine learning prediction model facing the user consumption preference in each cluster, and determining the reliability and effectiveness of the cluster through comparison with a reference method; and finally, selecting an optimal user preference prediction model, manually inspecting the user cluster in the model, and endowing market subdivision labels. Two evaluation indices (RMSE and R) of the invention ² ) In comparison with unoptimized user segments, the invention can find the network consumer segment group with the optimal effect of remarkably improving the user preference prediction.

Description

Optimization method and optimization system for network consumer subdivision

Technical Field

The invention belongs to the technical field of network consumer behavior information processing, and particularly relates to an optimization method and an optimization system for network consumer subdivision.

Background

Currently, the closest prior art:

the existing market subdivision technology is mainly subdivided based on data such as user demographics, psychological maps, behavior indexes and the like, for example, a pre-subdivision method adopts supervised machine learning such as predefined subdivision labels, a support vector machine, a decision tree, a random forest and the like for classification; after-post subdivision is automatically generated into subdivision clusters based on clustering methods such as K-means and the like, and corresponding subdivision labels are given to the clustering clusters by adopting manual observation, so that subdivision consumer groups and corresponding consumer market subdivision labels are obtained; or a mixed subdivision method, namely, clustering analysis is carried out on the basis of the result of prior subdivision. Regardless of which segment variables and methods are selected, the resulting segments must be operable and useful to support marketing strategy formulation and enforcement. Specifically, there are five criteria to distinguish the success or failure of a subdivision: identifiability (whether segments are identifiable), substantive (size of segments), accessibility (whether it is easy for marketers to conduct a campaign), differentiability (whether there is a differentiation between segments), and operability (whether segments are consistent with enterprise competitiveness).

Generally, in a network scenario such as e-commerce, the problems of the prior art are:

(1) The existing subdivision method has the problems of substantive property, accessibility, operability and the like. The pre-segmentation adopts predefined labels as segmentation targets, most of the labels are derived from past experiences and probably do not accord with real user data, so that the problems that the segmentation cannot be identified, the size difference of the segmentation scale is overlarge, the marketing personnel is not facilitated to carry out activities, the enterprise competitiveness runs counter and the like are caused.

(2) The existing post subdivision method has the problems of identifiability, accessibility, operability and the like. After-the-fact subdivision automatically identifies subdivision clusters in user data by adopting unsupervised methods such as clustering and the like, and the obtained subdivisions are different only in data significance and are not manually inspected, so that the problems that subdivision effects are questioned, marketing activities are not facilitated to be developed, subdivision does not accord with enterprise benefits and the like are caused.

(3) The existing mixed subdivision method also inherits the advantages of the two methods, and simultaneously has the disadvantages of the two methods such as the performance, the accessibility and the operability to different degrees.

Particularly in a network context, automation and intellectualization of marketing activities become mainstream, and the existing segmentation method and system are not combined with the marketing activities focusing on user preference and demand, such as positioning, popularization and delivery of corresponding online enterprise services and products. Therefore, the existing method generally has the prominent problems of accessibility, operability and the like.

The difficulty of solving the technical problems is as follows:

to achieve accurate predictions and dynamic interpretations of consumption demand and preference predictions, network consumer behavior must be fully extracted and mined. However, on the one hand, consumer behavior in a network environment is dynamic and heterogeneous, and it is difficult for existing market segmentation methods to support efficient mining of heterogeneous data. On the other hand, to meet the requirements of market segmentation on substantiality, accessibility, and operability in consumer preference prediction, the segmentation method must be adapted to dynamic consumer preferences. The existing consumption subdivision method lacks an implementation idea, an operation method and a system implementation for the functional requirement.

The significance of solving the technical problems is as follows:

the invention realizes accurate prediction and dynamic explanation of network consumer user segmentation, optimizes network market segmentation functional modules and technical routes, and provides support for electronic marketing decision making such as positioning, popularization and delivery of network services and products and development of intelligent electronic marketing components.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an optimization method and an optimization system for network consumer segmentation.

The invention is realized in such a way that a method for optimizing network consumer segment comprises the following steps:

step one, constructing a psychological subdivision dictionary for the word use behaviors of a user by using psychological lexical evidences, and acquiring the characteristics of a user psychological map;

selecting different cluster distances by using a clustering method to obtain user detailed clusters;

step three, constructing a machine learning prediction model facing user consumption preference in each cluster, and determining the reliability and effectiveness of the cluster through comparison with a reference method;

and step four, selecting an optimal user preference prediction model, manually inspecting the user cluster in the model, and endowing market segmentation labels.

Further, the step one further comprises: and automatically constructing a Schwarz value observation table word bank and a five-personality model word bank by using a natural language processing method, wherein the word bank is used for supporting the acquisition of the user psychological map.

Further, the first step specifically comprises:

1) Automatic acquisition of a psychological map word library:

calculating the similarity between word vectors generated by word embedding by using cosine similarity; order to

Where m is the word vector dimension; based on word vectors, the invention measures the semantic similarity of vocabularies by using Cosine distance, and the specific calculation formula is as follows:

based on cosine similarity, calculating Top10 vocabularies most similar to seed words in a synonym library by using a natural language word embedding algorithm; setting 0.45 as a threshold value of a similarity value, and traversing embedded words trained by the Internet corpus to obtain 10 seed words most similar to the corresponding psychological map candidate word library; after filtering through a threshold value, adding the expanded words into a candidate word set; repeatedly executing the same process of calculation by using a natural language word embedding algorithm based on the updated candidate word set until no new words are extracted;

calculating the score of the candidate word in each psychological subdivision dimension according to the membership formula of the psychological map dimension of the candidate word, wherein w _ext Is a psychologically subdivided expanded word set, w _seed1 ，w _seed2 ...，w _seedp The method is characterized in that each dimension seed word of each psychological map:

SVS_scores(w _ext ，w _seed1 )＝Max{sim(w _ext ，w _seed2 )，sim(w _ext ，w _seed3 )，...，sim(w _ext ，w _seedp )}；

2) The method comprises the following steps of automatically identifying a user psychological map based on a word bank:

defining a p-dimensional psychographic map as L = { L = { (L) } ₁ ，L ₂ ，...，L _p The user reviews the unstructured data set as r ₁ ，r ₂ ，...，r _m Total number of user comments is m, where each r _i Is { w _i1 ，w _i2 ，...，w _in N is the total number of words in the data; according to the obtained psychological map dictionary, adopting

Vocabulary accumulation, L _u ^p Is the score for each dimension of the mental map.

Further, the second step further comprises: setting different cluster distances, and identifying the network user market subdivision by using a DBSCAN density clustering algorithm.

Further, the second step further comprises:

a) Consumer psycho-subdivision cluster acquisition based on DBSCAN:

performing DBSCAN clustering on the scores of the network user psychological map; in the clustering cluster, according to the subdivision group where the user is located, predicting consumption preference;

b) Consumer preference prediction integrating psychological segmentation and deep neural networks:

capturing a nonlinear user psychological subdivision-product preference relation by using a deep neural network, and performing higher-level data representation by using complex abstract coding; if there are M users and N products, R represents the training data set matrix,

representing a test data set matrix; let r be _ui To the consumer u's preference for product i,

scoring the predicted preference;

on the basis, a DNN system architecture is constructed, and an input layer, a plurality of hidden layers and an output layer formed by nodes with specific category numbers are arranged; by minimizing

Wherein

g is a linear combination of node values in the hidden layers of the network, h is an activation function, the network is trained on the basis of the evaluation index minimum mean square error, and weights among the hidden layers are dynamically updated by using gradient descent and back propagation.

Further, the third step further comprises: and verifying the accuracy of the obtained user sub-groups by using the purchasing preference prediction of the user based on the deep neural network.

Further, in the fourth step, an optimal user preference prediction model is selected, and the root mean square error is defined as follows;

wherein

As predicted score, y _u，i Is the true score of the character,

average score for user u, size of n test data set.

Another object of the present invention is to provide an optimization control system for network consumer segment, which implements the optimization method for network consumer segment.

Another object of the present invention is to provide an optimization terminal for network consumer segment implementing the optimization method for network consumer segment.

In summary, the advantages and positive effects of the invention are:

the invention is optimized for two typical user psychographic map subdivision methods (SVS and BFF). Some experimental results are shown in table 1, and there are specific sub-groups (ClusterID column), and using specific psychographic subdivision variables, the best results are obtained by corresponding clustering and preference regression algorithms (e.g. in the Beauty commodity category, the smallest preference prediction error 0.6923 is obtained by the sub-group of "Cluster _2" under the support of BBF subdivision variables, DBSCAN clustering and DNN components).

Statistical analysis of optimization method results table 1Ming and B two evaluation indexes (RMSE and R) ² ) Next, the optimization methods (BFF and SVS) have higher user preference interpretability (R) than the control group's user subdivision (e.g., random of FIG. 3, FIG. 4, table 2) ² ) And a smaller preference prediction error (RMSE).

Table 1 optimization method for network consumer segmentation provided by the embodiment of the present invention effect table (part)

Table 2 effect comparison table (difference of different subdivision variables in preference prediction) of optimization method for network consumer subdivision provided in the embodiment of the present invention

In addition, the performance of the core component DNN of the optimization method and system is analyzed and compared by the invention. It can be seen that DNN is significantly lower in the least mean square error on preference prediction (RMSE value of NN algorithm around 0.82 as shown in fig. 5) than support vector regression SVR, random forest RF and linear regression LR methods (RMSE of these three methods is greater than 0.95 as shown in fig. 5), in the interpretability of consumer preference (R of DNN algorithm as shown in fig. 6) ² Around a value of 0.14) is significantly higher than the R of other methods (other algorithms as shown in table 3 and fig. 6) ² A value less than 0.05); the DNN components in Table 3 have the smallest prediction error (RMSE) and the greatest predictive interpretation power (R) relative to the other reference components (Linear regression LR, support vector regression SVR, and random forest RF) ² )。

Table 3 core component effect comparison table in the embodiment of the present invention

In general, the present invention effectively combines user segmentation with electronic marketing activities such as user preference prediction in a network scenario. And the user preference prediction is taken as a subdivided evaluation standard, and the user preference prediction is further divided into specific means of preference prediction. On one hand, the method can provide a basis for manual checksum utilization of subsequent sub-groups by identifying the sub-groups with optimal preference prediction effect, and provide a basis for optimization of substantive and accessibility of the sub-groups; on the other hand, the segmentation result is consistent with the competitiveness of an enterprise and the purpose of an electronic marketing component, and the accessibility and operability of segmentation are improved.

Drawings

Fig. 1 is a flowchart of a method for optimizing network consumer segments according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of an optimization method for network consumer segment according to an embodiment of the present invention.

FIG. 3 is a comparison graph (RMSE) of consumer segment optimization results provided by embodiments of the present invention.

FIG. 4 is a comparison graph of consumer segment optimization (R) provided by an embodiment of the present invention ² )。

Fig. 5 is a comparison graph (RMSE) of the performance of a preference prediction algorithm provided by an embodiment of the present invention.

Fig. 6 is a comparison graph (R2) of the performance of the preference prediction algorithm provided by the embodiment of the present invention.

Fig. 7 is a DDNN training and testing diagram provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

The existing subdivision technology depends on manual labels to determine the effectiveness and reliability of the subdivision method. But the agility and accuracy of the method are insufficient, so that the preference behavior of the consumer in the network environment cannot be accurately predicted and dynamically explained.

To solve the above problems, the present invention will be described in detail with reference to the accompanying drawings.

The optimization method for network consumer segmentation provided by the embodiment of the invention comprises the following steps: firstly, two main psychological subdivision dictionaries are constructed aiming at word use behaviors of consumers by utilizing psychological lexical evidences, and psychological map features of the consumers are obtained.

And secondly, selecting different clustering distances by using a clustering method to obtain consumer fine clusters.

And thirdly, constructing a machine learning prediction model facing the user consumption preference in each cluster, and determining the reliability and effectiveness of the cluster through comparison with a reference method.

And finally, selecting an optimal user preference prediction model, manually inspecting the user cluster in the model, and endowing market subdivision labels.

As shown in fig. 1, the method for optimizing network consumer segments provided in the embodiment of the present invention specifically includes:

1) Acquisition of consumer psychological maps:

and (3) automatically constructing two word banks of a Schwarz value observation table and a 'five-personality' model by using a natural language processing method so as to support the acquisition of a psychological map of the consumer. Firstly, two seed word sets are obtained according to the Schwarz value view and the using behavior characteristics of the related vocabularies of the five personality. And secondly, expanding the seed words by utilizing semantic knowledge in synonym forest to obtain a candidate word set. And thirdly, extracting the associated vocabulary from the large-scale Internet corpus by using a word embedding method, and further expanding the candidate word set. Finally, the vocabulary with larger deviation is eliminated through manual inspection, so that the reliability of the lexicon is verified.

2) Consumer psychological segmentation optimization based on user preference prediction:

the invention sets different clustering distances, utilizes a DBSCAN density clustering algorithm to identify the market segmentation of network consumers, and utilizes the purchasing preference prediction of consumers based on the deep neural network to verify the accuracy of the obtained consumer segmentation group.

3) The user preference prediction evaluation method based on the psychological subdivision comprises the following steps:

the Root Mean Square Error (RMSE) is defined as follows. Wherein

As predicted score, y _u，i Is the true score of the character,

average score of consumer u, y _test Test data set, n test the size of the data set.

The step 1) specifically comprises the following steps:

1.1 Automatic acquisition of a psychographic thesaurus:

in the invention, the similarity between word vectors generated by embedding words is calculated by using cosine similarity. Order to

Where m is the word vector dimension. Based on word vectors, the invention measures the semantic similarity of the vocabularies by using Cosine distance, and the specific calculation formula is as follows:

based on cosine similarity, top10 vocabularies most similar to the seed words in the synonym library are calculated by using a natural language word embedding algorithm. The method sets 0.45 as a threshold value of the similarity value, and traverses the embedded words trained by the Internet corpus to obtain 10 seed words most similar to the corresponding psychological map candidate word library. And after threshold filtering, adding the expanded words into the candidate word set. Based on the updated set of candidate words, the same process above is repeatedly performed until no new words are extracted.

According to the membership formula of the psychological map dimensions of the candidate words, the invention calculates the score of the candidate words in each psychological subdivision dimension, wherein w _ext Is a psychologically subdivided expanded set of words, w _seed1 ，w _seed2 …，w _seedp The method is characterized in that each dimension seed word of each psychological map:

SVS_scores(w _ext ，w _seed1 )＝Max(sim(w _ext ，w _seed2 )，sim(w _ext ，w _seed3 )，...，sim(w _ext ，w _seedp )}

1.2 Thesaurus-based automatic identification of consumer psychographic maps:

the invention makes p dimension psychology map definition is L = { L = { L ₁ ，L ₂ ，...，L _p The unstructured data sets such as the consumer reviews are r ₁ ，r ₂ ，...，r _m H, the total number of user comments is m, where each r _i Is { w _i1 ，w _i2 ，...，w _in N is the total number of words in the data. According to the psychological map dictionary obtained in 1.1, vocabulary accumulation is adopted, i.e.

L _u ^p Is the score for each dimension of the mental map.

In the step 2), the method specifically comprises the following steps:

2.1 DBSCAN-based consumer psycho-segmentation cluster acquisition:

unlike K-means, DBSCAN density clustering does not require specifying the number of clusters in the prior data, and can find clusters of any shape. DBSCAN clustering is carried out on the psychological map score of the network consumer; in the clustering, the invention predicts the consumption preference according to the fine group of the consumers.

2.2 Consumer preference prediction integrating psychological segmentation and deep neural networks:

deep Neural Networks (DDNN) are an advanced, rapidly developing artificial intelligence technique that has significant advantages over traditional intelligent algorithms. The method can effectively capture the nonlinear consumer psychological subdivision-product preference relation and can use complex abstract coding to carry out higher-level data representation. If there are M consumers and N products, R represents the training data set matrix,

representing a test data set matrix. Let r be _ui To the consumer u's preference for product i,

is a predicted preference score. And on the basis, constructing a DDNN architecture, and setting an input layer, a plurality of hidden layers and an output layer formed by nodes with specific category number. By minimizing

Wherein

g is a linear combination of node values in hidden layers of the network, h is an activation function (typically a Sigmoid or hyperbolic tangent function), the network is trained based on evaluation index minimum mean square error (mse), and the weights between multiple hidden layers are dynamically updated using gradient descent and back propagation.

In the embodiment of the invention, the overall framework principle of the optimization method of the network consumer subdivision is shown in fig. 2.

The present invention will be further described with reference to effects.

The invention is optimized for two typical user psychographic map subdivision methods (SVS and BFF). The results show two evaluation indices (RMSE and R) ² ) In comparison with the unoptimized user segment (e.g., rond in FIGS. 3 and 4), the present invention can find a significant improvement in the prediction of user preferenceThe best effort network consumer detail groups (e.g., BFFs and SVSs in fig. 3 and 4).

The invention analyzes and compares the performance of the core component DDNN of the optimization method and the optimization system. It can be seen that DDNN is significantly lower in the minimum mean square error on preference prediction (RMSE value around 0.82 for DNN algorithm shown in fig. 5) than support vector regression SVR, random forest RF, and linear regression LR methods (RMSE is greater than 0.95 for these three methods shown in fig. 5), in the interpretability of consumer preference (R for DNN algorithm shown in fig. 6) ² Around a value of 0.14) is significantly higher than the R of other methods (such as the other algorithms shown in fig. 6) ² A value of less than 0.05).

The invention is further described below in connection with the experimental procedures.

The invention constructs positive correlation and negative correlation electronic commerce psychographic dictionaries, namely SVS-pos, SVS-neg and BFF-pos, BFF-neg dictionaries, and carries out electronic commerce consumer segmentation based on the identified SVS and BFF scores and a DBSCAN clustering algorithm. The invention further provides a DDNN method for constructing a regression model for scoring the subdivided consumers. The invention then proceeds to experiment with amazon online shopping data.

1) The data set describes:

amazon is one of the largest e-commerce platforms in the world, and accumulates massive amounts of user purchasing behavior data. The amazon review dataset published by McAuley et al contains 1.428 hundred million product reviews and metadata from amazon.com with a data collection period of 5 months 1996 to 7 months 2014. The present invention selects 5 review datasets from 5 product categories based on a "K-core" value equal to "10" to ensure that there are at least 10 reviews per pending user or item. The present invention considers that 10 reviews (average review length of 189 words) are sufficient for consumer/product psychographic reasoning compared to 25 tweets. Table 4 shows a detailed data set description.

Experimental dataset description is shown in Table 4.

The following is detailed information of the review sample:

{

"reviewerID":"A2SUAM1J3GDNN3B",

"asin":"0000013714",

"reviewerName":"J.McDonald",

"helpful":[2,3],

"reviewText":"I bought this for my husband who plays the piano.He is having a wonderful time playing these old hymns.The music is at times hard to read because we think the book was published for singing from more than playing from.Great purchase though！",

"overall":5.0,

"summary":"Heavenly Highway Hymns",

"unixReviewTime":1252800000,

"reviewTime":"09 13,2009"

}

2) The experimental process comprises the following steps:

the data processing process is divided into the following steps.

First, the present invention retains the "reviewerID", "asin", "override", "reviewText", and "summary" in the above seven data sets, and merges the "reviewText" and "summary" as vocabulary use behaviors to recognize online psychological diagrams.

Secondly, the method uses Python, a machine learning tool Sciket-leann and a natural language processing tool (NLTK) to carry out text preprocessing, including normalization, identification, deletion of stop words and word drying. Normalization is the process of converting a list of words into a more uniform sequence. The marking is to cut a given character sequence and a defined document unit into pieces, i.e. marks. Some of the words in the reviews and their common use are of little value in helping to select text that meets the needs of the present invention and should be excluded from the vocabulary. These words are called stop words. The stem is intended to generalize the various shapes and derivatives of words into a common basic form. The invention intervenes in word forms in dictionaries and comments through Lancaster Stemmer of NLTK. According to the invention, stop words are deleted through an English stop word list in NLTK, the lowercase form of English words is obtained through an English lowercase conversion method in Python, and Z-score normalization is carried out through a proportion method in Sciket-leern. Based on the SVS-pos, SVS-neg, BFF-pos and BFF-neg dictionaries and all the data preprocessing steps described above, the present invention calculates a psychographic score by matching the vocabulary in the reviews with the vocabulary in these dictionaries. Thus, the present invention obtains the SVS and BFF scores for each amazon consumer and product.

Third, for each product category, the present invention performs a DBSCAN clustering algorithm in the SVS or BFF scores of the consumer using Scikit-leann to obtain the consumer's positively and negatively correlated psychographic scores and corresponding psychological segmentation labels. The present invention then constructs a score prediction dataset of psychographic scores (independent variables) combined with the scores given to the product by the consumer (dependent variables). The present invention also constructs a feature set for each product category, which contains random values between 0 and 1 as control groups. For each data set, the parameters of the DDNN were optimized by a gradient descent algorithm. The optimal number of cycles of the neural network algorithm (one pass through the complete training set) is determined by the performance of the validation set. For the support vector machine algorithm, the present invention uses a validation set to optimize the cost parameter C. The present invention develops DNN and Baseline using the Keras interface of Linear Regression (LR), SVM (radial basis function kernel), random forest and Google Tensorflo software in the Scikit-learn tool. 5-fold cross validation was used to select the training and testing data sets for each pass, avoiding overfitting of linear regression, SVM, RF and DDNN. Fig. 7 shows an example of the evolution of epochs from RMSE and DDNN. In fig. 7, the present invention can note that the best epoch is around 15.

Fourth, feature importance of different psychographic sub-dimensions in understanding consumer online preferences is studied using feature ranking and recursive feature elimination methods. A recursive feature elimination (RFE-SVM) method based on a support vector machine is a commonly used feature selection and subsequent regression task technique, especially in consumer preference prediction. Each iteration trains a linear SVM, and the next step is to consider the deletion of one or more "bad" features. The quality of the features is determined by the absolute values of the respective weights used in the SVM. The features that remain after many iterations are considered to be the most useful features for analyzing data. By incorporating RFE-SVM into segment consumer score prediction, the present invention can explore from the sub-dimensions of SVS and BFF whether these sub-dimensions are effective in predicting and interpreting preferences.

Finally, the present invention performed an online consumer preference prediction experiment that contained 5 product categories 4 prediction algorithms (LR, SVM, RM, DNN) 3 psychographic variables (random, SVS and BFF) 2 clustering methods (whether consumers were clustered based on DBSCAN).

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method for optimizing a network consumer segment, the method comprising:

step one, constructing a psychological subdivision dictionary for the word use behaviors of a user by using psychological lexical evidences, and acquiring the characteristics of a user psychological map; the method specifically comprises the following steps:

1) Automatic acquisition of a psychological map word library:

Where m is the word vector dimension; based on the word vectors, the Cosine distance is utilized to measure the semantic similarity of the vocabularies, and the specific calculation formula is as follows:

based on cosine similarity, calculating Top10 vocabularies most similar to seed words in a synonym library by using a natural language word embedding algorithm; setting 0.45 as a threshold value of a similarity value, and traversing embedded words trained by the Internet corpus to obtain 10 seed words most similar to the corresponding psychological map candidate word library; after filtering through a threshold value, adding the expanded words into a candidate word set; based on the updated candidate word set, repeatedly executing the same process of calculation by using a natural language word embedding algorithm until no new word is extracted;

calculating the score of the candidate word in each psychological subdivision dimension according to the membership formula of the psychological map dimension of the candidate word, wherein w _ext Is a psychologically subdivided expanded set of words, w _seed1 ，w _seed2 …，w _seedp Is a seed word of each dimension of each psychological map:

psychology of p dimension map definition is L = { L = { (L) ₁ ，L ₂ ，...，L _p The user reviews the unstructured data set as r ₁ ，r ₂ ，...，r _m Total number of user comments is m, where each r _i Is { w _i1 ，w _i2 ，...，w _in N is the total number of words in the data; according to the obtained psychological map dictionary, adopting

Vocabulary accumulation, L _u ^p Is the score for each dimension of the mental map;

and step four, selecting an optimal user preference prediction model, manually inspecting the user cluster in the model, and endowing market subdivision labels.

2. The method for optimizing network consumer segments of claim 1, wherein step one further comprises: and automatically constructing a Schwarz value observation table word library and a five-personality model word library by using a natural language processing method, wherein the word libraries are used for supporting the acquisition of the psychological map of the consumer.

3. The method of optimizing network consumer segments of claim 1,

the second step further comprises: setting different cluster distances, and identifying the network user market segmentation by using a DBSCAN density clustering algorithm.

4. The method of optimizing network consumer segments of claim 1,

the second step further comprises:

a) Obtaining user psychology fine clustering based on DBSCAN:

performing DBSCAN clustering on the scores of the psychological map of the network consumers; in the cluster, according to the subdivision group where the user is located, consumption preference is predicted;

b) Integrating psychological segmentation with user preference prediction for deep neural networks:

representing a test dataset matrix; let r be _ui For the preference of user u for product i,

scoring the predicted preference;

on the basis, a DNN architecture is constructed, and one input layer and a plurality of input layers are arrangedAn output layer composed of hidden layers and nodes with specific category number; by minimizing

Wherein

g is a linear combination of node values in the hidden layer of the network, h is an activation function, the network is trained based on the evaluation index least mean square error, and weights among multiple hidden layers are dynamically updated by using gradient descent and back propagation.

5. The method for optimizing network consumer segments of claim 1, wherein step three further comprises: and verifying the accuracy of the obtained user sub-groups in the purchase preference prediction of the user based on the deep neural network.

6. The method for optimizing network consumer segments according to claim 1, wherein in the step four, an optimal user preference prediction model is selected, and a root mean square error is defined as follows;

wherein

As predicted score, y _u，i Is the true score of the character,

the average score for user u, and n is the size of the test data set.

7. An optimization control system for network consumer segment implementing the optimization method for network consumer segment of claim 1.

8. An optimization terminal for network consumer segment implementing the method for optimizing network consumer segment of claim 1.