CN116385037A

CN116385037A - User portrait construction method and system based on feature fusion of improved LDA

Info

Publication number: CN116385037A
Application number: CN202310226593.2A
Authority: CN
Inventors: 曹亚东; 马小宁; 孙知信; 孙哲; 赵学健; 宫婧; 汪胡青; 胡冰; 徐玉华
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-03-10
Filing date: 2023-03-10
Publication date: 2023-07-04

Abstract

The invention discloses a user portrait construction method and a system based on feature fusion of improved LDA, which relate to the technical field of data mining and comprise the following steps: acquiring user data of a product consumer and preprocessing the data; extracting basic attribute data of a user, and extracting user behaviors and interest features according to the basic attribute data of the user; the basic attributes, behaviors and interests are fused to construct an end user portrayal, the portrayal is analyzed, and an advertisement recommendation dimension is provided. The user portrait construction method based on the feature fusion of the improved LDA carries out feature cross fusion on basic attributes, behavior data and interest data of consumers by improving a probability topic model. The invention not only improves the integrity and the accuracy of user portrait construction, deepens the understanding of the consumers purchasing behavior of the merchants, and ensures that the merchants optimize the products for rows; and the method is beneficial to the establishment of accurate advertisement recommendation strategies of targeted groups by merchants, and improves advertisement recommendation accuracy and conversion benefits.

Description

User portrait construction method and system based on feature fusion of improved LDA

Technical Field

The invention relates to the technical field of data mining, in particular to a user portrait construction method and system based on feature fusion of improved LDA.

Background

The user's life habit and consumption will are reflected to a certain extent by a large amount of data left in the internet surfing process through the mobile terminal, including identity data, access browsing data, purchasing data, social data and the like, and the user portrait is a virtual user image based on a network, is established on the basis of a large amount of user data and is processed through technologies such as data mining, machine learning, deep learning and the like, so that the user characteristics are displayed.

The user portrait is constructed, so that merchants can be helped to better know own consumption clients, crowd orientation is carried out in the advertisement putting process, and target crowds are found to be accurately recommended and marketed.

At present, a user portrait is constructed by commonly using the following models, each model has advantages, based on a mathematical statistics method, the user characteristics are analyzed by carrying out quantization processing on data values, and the user portrait is good at processing structured type data, but can not be analyzed on unstructured data such as images, audio texts and the like; based on a vector space model method, unstructured data is considered, and a user portrait is represented in a vector form; the method based on the topic model is suitable for processing unstructured data types, a model with lower dimensionality is used for representing a user to a certain extent, the model can segment text data into words, text topics and topic probability distribution proportion thereof are determined according to word topic probability, the model has important application in the field of natural language processing, LDA (laser direct structuring) represents texts as mixed random distribution of a plurality of topics, and the topics are represented as mixed probability distribution of a plurality of words, so that the method is a typical word bag model; the method based on ontology can make a label system describing user portraits more hierarchical and relevant, but excessively depends on expert definition between term logics; the method based on the neural network simulates an animal thinking mode and combines animal neural behaviors, and has the advantages that the model can perform distributed storage, large-scale parallel data processing and nonlinear operation; today, where networks are rapidly developed, massive heterogeneous data generated by users in network interaction cannot be described in a single manner, such as words issued by the users, browsed videos, concerned people, purchased goods, and the like; with the application of data mining, machine learning and other technologies, not only structured type data, but also unstructured types are subjected to data mining to show greater value.

According to the user portrait construction method based on the feature fusion of the improved LDA, multiple-aspect data are collected, different modeling methods are adopted for different types of data, basic attributes, interests and behavior models are fused to represent user portraits of product consumers, the integrity and the accuracy of the portraits are improved, and the user portraits suitable for an electronic commerce consumption platform are constructed.

Disclosure of Invention

This section is intended to outline some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description summary and in the title of the application, to avoid obscuring the purpose of this section, the description summary and the title of the invention, which should not be used to limit the scope of the invention.

The present invention has been made in view of the above-described problems.

Therefore, the technical problems solved by the invention are as follows: how to deepen the understanding of the merchant on the basic attributes, the behavioral aspects and the interest psychological aspects of the consumer, optimize the aspect of the product facing the customer, help the merchant to formulate an accurate advertisement recommendation strategy of the targeted crowd, and improve the advertisement recommendation accuracy and conversion benefit.

In order to solve the technical problems, the invention provides the following technical scheme: a user portrait construction method based on feature fusion of improved LDA comprises the following steps:

acquiring user data of a product consumer and preprocessing the data;

extracting basic attribute data of a user, and extracting user behaviors and interest features according to the basic attribute data of the user;

the basic attributes, behaviors and interests are fused to construct an end user portrayal, the portrayal is analyzed, and an advertisement recommendation dimension is provided.

As a preferred scheme of the improved LDA-based feature fusion user portrayal construction method of the present invention, the obtaining product consumer user data comprises:

collecting base layer attribute information of a platform user, and collecting dynamic information of the user on the platform;

the information comprises gender, region, age, online active time, historical praise collection data and historical purchase behavior data;

the preprocessing comprises splitting and deduplicating the collected data related to the user behaviors, and extracting key content from the source data.

As a preferred scheme of the user portrait construction method based on improved LDA feature fusion of the present invention, the preprocessing further includes: performing data cleaning, word segmentation and word stopping;

the data cleaning comprises removing noise and redundant data, and screening and checking the data; standardized data, namely supplementing the missing value, and ensuring that the data dimension can be in the same standard;

the word segmentation comprises the steps of dividing a text into a plurality of words with independent meanings according to reasonable rules;

the de-disabling of words includes filtering words that are nonsensical to classification, and maintaining and expanding the disabling vocabulary continuously during data processing.

As a preferable scheme of the user portrait construction method based on improved LDA feature fusion of the present invention, the basic attribute data includes: gender, region, active time, mobile terminal model, occupation;

extracting user behavior and interest characteristics comprises extracting user behavior characteristics and interest preference characteristics based on an improved LDA model;

the improved LDA model is represented as,

wherein, the liquid crystal display device comprises a liquid crystal display device,

for touching the people, the person is->

For the image precision, < >>

For time (I)>

The dimension is the feature dimension of the interesting behavior;

when meeting the requirement of touching people

The characteristic dimension is->

The image precision is->

The time feature dimension is->

It is ensured that the characteristic dimension is in the range +.>

。

As a preferable scheme of the user portrait construction method based on the feature fusion of the improved LDA, the method for extracting the user behavior and the interest features is expressed as follows:

classification according to the characteristics of the population is expressed as,

the distribution of individual words under each category, throughout the expected set, is represented as,

，

wherein, the fixed value

Representing a total number of text in the dataset; />

Representing a single text; />

Representing the total number of words in the text; />

Express theme->

A word vector representing text; />

Representing the topic distribution->

Is->

Super-parameters of dirichlet distribution; />

Representing word distribution->

Is->

Super-parameters of dirichlet distribution, < ->

Representing classification by crowd characteristics->

Representing the distribution of individual words under each category throughout the expected set.

As a preferred embodiment of the method for constructing a user portrait based on feature fusion of improved LDA according to the present invention, the constructing an end user portrait includes: selecting and fusing the user interest preference characteristics and the purchasing behavior characteristics to generate a user portrait model;

the feature fusion, denoted as,

，

representing interest preference features->

Representing the characteristics of purchasing behavior, a, c representing the characteristics of different sets in the interest preference portrait,/>

indicating the purchase behavior feature, the subscript number indicates the number of each text data, +.>

Representing a cartesian product.

As a preferred embodiment of the user portrayal construction method based on feature fusion of improved LDA according to the invention, the construction of the end user portrayal is represented as:

，

representing basic attribute features including age, sex, region,/->

Representing a purchasing behavior feature; />

Representing interest preference characteristics, parameters->

Indicates the time period +_>

Representing portrait feature dimension, < >>

Indicating the number of people touching.

Therefore, the technical problems solved by the invention are as follows: how to extract key content from source data, transform the data into unified and identifiable structure, and effectively extract the most relevant features of product consumer groups on the premise of reaching a certain degree, and remove redundancy.

In order to solve the technical problems, the invention provides the following technical scheme: a user portrayal construction system based on feature fusion of improved LDA, comprising:

the system comprises a data acquisition module, a data preprocessing module, a data mining module and a data analysis module;

as a preferable scheme of the user portrait construction system based on the feature fusion of the improved LDA, the data acquisition module is a device for acquiring user data, and is used for extracting a user behavior model and a user interest model and transmitting the acquired data to the data preprocessing module;

as a preferable scheme of the user portrait construction system based on the improved LDA feature fusion, the data preprocessing module is a device for processing missing and redundant data and is used for extracting key contents from the data acquisition module and converting the data into a unified and identifiable structure;

as a preferred scheme of the user portrait construction system based on the feature fusion of the improved LDA, the data mining module is a device for extracting user behaviors and interest features based on an improved LDA model, and final user portraits are generated by carrying out weight measurement on basic attribute features, behavior tags and interest tags on data extracted by a data preprocessing module, and carrying out feature selection and cross fusion;

as a preferable scheme of the user portrait construction system based on the feature fusion of the improved LDA, the data analysis module is a device for providing advertisement recommendation dimension through analyzing portraits, considers the influence of time factors on the user portraits, and analyzes consumer information in time intervals according to the ordering time of products.

A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method as described above when executing the computer program.

A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method as described above.

The invention has the beneficial effects that: the user portrait construction method based on the feature fusion of the improved LDA provided by the invention adopts data mining and machine learning to carry out user modeling, collects basic attribute data of users, records and analyzes behaviors and interest features of the users by utilizing a platform of the users, improves the integrity and accuracy of the user portrait, extracts features from different structural data of basic attributes, purchasing behavior data and interest preference data of crowds, constructs a new factor optimization probability theme model, can effectively extract the most relevant features of the product consumption crowds on the premise of obtaining certain touch, removes redundancy, and uses data mining and natural language processing technology to cross-fuse the interest preference features of behaviors to construct consumer crowd portraits, thereby improving the integrity and accuracy of the user portraits and providing a new idea for advertisement oriented crowds.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a general flow chart of a user portrayal construction method based on feature fusion of improved LDA according to an embodiment of the present invention;

FIG. 2 is a block diagram of a user portrayal construction system based on feature fusion with improved LDA according to a second embodiment of the present invention;

FIG. 3 is a comparison chart of removing redundant effects in a user portrait construction method based on feature fusion of improved LDA according to a fourth embodiment of the present invention;

FIG. 4 is a graph showing user portrait accuracy contrast of a user portrait construction method based on feature fusion of improved LDA according to a fourth embodiment of the present invention.

Description of the embodiments

So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

While the embodiments of the present invention have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the invention. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.

Also in the description of the present invention, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Example 1

Referring to fig. 1, for one embodiment of the present invention, there is provided a user portrait construction method based on feature fusion of improved LDA, including:

and acquiring related data of a historical consumer of a certain product on a consumption platform and performing preprocessing operation.

The construction of the basic attribute tag plays a role in constructing a user portrait, and the data is relatively easy to acquire although the identification effect on the nuances among users is poor, and the method has a better effect on distinguishing user groups under the condition that no other interactive information exists for a new user entering for the first time.

The method comprises the steps that firstly, base layer attribute information of platform users is grabbed, a large number of mainstream platforms at present need to fill specific information such as gender, region and age when the users register, the data are basic and easy to acquire, but because of complex network environments, users pay more and more attention to personal privacy, although the platform guides the users to use real identities, the users use false or fill in more simple information, and the data acquired by the portions need to be preprocessed.

Secondly, dynamic information of the user on the platform needs to be collected, including online active time, historical praise collection data and historical purchasing behavior data, so that more diversified references are provided for describing the user.

The preprocessing operation comprises the following steps: the collected source data sets are generally mixed and irregular, and usually have missing and redundant data, and preprocessing operation is needed to extract key content from the source data, and convert the data into a uniform and identifiable structure.

Firstly, data cleaning is carried out, noise and redundant data are removed, data are screened and checked, data are standardized, missing values are supplemented, and the data dimension can be ensured to be in the same standard.

Data cleaning is performed by using python, and codes are added for data cleaning in order to ensure universality of codes by considering analysis of codes as network data.

The following four types are defined in detail: the information of replying someone possibly occurring in the text is removed; because of the abundance of network languages, special characters are removed, various expression symbols, pigment characters and other special symbols, and excessive symbols can cause bad influence on word segmentation; excessive symbols in comments are removed, and the excessive symbols such as blank spaces and the like are not only useless, but also slow the efficiency of data processing; the case letters appearing in the data are unified, unlike academic, and are often represented by a meaning such as "FRIEND" and "FRIEND" in the daily network, and if no case conversion is performed, the system recognizes them as two different words, thereby interfering with classification.

Then word segmentation is carried out, the text is divided into a plurality of words with independent meanings according to reasonable rules, word habit of people is summarized, and the machine recognition method has good effect on recognizing new words.

Finally, deactivating words, filtering words with nonsensical categories, improving the retrieval efficiency, saving the loading memory, manually collecting the nonsensical words to form a deactivated word list, and continuously maintaining and expanding the deactivated word list in the data processing process.

Basic attribute data of the user is extracted.

And extracting basic attribute information of the user in the platform bottom layer data set, wherein the basic attribute information comprises gender, region, active time, mobile terminal model and occupation.

Features under basic attributes are usually structured data, such data are quantized in a mathematical statistics manner, and the number of each item of data and the ratio of each item of data to the whole are calculated.

In order to represent the normalization of the results, the index thereof is rank-quantized.

Sex is male and female, 1 is male, and 0 is female; the range of age interval is divided into several sections, and the world health organization of united nations is divided into children, young, middle-aged and elderly, wherein the age of 18 years and below is 1, young 19-23 is 2, 24-35 is 3, middle-aged 36-59 is 4, and elderly 60 and above is 5.

The regional labels are divided according to city levels, wherein one line is 1, two lines are 2, three lines and four lines are 3, and five lines and six lines are 4.

And extracting user behavior and interest characteristics according to the basic attribute data of the user.

And extracting historical purchase records and shopping cart information of the user to establish a behavior set of the user. And extracting the user behavior characteristics by using the c-LDA model.

The procedure for modeling by respectively importing interest and behavior datasets using the c-LDA model (Latent Dirichlet Allocation, implicit dirichlet allocation model) is as follows:

for the entire dataset Nm: poisson @

) The method comprises the steps of carrying out a first treatment on the surface of the For single text->

：Dirichlet（/>

) The method comprises the steps of carrying out a first treatment on the surface of the For interest topic z>

：Dirichlet（/>

) The method comprises the steps of carrying out a first treatment on the surface of the Generation of word w in the mth text:

according to

Text generates an m-term w topic: zm, n: multinormal ()>

) The method comprises the steps of carrying out a first treatment on the surface of the According to the generation: wm, n: multinormal ()>

). The above process is repeatedly performed on the text in the corpus.

From the model used, the variables can yield a joint distribution:

。

classification according to crowd characteristics is expressed as:

。

the distribution of individual words under each category throughout the expected set is expressed as:

。

wherein, the fixed value

Representing a total number of text in the dataset; />

Representing a single text; />

Representing the total number of words in the text; />

Express theme->

A word vector representing text; />

Representing the topic distribution->

Is->

Super-parameters of dirichlet distribution; />

Representing word distribution->

Is->

Super-parameters of dirichlet distribution, < ->

Representing classification by crowd characteristics->

And taking the historical consumers of the product as seed groups to extract the characteristics.

The extracted interesting behavior features are used to match and touch all users in the user pool.

The extracted feature topic dimension c of the LDA model is improved to ensure access to new user population, expressed as:

。

for touching the people, the person is->

For the image precision, < >>

For time (I)>

Is the dimension of the interesting behavior feature.

Introducing the number of people in touch

And image accuracy->

Adjusting the theme dimension +.>

。

Suppose that the touch person is just satisfied

The characteristic dimension is->

The image precision is->

The time feature dimension is->

It is ensured that the characteristic dimension is in the range +.>

。

The theme dimension can influence the magnitude of the oriented crowd pack and the precision of the user portrait, and when the dimension is too small, the magnitude of the touched crowd pack is small; when the dimension is too large, poor user portrait accuracy is caused.

Feature selection and cross fusion are carried out on the user purchasing behavior model and the interest preference model, and an end user portrait is generated and expressed as follows:

。

representing interest preference features->

Features representing purchasing behavior, a, c representing different sets of features in the interest preference portrait, +.>

Representing a cartesian product.

Constructing an end user representation as:

。

representing basic attribute features including age, sex, region,/->

Representing a purchasing behavior feature; />

Representing interest preference characteristics, parameters->

Indicates the time period +_>

Representing portrait feature dimension, < >>

Indicating the number of people touching.

Example 2

Referring to FIG. 2, for one embodiment of the present invention, there is provided a user portrayal construction system based on feature fusion of improved LDA, comprising:

the system comprises a data acquisition module 100, a data preprocessing module 200, a data mining module 300 and a data analysis module 400;

the data acquisition module 100 is a device for acquiring user data, and is configured to extract a user behavior model and a user interest model, and transmit the acquired data to the data preprocessing module 200;

the data preprocessing module 200 is a device for processing missing and redundant data, and is used for extracting key content from the data acquisition module 100 and converting the data into a unified and identifiable structure;

the data mining module 300 is a device for extracting user behaviors and interest features based on an improved LDA model, and generates a final user portrait by performing weight measurement of basic attribute features, behavior tags and interest tags on the data extracted by the data preprocessing module 200, and performing feature selection and cross fusion;

the data analysis module 400 is a device for providing advertisement recommendation dimension by analyzing portraits, considers the influence of time factors on user portraits, and analyzes consumer information in time intervals according to the product ordering time.

Example 3

One embodiment of the present invention, which is different from the first two embodiments, is:

the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Example 4

Referring to fig. 3-4, for one embodiment of the present invention, a user portrait construction method based on feature fusion of improved LDA is provided, and in order to verify the beneficial effects of the present invention, scientific demonstration is performed through economic benefit calculation and simulation experiments.

In this embodiment, a specific use experiment is performed on the method of the present invention, in a preset equal experimental environment, 3 groups of experiments are performed on the existing traditional method and the method of the present embodiment, and for the algorithm of the above embodiment, the integrity and accuracy of the user image under different conditions are used as variables for evaluating the construction of the user image, and the economic benefit of the algorithm is measured, and the experimental results are shown in the following tables 1-2:

table 1 redundant effects removal vs. table

Build time comparison	Experiment 1	Experiment 2	Experiment 3
				The method	8s	11s	9s
Conventional method	16s	14s	15s

TABLE 2 user portrayal accuracy vs. Table

User portrayal accuracy contrast	Experiment 1	Experiment 2	Experiment 3
				The method	97%	95%	96%
Conventional method	88%	90%	86%

The comparison experiment can confirm that the user portrait construction speed of the method provided by the invention is obviously improved, and compared with the prior art, the construction efficiency is obviously improved, and the time is reduced; meanwhile, the method has real-time performance and greatly reduces the error rate.

In actual use, the method meets the actual production operation requirements, strengthens the management and application of user portrait construction, reduces operation and maintenance management and control cost, improves service quality, reduces labor cost, improves dispatching command quality and effect, can achieve more excellent effect than the traditional method, and ensures the accuracy of construction data.

Through natural language processing technology and user portrait technology, new parameters are introduced to improve a probability topic model, and consumer basic attributes, behavior data and interest data are subjected to feature cross fusion. The invention not only improves the integrity and the accuracy of user portrait construction, deepens the understanding of the consumers purchasing behavior of merchants, and enables the merchants to conduct targeted optimization on products; and the method is beneficial to the establishment of accurate advertisement recommendation strategies of targeted groups by merchants, and improves advertisement recommendation accuracy and conversion benefits.

The influence of time factors is considered for constructing the user portrait, the probability theme model is improved by considering the factors of the contact number for extracting interest preference characteristics and purchasing behavior characteristics, and finally, the feature cross fusion is carried out on the probability theme model, so that the user portrait suitable for the E-commerce consumption platform is constructed.

It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims

1. The user portrait construction method based on the feature fusion of the improved LDA is characterized by comprising the following steps:

acquiring user data of a product consumer and preprocessing the data;

2. The improved LDA-based feature fusion user portrayal construction method of claim 1, wherein the obtaining product consumer user data comprises:

3. The user portrayal construction method based on feature fusion of improved LDA according to claim 1 or 2, wherein the preprocessing further comprises: performing data cleaning, word segmentation and word stopping;

4. The user portrayal construction method based on feature fusion of improved LDA of claim 1, wherein the basic attribute data comprises: gender, region, active time, mobile terminal model, occupation;

the improved LDA model is represented as,

c＝f(e，f，t)

wherein e is the number of people touching, f is the portrait precision, t is the time, and c is the feature dimension of the interesting behavior;

the feature dimension when meeting touch e is c ₀ The feature dimension is c when the image precision is f _f Then the feature dimension is guaranteed to range from [ c ] ₀ ，c _f ]。

5. The improved LDA-based feature fusion user portrayal construction method of claim 4, wherein the extracting user behavior and interest features is represented as:

wherein the fixed value M represents the total number of text in the dataset; m represents a single text; n represents the total number of words in the text; z represents the topic, w represents the word vector of the text; θ represents the subject distribution, and α is the hyper-parameter of the dirichlet distribution of θ; phi denotes the word distribution, beta is the hyper-parameter of the dirichlet distribution, p (w) _m |αγβ) means classification by crowd feature, and p (w|αγβ) means distribution of individual words under each category in the whole expectation set.

6. A user portrayal construction method based on feature fusion of improved LDA as recited in claim 1, wherein said constructing an end user portrayal comprises: selecting and fusing the user interest preference characteristics and the purchasing behavior characteristics to generate a user portrait model;

the feature fusion, denoted as,

wherein A represents interest preference characteristics, B represents purchasing behavior characteristics, a and c represent characteristics of different sets in interest preference portraits, y represents purchasing behavior characteristics, subscript numbers represent the number of text data, and x represents Cartesian products.

7. A user portrayal construction method based on feature fusion of improved LDA as claimed in claim 1 or 6, wherein said constructing an end user portrayal is expressed as:

P＝{B,A,I,t,c,e}

wherein B represents basic attribute characteristics including age, gender and region, A represents purchasing behavior characteristics; i represents interest preference characteristics, parameter t represents a time period, c represents portrait characteristic dimensions, and e represents the number of touches.

8. A user portrayal construction system based on feature fusion of improved LDA, comprising:

a data acquisition module (100), a data preprocessing module (200), a data mining module (300), a data analysis module (400);

the data acquisition module (100) is a device for acquiring user data, and is used for extracting a user behavior model and a user interest model and transmitting the acquired data to the preprocessing module (200);

the data preprocessing module (200) is a device for processing missing and redundant data, and is used for extracting key contents from the data acquisition module (100) and converting the data into a unified and identifiable structure;

the data mining module (300) is a device for extracting user behaviors and interest features based on an improved LDA model, and generates a final user portrait by carrying out weight measurement on basic attribute features, behavior tags and interest tags on the data extracted by the data preprocessing module (200) and carrying out feature selection and cross fusion;

the data analysis module (400) is a device for providing advertisement recommendation dimension through analysis of portraits, and is used for analyzing consumer information in time intervals according to product ordering time in consideration of the influence of time factors on user portraits.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.