CN116385037A - User portrait construction method and system based on feature fusion of improved LDA - Google Patents

User portrait construction method and system based on feature fusion of improved LDA Download PDF

Info

Publication number
CN116385037A
CN116385037A CN202310226593.2A CN202310226593A CN116385037A CN 116385037 A CN116385037 A CN 116385037A CN 202310226593 A CN202310226593 A CN 202310226593A CN 116385037 A CN116385037 A CN 116385037A
Authority
CN
China
Prior art keywords
data
user
portrayal
interest
feature fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310226593.2A
Other languages
Chinese (zh)
Inventor
曹亚东
马小宁
孙知信
孙哲
赵学健
宫婧
汪胡青
胡冰
徐玉华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202310226593.2A priority Critical patent/CN116385037A/en
Publication of CN116385037A publication Critical patent/CN116385037A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a user portrait construction method and a system based on feature fusion of improved LDA, which relate to the technical field of data mining and comprise the following steps: acquiring user data of a product consumer and preprocessing the data; extracting basic attribute data of a user, and extracting user behaviors and interest features according to the basic attribute data of the user; the basic attributes, behaviors and interests are fused to construct an end user portrayal, the portrayal is analyzed, and an advertisement recommendation dimension is provided. The user portrait construction method based on the feature fusion of the improved LDA carries out feature cross fusion on basic attributes, behavior data and interest data of consumers by improving a probability topic model. The invention not only improves the integrity and the accuracy of user portrait construction, deepens the understanding of the consumers purchasing behavior of the merchants, and ensures that the merchants optimize the products for rows; and the method is beneficial to the establishment of accurate advertisement recommendation strategies of targeted groups by merchants, and improves advertisement recommendation accuracy and conversion benefits.

Description

User portrait construction method and system based on feature fusion of improved LDA
Technical Field
The invention relates to the technical field of data mining, in particular to a user portrait construction method and system based on feature fusion of improved LDA.
Background
The user's life habit and consumption will are reflected to a certain extent by a large amount of data left in the internet surfing process through the mobile terminal, including identity data, access browsing data, purchasing data, social data and the like, and the user portrait is a virtual user image based on a network, is established on the basis of a large amount of user data and is processed through technologies such as data mining, machine learning, deep learning and the like, so that the user characteristics are displayed.
The user portrait is constructed, so that merchants can be helped to better know own consumption clients, crowd orientation is carried out in the advertisement putting process, and target crowds are found to be accurately recommended and marketed.
At present, a user portrait is constructed by commonly using the following models, each model has advantages, based on a mathematical statistics method, the user characteristics are analyzed by carrying out quantization processing on data values, and the user portrait is good at processing structured type data, but can not be analyzed on unstructured data such as images, audio texts and the like; based on a vector space model method, unstructured data is considered, and a user portrait is represented in a vector form; the method based on the topic model is suitable for processing unstructured data types, a model with lower dimensionality is used for representing a user to a certain extent, the model can segment text data into words, text topics and topic probability distribution proportion thereof are determined according to word topic probability, the model has important application in the field of natural language processing, LDA (laser direct structuring) represents texts as mixed random distribution of a plurality of topics, and the topics are represented as mixed probability distribution of a plurality of words, so that the method is a typical word bag model; the method based on ontology can make a label system describing user portraits more hierarchical and relevant, but excessively depends on expert definition between term logics; the method based on the neural network simulates an animal thinking mode and combines animal neural behaviors, and has the advantages that the model can perform distributed storage, large-scale parallel data processing and nonlinear operation; today, where networks are rapidly developed, massive heterogeneous data generated by users in network interaction cannot be described in a single manner, such as words issued by the users, browsed videos, concerned people, purchased goods, and the like; with the application of data mining, machine learning and other technologies, not only structured type data, but also unstructured types are subjected to data mining to show greater value.
According to the user portrait construction method based on the feature fusion of the improved LDA, multiple-aspect data are collected, different modeling methods are adopted for different types of data, basic attributes, interests and behavior models are fused to represent user portraits of product consumers, the integrity and the accuracy of the portraits are improved, and the user portraits suitable for an electronic commerce consumption platform are constructed.
Disclosure of Invention
This section is intended to outline some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description summary and in the title of the application, to avoid obscuring the purpose of this section, the description summary and the title of the invention, which should not be used to limit the scope of the invention.
The present invention has been made in view of the above-described problems.
Therefore, the technical problems solved by the invention are as follows: how to deepen the understanding of the merchant on the basic attributes, the behavioral aspects and the interest psychological aspects of the consumer, optimize the aspect of the product facing the customer, help the merchant to formulate an accurate advertisement recommendation strategy of the targeted crowd, and improve the advertisement recommendation accuracy and conversion benefit.
In order to solve the technical problems, the invention provides the following technical scheme: a user portrait construction method based on feature fusion of improved LDA comprises the following steps:
acquiring user data of a product consumer and preprocessing the data;
extracting basic attribute data of a user, and extracting user behaviors and interest features according to the basic attribute data of the user;
the basic attributes, behaviors and interests are fused to construct an end user portrayal, the portrayal is analyzed, and an advertisement recommendation dimension is provided.
As a preferred scheme of the improved LDA-based feature fusion user portrayal construction method of the present invention, the obtaining product consumer user data comprises:
collecting base layer attribute information of a platform user, and collecting dynamic information of the user on the platform;
the information comprises gender, region, age, online active time, historical praise collection data and historical purchase behavior data;
the preprocessing comprises splitting and deduplicating the collected data related to the user behaviors, and extracting key content from the source data.
As a preferred scheme of the user portrait construction method based on improved LDA feature fusion of the present invention, the preprocessing further includes: performing data cleaning, word segmentation and word stopping;
the data cleaning comprises removing noise and redundant data, and screening and checking the data; standardized data, namely supplementing the missing value, and ensuring that the data dimension can be in the same standard;
the word segmentation comprises the steps of dividing a text into a plurality of words with independent meanings according to reasonable rules;
the de-disabling of words includes filtering words that are nonsensical to classification, and maintaining and expanding the disabling vocabulary continuously during data processing.
As a preferable scheme of the user portrait construction method based on improved LDA feature fusion of the present invention, the basic attribute data includes: gender, region, active time, mobile terminal model, occupation;
extracting user behavior and interest characteristics comprises extracting user behavior characteristics and interest preference characteristics based on an improved LDA model;
the improved LDA model is represented as,
Figure SMS_1
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_2
for touching the people, the person is->
Figure SMS_3
For the image precision, < >>
Figure SMS_4
For time (I)>
Figure SMS_5
The dimension is the feature dimension of the interesting behavior;
when meeting the requirement of touching people
Figure SMS_6
The characteristic dimension is->
Figure SMS_7
The image precision is->
Figure SMS_8
The time feature dimension is->
Figure SMS_9
It is ensured that the characteristic dimension is in the range +.>
Figure SMS_10
As a preferable scheme of the user portrait construction method based on the feature fusion of the improved LDA, the method for extracting the user behavior and the interest features is expressed as follows:
Figure SMS_11
classification according to the characteristics of the population is expressed as,
Figure SMS_12
the distribution of individual words under each category, throughout the expected set, is represented as,
Figure SMS_13
wherein, the fixed value
Figure SMS_17
Representing a total number of text in the dataset; />
Figure SMS_19
Representing a single text; />
Figure SMS_23
Representing the total number of words in the text; />
Figure SMS_15
Express theme->
Figure SMS_20
A word vector representing text; />
Figure SMS_24
Representing the topic distribution->
Figure SMS_26
Is->
Figure SMS_14
Super-parameters of dirichlet distribution; />
Figure SMS_21
Representing word distribution->
Figure SMS_22
Is->
Figure SMS_25
Super-parameters of dirichlet distribution, < ->
Figure SMS_16
Representing classification by crowd characteristics->
Figure SMS_18
Representing the distribution of individual words under each category throughout the expected set.
As a preferred embodiment of the method for constructing a user portrait based on feature fusion of improved LDA according to the present invention, the constructing an end user portrait includes: selecting and fusing the user interest preference characteristics and the purchasing behavior characteristics to generate a user portrait model;
the feature fusion, denoted as,
Figure SMS_27
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_28
representing interest preference features->
Figure SMS_29
Representing the characteristics of purchasing behavior, a, c representing the characteristics of different sets in the interest preference portrait,/>
Figure SMS_30
indicating the purchase behavior feature, the subscript number indicates the number of each text data, +.>
Figure SMS_31
Representing a cartesian product.
As a preferred embodiment of the user portrayal construction method based on feature fusion of improved LDA according to the invention, the construction of the end user portrayal is represented as:
Figure SMS_32
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_33
representing basic attribute features including age, sex, region,/->
Figure SMS_34
Representing a purchasing behavior feature; />
Figure SMS_35
Representing interest preference characteristics, parameters->
Figure SMS_36
Indicates the time period +_>
Figure SMS_37
Representing portrait feature dimension, < >>
Figure SMS_38
Indicating the number of people touching.
Therefore, the technical problems solved by the invention are as follows: how to extract key content from source data, transform the data into unified and identifiable structure, and effectively extract the most relevant features of product consumer groups on the premise of reaching a certain degree, and remove redundancy.
In order to solve the technical problems, the invention provides the following technical scheme: a user portrayal construction system based on feature fusion of improved LDA, comprising:
the system comprises a data acquisition module, a data preprocessing module, a data mining module and a data analysis module;
as a preferable scheme of the user portrait construction system based on the feature fusion of the improved LDA, the data acquisition module is a device for acquiring user data, and is used for extracting a user behavior model and a user interest model and transmitting the acquired data to the data preprocessing module;
as a preferable scheme of the user portrait construction system based on the improved LDA feature fusion, the data preprocessing module is a device for processing missing and redundant data and is used for extracting key contents from the data acquisition module and converting the data into a unified and identifiable structure;
as a preferred scheme of the user portrait construction system based on the feature fusion of the improved LDA, the data mining module is a device for extracting user behaviors and interest features based on an improved LDA model, and final user portraits are generated by carrying out weight measurement on basic attribute features, behavior tags and interest tags on data extracted by a data preprocessing module, and carrying out feature selection and cross fusion;
as a preferable scheme of the user portrait construction system based on the feature fusion of the improved LDA, the data analysis module is a device for providing advertisement recommendation dimension through analyzing portraits, considers the influence of time factors on the user portraits, and analyzes consumer information in time intervals according to the ordering time of products.
A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method as described above when executing the computer program.
A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method as described above.
The invention has the beneficial effects that: the user portrait construction method based on the feature fusion of the improved LDA provided by the invention adopts data mining and machine learning to carry out user modeling, collects basic attribute data of users, records and analyzes behaviors and interest features of the users by utilizing a platform of the users, improves the integrity and accuracy of the user portrait, extracts features from different structural data of basic attributes, purchasing behavior data and interest preference data of crowds, constructs a new factor optimization probability theme model, can effectively extract the most relevant features of the product consumption crowds on the premise of obtaining certain touch, removes redundancy, and uses data mining and natural language processing technology to cross-fuse the interest preference features of behaviors to construct consumer crowd portraits, thereby improving the integrity and accuracy of the user portraits and providing a new idea for advertisement oriented crowds.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:
FIG. 1 is a general flow chart of a user portrayal construction method based on feature fusion of improved LDA according to an embodiment of the present invention;
FIG. 2 is a block diagram of a user portrayal construction system based on feature fusion with improved LDA according to a second embodiment of the present invention;
FIG. 3 is a comparison chart of removing redundant effects in a user portrait construction method based on feature fusion of improved LDA according to a fourth embodiment of the present invention;
FIG. 4 is a graph showing user portrait accuracy contrast of a user portrait construction method based on feature fusion of improved LDA according to a fourth embodiment of the present invention.
Description of the embodiments
So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
While the embodiments of the present invention have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the invention. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.
Also in the description of the present invention, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Example 1
Referring to fig. 1, for one embodiment of the present invention, there is provided a user portrait construction method based on feature fusion of improved LDA, including:
and acquiring related data of a historical consumer of a certain product on a consumption platform and performing preprocessing operation.
The construction of the basic attribute tag plays a role in constructing a user portrait, and the data is relatively easy to acquire although the identification effect on the nuances among users is poor, and the method has a better effect on distinguishing user groups under the condition that no other interactive information exists for a new user entering for the first time.
The method comprises the steps that firstly, base layer attribute information of platform users is grabbed, a large number of mainstream platforms at present need to fill specific information such as gender, region and age when the users register, the data are basic and easy to acquire, but because of complex network environments, users pay more and more attention to personal privacy, although the platform guides the users to use real identities, the users use false or fill in more simple information, and the data acquired by the portions need to be preprocessed.
Secondly, dynamic information of the user on the platform needs to be collected, including online active time, historical praise collection data and historical purchasing behavior data, so that more diversified references are provided for describing the user.
The preprocessing operation comprises the following steps: the collected source data sets are generally mixed and irregular, and usually have missing and redundant data, and preprocessing operation is needed to extract key content from the source data, and convert the data into a uniform and identifiable structure.
Firstly, data cleaning is carried out, noise and redundant data are removed, data are screened and checked, data are standardized, missing values are supplemented, and the data dimension can be ensured to be in the same standard.
Data cleaning is performed by using python, and codes are added for data cleaning in order to ensure universality of codes by considering analysis of codes as network data.
The following four types are defined in detail: the information of replying someone possibly occurring in the text is removed; because of the abundance of network languages, special characters are removed, various expression symbols, pigment characters and other special symbols, and excessive symbols can cause bad influence on word segmentation; excessive symbols in comments are removed, and the excessive symbols such as blank spaces and the like are not only useless, but also slow the efficiency of data processing; the case letters appearing in the data are unified, unlike academic, and are often represented by a meaning such as "FRIEND" and "FRIEND" in the daily network, and if no case conversion is performed, the system recognizes them as two different words, thereby interfering with classification.
Then word segmentation is carried out, the text is divided into a plurality of words with independent meanings according to reasonable rules, word habit of people is summarized, and the machine recognition method has good effect on recognizing new words.
Finally, deactivating words, filtering words with nonsensical categories, improving the retrieval efficiency, saving the loading memory, manually collecting the nonsensical words to form a deactivated word list, and continuously maintaining and expanding the deactivated word list in the data processing process.
Basic attribute data of the user is extracted.
And extracting basic attribute information of the user in the platform bottom layer data set, wherein the basic attribute information comprises gender, region, active time, mobile terminal model and occupation.
Features under basic attributes are usually structured data, such data are quantized in a mathematical statistics manner, and the number of each item of data and the ratio of each item of data to the whole are calculated.
In order to represent the normalization of the results, the index thereof is rank-quantized.
Sex is male and female, 1 is male, and 0 is female; the range of age interval is divided into several sections, and the world health organization of united nations is divided into children, young, middle-aged and elderly, wherein the age of 18 years and below is 1, young 19-23 is 2, 24-35 is 3, middle-aged 36-59 is 4, and elderly 60 and above is 5.
The regional labels are divided according to city levels, wherein one line is 1, two lines are 2, three lines and four lines are 3, and five lines and six lines are 4.
And extracting user behavior and interest characteristics according to the basic attribute data of the user.
And extracting historical purchase records and shopping cart information of the user to establish a behavior set of the user. And extracting the user behavior characteristics by using the c-LDA model.
The procedure for modeling by respectively importing interest and behavior datasets using the c-LDA model (Latent Dirichlet Allocation, implicit dirichlet allocation model) is as follows:
for the entire dataset Nm: poisson @
Figure SMS_39
) The method comprises the steps of carrying out a first treatment on the surface of the For single text->
Figure SMS_40
:Dirichlet(/>
Figure SMS_41
) The method comprises the steps of carrying out a first treatment on the surface of the For interest topic z>
Figure SMS_42
:Dirichlet(/>
Figure SMS_43
) The method comprises the steps of carrying out a first treatment on the surface of the Generation of word w in the mth text:
according to
Figure SMS_44
Text generates an m-term w topic: zm, n: multinormal ()>
Figure SMS_45
) The method comprises the steps of carrying out a first treatment on the surface of the According to the generation: wm, n: multinormal ()>
Figure SMS_46
). The above process is repeatedly performed on the text in the corpus.
From the model used, the variables can yield a joint distribution:
Figure SMS_47
classification according to crowd characteristics is expressed as:
Figure SMS_48
the distribution of individual words under each category throughout the expected set is expressed as:
Figure SMS_49
wherein, the fixed value
Figure SMS_52
Representing a total number of text in the dataset; />
Figure SMS_55
Representing a single text; />
Figure SMS_59
Representing the total number of words in the text; />
Figure SMS_53
Express theme->
Figure SMS_57
A word vector representing text; />
Figure SMS_60
Representing the topic distribution->
Figure SMS_62
Is->
Figure SMS_50
Super-parameters of dirichlet distribution; />
Figure SMS_54
Representing word distribution->
Figure SMS_58
Is->
Figure SMS_61
Super-parameters of dirichlet distribution, < ->
Figure SMS_51
Representing classification by crowd characteristics->
Figure SMS_56
Representing the distribution of individual words under each category throughout the expected set.
And taking the historical consumers of the product as seed groups to extract the characteristics.
The extracted interesting behavior features are used to match and touch all users in the user pool.
The extracted feature topic dimension c of the LDA model is improved to ensure access to new user population, expressed as:
Figure SMS_63
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_64
for touching the people, the person is->
Figure SMS_65
For the image precision, < >>
Figure SMS_66
For time (I)>
Figure SMS_67
Is the dimension of the interesting behavior feature.
Introducing the number of people in touch
Figure SMS_68
And image accuracy->
Figure SMS_69
Adjusting the theme dimension +.>
Figure SMS_70
Suppose that the touch person is just satisfied
Figure SMS_71
The characteristic dimension is->
Figure SMS_72
The image precision is->
Figure SMS_73
The time feature dimension is->
Figure SMS_74
It is ensured that the characteristic dimension is in the range +.>
Figure SMS_75
The theme dimension can influence the magnitude of the oriented crowd pack and the precision of the user portrait, and when the dimension is too small, the magnitude of the touched crowd pack is small; when the dimension is too large, poor user portrait accuracy is caused.
The basic attributes, behaviors and interests are fused to construct an end user portrayal, the portrayal is analyzed, and an advertisement recommendation dimension is provided.
Feature selection and cross fusion are carried out on the user purchasing behavior model and the interest preference model, and an end user portrait is generated and expressed as follows:
Figure SMS_76
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_77
representing interest preference features->
Figure SMS_78
Features representing purchasing behavior, a, c representing different sets of features in the interest preference portrait, +.>
Figure SMS_79
Indicating the purchase behavior feature, the subscript number indicates the number of each text data, +.>
Figure SMS_80
Representing a cartesian product.
Constructing an end user representation as:
Figure SMS_81
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_82
representing basic attribute features including age, sex, region,/->
Figure SMS_83
Representing a purchasing behavior feature; />
Figure SMS_84
Representing interest preference characteristics, parameters->
Figure SMS_85
Indicates the time period +_>
Figure SMS_86
Representing portrait feature dimension, < >>
Figure SMS_87
Indicating the number of people touching.
Example 2
Referring to FIG. 2, for one embodiment of the present invention, there is provided a user portrayal construction system based on feature fusion of improved LDA, comprising:
the system comprises a data acquisition module 100, a data preprocessing module 200, a data mining module 300 and a data analysis module 400;
the data acquisition module 100 is a device for acquiring user data, and is configured to extract a user behavior model and a user interest model, and transmit the acquired data to the data preprocessing module 200;
the data preprocessing module 200 is a device for processing missing and redundant data, and is used for extracting key content from the data acquisition module 100 and converting the data into a unified and identifiable structure;
the data mining module 300 is a device for extracting user behaviors and interest features based on an improved LDA model, and generates a final user portrait by performing weight measurement of basic attribute features, behavior tags and interest tags on the data extracted by the data preprocessing module 200, and performing feature selection and cross fusion;
the data analysis module 400 is a device for providing advertisement recommendation dimension by analyzing portraits, considers the influence of time factors on user portraits, and analyzes consumer information in time intervals according to the product ordering time.
Example 3
One embodiment of the present invention, which is different from the first two embodiments, is:
the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Example 4
Referring to fig. 3-4, for one embodiment of the present invention, a user portrait construction method based on feature fusion of improved LDA is provided, and in order to verify the beneficial effects of the present invention, scientific demonstration is performed through economic benefit calculation and simulation experiments.
In this embodiment, a specific use experiment is performed on the method of the present invention, in a preset equal experimental environment, 3 groups of experiments are performed on the existing traditional method and the method of the present embodiment, and for the algorithm of the above embodiment, the integrity and accuracy of the user image under different conditions are used as variables for evaluating the construction of the user image, and the economic benefit of the algorithm is measured, and the experimental results are shown in the following tables 1-2:
table 1 redundant effects removal vs. table
Build time comparison Experiment 1 Experiment 2 Experiment 3
The method 8s 11s 9s
Conventional method 16s 14s 15s
TABLE 2 user portrayal accuracy vs. Table
User portrayal accuracy contrast Experiment 1 Experiment 2 Experiment 3
The method 97% 95% 96%
Conventional method 88% 90% 86%
The comparison experiment can confirm that the user portrait construction speed of the method provided by the invention is obviously improved, and compared with the prior art, the construction efficiency is obviously improved, and the time is reduced; meanwhile, the method has real-time performance and greatly reduces the error rate.
In actual use, the method meets the actual production operation requirements, strengthens the management and application of user portrait construction, reduces operation and maintenance management and control cost, improves service quality, reduces labor cost, improves dispatching command quality and effect, can achieve more excellent effect than the traditional method, and ensures the accuracy of construction data.
Through natural language processing technology and user portrait technology, new parameters are introduced to improve a probability topic model, and consumer basic attributes, behavior data and interest data are subjected to feature cross fusion. The invention not only improves the integrity and the accuracy of user portrait construction, deepens the understanding of the consumers purchasing behavior of merchants, and enables the merchants to conduct targeted optimization on products; and the method is beneficial to the establishment of accurate advertisement recommendation strategies of targeted groups by merchants, and improves advertisement recommendation accuracy and conversion benefits.
The influence of time factors is considered for constructing the user portrait, the probability theme model is improved by considering the factors of the contact number for extracting interest preference characteristics and purchasing behavior characteristics, and finally, the feature cross fusion is carried out on the probability theme model, so that the user portrait suitable for the E-commerce consumption platform is constructed.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims (10)

1. The user portrait construction method based on the feature fusion of the improved LDA is characterized by comprising the following steps:
acquiring user data of a product consumer and preprocessing the data;
extracting basic attribute data of a user, and extracting user behaviors and interest features according to the basic attribute data of the user;
the basic attributes, behaviors and interests are fused to construct an end user portrayal, the portrayal is analyzed, and an advertisement recommendation dimension is provided.
2. The improved LDA-based feature fusion user portrayal construction method of claim 1, wherein the obtaining product consumer user data comprises:
collecting base layer attribute information of a platform user, and collecting dynamic information of the user on the platform;
the information comprises gender, region, age, online active time, historical praise collection data and historical purchase behavior data;
the preprocessing comprises splitting and deduplicating the collected data related to the user behaviors, and extracting key content from the source data.
3. The user portrayal construction method based on feature fusion of improved LDA according to claim 1 or 2, wherein the preprocessing further comprises: performing data cleaning, word segmentation and word stopping;
the data cleaning comprises removing noise and redundant data, and screening and checking the data; standardized data, namely supplementing the missing value, and ensuring that the data dimension can be in the same standard;
the word segmentation comprises the steps of dividing a text into a plurality of words with independent meanings according to reasonable rules;
the de-disabling of words includes filtering words that are nonsensical to classification, and maintaining and expanding the disabling vocabulary continuously during data processing.
4. The user portrayal construction method based on feature fusion of improved LDA of claim 1, wherein the basic attribute data comprises: gender, region, active time, mobile terminal model, occupation;
extracting user behavior and interest characteristics comprises extracting user behavior characteristics and interest preference characteristics based on an improved LDA model;
the improved LDA model is represented as,
c=f(e,f,t)
wherein e is the number of people touching, f is the portrait precision, t is the time, and c is the feature dimension of the interesting behavior;
the feature dimension when meeting touch e is c 0 The feature dimension is c when the image precision is f f Then the feature dimension is guaranteed to range from [ c ] 0 ,c f ]。
5. The improved LDA-based feature fusion user portrayal construction method of claim 4, wherein the extracting user behavior and interest features is represented as:
Figure QLYQS_1
classification according to the characteristics of the population is expressed as,
Figure QLYQS_2
the distribution of individual words under each category, throughout the expected set, is represented as,
Figure QLYQS_3
wherein the fixed value M represents the total number of text in the dataset; m represents a single text; n represents the total number of words in the text; z represents the topic, w represents the word vector of the text; θ represents the subject distribution, and α is the hyper-parameter of the dirichlet distribution of θ; phi denotes the word distribution, beta is the hyper-parameter of the dirichlet distribution, p (w) m |αγβ) means classification by crowd feature, and p (w|αγβ) means distribution of individual words under each category in the whole expectation set.
6. A user portrayal construction method based on feature fusion of improved LDA as recited in claim 1, wherein said constructing an end user portrayal comprises: selecting and fusing the user interest preference characteristics and the purchasing behavior characteristics to generate a user portrait model;
the feature fusion, denoted as,
Figure QLYQS_4
wherein A represents interest preference characteristics, B represents purchasing behavior characteristics, a and c represent characteristics of different sets in interest preference portraits, y represents purchasing behavior characteristics, subscript numbers represent the number of text data, and x represents Cartesian products.
7. A user portrayal construction method based on feature fusion of improved LDA as claimed in claim 1 or 6, wherein said constructing an end user portrayal is expressed as:
P={B,A,I,t,c,e}
wherein B represents basic attribute characteristics including age, gender and region, A represents purchasing behavior characteristics; i represents interest preference characteristics, parameter t represents a time period, c represents portrait characteristic dimensions, and e represents the number of touches.
8. A user portrayal construction system based on feature fusion of improved LDA, comprising:
a data acquisition module (100), a data preprocessing module (200), a data mining module (300), a data analysis module (400);
the data acquisition module (100) is a device for acquiring user data, and is used for extracting a user behavior model and a user interest model and transmitting the acquired data to the preprocessing module (200);
the data preprocessing module (200) is a device for processing missing and redundant data, and is used for extracting key contents from the data acquisition module (100) and converting the data into a unified and identifiable structure;
the data mining module (300) is a device for extracting user behaviors and interest features based on an improved LDA model, and generates a final user portrait by carrying out weight measurement on basic attribute features, behavior tags and interest tags on the data extracted by the data preprocessing module (200) and carrying out feature selection and cross fusion;
the data analysis module (400) is a device for providing advertisement recommendation dimension through analysis of portraits, and is used for analyzing consumer information in time intervals according to product ordering time in consideration of the influence of time factors on user portraits.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202310226593.2A 2023-03-10 2023-03-10 User portrait construction method and system based on feature fusion of improved LDA Pending CN116385037A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310226593.2A CN116385037A (en) 2023-03-10 2023-03-10 User portrait construction method and system based on feature fusion of improved LDA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310226593.2A CN116385037A (en) 2023-03-10 2023-03-10 User portrait construction method and system based on feature fusion of improved LDA

Publications (1)

Publication Number Publication Date
CN116385037A true CN116385037A (en) 2023-07-04

Family

ID=86964802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310226593.2A Pending CN116385037A (en) 2023-03-10 2023-03-10 User portrait construction method and system based on feature fusion of improved LDA

Country Status (1)

Country Link
CN (1) CN116385037A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117271905A (en) * 2023-11-21 2023-12-22 杭州小策科技有限公司 Crowd image-based lateral demand analysis method and system
CN117455555A (en) * 2023-12-25 2024-01-26 厦门理工学院 Big data-based electric business portrait analysis method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117271905A (en) * 2023-11-21 2023-12-22 杭州小策科技有限公司 Crowd image-based lateral demand analysis method and system
CN117271905B (en) * 2023-11-21 2024-02-09 杭州小策科技有限公司 Crowd image-based lateral demand analysis method and system
CN117455555A (en) * 2023-12-25 2024-01-26 厦门理工学院 Big data-based electric business portrait analysis method and system
CN117455555B (en) * 2023-12-25 2024-03-08 厦门理工学院 Big data-based electric business portrait analysis method and system

Similar Documents

Publication Publication Date Title
Rejikumar Antecedents of green purchase behaviour: An examination of moderating role of green wash fear
CN116385037A (en) User portrait construction method and system based on feature fusion of improved LDA
Malik et al. Applied unsupervised learning with R: Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA
Chang et al. Research on detection methods based on Doc2vec abnormal comments
Darko et al. Modeling customer satisfaction through online reviews: A FlowSort group decision model under probabilistic linguistic settings
CN112990973A (en) Online shop portrait construction method and system
Li et al. Exploring the technology emergence related to artificial intelligence: A perspective of coupling analyses
CN114266443A (en) Data evaluation method and device, electronic equipment and storage medium
Tsao et al. A machine-learning based approach to measuring constructs through text analysis
CN116468460A (en) Consumer finance customer image recognition system and method based on artificial intelligence
Zhang et al. Requirement analysis and service optimization of multiple category fresh products in online retailing using importance-Kano analysis
Jayawardena et al. Artificial intelligence (AI)-based market intelligence and customer insights
Gerlich et al. Artificial intelligence as toolset for analysis of public opinion and social interaction in marketing: identification of micro and nano influencers
Koolena et al. Online book reviews and the computational modelling of reading impact
Wang et al. Image or text: Which one is more Influential? A deep-learning approach for visual and textual data analysis in the digital economy
Dubovikov Managing Data Science: Effective strategies to manage data science projects and build a sustainable team
CN112115712A (en) Topic-based group emotion analysis method
CN114528416A (en) Enterprise public opinion environment monitoring method and system based on big data
Ren et al. Check for updates Research on Digital Transformation and Upgrading of Fashion Industry Under the Background of Big Data
Siahaan et al. SIX BOOKS IN ONE: Classification, Prediction, and Sentiment Analysis Using Machine Learning and Deep Learning with Python GUI
Antonopoulou et al. The Role of Brand Personality in e-Marketing: A Computational Approach
Vadloori et al. Exploratory and sentiment analysis of Netflix data
Pokrovskii Study on customer behavior analysis using machine learning
Calderón-Fajardo et al. Understanding destination brand experience through data mining and machine learning
Jabr et al. What are they saying? A methodology for extracting information from online reviews

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination