CN113553501A - Method and device for user portrait prediction based on artificial intelligence - Google Patents

Method and device for user portrait prediction based on artificial intelligence Download PDF

Info

Publication number
CN113553501A
CN113553501A CN202110744483.6A CN202110744483A CN113553501A CN 113553501 A CN113553501 A CN 113553501A CN 202110744483 A CN202110744483 A CN 202110744483A CN 113553501 A CN113553501 A CN 113553501A
Authority
CN
China
Prior art keywords
user
training
gbdt
groups
sets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110744483.6A
Other languages
Chinese (zh)
Inventor
张阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Heze Mingge Network Technology Co ltd
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110744483.6A priority Critical patent/CN113553501A/en
Publication of CN113553501A publication Critical patent/CN113553501A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a user portrait prediction method based on artificial intelligence, which comprises the following steps: collecting operation behavior data flow of a user in a commodity purchasing process; extracting features from the operation behavior data stream to generate a training sample; training the training sample based on a gradient lifting tree GBDT and a logistic regression LR fusion model, and outputting a trained portrait prediction parameter; predicting the user representation based on the representation prediction parameters and natural attribute parameters of the user.

Description

Method and device for user portrait prediction based on artificial intelligence
Technical Field
The application relates to the technical field of electronic commerce, in particular to a method and a device for user portrait prediction based on artificial intelligence.
Background
In recent years, with the continuous development and rapid improvement of the internet, the traffic cost is more and more expensive, the new cost is higher and more, the loyalty of old customers is lower and more, and under the situation, the brand can be continuously increased only by finely operating each client. The refined operation in the e-commerce field refers to matching different services and contents for users with different requirements through user grouping, so that the personalized requirements of the users are met. The user portrait can describe user characteristics from a plurality of dimensions, different product types, user portrait dimensions are also different, portrait information is faster and more accurate, and the accurate actual user group of location of the e-commerce can be helped.
User portrait information is usually subjected to data mining and modeling from real-time operation flow and offline operation data, different user tags are output through models to describe characteristics of users, a current commonly-used data mining algorithm is cluster analysis K-means, however, the K-means method is based on a large amount of historical data to perform cluster analysis, the user portrait prediction capability is weak, and the next commodity purchasing behavior of the users cannot be well predicted and recommended.
Disclosure of Invention
The embodiment of the application provides a method and a device for user portrait prediction based on artificial intelligence, which are used for solving the problem of poor user portrait prediction capability in the prior art.
The embodiment of the invention provides a user portrait prediction method based on artificial intelligence, which comprises the following steps:
collecting operation behavior data flow of a user in a commodity purchasing process;
extracting features from the operation behavior data stream to generate a training sample;
training the training sample based on a gradient lifting tree GBDT and a logistic regression LR fusion model, and outputting a trained portrait prediction parameter;
predicting the user representation based on the representation prediction parameters and natural attribute parameters of the user.
Optionally, the training sample based on the gradient descent tree GBDT and the logistic regression LR fusion model, and outputting the trained portrait prediction parameters includes:
dividing the training samples into N groups of training feature sets according to N commodity categories, wherein N is a positive integer greater than 1;
dividing each group of training feature sets into three training sets according to the user clicking operation behavior feature, the user purchasing operation behavior feature and the user and customer service conversation behavior feature, and respectively establishing corresponding GBDT trees;
respectively traversing GBDT trees corresponding to the three training sets, and outputting three groups of GBDT training sets;
and taking the three groups of GBDT training sets as input of an LR model, training the three groups of GBDT training sets by using the LR model, and outputting the image prediction parameters after training.
Optionally, the separately traversing the GBDT trees corresponding to the three training sets includes:
setting a training set x, the depth D of a loss function L, GBDT tree and the iteration number M, and initializing a weak learner;
calculating the negative gradient r for each sample in the training setim,i=1,2,...N;
Using the negative gradient as a new sample, and using the data set (x)i,rim) Determining a second GBDT tree f as training data for the GBDT treem(x) Wherein the leaf node region corresponding to the second GBDT tree is RjmJ is 1,2. Wherein J is the number of leaf nodes of the second GBDT tree;
calculating the best fitting value of the leaf node area;
and updating the strong learner, and obtaining the final learner after M rounds of iteration.
Optionally, the training of the three groups of GBDT training sets using an LR model includes:
setting a loss function L1, a step length a and a maximum iteration number MmaxThe error limit t;
initializing an image prediction parameter c ═ c0,c1,c2...ck};
Inputting the three groups of GBDT training sets to carry out portrait prediction parameter iteration, respectively judging whether the errors of the three groups of GBDT training sets are smaller than t in each iteration process, terminating the training if the errors of the three groups of GBDT training sets are smaller than t, and updating the portrait prediction parameter c if the errors of the three groups of GBDT training sets are larger than or equal to t;
and outputting the final portrait prediction parameter c after the iteration is finished.
Optionally, the portrait prediction parameter c is used for predicting whether the user clicks a commodity page, and/or predicting whether the user purchases commodities, and/or predicting whether the user communicates with the customer service frequently.
Optionally, the extracting features from the operation behavior data stream includes:
acquiring a Pearson correlation coefficient between every two parameters;
and if the Pearson correlation coefficient exceeds a preset threshold value, deleting one of the two parameters corresponding to the Pearson correlation coefficient.
Optionally, the natural attributes of the user include a user gender, a user age, and a user interest, and predicting the user representation based on the representation prediction parameter and the natural attribute parameter of the user includes:
acquiring a user portrait template library, wherein the user portrait template library comprises a plurality of user portraits and corresponding characteristic values;
setting different weights for different portrait prediction parameters and natural attribute parameters of a user, and performing weighting operation to determine a user portrait characteristic value;
and traversing the characteristic values in the user image template library, and determining the characteristic value with the minimum difference value with the characteristic value of the user image in the template library, so that the user image corresponding to the characteristic value in the template library is the predicted user image.
The embodiment of the invention also provides a device for user portrait prediction based on artificial intelligence, which comprises:
the acquisition unit is used for acquiring the operation behavior data flow of the user in the commodity purchasing process;
the characteristic extraction unit is used for extracting characteristics from the operation behavior data stream to generate a training sample;
the training unit is used for training the training sample based on a gradient lifting tree GBDT and a logistic regression LR fusion model and outputting a trained portrait prediction parameter;
a prediction unit to predict the user representation based on the representation prediction parameters and the natural attribute parameters of the user.
Optionally, the training unit trains the training sample based on a gradient descent tree GBDT and a logistic regression LR fusion model, and outputs the trained portrait prediction parameters, including:
dividing the training samples into N groups of training feature sets according to N commodity categories, wherein N is a positive integer greater than 1;
dividing each group of training feature sets into three training sets according to the user clicking operation behavior feature, the user purchasing operation behavior feature and the user and customer service conversation behavior feature, and respectively establishing corresponding GBDT trees;
respectively traversing GBDT trees corresponding to the three training sets, and outputting three groups of GBDT training sets;
and taking the three groups of GBDT training sets as input of an LR model, training the three groups of GBDT training sets by using the LR model, and outputting the image prediction parameters after training.
The embodiment of the invention also provides a device which comprises a memory and a processor, wherein the memory is stored with computer executable instructions, and the processor realizes the method when running the computer executable instructions on the memory.
According to the method provided by the embodiment of the invention, the operation behavior of the user is analyzed through the GBDT and LR fusion model, the user portrait prediction parameter is output, the user portrait prediction parameter and the user natural attribute parameter are combined and analyzed, the user portrait prediction is finally obtained, and the user portrait prediction capability and the user portrait prediction accuracy are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below.
FIG. 1 is a schematic flow diagram of artificial intelligence based user profile prediction in one embodiment;
FIG. 2 is a schematic diagram of the GBDT + LR model structure in one embodiment;
FIG. 3 is a diagram of an artificial intelligence based user representation prediction apparatus in one embodiment;
FIG. 4 is a diagram illustrating the hardware components of the apparatus according to one embodiment.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
FIG. 1 is a flow chart of artificial intelligence based user profile prediction, as shown in FIG. 1, according to an embodiment of the present invention, the method comprising:
s101, collecting an operation behavior data stream of a user in a commodity purchasing process;
in the embodiment of the invention, the e-commerce platform is provided with a plurality of cloud servers, the cloud servers acquire operation behavior data streams from the processes of clicking, operating and purchasing in the e-commerce webpage of a user, and the data streams comprise real-time data streams and offline data streams.
In the embodiment of the invention, the user portrait is targeted for each type of commodity, and the commodity is recommended based on the user portrait. Therefore, in the embodiment of the present invention, a certain type of commodity is taken as an example, data streams of the click, purchase and customer service conversation behavior of the commodity by the user are obtained, and the data streams are more in data types, such as a commodity number, a commodity brand, a commodity store, the number of clicks, a customer service conversation time length, a frequency, a content data size, and the like.
S102, extracting features from the operation behavior data stream to generate a training sample;
for the data in the data stream, it is necessary to select relatively important and relatively independent (low correlation between features) data as features to generate training samples.
In order to reduce redundancy of features, the embodiment of the present invention also needs to cut off different data. For example, acquiring a pearson correlation coefficient between two parameters in a data stream; and if the Pearson correlation coefficient exceeds a preset threshold value, deleting one of the two parameters corresponding to the Pearson correlation coefficient. That is, in the embodiment of the present invention, a concept of pearson correlation coefficient is defined, where the correlation coefficient reflects a linear correlation between two parameters, and a value range is [ -1,1], where 1 represents a complete positive correlation, 0 represents no linear relationship at all, and-1 represents a complete negative correlation, that is, one parameter is increasing while the other parameter is decreasing. The closer the correlation coefficient is to 0, the weaker the correlation. The calculation formula is as follows:
Figure BDA0003143944190000071
wherein X and Y each represent two successive parameters in pairs.
The correlation determination is based on:
generally, | r | >0.8 is highly correlated; 0.4< ═ r | <0.8 moderate correlation; the low correlation of r <0.4, two parameters with high correlation can remove one of the unimportants to reduce the redundancy of data. For example, the time length of the user and the customer service conversation is typically in positive correlation with the content size, and the longer the time length is, the more the communication content is, so that the feature extraction only needs to be performed on the time length.
S103, training the training sample based on a gradient lifting tree GBDT and a logistic regression LR fusion model, and outputting a trained portrait prediction parameter;
because the characteristics of the model relate to operation behavior characteristics and marketing activity participation behavior characteristics, and the characteristic dimension is very high, an LR (logistic regression) algorithm is adopted, but because the learning capacity of the LR model is limited, a large amount of characteristic engineering is required to be carried out, effective characteristics and characteristic combinations are extracted, and the nonlinear learning capacity of the model is improved. The GBDT (gradient Boosting Decision Tree) is an iterative Decision tree algorithm, belongs to a member of an ensemble learning Boosting family, has the advantages of high classification accuracy, good generalization capability and the like, and is a commonly used nonlinear model. Based on the boosting thought in ensemble learning, a new decision tree is established in the gradient direction for reducing the residual error in each iteration, and the decision trees are generated by iteration for a plurality of times. Therefore, the GBDT can find various characteristics with distinctiveness and characteristic combinations, and the time and labor cost of characteristic engineering are greatly saved. Therefore, a fusion algorithm of GBDT and LR is selected for the abnormal user identification model.
The basic idea of GBDT is: based on the forward distribution algorithm, each iteration is calculated to reduce the residual error (residual) of the previous time. To eliminate the residual, a new model can be built in the Gradient (Gradient) direction in which the residual is reduced. Therefore, in the Gradient Boost, the goal of each new model is to reduce the residual error of the previous model to the Gradient direction, which is greatly different from the traditional Boost algorithm that weights the correct and wrong samples. Therefore, the GBDT can achieve a higher accuracy with a relatively small parameter adjustment time. In addition, the GBDT adopts a robust loss function, and the robustness to abnormal data is very high.
The GBDT and LR fusion model is composed of two parts, wherein GBDT is used for extracting features from a training set to serve as new training input data, and LR is used as a classifier of the new training input data.
FIG. 2 is a schematic diagram of a GBDT + LR model structure, assuming that the GBDT has two weak classifiers, which are respectively represented by a hollow part and a solid part, wherein the number of leaf nodes of the hollow weak classifier is 3, the number of leaf nodes of the solid weak classifier is 2, and the prediction result of [0-1] in the hollow weak classifier falls on the second leaf node, and the prediction result of [0-1] in the solid weak classifier also falls on the second leaf node, noting that the prediction result of the hollow weak classifier is [ 010 ], the prediction result of the solid weak classifier is [ 01 ], and the output of the GBDT is a combination [ 01001 ] of the weak classifiers or a sparse vector (array).
After the new training data is constructed, the new training data is input into the LR classifier as input training set data for the training of the final classifier.
Specifically, in the embodiment of the present invention, the storming step of training the training sample may be:
s1031, dividing the training samples into N groups of training feature sets according to N commodity categories, wherein N is a positive integer larger than 1;
since the final predicted user portrait is strongly related to a certain type of commodity, it is necessary to distinguish the total training sample into N different training feature sets according to N commodity categories (commodity category IDs), where each training feature set corresponds to a commodity category ID, and perform user portrait prediction for each type of commodity.
S1032, dividing each group of training feature sets into three training sets according to the user clicking operation behavior feature, the user purchasing operation behavior feature and the user and customer service conversation behavior feature, and respectively establishing corresponding GBDT trees;
in the process of commodity purchasing behavior of a user, three operation behaviors are typical operation behaviors, the overall operation behavior of the user can be analyzed based on the operation behaviors, and the three operation behavior characteristics are the user click operation behavior characteristics (such as click times, webpage stay times, recommendation times and the like), the user purchase operation behavior characteristics (purchase price, purchase time and the like) and the user and customer service conversation behaviors (conversation content, emotion, frequency, duration and the like). The user's figure can be obtained from three typical operation behaviors, for example, the user is interested in the product (number of clicks) and has a payment intention (history payment record), the user is more critical to the product (service session duration), and the like. And respectively establishing corresponding GBDT trees.
S1033, respectively traversing GBDT trees corresponding to the three training sets, and outputting three groups of GBDT training sets;
setting a training set x (containing n samples), the depth D of a loss function L, GBDT tree and the iteration number M, and initializing a weak learner;
wherein the weak learner is initialized to f0(x),
Figure BDA0003143944190000101
For the ith sample in each training set (n samples in the training set, i is 1,2.. n), a negative gradient r is calculatedim(residual);
Figure BDA0003143944190000102
using the negative gradient as a new sample, and using the data set (x)i,rim) Determining a second GBDT tree f as training data for the GBDT treem(x) Wherein the leaf node region corresponding to the second GBDT tree is RjmJ is 1,2. Wherein J is the number of leaf nodes of the second GBDT tree;
calculating the best fitting value of the leaf node area;
Figure BDA0003143944190000103
the strong learning device is updated, and the strong learning device is updated,
Figure BDA0003143944190000104
after M iterations, the final learner is obtained:
Figure BDA0003143944190000105
s1034, the three groups of GBDT training sets are used as input of an LR model, the LR model is used for training the three groups of GBDT training sets, and the trained portrait prediction parameters are output.
Setting a loss function L1, a step length a and a maximum iteration number MmaxThe error limit t;
let any one of the three GBDT training sets be defined as xiN, wherein i is 1,2.. n, and n is the number of samples;
initializing an image prediction parameter c ═ c0,c1,c2...ck};
The loss function uses a log-likelihood function:
Figure BDA0003143944190000111
Figure BDA0003143944190000112
inputting the three groups of GBDT training sets in an LR model to carry out portrait prediction parameter iteration, respectively judging whether the errors of the three groups of GBDT training sets are less than t in each iteration process, terminating the training if the errors are less than t, updating a portrait prediction parameter c if the errors are more than or equal to t, and updating the updated cjJ is 0,1,. k is as follows;
Figure BDA0003143944190000113
and outputting the final portrait prediction parameter c after the iteration is finished.
In the embodiment of the present invention, the portrait prediction parameter c may be a prediction of whether the user clicks a merchandise page, and/or a prediction of whether the user purchases merchandise, and/or whether the user communicates with the customer service frequently.
S104, the user portrait is predicted based on the portrait prediction parameters and the natural attribute parameters of the user.
The natural attributes of the user may include a user gender, a user age, a user interest, and the like. The listing may be by information filled in at the time of user registration.
Wherein, S104 may specifically be:
acquiring a user portrait template library, wherein the user portrait template library comprises a plurality of user portraits and corresponding characteristic values; the user portrait template library is a user accurate portrait library which is mined and extracted by engineers for massive users, and comprises various natural attributes, interest and hobbies, click rates of similar/different commodities, historical purchase rates, complaint rates and the like of the users, and the user portrait template library can accurately recommend the commodities according to various attributes or parameters of the users or accurately recommend potential users according to the commodities. In the user portrait template library, a concept of a characteristic value (namely a correlation coefficient) is defined, the characteristic value is an index for quantifying whether a user portrait purchases a certain commodity, the value of the characteristic value is [0,1], 0 represents that the user is not interested in the commodity, and 1 represents that the user has a very strong purchase intention. Therefore, the values of the characteristic values are different, and the corresponding user images are also different.
Different weights (lambda) are set for different portrait prediction parameters and natural attribute parameters of the user12,...λn) And performing weighting operation to determine a user portrait characteristic value theta;
Figure BDA0003143944190000121
wherein b ═ b1,b2,...bjIs the user natural attribute parameter, p ═ p1+j,p2+j,...pnN, j are image prediction parameters, i 1,2<n。
And traversing the characteristic values in the user image template library, and determining the characteristic value with the minimum difference value with the characteristic value of the user image in the template library, so that the user image corresponding to the characteristic value in the template library is the predicted user image. For example, if the calculated feature value is 0.75 and the template library has feature values of 0.7, 0.73, 0.74, 0.78, the difference between 0.74 and 0.75 is minimal, and the user image corresponding to the calculated feature value is deemed to be consistent with the user image corresponding to the feature value of 0.74 in the template library.
According to the method provided by the embodiment of the invention, the operation behavior of the user is analyzed through the GBDT and LR fusion model, the user portrait prediction parameter is output, the user portrait prediction parameter and the user natural attribute parameter are combined and analyzed, the user portrait prediction is finally obtained, and the user portrait prediction capability and the user portrait prediction accuracy are improved.
As shown in FIG. 3, an embodiment of the present invention further provides an apparatus for user portrait prediction based on artificial intelligence, including:
the acquisition unit 31 is used for acquiring operation behavior data flow of a user in a commodity purchasing process;
a feature extraction unit 32, configured to perform feature extraction from the operation behavior data stream to generate a training sample;
the training unit 33 is used for training the training sample based on a gradient lifting tree GBDT and a logistic regression LR fusion model and outputting a trained portrait prediction parameter;
a prediction unit 34 for predicting the user representation based on the representation prediction parameters and the natural attribute parameters of the user.
The training unit 33 trains the training sample based on the gradient descent tree GBDT and the logistic regression LR fusion model, and outputs the trained portrait prediction parameters, specifically:
dividing the training samples into N groups of training feature sets according to N commodity categories, wherein N is a positive integer greater than 1;
dividing each group of training feature sets into three training sets according to the user clicking operation behavior feature, the user purchasing operation behavior feature and the user and customer service conversation behavior feature, and respectively establishing corresponding GBDT trees;
respectively traversing GBDT trees corresponding to the three training sets, and outputting three groups of GBDT training sets;
and taking the three groups of GBDT training sets as input of an LR model, training the three groups of GBDT training sets by using the LR model, and outputting the image prediction parameters after training.
The embodiment of the present invention further includes an apparatus, which is characterized by comprising a memory and a processor, wherein the memory stores computer executable instructions, and the processor implements the method when executing the computer executable instructions on the memory.
Embodiments of the present invention also provide a computer-readable storage medium having stored thereon computer-executable instructions for performing the method in the foregoing embodiments.
FIG. 4 is a diagram illustrating the hardware components of the apparatus according to one embodiment. It will be appreciated that fig. 4 only shows a simplified design of the device. In practical applications, the apparatuses may also respectively include other necessary elements, including but not limited to any number of input/output systems, processors, controllers, memories, etc., and all apparatuses that can implement the big data management method of the embodiments of the present application are within the protection scope of the present application.
The memory includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), which is used for storing instructions and data.
The input system is for inputting data and/or signals and the output system is for outputting data and/or signals. The output system and the input system may be separate devices or may be an integral device.
The processor may include one or more processors, for example, one or more Central Processing Units (CPUs), and in the case of one CPU, the CPU may be a single-core CPU or a multi-core CPU. The processor may also include one or more special purpose processors, which may include GPUs, FPGAs, etc., for accelerated processing.
The memory is used to store program codes and data of the network device.
The processor is used for calling the program codes and data in the memory and executing the steps in the method embodiment. Specifically, reference may be made to the description of the method embodiment, which is not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the division of the unit is only one logical function division, and other division may be implemented in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable system. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a read-only memory (ROM), or a Random Access Memory (RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).
The above is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for user portrait prediction based on artificial intelligence, comprising:
collecting operation behavior data flow of a user in a commodity purchasing process;
extracting features from the operation behavior data stream to generate a training sample;
training the training sample based on a gradient lifting tree GBDT and a logistic regression LR fusion model, and outputting a trained portrait prediction parameter;
predicting the user representation based on the representation prediction parameters and natural attribute parameters of the user.
2. The method of claim 1, wherein the training samples based on the gradient descent tree GBDT and logistic regression LR fusion model to output the trained portrait prediction parameters comprises:
dividing the training samples into N groups of training feature sets according to N commodity categories, wherein N is a positive integer greater than 1;
dividing each group of training feature sets into three training sets according to the user clicking operation behavior feature, the user purchasing operation behavior feature and the user and customer service conversation behavior feature, and respectively establishing corresponding GBDT trees;
respectively traversing GBDT trees corresponding to the three training sets, and outputting three groups of GBDT training sets;
and taking the three groups of GBDT training sets as input of an LR model, training the three groups of GBDT training sets by using the LR model, and outputting the image prediction parameters after training.
3. The method according to claim 2, wherein said separately traversing the GBDT trees corresponding to the three training sets comprises:
setting a training set x, the depth D of a loss function L, GBDT tree and the iteration number M, and initializing a weak learner;
calculating the negative gradient r for each sample in the training setim,i=1,2,...n;
Using the negative gradient as a new sample, and using the data set (x)i,rim) Determining a second GBDT tree f as training data for the GBDT treem(x) Wherein the leaf node region corresponding to the second GBDT tree is RjmJ is 1,2. Wherein J is the number of leaf nodes of the second GBDT tree;
calculating the best fitting value of the leaf node area;
and updating the strong learner, and obtaining the final learner after M rounds of iteration.
4. The method according to claim 2 or 3, wherein the training of the three groups of GBDT training sets using an LR model comprises:
setting a loss function L1, a step length a and a maximum iteration number MmaxThe error limit t;
initializing an image prediction parameter c ═ c0,c1,c2...ck};
Inputting the three groups of GBDT training sets to carry out portrait prediction parameter iteration, respectively judging whether the errors of the three groups of GBDT training sets are smaller than t in each iteration process, terminating the training if the errors of the three groups of GBDT training sets are smaller than t, and updating the portrait prediction parameter c if the errors of the three groups of GBDT training sets are larger than or equal to t;
and outputting the final portrait prediction parameter c after the iteration is finished.
5. The method of claim 4, wherein the representation prediction parameter c is a prediction of whether the user clicks on a merchandise page, and/or whether the user purchases merchandise, and/or whether the user communicates frequently with the customer service.
6. The method of claim 1, wherein the extracting features from the operational behavior data stream comprises:
acquiring a Pearson correlation coefficient between every two parameters;
and if the Pearson correlation coefficient exceeds a preset threshold value, deleting one of the two parameters corresponding to the Pearson correlation coefficient.
7. The method of claim 1, wherein the natural attributes of the user include a user gender, a user age, and a user interest, and predicting the user representation based on the representation prediction parameters and the natural attribute parameters of the user comprises:
acquiring a user portrait template library, wherein the user portrait template library comprises a plurality of user portraits and corresponding characteristic values;
setting different weights for different portrait prediction parameters and natural attribute parameters of a user, and performing weighting operation to determine a user portrait characteristic value;
and traversing the characteristic values in the user image template library, and determining the characteristic value with the minimum difference value with the characteristic value of the user image in the template library, so that the user image corresponding to the characteristic value in the template library is the predicted user image.
8. An apparatus for artificial intelligence based user profile prediction, comprising:
the acquisition unit is used for acquiring the operation behavior data flow of the user in the commodity purchasing process;
the characteristic extraction unit is used for extracting characteristics from the operation behavior data stream to generate a training sample;
the training unit is used for training the training sample based on a gradient lifting tree GBDT and a logistic regression LR fusion model and outputting a trained portrait prediction parameter;
a prediction unit to predict the user representation based on the representation prediction parameters and the natural attribute parameters of the user.
9. The apparatus of claim 8, wherein the training unit is configured to train the training samples based on a gradient descent tree (GBDT) and a Logistic Regression (LR) fusion model, and output the trained portrait prediction parameters, and comprises:
dividing the training samples into N groups of training feature sets according to N commodity categories, wherein N is a positive integer greater than 1;
dividing each group of training feature sets into three training sets according to the user clicking operation behavior feature, the user purchasing operation behavior feature and the user and customer service conversation behavior feature, and respectively establishing corresponding GBDT trees;
respectively traversing GBDT trees corresponding to the three training sets, and outputting three groups of GBDT training sets;
and taking the three groups of GBDT training sets as input of an LR model, training the three groups of GBDT training sets by using the LR model, and outputting the image prediction parameters after training.
10. An apparatus comprising a memory having computer-executable instructions stored thereon and a processor that, when executing the computer-executable instructions on the memory, implements the method of any of claims 1 to 7.
CN202110744483.6A 2021-07-01 2021-07-01 Method and device for user portrait prediction based on artificial intelligence Withdrawn CN113553501A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110744483.6A CN113553501A (en) 2021-07-01 2021-07-01 Method and device for user portrait prediction based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110744483.6A CN113553501A (en) 2021-07-01 2021-07-01 Method and device for user portrait prediction based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN113553501A true CN113553501A (en) 2021-10-26

Family

ID=78102679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110744483.6A Withdrawn CN113553501A (en) 2021-07-01 2021-07-01 Method and device for user portrait prediction based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN113553501A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115083337A (en) * 2022-07-08 2022-09-20 深圳市安信泰科技有限公司 LED display driving system and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115083337A (en) * 2022-07-08 2022-09-20 深圳市安信泰科技有限公司 LED display driving system and method

Similar Documents

Publication Publication Date Title
WO2022057658A1 (en) Method and apparatus for training recommendation model, and computer device and storage medium
CN113256367B (en) Commodity recommendation method, system, equipment and medium for user behavior history data
US20170249389A1 (en) Sentiment rating system and method
CN110598120A (en) Behavior data based financing recommendation method, device and equipment
CN111275205B (en) Virtual sample generation method, terminal equipment and storage medium
CN112380449B (en) Information recommendation method, model training method and related device
WO2023000491A1 (en) Application recommendation method, apparatus and device, and computer-readable storage medium
CN111429161B (en) Feature extraction method, feature extraction device, storage medium and electronic equipment
CN111125529A (en) Product matching method and device, computer equipment and storage medium
CN111881671A (en) Attribute word extraction method
CN108572984A (en) A kind of active user interest recognition methods and device
CN111754278A (en) Article recommendation method and device, computer storage medium and electronic equipment
CN111667024B (en) Content pushing method, device, computer equipment and storage medium
CN114511387A (en) Product recommendation method and device, electronic equipment and storage medium
CN113592593A (en) Training and application method, device, equipment and storage medium of sequence recommendation model
CN114997916A (en) Prediction method, system, electronic device and storage medium of potential user
CN111159481B (en) Edge prediction method and device for graph data and terminal equipment
CN113837843B (en) Product recommendation method and device, medium and electronic equipment
CN113495991A (en) Recommendation method and device
CN112528103A (en) Method and device for recommending objects
CN116109354A (en) Content recommendation method, apparatus, device, storage medium, and computer program product
CN112905885B (en) Method, apparatus, device, medium and program product for recommending resources to user
CN113722487A (en) User emotion analysis method, device and equipment and storage medium
CN113553501A (en) Method and device for user portrait prediction based on artificial intelligence
CN117573973A (en) Resource recommendation method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220329

Address after: 518129 1c-912, building 1, jiayuhaoyuan, No. 698, Jihua Road, dafapu community, Bantian street, Longgang District, Shenzhen, Guangdong

Applicant after: Shenzhen tongerjia Education Consulting Co.,Ltd.

Address before: 518129 Bantian shangpinya garden, Longgang District, Shenzhen City, Guangdong Province

Applicant before: Zhang Yang

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230103

Address after: 274000 No. 1388, Renmin Road, Heze City, Shandong Province

Applicant after: Heze Mingge Network Technology Co.,Ltd.

Address before: 518129 1c-912, building 1, jiayuhaoyuan, No. 698, Jihua Road, dafapu community, Bantian street, Longgang District, Shenzhen, Guangdong

Applicant before: Shenzhen tongerjia Education Consulting Co.,Ltd.

TA01 Transfer of patent application right
WW01 Invention patent application withdrawn after publication

Application publication date: 20211026

WW01 Invention patent application withdrawn after publication