WO2021189922A1 - 用户画像生成方法、装置、设备及介质 - Google Patents

用户画像生成方法、装置、设备及介质 Download PDF

Info

Publication number
WO2021189922A1
WO2021189922A1 PCT/CN2020/132601 CN2020132601W WO2021189922A1 WO 2021189922 A1 WO2021189922 A1 WO 2021189922A1 CN 2020132601 W CN2020132601 W CN 2020132601W WO 2021189922 A1 WO2021189922 A1 WO 2021189922A1
Authority
WO
WIPO (PCT)
Prior art keywords
behavior
time series
user
product
data
Prior art date
Application number
PCT/CN2020/132601
Other languages
English (en)
French (fr)
Inventor
夏婧
吴振宇
王建明
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021189922A1 publication Critical patent/WO2021189922A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Definitions

  • This application relates to the field of artificial intelligence technology, in particular to a method, device, equipment and medium for generating a user portrait.
  • User portrait is a digital abstraction of user roles. It is a model for analyzing and mining user behaviors and constructing accurate user portraits. It can help companies expand the sales of emerging products and conduct targeted sales by understanding the user's environment and the required products.
  • User portraits are relatively coarse-grained, difficult to meet the needs of multiple marketing scenarios, difficult to meet multiple role-based requirements, and difficult to track user behaviors to cultivate long-term customers. In the above-mentioned difficult situations, the improvement of user portraits to help businesses carry out precision marketing is limited. It can neither meet the needs of marketing personnel in real time, nor can it distinguish the differences in characteristics and needs of different types of users at a high granularity.
  • the main purpose of this application is to provide a user portrait generation method, device, equipment and medium, which aims to solve the limitations of the prior art user portraits that help the business to carry out precision marketing and the improvement that can not meet the needs of the marketing side business personnel in real time.
  • Technical issues that cannot distinguish between different types of users' characteristics and needs with a high degree of granularity.
  • this application proposes a user portrait generation method, which includes:
  • This application also proposes a device for generating a user portrait, which includes:
  • a data acquisition module configured to acquire a time series of status characteristics of a target user and a time series of purchase behavior, where the time series of purchase behavior carries a product identifier of the product purchased by the target user;
  • the model acquisition module is configured to search for a behavior prediction model corresponding to the product identifier from a preset model library, where the behavior prediction model is a model obtained based on Markov decision process and maximum likelihood inverse reinforcement learning;
  • a prediction module configured to input the state characteristic time series and the purchase behavior time series into the behavior prediction model corresponding to the product identification to perform probabilistic prediction to obtain the behavior prediction data of the target user;
  • the portrait module is used to determine the portrait of the target user according to the behavior prediction data.
  • This application also proposes a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the following method steps when the computer program is executed:
  • This application also proposes a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following method steps are implemented:
  • the user portrait generation method, device, equipment, and medium of the present application realize the description of the user’s life stage, life state, and consumption scene by acquiring the target user’s state feature time series and purchase behavior time series, thereby facilitating the construction of multiple
  • the perspective of user portraits meets the needs of user portraits in complex scenarios; because different time series of purchase behaviors are used for different products, and each behavior prediction model corresponds to a product, the granularity of user portraits is improved; because of the behavior prediction model It is a model based on the Markov decision process and maximum likelihood inverse reinforcement learning.
  • the Markov decision process can fully explore user behavior when the life stage, life state, and consumption scene change, and improve the accuracy of user portraits.
  • Maximum likelihood inverse reinforcement learning realizes autonomous learning and improves generalization ability.
  • FIG. 1 is a schematic flowchart of a method for generating a user portrait according to an embodiment of this application
  • FIG. 2 is a schematic block diagram of the structure of an apparatus for generating a user portrait according to an embodiment of the application
  • FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
  • a method for generating a user portrait which is applied to the field of artificial intelligence technology.
  • the method obtains the behavior prediction model through the model obtained based on the Markov decision process and the maximum likelihood inverse reinforcement learning, and then uses the behavior prediction model for probability prediction.
  • the Markov decision process can occur in the life stage, life state, and consumption scene. Fully excavate user behavior during changes, improve the accuracy of user portraits, realize autonomous learning through maximum likelihood inverse reinforcement learning, and improve generalization ability.
  • the method for generating a user portrait includes:
  • S1 Obtain a time series of state characteristics and a time series of purchase behavior of a target user, where the time series of purchase behavior carries a product identifier of a product purchased by the target user;
  • S2 Search for a behavior prediction model corresponding to the product identifier from a preset model library, where the behavior prediction model is a model obtained based on the Markov decision process and maximum likelihood inverse reinforcement learning;
  • S3 Input the state characteristic time series and the purchase behavior time series into the behavior prediction model corresponding to the product identification to perform probabilistic prediction to obtain the behavior prediction data of the target user;
  • S4 Determine the portrait of the target user according to the behavior prediction data.
  • This embodiment realizes the description of the user's life stage, life state, and consumption scene by acquiring the target user's state feature time series and purchase behavior time series, thereby facilitating the construction of multi-view user portraits and satisfying users in complex scenarios Portrait demand; because different time series of purchase behaviors are used for different products, each behavior prediction model corresponds to a product, so the granularity of user portraits is improved; because the behavior prediction model is based on Markov decision process and maximum likelihood In the model obtained by inverse reinforcement learning, Markov's decision-making process can fully explore user behaviors when life stages, life states, and consumption scenarios change, improve the accuracy of user portraits, and realize autonomous learning through maximum likelihood inverse reinforcement learning. Improved generalization ability.
  • the target user's state characteristic time series and purchase behavior time series can be obtained from the database.
  • the state feature time series and purchase behavior time series of the target user refer to the state feature time series and purchase behavior time series of the same user to be profiled.
  • the state feature time sequence refers to the time sequence of the state feature vector of the user to be portrayed.
  • Each state feature vector expresses multiple user information. That is, the state feature time series includes a plurality of state feature vectors, and the state feature vectors of the plurality of state feature vectors are arranged in time.
  • User information includes but is not limited to: personal information, financial status, purchased product information, loan records, and information browsing records.
  • the state feature time series can be expressed as ⁇ x 1 ,x 2 ,x 3 , whilx n ⁇
  • each state feature vector in ⁇ x 1 ,x 2 ,x 3 , whilx n ⁇ includes 6 vectors Elements, the 6 vector elements respectively represent the data generation time, personal information, financial status, purchase product information, loan records, and information browsing records.
  • xi includes 6 vector elements, and the 6 vector elements of xi respectively Represents the time of data generation, personal information, financial status, purchase product information, loan records, information browsing records, x i is the i-th value in ⁇ x 1 ,x 2 ,x 3 ,...x n ⁇ (that is, the first i time state feature vector), which is not specifically limited in this example.
  • the purchase behavior time sequence refers to the time sequence of the characteristics of the purchase behavior of the user to be profiled on a certain product.
  • the purchase behavior time series includes a plurality of the purchase behavior characteristics, and each of the purchase behavior characteristics includes a value. For example, when the purchase behavior characteristic is 1, it means that the product is purchased, and when the purchase behavior characteristic is 0, it means that the product has not been purchased. This product is not specifically limited in this example.
  • the purchase behavior time series can be expressed as ⁇ a 1 ,a 2 ,a 3 , whila n ⁇ , ⁇ a 1 ,a 2 ,a 3 , whila n ⁇ is the purchase behavior of the same product, a i
  • a i There is a value (0 or 1), when a i is 0, it means that the product is purchased, when a i is 1, it means that the product has not been purchased, and a i is ⁇ a 1 ,a 2 ,a 3 , whila n ⁇
  • the i-th value that is, the characteristic of the purchase behavior at the i-th time) in the middle is not specifically limited in this example.
  • the number of state feature vectors in the state feature time series is the same as the number of purchase behavior features in the purchase behavior time series.
  • the preset model library includes at least one behavior prediction model, and each behavior prediction model carries a product identifier.
  • the behavior prediction model is a model for probabilistically predicting the purchase behavior of the target.
  • a behavior prediction model is obtained.
  • the product identification carried by the behavior prediction model is the same as the product identification of the sample data of multiple typical users used in modeling and independent learning.
  • the behavior prediction data refers to the probability prediction value of a target user's purchase behavior of a product.
  • step S2 to step S3 can complete the probability prediction of the state characteristic time series and multiple purchase behavior time series. In other words, steps S2 to S3 only predict the probability prediction value of the target user's purchase behavior of a product each time.
  • the portrait of the target user is used to describe whether the target user purchases the product.
  • the portrait of the target user can be expressed as [1 0 1 1], the first vector element represents product one, the second vector element represents product two, the third vector element represents product three, and the fourth vector element represents product 4.
  • Vector element value 0 means no purchase
  • vector element value 1 means purchase
  • the target user's portrait [1 0 1 1] means that the target user purchases product 1, product 3, product 4, and the target user does not purchase product 2, here
  • the examples are not specifically limited.
  • the portrait of the target user can also be expressed as ⁇ product one: 1, product two: 0, product three: 1, product four: 1 ⁇ , the set element value 0 means no purchase, the set element value 1 means purchase, and the target user
  • the portrait of ⁇ product one: 1, product two: 0, product three: 1, product four: 1 ⁇ indicates that the target user purchases product one, product three, product four, and the target user does not purchase product two. This example does not make specific restrictions .
  • the method before the step of searching for the behavior prediction model corresponding to the product identifier from the preset model library, the method further includes:
  • S021 Acquire sample data of multiple typical users, where the sample data carries the product identifier of the product purchased by the typical user;
  • S022 Determine the utility function set of the sample data based on the Markov decision process
  • S023 Perform maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, and the behavior prediction model carries the product identifier.
  • This embodiment implements the use of sample data of multiple typical users to determine the behavior prediction model based on the Markov decision process and maximum likelihood inverse reinforcement learning.
  • the Markov decision process can occur in life stages, life states, and consumption scenarios. Fully excavate user behavior during changes, improve the accuracy of user portraits, realize autonomous learning through maximum likelihood inverse reinforcement learning, and improve generalization ability.
  • sample data of multiple typical users can be obtained from the database.
  • the sample data of typical users refers to data of representative customers, which is determined based on historical customer data.
  • a representative customer refers to a customer whose willingness and behavior to purchase a product is at the average level of that customer.
  • customers with similar income levels, similar education levels, similar family members, and similar work experience are classified into the same type of customers. It is understandable that there are other ways to classify customers. For example, customers with similar education levels and similar family members are classified into the same type of customers. This example does not make specific restrictions.
  • the sample data includes: a time series of state characteristics of a typical user and a time series of purchase behavior, and the time series of purchase behavior of the typical user carries a product identifier of a product purchased by the typical user.
  • the state feature time series of the typical user refers to the time series of the state feature vector of the typical user.
  • the time series of purchase behaviors of typical users refers to the time series of characteristics of typical users' purchase behaviors for a certain product.
  • the number of state feature vectors in the typical user's state feature time series is the same as the number of purchase behavior features in the typical user's purchase behavior time series.
  • the relationship between the state, behavior, and utility function is established based on the Markov decision process. Then the utility function is optimized and solved, and the utility function set is determined according to the optimized solution result. Among them, the utility function is extracted from the optimization solution result, and the extracted utility function is combined into a set, which is the set of utility functions.
  • the number of utility functions in the utility function set is the same as the number of state feature vectors in the state feature time series of the typical user.
  • the product identifier carried by the behavior prediction model is the same as the product identifier corresponding to the purchase behavior time series of the typical user in step S022.
  • the foregoing acquiring sample data of multiple typical users includes:
  • S0211 Acquire historical data of multiple typical users, the historical data includes: state characteristic data of the typical user, and purchase behavior data of the typical user.
  • the purchase behavior data of the typical user carries the product identification of the product purchased by the typical user. ;
  • S0212 Perform time series construction on the state characteristic data of the typical user to obtain sample data of the state characteristic time series of the typical user;
  • S0213 Construct a time series of the typical user purchase behavior data according to the product identifier to obtain sample data of the typical user purchase behavior time series.
  • This embodiment implements the time series construction of the state characteristic data of the typical user to obtain the sample data of the typical user state characteristic time series, and the time series construction of the typical user purchase behavior data according to the product identifier, to obtain The sample data of the typical user's purchase behavior time series, so that the sample data of the typical user can describe the user's life stage, life state, and consumption scene, which is conducive to constructing a multi-view user portrait and satisfies users in complex scenes. Portrait demand.
  • For S0211 obtain historical customer data to be processed; perform typical user feature extraction according to the historical customer data to be processed to obtain historical data of the multiple typical users.
  • Each historical data of the typical user corresponds to a typical user.
  • the state characteristic data is a data set.
  • the number of status feature data in the status feature data of the typical user is the same as the number of purchase behavior data in the purchase behavior data of the typical user.
  • the purchase behavior data is extracted from the purchase behavior data of the typical user according to the product identification, and the extracted purchase behavior data is constructed in a time series to obtain sample data of the typical user purchase behavior time series . That is to say, each time the purchase behavior time series of the typical users of one product identifier are extracted, after multiple extractions, the purchase behavior time series of multiple typical users corresponding to the same typical user can be determined.
  • the above-mentioned sample data includes: a typical user's state characteristic time series and a purchase behavior time series, the purchase behavior time series of the typical user carries the product identification of the product purchased by the typical user; the mark-based In the decision-making process of the husband, the step of determining the set of utility functions according to the time series of the state characteristics of all the typical users and the time series of the purchase behaviors of all the typical users with the same product identification includes:
  • S0222 Use the dynamic programming method to iteratively optimize the calculation formula of the maximum value behavior to obtain the target maximum value behavior calculation formula
  • This embodiment implements the use of sample data of multiple typical users to determine the utility function set based on the Markov decision process.
  • the Markov decision process can fully explore user behaviors when life stages, life states, and consumption scenarios change.
  • x) is the probability of taking action a in state x
  • U(x,a) is the utility function
  • x is the value in the time series of the state characteristic of the typical user, and the state of the typical user
  • the characteristic time series is expressed as ⁇ x 1 , x 2 , x 3 ,...x n ⁇
  • a is the value in the time series of the typical user's buying behavior
  • the time series of the typical user's buying behavior is expressed as ⁇ a 1 ,a 2 ,a 3 , whila n ⁇ .
  • the dynamic programming method is used to iteratively optimize and solve the calculation formula of the maximum value behavior to obtain the calculation formula of the target maximum value behavior.
  • the optimization solution is to find an optimal strategy to allow typical users to obtain more gains than other strategies in the process of interacting with each state feature in the state feature time series.
  • Optimal solution is to make Has the largest value, The utility function U(x,a) extracted when the value of is the largest is the most valuable utility function.
  • This optimal strategy can be represented by ⁇ . Once we find the optimal strategy ⁇ , then we have solved the reinforcement learning problem. Generally speaking, it is more difficult to find an optimal strategy, but a better strategy can be determined by comparing the pros and cons of several different strategies, that is, the local optimal solution.
  • the Bellman equation V is used to optimize and solve the calculation formula of the maximum value behavior by using a dynamic programming method iteratively.
  • V (x t) represents the basis of the state x t, expected utility function U,;
  • U (x t, a t ) represents the utility function value x t (t time) and a t (t time) time;
  • beta] Is the attenuation factor, the value of the attenuation factor is 0-1 (which can include 0 or 1);
  • x is the value in the time series of the typical user's state characteristics, and a is the time series of the typical user's purchase behavior Value in.
  • the attenuation factor is 0.9 to avoid excessive attenuation; t is time; U is the utility function U(x, a).
  • a utility function is extracted from the target maximum value behavior calculation formula, and the extracted utility function is put into the utility function set.
  • the foregoing step of performing maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model includes:
  • S0231 Perform linear superposition on the utility functions in the utility function set to obtain a personal utility function to be estimated;
  • S0232 Use the softmax function to normalize the personal utility function to be estimated to obtain a normalized personal utility function
  • S0233 Use a maximum entropy inverse reinforcement learning method to estimate the parameters of the normalized personal utility function to obtain the behavior prediction model.
  • This embodiment implements linear superposition and normalization processing to achieve maximum likelihood inverse reinforcement learning, and achieves autonomous learning through maximum likelihood inverse reinforcement learning, and improves generalization ability.
  • the utility function set is expressed as ⁇ U 1 , U 2 , U 3 ,...U n ⁇ , the utility functions in the utility function set will be linearly superimposed to obtain the personal utility function to be estimated U agent , specifically expressed as:
  • U agent w 1 U 1 +w 2 U 2 +w 3 U 3 + whil+w n U n
  • w 1 , w 2 , w 3 ,...w n are parameters that need to be estimated.
  • the personal utility function to be estimated is normalized through a softmax function.
  • the Softmax function is a normalized exponential function, which "compresses" a K-dimensional vector z containing any real number into another K-dimensional real vector ⁇ (z), so that the range of each element is between (0,1) , And the sum of all elements is 1.
  • U(x,a) j refers to w j U j of U agent in step S0231
  • U(x,a) i refers to w i U i of U agent in step S0231
  • e is a natural constant, which is in mathematics
  • a constant is an infinite non-recurring decimal and a transcendental number. Its value is about 2.718281828459.
  • the step of using the maximum entropy inverse reinforcement learning method to estimate the parameters of the normalized personal utility function to obtain the behavior prediction model includes:
  • f represents the feature expectation (here refers to the expected utility value each product brings to the customer, that is, the personal utility function U agent to be estimated), It is the expert characteristic expectation (the weighted utility value brought by multiple products to the customer), which is the probability of each product being selected (that is, w 1 , w 2 , w 3 ,... in the personal utility function U agent to be estimated) ...W n ); Convert the problem into a standard type, and become the optimal problem when solving the maximum entropy:
  • plogp represents the entropy of a random variable
  • St is followed by calculation The restrictions
  • exp() is an exponential function based on the natural constant e in advanced mathematics; the parameter ⁇ j corresponds to the Lagrangian multiplier, which can be solved by the maximum likelihood method; f j means that each j product brings customers Expected utility value to come.
  • the step of determining the portrait of the target user according to the behavior prediction data includes:
  • the behavior prediction data is higher than the preset threshold value, it is determined that the prediction result corresponding to the product identification is purchase; otherwise, it is determined that the prediction result corresponding to the product identification is no purchase;
  • the preset threshold may be 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, which is not specifically limited in this example.
  • the prediction result obtained with a high preset threshold is more accurate than a prediction result obtained with a low preset threshold, and the range is reduced, where the reduced range means that the prediction result of some users with purchase intention is determined to not purchase.
  • all the prediction results corresponding to the product identification may be combined into a vector, and the combined vector is used as the portrait of the target user.
  • this application also proposes a user portrait generation device, the device includes:
  • the data acquisition module 100 is configured to acquire a time series of status characteristics and a time series of purchase behaviors of a target user, where the time series of purchase behaviors carry the product identification of the product purchased by the target user;
  • the model acquisition module 200 is used to search for a behavior prediction model corresponding to the product identifier from a preset model library, where the behavior prediction model is a model obtained based on the Markov decision process and maximum likelihood inverse reinforcement learning ;
  • the prediction module 300 is configured to input the state characteristic time series and the purchase behavior time series into the behavior prediction model corresponding to the product identification to perform probabilistic prediction to obtain the behavior prediction data of the target user;
  • the portrait module 400 is used to determine the portrait of the target user according to the behavior prediction data.
  • This embodiment realizes the description of the user's life stage, life state, and consumption scene by acquiring the target user's state feature time series and purchase behavior time series, thereby facilitating the construction of multi-view user portraits and satisfying users in complex scenarios Portrait demand; because different time series of purchase behaviors are used for different products, each behavior prediction model corresponds to a product, so the granularity of user portraits is improved; because the behavior prediction model is based on Markov decision process and maximum likelihood In the model obtained by inverse reinforcement learning, Markov's decision-making process can fully explore user behaviors when life stages, life states, and consumption scenarios change, improve the accuracy of user portraits, and realize autonomous learning through maximum likelihood inverse reinforcement learning. Improved generalization ability.
  • the device includes: a model training module
  • the model training module is used to obtain sample data of multiple typical users, where the sample data carries the product identification of the product purchased by the typical user; the utility function set of the sample data is determined based on the Markov decision process Perform maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, and the behavior prediction model carries the product identifier.
  • the model training module includes: a sample acquisition sub-module
  • the sample acquisition sub-module is used to acquire historical data of multiple typical users.
  • the historical data includes: state characteristic data of typical users, and purchase behavior data of typical users.
  • the purchase behavior data of typical users carries the Product identification of the product purchased by a typical user; time series construction of the state characteristic data of the typical user to obtain sample data of the typical user state characteristic time series; time series of the typical user purchase behavior data according to the product identification Construct, and obtain the sample data of the typical user's purchase behavior time series.
  • the sample data includes: a time series of state characteristics of a typical user and a time series of purchase behavior, the time series of purchase behavior of the typical user carries a product identifier of the product purchased by the typical user;
  • the model training module further includes: a utility function determining sub-module;
  • the utility function determination sub-module is used to obtain the maximum value behavior calculation formula determined by the typical user's state characteristic time series and the purchase behavior time series; the dynamic programming method is used to iteratively optimize the maximum value behavior calculation formula Solve to obtain a target maximum value behavior calculation formula; extract a utility function from the target maximum value behavior calculation formula and combine the extracted multiple utility functions into the utility function set.
  • the model training module further includes: a maximum likelihood inverse reinforcement learning sub-module;
  • the maximum likelihood inverse reinforcement learning sub-module is used to linearly superimpose the utility functions in the utility function set to obtain the personal utility function to be estimated; the softmax function is used to normalize the personal utility function to be estimated , Obtain the normalized personal utility function; adopt the maximum entropy inverse reinforcement learning method to estimate the parameters of the normalized personal utility function to obtain the behavior prediction model.
  • the maximum likelihood inverse reinforcement learning sub-module includes: a parameter estimation unit;
  • the parameter estimation unit is used to assume that there is a potential probability distribution under which the expert trajectory is generated, and the known conditions are:
  • f represents the feature expectation (here refers to the expected utility value each product brings to the customer, that is, the personal utility function U agent to be estimated), It is the expert characteristic expectation (the weighted utility value brought by multiple products to the customer), which is the probability of each product being selected (that is, w 1 , w 2 , w 3 ,... in the personal utility function U agent to be estimated) ...W n ), which transforms the problem into a standard form, and becomes the optimal problem when solving the maximum entropy:
  • plogp represents the entropy of a random variable
  • St is followed by calculation The restrictions
  • exp() is an exponential function based on the natural constant e in advanced mathematics; the parameter ⁇ j corresponds to the Lagrangian multiplier, which can be solved by the maximum likelihood method; f j means that each j product brings customers Expected utility value to come.
  • the portrait module 400 includes: a prediction result determination sub-module and a portrait determination sub-module;
  • the prediction result determination submodule is used to compare the target behavior prediction data with a preset threshold, and use the result of the comparison as the prediction result;
  • the portrait determination sub-module is used to combine the prediction result corresponding to the product identifier into a vector as the portrait of the target user.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as the method of generating user portraits.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a user portrait generation method.
  • the method for generating a user portrait includes: acquiring a target user's state feature time series and a purchase behavior time series, the purchase behavior time series carrying the product identification of the product purchased by the target user; searching and searching from a preset model library
  • the behavior prediction model corresponding to the product identifier wherein the behavior prediction model is a model obtained based on the Markov decision process and maximum likelihood inverse reinforcement learning; the state feature time series and the purchase behavior time series are input
  • Probabilistic prediction is performed on the behavior prediction model corresponding to the product identification to obtain the behavior prediction data of the target user; and the portrait of the target user is determined according to the behavior prediction data.
  • This embodiment realizes the description of the user's life stage, life state, and consumption scene by acquiring the target user's state feature time series and purchase behavior time series, thereby facilitating the construction of multi-view user portraits and satisfying users in complex scenarios Portrait demand; because different time series of purchase behaviors are used for different products, each behavior prediction model corresponds to a product, so the granularity of user portraits is improved; because the behavior prediction model is based on Markov decision process and maximum likelihood In the model obtained by inverse reinforcement learning, Markov's decision-making process can fully explore user behaviors when life stages, life states, and consumption scenarios change, improve the accuracy of user portraits, and realize autonomous learning through maximum likelihood inverse reinforcement learning. Improved generalization ability.
  • An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
  • a method for generating a user portrait is realized.
  • Behavior time series the purchase behavior time series carries the product identification of the product purchased by the target user; a behavior prediction model corresponding to the product identification is searched from a preset model library, wherein the behavior prediction model is based on Markov decision process and a model obtained by maximum likelihood inverse reinforcement learning; input the state feature time series and the purchase behavior time series into the behavior prediction model corresponding to the product identification to perform probabilistic prediction to obtain the Behavior prediction data of the target user; according to the behavior prediction data, the portrait of the target user is determined.
  • the user portrait generation method executed above realizes the description of the user's life stage, life state, and consumption scene by acquiring the target user's state feature time series and purchase behavior time series, thereby facilitating the construction of multi-view user portraits, satisfying
  • the demand for user portraits in complex scenes is solved; because different time series of purchase behaviors are used for different products, each behavior prediction model corresponds to a product, so the granularity of user portraits is improved; because the behavior prediction model is based on Markov decision-making
  • the model obtained by the process and maximum likelihood inverse reinforcement learning, the Markov decision process can fully mine user behavior when the life stage, life state, and consumption scene change, improve the accuracy of user portraits, and use maximum likelihood inverse reinforcement learning Autonomous learning is realized and generalization ability is improved.
  • the computer storage medium may be non-volatile or volatile.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请涉及人工智能技术领域,揭示了一种用户画像生成方法、装置、设备及介质,其中方法包括:获取目标用户的状态特征时间序列及购买行为时间序列,购买行为时间序列携带有目标用户购买产品的产品标识;从预设的模型库中查找与产品标识对应的行为预测模型,其中,行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型;将状态特征时间序列及购买行为时间序列输入到与产品标识对应的行为预测模型进行概率预测得到目标用户的行为预测数据;根据行为预测数据,确定目标用户的画像。在人生阶段、人生状态、消费场景发生变化时充分挖掘用户行为,提高用户画像的准确性,提高用户画像颗粒度的精细度。

Description

用户画像生成方法、装置、设备及介质
本申请要求于2020年10月19日提交中国专利局、申请号为202011118110X,发明名称为“用户画像生成方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及到人工智能技术领域,特别是涉及到一种用户画像生成方法、装置、设备及介质。
背景技术
用户画像是用户角色的数字化抽象,是分析挖掘用户行为的模型,构建精准的用户画像,可以帮助企业拓展新兴产品的销售,通过了解用户所处环境、所需产品进行针对性的销售。发明人意识到传统用户画像模型采用羊群模型或人像模型,只能对用户在单一场景下进行分析,不能跟随用户的人生阶段、人生状态、消费场景等改变;现有用户画像描述内容缺乏个性化,用户画像颗粒度较粗,难以满足多个营销场景的需要,难以满足多种角色化要求,难以追踪用户行为培养长期客户。在上述诸多困难的情况下,用户画像帮助业务进行精准营销所得到的提升有局限,既不能实时满足营销端业务人员的需要,也不能高颗粒度地区分不同类型用户的特征差异及需求差异。
技术问题
旨在解决现有技术用户画像帮助业务进行精准营销所得到的提升有局限、不能实时满足营销端业务人员的需要、不能高颗粒度地区分不同类型用户的特征差异及需求差异的技术问题。
技术解决方案
本申请的主要目的为提供一种用户画像生成方法、装置、设备及介质,旨在解决现有技术用户画像帮助业务进行精准营销所得到的提升有局限、不能实时满足营销端业务人员的需要、不能高颗粒度地区分不同类型用户的特征差异及需求差异的技术问题。
为了实现上述发明目的,本申请提出一种用户画像生成方法,所述方法包括:
获取目标用户的状态特征时间序列及购买行为时间序列,所述购买行为时间序列携带有所述目标用户购买产品的产品标识;
从预设的模型库中查找与所述产品标识对应的行为预测模型,其中,所述行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型;
将所述状态特征时间序列及所述购买行为时间序列输入到所述与所述产品标识对应的行为预测模型进行概率预测得到所述目标用户的行为预测数据;根据所述行为预测数据,确定所述目标用户的画像。
本申请还提出了一种用户画像生成装置,所述装置包括:
数据获取模块,用于获取目标用户的状态特征时间序列及购买行为时间序列,所述购买行为时间序列携带有所述目标用户购买产品的产品标识;
模型获取模块,用于从预设的模型库中查找与所述产品标识对应的行为预测模型,其中,所述行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型;
预测模块,用于将所述状态特征时间序列及所述购买行为时间序列输入到所述与所述产品标识对应的行为预测模型进行概率预测得到所述目标用户的行为预测数据;
画像模块,用于根据所述行为预测数据,确定所述目标用户的画像。
本申请还提出了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如下方法步骤:
获取目标用户的状态特征时间序列及购买行为时间序列,所述购买行为时间序列携带有所述目标用户购买产品的产品标识;
从预设的模型库中查找与所述产品标识对应的行为预测模型,其中,所述行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型;
将所述状态特征时间序列及所述购买行为时间序列输入到所述与所述产品标识对应的行为预测模型进行概率预测得到所述目标用户的行为预测数据;根据所述行为预测数据,确定所述目标用户的画像。
本申请还提出了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如下方法步骤:
获取目标用户的状态特征时间序列及购买行为时间序列,所述购买行为时间序列携带有所述目标用户购买产品的产品标识;
从预设的模型库中查找与所述产品标识对应的行为预测模型,其中,所述行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型;
将所述状态特征时间序列及所述购买行为时间序列输入到所述与所述产品标识对应的行为预测模型进行概率预测得到所述目标用户的行为预测数据;根据所述行为预测数据,确定所述目标用户的画像。
有益效果
本申请的用户画像生成方法、装置、设备及介质,通过获取目标用户的状态特征时间序列及购买行为时间序列,实现了对用户的人生阶段、人生状态、消费场景的描述,从而有利于构建多视角的用户画像,满足了复杂场景的用户画像需求;因为针对不同产品采用不同的购买行为时间序列,每个行为预测模型对应一个产品,所以提高了用户画像颗粒度的精细度;因为行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型,马尔科夫决策过程能在人生阶段、人生状态、消费场景发生变化时充分挖掘用户行为,提高了用户画像的准确性,通过最大似然逆强化学习实现了自主学习,提高了泛化能力。
附图说明
图1为本申请一实施例的用户画像生成方法的流程示意图;
图2为本申请一实施例的用户画像生成装置的结构示意框图;
图3为本申请一实施例的计算机设备的结构示意框图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
为了解决现有技术用户画像帮助业务进行精准营销所得到的提升有局限、不 能实时满足营销端业务人员的需要、不能高颗粒度地区分不同类型用户的特征差异及需求差异的技术问题,提出了一种用户画像生成方法,所述方法应用于人工智能技术领域。所述方法通过基于马尔科夫决策过程及最大似然逆强化学习得到的模型得到行为预测模型,再采用行为预测模型进行概率预测,马尔科夫决策过程能在人生阶段、人生状态、消费场景发生变化时充分挖掘用户行为,提高了用户画像的准确性,通过最大似然逆强化学习实现了自主学习,提高了泛化能力。
参照图1,所述用户画像生成方法包括:
S1:获取目标用户的状态特征时间序列及购买行为时间序列,所述购买行为时间序列携带有所述目标用户购买产品的产品标识;
S2:从预设的模型库中查找与所述产品标识对应的行为预测模型,其中,所述行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型;
S3:将所述状态特征时间序列及所述购买行为时间序列输入到所述与所述产品标识对应的行为预测模型进行概率预测得到所述目标用户的行为预测数据;
S4:根据所述行为预测数据,确定所述目标用户的画像。
本实施例通过获取目标用户的状态特征时间序列及购买行为时间序列,实现了对用户的人生阶段、人生状态、消费场景的描述,从而有利于构建多视角的用户画像,满足了复杂场景的用户画像需求;因为针对不同产品采用不同的购买行为时间序列,每个行为预测模型对应一个产品,所以提高了用户画像颗粒度的精细度;因为行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型,马尔科夫决策过程能在人生阶段、人生状态、消费场景发生变化时充分挖掘用户行为,提高了用户画像的准确性,通过最大似然逆强化学习实现了自主学习,提高了泛化能力。
对于S1,可以从数据库中获取目标用户的状态特征时间序列及购买行为时间序列。
所述目标用户的状态特征时间序列及购买行为时间序列,是指同一个待画像的用户的状态特征时间序列及购买行为时间序列。
所述状态特征时间序列,是指待画像用户的状态特征向量的时间序列。每个状态特征向量表述多个用户信息。也就是说,所述状态特征时间序列包括多个状态特征向量,所述多个状态特征向量中的状态特征向量按时间进行排列。用户信息包括但不限于:个人信息、财务状况、购买产品信息、贷款记录、信息浏览记录。比如,状态特征时间序列可以表述为{x 1,x 2,x 3,……x n},{x 1,x 2,x 3,……x n}中每个状态特征向量包括6个向量元素,6个向量元素分别代表述数据产生时间、个人信息、财务状况、购买产品信息、贷款记录、信息浏览记录,也就是说,x i包括6个向量元素,x i的6个向量元素分别代表述数据产生时间、个人信息、财务状况、购买产品信息、贷款记录、信息浏览记录,x i是{x 1,x 2,x 3,……x n}中第i个值(也就是第i个时间的状态特征向量),在此举例不做具体限定。
所述购买行为时间序列,是指待画像用户对某一产品的购买行为特征的时间序列。所述购买行为时间序列包括多个所述购买行为特征,每个所述购买行为特征包括一个值,比如,当购买行为特征为1时表示购买该产品,当购买行为特征为0时表示未购买该产品,在此举例不做具体限定。比如,购买行为时间序列可以表述为{a 1,a 2,a 3,……a n},{a 1,a 2,a 3,……a n}为同一个产品的购买行为,a i有一个值(0或1),当a i是0是表示购买该产品,当a i是1是表示未购买该产品,a i是{a 1,a 2,a 3,……a n}中第i个值(也就是第i个时间的购买行为特征),在此举例不做具体限定。
优选的,所述状态特征时间序列中状态特征向量的个数与所述购买行为时间 序列中购买行为特征的个数相同。
对于S2,从预设的模型库的产品标识中,找出与所述购买行为时间序列携带的所述目标用户购买产品的产品标识相同的标识,将找出的产品标识对应的行为预测模型作为与所述产品标识对应的行为预测模型。
所述预设的模型库中包括至少一个行为预测模型,每个行为预测模型携带有产品标识。所述行为预测模型是对目标用的购买行为进行概率预测的模型。
采用多个典型用户的样本数据,基于马尔科夫决策过程及最大似然逆强化学习进行建模和自主学习,得到行为预测模型。也就是说,所述行为预测模型携带的产品标识与建模和自主学习采用的多个典型用户的样本数据的产品标识相同。
对于S3,将所述状态特征时间序列及所述购买行为时间序列输入与输入的所述购买行为时间序列携带的产品标识对应的行为预测模型进行概率预测,获取所述购买行为时间序列携带的产品标识对应的行为预测模型输出的所述目标用户的行为预测数据,也就是说,所述行为预测数据对应的所述产品标识与用于预测的所述购买行为时间携带的产品标识相同。
所述行为预测数据,是指目标用户对一种产品的购买行为的概率预测值。
重复步骤S2至步骤S3,可以完成对状态特征时间序列及多个购买行为时间序列的概率预测。也就是说,步骤S2至步骤S3每次只预测目标用户对一个产品的购买行为的概率预测值。
对于S4,所述目标用户的画像,用于描述目标用户对产品是否购买。
比如,目标用户的画像可以表述为[1 0 1 1],第一个向量元素代表产品一,第二个向量元素代表产品二,第三个向量元素代表产品三,第四个向量元素代表产品四,向量元素值0代表不购买,向量元素值1代表购买,则目标用户的画像[1 0 1 1]表示目标用户购买产品一、产品三、产品四,目标用户不购买产品二,在此举例不做具体限定。
又比如,目标用户的画像还可以表述为{产品一:1,产品二:0,产品三:1,产品四:1},集合元素值0代表不购买,集合元素值1代表购买,目标用户的画像{产品一:1,产品二:0,产品三:1,产品四:1}表示目标用户购买产品一、产品三、产品四,目标用户不购买产品二,在此举例不做具体限定。
在一个实施例中,上述从预设的模型库中查找与所述产品标识对应的行为预测模型的步骤之前,还包括:
S021:获取多个典型用户的样本数据,其中,所述样本数据携带有所述典型用户购买产品的产品标识;
S022:基于马尔可夫决策过程确定所述样本数据的效用函数集合;
S023:对所述效用函数集合进行最大似然逆强化学习得到所述行为预测模型,所述行为预测模型携带有所述产品标识。
本实施例实现了采用多个典型用户的样本数据,基于马尔可夫决策过程和最大似然逆强化学习确定所述行为预测模型,马尔科夫决策过程能在人生阶段、人生状态、消费场景发生变化时充分挖掘用户行为,提高了用户画像的准确性,通过最大似然逆强化学习实现了自主学习,提高了泛化能力。
对于S021,可以从数据库中获取多个典型用户的样本数据。
所述典型用户的样本数据,是指具有代表性的客户的数据,根据历史客户数据确定。代表性的客户是指某类客户中购买产品的意愿和行为处于本类客户的平均水平的客户。其中,将相似收入水平、相似教育程度、相似家庭成员组成、相似工作经历的客户划分为同一类客户。可以理解的是,划分客户类别的方式还有 其他方式,比如,将相似教育程度、相似家庭成员组成的客户划分为同一类客户,在此举例不做具体限定。
所述样本数据包括:典型用户的状态特征时间序列和购买行为时间序列,所述典型用户的购买行为时间序列携带有所述典型用户购买产品的产品标识。
所述典型用户的状态特征时间序列,是指典型用户的状态特征向量的时间序列。
所述典型用户的购买行为时间序列,是指典型用户对某一产品的购买行为特征的时间序列。
优选的,所述典型用户的状态特征时间序列中状态特征向量的个数与所述典型用户的购买行为时间序列中购买行为特征的个数相同。
对于S022,根据所有所述典型用户的状态特征时间序列和所述产品标识相同的所有所述典型用户的购买行为时间序列,基于马尔可夫决策过程建立状态、行为、效用函数的关系。然后对效用函数进行优化求解,根据优化求解结果确定所述效用函数集合。其中,从优化求解结果中提取效用函数,并将提取的效用函数组合成一个集合,组成的集合就是效用函数集合。
优选的,所述效用函数集合中效用函数的数量与所述典型用户的状态特征时间序列中状态特征向量的个数相同。
对于S023,根据所述效用函数集合进行最大似然逆强化学习时,采用线性叠加的方式将效用函数集合中的效用函数进行整合,采用最大熵逆强化学习对整合结果进行参数估计,参数估计完成得到所述行为预测模型,从而拟合出了个人效用函数和购买行为特征。
所述行为预测模型携带的所述产品标识与步骤S022中所述典型用户的购买行为时间序列对应的所述产品标识相同。
在一个实施例中,上述获取多个典型用户的样本数据,包括:
S0211:获取多个典型用户的历史数据,所述历史数据包括:典型用户的状态特征数据、典型用户的购买行为数据,所述典型用户的购买行为数据携带有所述典型用户购买产品的产品标识;
S0212:对所述典型用户的状态特征数据进行时间序列构建得到所述典型用户状态特征时间序列的样本数据;
S0213:按所述产品标识对所述典型用户购买行为数据进行时间序列构建,得到所述典型用户购买行为时间序列的样本数据。
本实施例实现了对所述典型用户的状态特征数据进行时间序列构建得到所述典型用户状态特征时间序列的样本数据,按所述产品标识对所述典型用户购买行为数据进行时间序列构建,得到所述典型用户购买行为时间序列的样本数据,从而使典型用户的样本数据实现了对用户的人生阶段、人生状态、消费场景的描述,有利于构建多视角的用户画像,满足了复杂场景的用户画像需求。
对于S0211,获取待处理的历史客户数据;根据所述待处理的历史客户数据进行典型用户特征提取,得到所述多个典型用户的历史数据。
每个所述典型用户的历史数据对应一个典型用户。
所述状态特征数据是数据集合。
优选的,所述典型用户的状态特征数据中状态特征数据的个数与所述典型用户的购买行为数据中购买行为数据的个数相同。
对于S0212,从所述典型用户的状态特征数据中提取出状态特征数据;将提 取出的所述状态特征数据进行时间序列构建,得到所述典型用户状态特征时间序列的样本数据。
对于S0213,从所述典型用户的购买行为数据中按所述产品标识提取出购买行为数据,将提取出的所述购买行为数据进行时间序列构建,得到所述典型用户购买行为时间序列的样本数据。也就是说,每次提取出一个产品标识的所述典型用户的购买行为时间序列,经过多次提取,即可确定同一典型用户对应的多个所述典型用户的购买行为时间序列。
在一个实施例中,上述样本数据包括:典型用户的状态特征时间序列和购买行为时间序列,所述典型用户的购买行为时间序列携带有所述典型用户购买产品的产品标识;所述基于马尔可夫决策过程,根据所有所述典型用户的状态特征时间序列和所述产品标识相同的所有所述典型用户的购买行为时间序列,确定效用函数集合的步骤,包括:
S0221:获取由所述典型用户的状态特征时间序列和购买行为时间序列确定得到的最大价值行为计算公式;
S0222:采用动态规划方法迭代对所述最大价值行为计算公式进行优化求解,得到目标最大价值行为计算公式;
S0223:从所述目标最大价值行为计算公式中提取效用函数并将提取的多个效用函数组合为所述效用函数集合。
本实施例实现了采用多个典型用户的样本数据,基于马尔可夫决策过程确定所述效用函数集合,马尔科夫决策过程能在人生阶段、人生状态、消费场景发生变化时充分挖掘用户行为。
对于S0221,最大价值行为计算公式A表述如下:
Figure PCTCN2020132601-appb-000001
其中,p(a|x)即在状态x时采取动作a的概率,U(x,a)是效用函数;x是所述典型用户的状态特征时间序列中的值,所述典型用户的状态特征时间序列表述为{x 1,x 2,x 3,……x n};a为所述典型用户的购买行为时间序列中的值,所述典型用户的购买行为时间序列表述为{a 1,a 2,a 3,……a n}。
对于S0222,对所述最大价值行为计算公式采用动态规划方法迭代进行优化求解,得到所述目标最大价值行为计算公式。
优化求解是寻找一个最优的策略让典型用户在与状态特征时间序列中各个状态特征的交互过程中获得始终比其它策略都要多的收获。优化求解就是使
Figure PCTCN2020132601-appb-000002
的值最大,
Figure PCTCN2020132601-appb-000003
的值最大时提取的效用函数U(x,a)是最有价值的效用函数。
意味着要寻找一个最优的策略让个体在与环境交互过程中获得始终比其它策略都要多的收获,这个最优策略我们可以用ππ表示。一旦找到这个最优策略ππ,那么我们就解决了这个强化学习问题。一般来说,比较难去找到一个最优策略,但是可以通过比较若干不同策略的优劣来确定一个较好的策略,也就是局部最优解。
优选的,采用贝尔曼方程V对所述最大价值行为计算公式采用动态规划方法迭代进行优化求解。
Figure PCTCN2020132601-appb-000004
其中,V(x t)表示基于状态x t,对效用函数U的期望;U(x t,a t)表示在x t(t时刻)和a t(t时刻)时刻的效用函数值;β是衰减因子,衰减因子的取值为0-1(可以包括0,也可以包括1);x是所述典型用户的状态特征时间序列中的值,a为所述典型用户的购买行为时间序列中的值。
优选的,衰减因子取值为0.9,避免过度衰减;t是时间;U是效用函数U(x,a)。
对于S0223,从所述目标最大价值行为计算公式中提取出效用函数,将提取出的效用函数放入所述效用函数集合。
在一个实施例中,上述对所述效用函数集合进行最大似然逆强化学习得到所述行为预测模型的步骤,包括:
S0231:对所述效用函数集合中的效用函数进行线性叠加,得到待估计个人效用函数;
S0232:采用softmax函数对所述待估计个人效用函数进行归一化处理,得到归一化个人效用函数;
S0233:采用最大熵逆强化学习方法对所述归一化个人效用函数进行参数估计,得到所述行为预测模型。
本实施例实现了线性叠加和归一化处理,以实现最大似然逆强化学习,通过最大似然逆强化学习实现了自主学习,提高了泛化能力。
对于S0231,将所述效用函数集合表述为{U 1,U 2,U 3,……U n},将对所述效用函数集合中的效用函数进行线性叠加,得到所述待估计个人效用函数U agent,具体表述为:
U agent=w 1U 1+w 2U 2+w 3U 3+……+w nU n
其中,w 1,w 2,w 3,……w n是需要估计的参数。
对于S0232,优选的,将所述待估计个人效用函数通过softmax函数进行归一化处理。
Softmax函数是归一化指数函数,将一个含任意实数的K维向量z“压缩”到另一个K维实向量σ(z)中,使得每一个元素的范围都在(0,1)之间,并且所有元素的和为1。
Figure PCTCN2020132601-appb-000005
其中,U(x,a) j是指步骤S0231中U agent的w jU j;U(x,a) i是指步骤S0231中U agent的w iU i;e是自然常数,为数学中一个常数,是一个无限不循环小数,且为超越数,其值约为2.718281828459。
在一个实施例中,上述采用最大熵逆强化学习方法对所述归一化个人效用函数进行参数估计,得到所述行为预测模型的步骤,包括:
假设存在一个潜在概率分布,在该概率分布下,产生专家轨迹,已知条件为:
Figure PCTCN2020132601-appb-000006
其中,f表示特征期望(在这里指每一种产品给客户带来的期望效用值,也就是所述待估计个人效用函数U agent),
Figure PCTCN2020132601-appb-000007
是专家特征期望(多种产品给客户带来的加权效用值),为每种产品被选中的概率(也就是所述待估计个人效用函数U agent中的w 1,w 2,w 3,……w n);将问题转化为标准型,成为求解熵最大时的最优问题:
Figure PCTCN2020132601-appb-000008
Figure PCTCN2020132601-appb-000009
s.t.∑w=1
其中,plogp表示一个随机变量的熵;
Figure PCTCN2020132601-appb-000010
是求最大值;S.t.后面是计算
Figure PCTCN2020132601-appb-000011
的限制条件;
通过拉格朗日乘子法:
Figure PCTCN2020132601-appb-000012
求解后,对概率w进行微分计算,得到最大熵概率为:
Figure PCTCN2020132601-appb-000013
其中,exp()高等数学里以自然常数e为底的指数函数;参数λ j对应着拉格朗日乘子,该参数可以利用最大似然法求解;f j指每j种产品给客户带来的期望效用值。
在一个实施例中,所述根据所述行为预测数据,确定所述目标用户的画像的步骤,包括:
S61:将所述行为预测数据与预设阈值进行对比,并将对比的结果作为预测结果;
当所述行为预测数据高于所述预设阈值时确定与产品标识对应的预测结果为购买,否则确定与产品标识对应的预测结果为不购买;
S62:将与产品标识对应的预测结果组合成向量作为所述目标用户的画像。
对于S61,所述预设阈值可以选择0.5、0.55、0.6、0.65、0.7、0.75、0.8,在此举例不做具体限定。预设阈值高得到的预测结果相对预设阈值低得到的预测结果准确度高,范围降低,其中,范围降低是指部分具有购买意愿的用户的预测结果被确定为不购买。
对于S62,可以将所有所述与产品标识对应的预测结果组合成向量,将组合得到的向量作为所述目标用户的画像。
可以理解的是,还可以将所有所述与产品标识对应的预测结果组合成集合,将组合得到的集合作为所述目标用户的画像。
参照图2,本申请还提出了一种用户画像生成装置,所述装置包括:
数据获取模块100,用于获取目标用户的状态特征时间序列及购买行为时间序列,所述购买行为时间序列携带有所述目标用户购买产品的产品标识;
模型获取模块200,用于从预设的模型库中查找与所述产品标识对应的行为 预测模型,其中,所述行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型;
预测模块300,用于将所述状态特征时间序列及所述购买行为时间序列输入到所述与所述产品标识对应的行为预测模型进行概率预测得到所述目标用户的行为预测数据;
画像模块400,用于根据所述行为预测数据,确定所述目标用户的画像。
本实施例通过获取目标用户的状态特征时间序列及购买行为时间序列,实现了对用户的人生阶段、人生状态、消费场景的描述,从而有利于构建多视角的用户画像,满足了复杂场景的用户画像需求;因为针对不同产品采用不同的购买行为时间序列,每个行为预测模型对应一个产品,所以提高了用户画像颗粒度的精细度;因为行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型,马尔科夫决策过程能在人生阶段、人生状态、消费场景发生变化时充分挖掘用户行为,提高了用户画像的准确性,通过最大似然逆强化学习实现了自主学习,提高了泛化能力。
在一个实施例中,所述装置包括:模型训练模块;
所述模型训练模块,用于获取多个典型用户的样本数据,其中,所述样本数据携带有所述典型用户购买产品的产品标识;基于马尔可夫决策过程确定所述样本数据的效用函数集合;对所述效用函数集合进行最大似然逆强化学习得到所述行为预测模型,所述行为预测模型携带有所述产品标识。
在一个实施例中,所述模型训练模块包括:样本获取子模块;
所述样本获取子模块,用于获取多个典型用户的历史数据,所述历史数据包括:典型用户的状态特征数据、典型用户的购买行为数据,所述典型用户的购买行为数据携带有所述典型用户购买产品的产品标识;对所述典型用户的状态特征数据进行时间序列构建得到所述典型用户状态特征时间序列的样本数据;按所述产品标识对所述典型用户购买行为数据进行时间序列构建,得到所述典型用户购买行为时间序列的样本数据。
在一个实施例中,所述样本数据包括:典型用户的状态特征时间序列和购买行为时间序列,所述典型用户的购买行为时间序列携带有所述典型用户购买产品的产品标识;
所述模型训练模块还包括:效用函数确定子模块;
所述效用函数确定子模块,用于获取由所述典型用户的状态特征时间序列和购买行为时间序列确定得到的最大价值行为计算公式;采用动态规划方法迭代对所述最大价值行为计算公式进行优化求解,得到目标最大价值行为计算公式;从所述目标最大价值行为计算公式中提取效用函数并将提取的多个效用函数组合为所述效用函数集合。
在一个实施例中,所述模型训练模块还包括:最大似然逆强化学习子模块;
所述最大似然逆强化学习子模块,用于对所述效用函数集合中的效用函数进行线性叠加,得到待估计个人效用函数;采用softmax函数对所述待估计个人效用函数进行归一化处理,得到归一化个人效用函数;采用最大熵逆强化学习方法对所述归一化个人效用函数进行参数估计,得到所述行为预测模型。
在一个实施例中,所述最大似然逆强化学习子模块包括:参数估计单元;
所述参数估计单元,用于假设存在一个潜在概率分布,在该概率分布下,产生专家轨迹,已知条件为:
Figure PCTCN2020132601-appb-000014
其中,f表示特征期望(在这里指每一种产品给客户带来的期望效用值,也就是所述待估计个人效用函数U agent),
Figure PCTCN2020132601-appb-000015
是专家特征期望(多种产品给客户带来的加权效用值),为每种产品被选中的概率(也就是所述待估计个人效用函数U agent中的w 1,w 2,w 3,……w n),将问题转化为标准型,成为求解熵最大时的最优问题:
Figure PCTCN2020132601-appb-000016
Figure PCTCN2020132601-appb-000017
s.t.∑w=1
其中,plogp表示一个随机变量的熵;
Figure PCTCN2020132601-appb-000018
是求最大值;S.t.后面是计算
Figure PCTCN2020132601-appb-000019
的限制条件;
通过拉格朗日乘子法:
Figure PCTCN2020132601-appb-000020
求解后,对概率w进行微分计算,得到最大熵概率为:
Figure PCTCN2020132601-appb-000021
其中,exp()高等数学里以自然常数e为底的指数函数;参数λ j对应着拉格朗日乘子,该参数可以利用最大似然法求解;f j指每j种产品给客户带来的期望效用值。
在一个实施例中,所述画像模块400包括:预测结果确定子模块、画像确定子模块;
所述预测结果确定子模块,用于将所述目标行为预测数据与预设阈值进行对比,并将对比的结果作为预测结果;
所述画像确定子模块,用于将与产品标识对应的预测结果组合成向量作为所述目标用户的画像。
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于储存用户画像生成方法等数据。该计算机设备的网络接口用于与外部的终端通过网络 连接通信。该计算机程序被处理器执行时以实现一种用户画像生成方法。所述用户画像生成方法,包括:获取目标用户的状态特征时间序列及购买行为时间序列,所述购买行为时间序列携带有所述目标用户购买产品的产品标识;从预设的模型库中查找与所述产品标识对应的行为预测模型,其中,所述行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型;将所述状态特征时间序列及所述购买行为时间序列输入到所述与所述产品标识对应的行为预测模型进行概率预测得到所述目标用户的行为预测数据;根据所述行为预测数据,确定所述目标用户的画像。
本实施例通过获取目标用户的状态特征时间序列及购买行为时间序列,实现了对用户的人生阶段、人生状态、消费场景的描述,从而有利于构建多视角的用户画像,满足了复杂场景的用户画像需求;因为针对不同产品采用不同的购买行为时间序列,每个行为预测模型对应一个产品,所以提高了用户画像颗粒度的精细度;因为行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型,马尔科夫决策过程能在人生阶段、人生状态、消费场景发生变化时充分挖掘用户行为,提高了用户画像的准确性,通过最大似然逆强化学习实现了自主学习,提高了泛化能力。
本申请一实施例还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现一种用户画像生成方法,包括步骤:获取目标用户的状态特征时间序列及购买行为时间序列,所述购买行为时间序列携带有所述目标用户购买产品的产品标识;从预设的模型库中查找与所述产品标识对应的行为预测模型,其中,所述行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型;将所述状态特征时间序列及所述购买行为时间序列输入到所述与所述产品标识对应的行为预测模型进行概率预测得到所述目标用户的行为预测数据;根据所述行为预测数据,确定所述目标用户的画像。
上述执行的用户画像生成方法,通过获取目标用户的状态特征时间序列及购买行为时间序列,实现了对用户的人生阶段、人生状态、消费场景的描述,从而有利于构建多视角的用户画像,满足了复杂场景的用户画像需求;因为针对不同产品采用不同的购买行为时间序列,每个行为预测模型对应一个产品,所以提高了用户画像颗粒度的精细度;因为行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型,马尔科夫决策过程能在人生阶段、人生状态、消费场景发生变化时充分挖掘用户行为,提高了用户画像的准确性,通过最大似然逆强化学习实现了自主学习,提高了泛化能力。
所述计算机存储介质可以是非易失性,也可以是易失性。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总 线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种用户画像生成方法,其中,所述方法包括:
    获取目标用户的状态特征时间序列及购买行为时间序列,所述购买行为时间序列携带有所述目标用户购买产品的产品标识;
    从预设的模型库中查找与所述产品标识对应的行为预测模型,其中,所述行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型;
    将所述状态特征时间序列及所述购买行为时间序列输入到所述与所述产品标识对应的行为预测模型进行概率预测得到所述目标用户的行为预测数据;根据所述行为预测数据,确定所述目标用户的画像。
  2. 根据权利要求1所述的用户画像生成方法,其中,所述从预设的模型库中查找与所述产品标识对应的行为预测模型的步骤之前,还包括:
    获取多个典型用户的样本数据,其中,所述样本数据携带有所述典型用户购买产品的产品标识;
    基于马尔可夫决策过程确定所述样本数据的效用函数集合;
    对所述效用函数集合进行最大似然逆强化学习得到所述行为预测模型,所述行为预测模型携带有所述产品标识。
  3. 根据权利要求2所述的用户画像生成方法,其中,所述获取多个典型用户的样本数据,包括:
    获取多个典型用户的历史数据,所述历史数据包括:典型用户的状态特征数据、典型用户的购买行为数据,所述典型用户的购买行为数据携带有所述典型用户购买产品的产品标识;
    对所述典型用户的状态特征数据进行时间序列构建得到所述典型用户状态特征时间序列的样本数据;
    按所述产品标识对所述典型用户购买行为数据进行时间序列构建,得到所述典型用户购买行为时间序列的样本数据。
  4. 根据权利要求2所述的用户画像生成方法,其中,所述样本数据包括:典型用户的状态特征时间序列和购买行为时间序列,所述典型用户的购买行为时间序列携带有所述典型用户购买产品的产品标识;所述基于马尔可夫决策过程确定所述样本数据的效用函数集合的步骤,包括:
    获取由所述典型用户的状态特征时间序列和购买行为时间序列确定得到的最大价值行为计算公式;
    采用动态规划方法迭代对所述最大价值行为计算公式进行优化求解,得到目标最大价值行为计算公式;
    从所述目标最大价值行为计算公式中提取效用函数并将提取的多个效用函数组合为所述效用函数集合。
  5. 根据权利要求2所述的用户画像生成方法,其中,所述对所述效用函数集合进行最大似然逆强化学习得到所述行为预测模型的步骤,包括:
    对所述效用函数集合中的效用函数进行线性叠加,得到待估计个人效用函数;
    采用softmax函数对所述待估计个人效用函数进行归一化处理,得到归一化个人效用函数;
    采用最大熵逆强化学习方法对所述归一化个人效用函数进行参数估计,得到所述行为预测模型。
  6. 根据权利要求5所述的用户画像生成方法,其中,所述采用最大熵逆强化 学习方法对所述归一化个人效用函数进行参数估计,得到所述行为预测模型的步骤,包括:
    假设存在一个潜在概率分布,在该概率分布下,产生专家轨迹,已知条件为:
    Figure PCTCN2020132601-appb-100001
    其中,f表示特征期望(在这里指每一种产品给客户带来的期望效用值,也就是所述待估计个人效用函数U agent),
    Figure PCTCN2020132601-appb-100002
    是专家特征期望(多种产品给客户带来的加权效用值),为每种产品被选中的概率(也就是所述待估计个人效用函数U agent中的w 1,w 2,w 3,……w n),将问题转化为标准型,成为求解熵最大时的最优问题:
    Figure PCTCN2020132601-appb-100003
    Figure PCTCN2020132601-appb-100004
    s.t.∑w=1
    其中,plogp表示一个随机变量的熵;
    Figure PCTCN2020132601-appb-100005
    是求最大值;S.t.后面是计算
    Figure PCTCN2020132601-appb-100006
    的限制条件;
    通过拉格朗日乘子法:
    Figure PCTCN2020132601-appb-100007
    求解后,对概率w进行微分计算,得到最大熵概率为:
    Figure PCTCN2020132601-appb-100008
    其中,exp()高等数学里以自然常数e为底的指数函数;参数λ j对应着拉格朗日乘子,该参数可以利用最大似然法求解;f j指每j种产品给客户带来的期望效用值。
  7. 根据权利要求1所述的用户画像生成方法,其中,所述根据所述行为预测数据,确定所述目标用户的画像的步骤,包括:
    将所述行为预测数据与预设阈值进行对比,并将对比的结果作为预测结果;
    将与产品标识对应的预测结果组合成向量作为所述目标用户的画像。
  8. 一种用户画像生成装置,其中,所述装置包括:
    数据获取模块,用于获取目标用户的状态特征时间序列及购买行为时间序列,所述购买行为时间序列携带有所述目标用户购买产品的产品标识;
    模型获取模块,用于从预设的模型库中查找与所述产品标识对应的行为预测模型,其中,所述行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型;
    预测模块,用于将所述状态特征时间序列及所述购买行为时间序列输入到所述与所述产品标识对应的行为预测模型进行概率预测得到所述目标用户的行为预测数据;
    画像模块,用于根据所述行为预测数据,确定所述目标用户的画像。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现如下方法步骤:
    获取目标用户的状态特征时间序列及购买行为时间序列,所述购买行为时间序列携带有所述目标用户购买产品的产品标识;
    从预设的模型库中查找与所述产品标识对应的行为预测模型,其中,所述行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型;
    将所述状态特征时间序列及所述购买行为时间序列输入到所述与所述产品标识对应的行为预测模型进行概率预测得到所述目标用户的行为预测数据;根据所述行为预测数据,确定所述目标用户的画像。
  10. 根据权利要求9所述的计算机设备,其中,所述从预设的模型库中查找与所述产品标识对应的行为预测模型的步骤之前,还包括:
    获取多个典型用户的样本数据,其中,所述样本数据携带有所述典型用户购买产品的产品标识;
    基于马尔可夫决策过程确定所述样本数据的效用函数集合;
    对所述效用函数集合进行最大似然逆强化学习得到所述行为预测模型,所述行为预测模型携带有所述产品标识。
  11. 根据权利要求10所述的计算机设备,其中,所述获取多个典型用户的样本数据,包括:
    获取多个典型用户的历史数据,所述历史数据包括:典型用户的状态特征数据、典型用户的购买行为数据,所述典型用户的购买行为数据携带有所述典型用户购买产品的产品标识;
    对所述典型用户的状态特征数据进行时间序列构建得到所述典型用户状态特征时间序列的样本数据;
    按所述产品标识对所述典型用户购买行为数据进行时间序列构建,得到所述典型用户购买行为时间序列的样本数据。
  12. 根据权利要求10所述的计算机设备,其中,所述样本数据包括:典型用户的状态特征时间序列和购买行为时间序列,所述典型用户的购买行为时间序列携带有所述典型用户购买产品的产品标识;所述基于马尔可夫决策过程确定所述样本数据的效用函数集合的步骤,包括:
    获取由所述典型用户的状态特征时间序列和购买行为时间序列确定得到的最大价值行为计算公式;
    采用动态规划方法迭代对所述最大价值行为计算公式进行优化求解,得到目标最大价值行为计算公式;
    从所述目标最大价值行为计算公式中提取效用函数并将提取的多个效用函数组合为所述效用函数集合。
  13. 根据权利要求10所述的计算机设备,其中,所述对所述效用函数集合进行最大似然逆强化学习得到所述行为预测模型的步骤,包括:
    对所述效用函数集合中的效用函数进行线性叠加,得到待估计个人效用函数;
    采用softmax函数对所述待估计个人效用函数进行归一化处理,得到归一化个人效用函数;
    采用最大熵逆强化学习方法对所述归一化个人效用函数进行参数估计,得到所述行为预测模型。
  14. 根据权利要求13所述的计算机设备,其中,所述采用最大熵逆强化学习方法对所述归一化个人效用函数进行参数估计,得到所述行为预测模型的步骤,包括:
    假设存在一个潜在概率分布,在该概率分布下,产生专家轨迹,已知条件为:
    Figure PCTCN2020132601-appb-100009
    其中,f表示特征期望(在这里指每一种产品给客户带来的期望效用值,也就是所述待估计个人效用函数U agent),
    Figure PCTCN2020132601-appb-100010
    是专家特征期望(多种产品给客户带来的加权效用值),为每种产品被选中的概率(也就是所述待估计个人效用函数U agent中的w 1,w 2,w 3,……w n),将问题转化为标准型,成为求解熵最大时的最优问题:
    Figure PCTCN2020132601-appb-100011
    Figure PCTCN2020132601-appb-100012
    s.t.∑w=1
    其中,plogp表示一个随机变量的熵;
    Figure PCTCN2020132601-appb-100013
    是求最大值;S.t.后面是计算
    Figure PCTCN2020132601-appb-100014
    的限制条件;
    通过拉格朗日乘子法:
    Figure PCTCN2020132601-appb-100015
    求解后,对概率w进行微分计算,得到最大熵概率为:
    Figure PCTCN2020132601-appb-100016
    其中,exp()高等数学里以自然常数e为底的指数函数;参数λ j对应着拉格朗日乘子,该参数可以利用最大似然法求解;f j指每j种产品给客户带来的期望效用值。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下方法步骤:
    获取目标用户的状态特征时间序列及购买行为时间序列,所述购买行为时间序列携带有所述目标用户购买产品的产品标识;
    从预设的模型库中查找与所述产品标识对应的行为预测模型,其中,所述行为预测模型是基于马尔科夫决策过程及最大似然逆强化学习得到的模型;
    将所述状态特征时间序列及所述购买行为时间序列输入到所述与所述产品标识对应的行为预测模型进行概率预测得到所述目标用户的行为预测数据;根据所述行为预测数据,确定所述目标用户的画像。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述从预设的模型库中查找与所述产品标识对应的行为预测模型的步骤之前,还包括:
    获取多个典型用户的样本数据,其中,所述样本数据携带有所述典型用户购买产品的产品标识;
    基于马尔可夫决策过程确定所述样本数据的效用函数集合;
    对所述效用函数集合进行最大似然逆强化学习得到所述行为预测模型,所述行为预测模型携带有所述产品标识。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述获取多个典型用户的样本数据,包括:
    获取多个典型用户的历史数据,所述历史数据包括:典型用户的状态特征数据、典型用户的购买行为数据,所述典型用户的购买行为数据携带有所述典型用户购买产品的产品标识;
    对所述典型用户的状态特征数据进行时间序列构建得到所述典型用户状态特征时间序列的样本数据;
    按所述产品标识对所述典型用户购买行为数据进行时间序列构建,得到所述典型用户购买行为时间序列的样本数据。
  18. 根据权利要求16所述的计算机可读存储介质,其中,所述样本数据包括:典型用户的状态特征时间序列和购买行为时间序列,所述典型用户的购买行为时间序列携带有所述典型用户购买产品的产品标识;所述基于马尔可夫决策过程确定所述样本数据的效用函数集合的步骤,包括:
    获取由所述典型用户的状态特征时间序列和购买行为时间序列确定得到的最大价值行为计算公式;
    采用动态规划方法迭代对所述最大价值行为计算公式进行优化求解,得到目标最大价值行为计算公式;
    从所述目标最大价值行为计算公式中提取效用函数并将提取的多个效用函数组合为所述效用函数集合。
  19. 根据权利要求16所述的计算机可读存储介质,其中,所述对所述效用函数集合进行最大似然逆强化学习得到所述行为预测模型的步骤,包括:
    对所述效用函数集合中的效用函数进行线性叠加,得到待估计个人效用函数;
    采用softmax函数对所述待估计个人效用函数进行归一化处理,得到归一化个人效用函数;
    采用最大熵逆强化学习方法对所述归一化个人效用函数进行参数估计,得到所述行为预测模型。
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述采用最大熵逆强化学习方法对所述归一化个人效用函数进行参数估计,得到所述行为预测模型的步骤,包括:
    假设存在一个潜在概率分布,在该概率分布下,产生专家轨迹,已知条件为:
    Figure PCTCN2020132601-appb-100017
    其中,f表示特征期望(在这里指每一种产品给客户带来的期望效用值,也就是所述待估计个人效用函数U agent),
    Figure PCTCN2020132601-appb-100018
    是专家特征期望(多种产品给客户带来 的加权效用值),为每种产品被选中的概率(也就是所述待估计个人效用函数U agent中的w 1,w 2,w 3,……w n),将问题转化为标准型,成为求解熵最大时的最优问题:
    Figure PCTCN2020132601-appb-100019
    Figure PCTCN2020132601-appb-100020
    s.t.∑w=1
    其中,plogp表示一个随机变量的熵;
    Figure PCTCN2020132601-appb-100021
    是求最大值;S.t.后面是计算
    Figure PCTCN2020132601-appb-100022
    的限制条件;
    通过拉格朗日乘子法:
    Figure PCTCN2020132601-appb-100023
    求解后,对概率w进行微分计算,得到最大熵概率为:
    Figure PCTCN2020132601-appb-100024
    其中,exp()高等数学里以自然常数e为底的指数函数;参数λ j对应着拉格朗日乘子,该参数可以利用最大似然法求解;f j指每j种产品给客户带来的期望效用值。
PCT/CN2020/132601 2020-10-19 2020-11-30 用户画像生成方法、装置、设备及介质 WO2021189922A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011118110.XA CN112256961B (zh) 2020-10-19 2020-10-19 用户画像生成方法、装置、设备及介质
CN202011118110.X 2020-10-19

Publications (1)

Publication Number Publication Date
WO2021189922A1 true WO2021189922A1 (zh) 2021-09-30

Family

ID=74243980

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/132601 WO2021189922A1 (zh) 2020-10-19 2020-11-30 用户画像生成方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN112256961B (zh)
WO (1) WO2021189922A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113988070A (zh) * 2021-10-09 2022-01-28 广州快决测信息科技有限公司 调研问题生成方法、装置、计算机设备和存储介质
CN113988727A (zh) * 2021-12-28 2022-01-28 青岛海尔工业智能研究院有限公司 资源调度方法及系统
CN114331512A (zh) * 2021-12-22 2022-04-12 重庆汇博利农科技有限公司 一种可视化数据建模及大数据画像的方法
CN114663132A (zh) * 2022-03-02 2022-06-24 厦门文杉信息科技有限公司 一种基于实时用户画像的智能营销方法及装置
US20220398607A1 (en) * 2021-06-14 2022-12-15 Fujitsu Limited Method for inverse reinforcement learning and information processing apparatus
CN117271905A (zh) * 2023-11-21 2023-12-22 杭州小策科技有限公司 基于人群画像的侧向需求分析方法及系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516533A (zh) * 2021-06-24 2021-10-19 平安科技(深圳)有限公司 基于改进bert模型的产品推荐方法、装置、设备及介质
CN113592551A (zh) * 2021-07-31 2021-11-02 广州小鹏汽车科技有限公司 购车用户行为数据分析处理方法、装置及设备
CN115098931B (zh) * 2022-07-20 2022-12-16 江苏艾佳家居用品有限公司 一种用于挖掘用户室内设计个性化需求的小样本分析方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105761102A (zh) * 2016-02-04 2016-07-13 杭州朗和科技有限公司 一种预测用户购买商品行为的方法和装置
KR101813805B1 (ko) * 2016-09-28 2017-12-29 한양대학교 산학협력단 머신 러닝을 이용한 사용자의 구매 확률 예측 방법 및 장치
CN107705155A (zh) * 2017-10-11 2018-02-16 北京三快在线科技有限公司 一种消费能力预测方法、装置、电子设备及可读存储介质
CN108492138A (zh) * 2018-03-19 2018-09-04 平安科技(深圳)有限公司 产品购买预测方法、服务器及存储介质
CN109242524A (zh) * 2017-07-10 2019-01-18 Sk普兰尼特有限公司 基于用户的行为顺序预测购买概率的方法及其设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7835936B2 (en) * 2004-06-05 2010-11-16 Sap Ag System and method for modeling customer response using data observable from customer buying decisions
JP5722276B2 (ja) * 2012-05-16 2015-05-20 日本電信電話株式会社 初回購買推定装置、方法、及びプログラム
CN106228314A (zh) * 2016-08-11 2016-12-14 电子科技大学 基于深度增强学习的工作流调度方法
CN108594638B (zh) * 2018-03-27 2020-07-24 南京航空航天大学 面向多任务多指标优化约束的航天器acs在轨重构方法
CN110570279A (zh) * 2019-09-04 2019-12-13 深圳创新奇智科技有限公司 一种基于用户实时行为的策略化推荐方法及装置
CN111159534A (zh) * 2019-12-03 2020-05-15 泰康保险集团股份有限公司 基于用户画像的辅助决策方法及装置、设备和介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105761102A (zh) * 2016-02-04 2016-07-13 杭州朗和科技有限公司 一种预测用户购买商品行为的方法和装置
KR101813805B1 (ko) * 2016-09-28 2017-12-29 한양대학교 산학협력단 머신 러닝을 이용한 사용자의 구매 확률 예측 방법 및 장치
CN109242524A (zh) * 2017-07-10 2019-01-18 Sk普兰尼特有限公司 基于用户的行为顺序预测购买概率的方法及其设备
CN107705155A (zh) * 2017-10-11 2018-02-16 北京三快在线科技有限公司 一种消费能力预测方法、装置、电子设备及可读存储介质
CN108492138A (zh) * 2018-03-19 2018-09-04 平安科技(深圳)有限公司 产品购买预测方法、服务器及存储介质

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220398607A1 (en) * 2021-06-14 2022-12-15 Fujitsu Limited Method for inverse reinforcement learning and information processing apparatus
CN113988070A (zh) * 2021-10-09 2022-01-28 广州快决测信息科技有限公司 调研问题生成方法、装置、计算机设备和存储介质
CN113988070B (zh) * 2021-10-09 2023-05-05 广州快决测信息科技有限公司 调研问题生成方法、装置、计算机设备和存储介质
CN114331512A (zh) * 2021-12-22 2022-04-12 重庆汇博利农科技有限公司 一种可视化数据建模及大数据画像的方法
CN114331512B (zh) * 2021-12-22 2023-08-25 重庆汇博利农科技有限公司 一种可视化数据建模及大数据画像的方法
CN113988727A (zh) * 2021-12-28 2022-01-28 青岛海尔工业智能研究院有限公司 资源调度方法及系统
CN113988727B (zh) * 2021-12-28 2022-05-10 卡奥斯工业智能研究院(青岛)有限公司 资源调度方法及系统
CN114663132A (zh) * 2022-03-02 2022-06-24 厦门文杉信息科技有限公司 一种基于实时用户画像的智能营销方法及装置
CN117271905A (zh) * 2023-11-21 2023-12-22 杭州小策科技有限公司 基于人群画像的侧向需求分析方法及系统
CN117271905B (zh) * 2023-11-21 2024-02-09 杭州小策科技有限公司 基于人群画像的侧向需求分析方法及系统

Also Published As

Publication number Publication date
CN112256961A (zh) 2021-01-22
CN112256961B (zh) 2024-04-09

Similar Documents

Publication Publication Date Title
WO2021189922A1 (zh) 用户画像生成方法、装置、设备及介质
Ranganath et al. Operator variational inference
Burnap et al. Design and evaluation of product aesthetics: A human-machine hybrid approach
JP2022525702A (ja) モデル公平性のためのシステムおよび方法
CN109766557B (zh) 一种情感分析方法、装置、存储介质及终端设备
CN113705772A (zh) 一种模型训练方法、装置、设备及可读存储介质
Ciliberto et al. A general framework for consistent structured prediction with implicit loss embeddings
CN111209398A (zh) 一种基于图卷积神经网络的文本分类方法、系统
Liu et al. PHD: A probabilistic model of hybrid deep collaborative filtering for recommender systems
Foong et al. Pathologies of factorised gaussian and mc dropout posteriors in bayesian neural networks
US20230105547A1 (en) Machine learning model fairness and explainability
He et al. Uniform-pac bounds for reinforcement learning with linear function approximation
Lataniotis Data-driven uncertainty quantification for high-dimensional engineering problems
US11144938B2 (en) Method and system for predictive modeling of consumer profiles
Li et al. High-dimensional interaction detection with false sign rate control
Ahan et al. Social network analysis using data segmentation and neural networks
Richard et al. Link discovery using graph feature tracking
Veras et al. A sparse linear regression model for incomplete datasets
Kei et al. Change point detection on a separable model for dynamic networks
CN114238798A (zh) 基于神经网络的搜索排序方法、系统、设备及存储介质
Zhang et al. Recommendation based on collaborative filtering by convolution deep learning model based on label weight nearest neighbor
CN112884028A (zh) 一种系统资源调整方法、装置及设备
Sumalatha et al. Rough set based decision rule generation to find behavioural patterns of customers
Guo et al. A semi-supervised label distribution learning model with label correlations and data manifold exploration
Gray-Davies et al. Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20927729

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20927729

Country of ref document: EP

Kind code of ref document: A1