CN112256961A

CN112256961A - User portrait generation method, device, equipment and medium

Info

Publication number: CN112256961A
Application number: CN202011118110.XA
Authority: CN
Inventors: 夏婧; 吴振宇; 王建明
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-10-19
Filing date: 2020-10-19
Publication date: 2021-01-22
Anticipated expiration: 2040-10-19
Also published as: CN112256961B; WO2021189922A1

Abstract

The application relates to the technical field of artificial intelligence, and discloses a user portrait generation method, a device, equipment and a medium, wherein the method comprises the following steps: acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user; searching a behavior prediction model corresponding to the product identification from a preset model library, wherein the behavior prediction model is obtained based on a Markov decision process and maximum likelihood inverse reinforcement learning; inputting the state characteristic time sequence and the purchasing behavior time sequence into a behavior prediction model corresponding to the product identification for probability prediction to obtain behavior prediction data of a target user; based on the behavior prediction data, a representation of the target user is determined. The user behaviors are fully mined when the life stage, the life state and the consumption scene change, the user portrait accuracy is improved, and the fineness of the user portrait granularity is improved.

Description

User portrait generation method, device, equipment and medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for generating a user portrait.

Background

The user portrait is a digital abstract of a user role, is a model for analyzing and mining user behaviors, constructs an accurate user portrait, can help enterprises to expand sales of emerging products, and can be purposefully sold by knowing the environment of the user and the needed products. The traditional user portrait model adopts an ethnic group model or a portrait model, can only analyze a user in a single scene, and cannot follow changes of the life stage, the life state, the consumption scene and the like of the user; the existing user portrait description content is lack of individuation, the granularity of the user portrait is coarse, the requirements of a plurality of marketing scenes are difficult to meet, the requirements of various personalities are difficult to meet, and the user behavior is difficult to track to cultivate long-term customers. Under the conditions of the difficulties, the promotion obtained by accurate marketing of the user image drawing help business is limited, the requirements of business personnel at a marketing end can not be met in real time, and the characteristic difference and the requirement difference of different types of users can not be distinguished at high granularity.

Disclosure of Invention

The application mainly aims to provide a user portrait generation method, a user portrait generation device, user portrait generation equipment and a user portrait generation medium, and aims to solve the technical problems that in the prior art, promotion of accurate marketing of user portrait help business is limited, requirements of business personnel at a marketing end cannot be met in real time, and feature differences and demand differences of different types of users cannot be distinguished in a high-granularity mode.

In order to achieve the above object, the present application provides a method for generating a user portrait, the method comprising:

acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;

searching a behavior prediction model corresponding to the product identification from a preset model library, wherein the behavior prediction model is obtained based on a Markov decision process and maximum likelihood inverse reinforcement learning;

inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identification for probability prediction to obtain behavior prediction data of the target user; determining the portrait of the target user according to the behavior prediction data.

Further, before the step of searching the behavior prediction model corresponding to the product identifier from the preset model library, the method further includes:

acquiring sample data of a plurality of typical users, wherein the sample data carries product identifiers of products purchased by the typical users;

determining a utility function set of the sample data based on a Markov decision process;

and carrying out maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, wherein the behavior prediction model carries the product identification.

Further, the obtaining sample data of a plurality of typical users includes:

acquiring historical data of a plurality of typical users, wherein the historical data comprises: the system comprises state characteristic data of a typical user and purchasing behavior data of the typical user, wherein the purchasing behavior data of the typical user carries a product identifier of a product purchased by the typical user;

time series construction is carried out on the state characteristic data of the typical user to obtain sample data of the state characteristic time series of the typical user;

and constructing a time sequence of the typical user purchasing behavior data according to the product identification to obtain sample data of the typical user purchasing behavior time sequence.

Further, the sample data comprises: the method comprises the steps that a state characteristic time sequence and a purchasing behavior time sequence of a typical user are obtained, and the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user; the step of determining a set of utility functions for the sample data based on a markov decision process comprises:

acquiring a maximum value behavior calculation formula determined by the state characteristic time sequence and the purchasing behavior time sequence of the typical user;

performing iteration on the maximum value behavior calculation formula by adopting a dynamic programming method to perform optimization solution to obtain a target maximum value behavior calculation formula;

and extracting a utility function from the target maximum value behavior calculation formula and combining the extracted plurality of utility functions into the utility function set.

Further, the step of performing maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model includes:

linearly superposing utility functions in the utility function set to obtain a personal utility function to be estimated;

carrying out normalization processing on the personal utility function to be estimated by adopting a softmax function to obtain a normalized personal utility function;

and performing parameter estimation on the normalized personal utility function by adopting a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model.

Further, the step of performing parameter estimation on the normalized personal utility function by using a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model includes:

assuming that there is a potential probability distribution under which expert trajectories are generated, the known conditions are:

wherein f represents the characteristic expectation (here, the expected utility value of each product to the client, i.e. the personal utility function U to be estimated_agent)，

Is specially designedFamily characteristic expectation (weighted utility value brought to customers by various products), probability of being selected for each product (namely, the personal utility function U to be estimated_agentW in₁,w₂,w₃,……w_n) And converting the problem into a standard type to become an optimal problem when the solving entropy is maximum:

s.t.∑w＝1

wherein plogp represents the entropy of a random variable;

the maximum value is calculated; s.t. is followed by calculation

The limiting conditions of (1);

by lagrange multiplier method:

after solving, carrying out differential calculation on the probability w to obtain the maximum entropy probability as follows:

wherein, the exp () higher mathematics is an exponential function with a natural constant e as a base; parameter lambda_jCorresponding to the Lagrange multiplier, the parameter can be solved by using a maximum likelihood method; f. of_jRefers to the expected utility value that each j product brings to the customer.

Further, the step of determining the representation of the target user based on the behavior prediction data comprises:

comparing the behavior prediction data with a preset threshold value, and taking the comparison result as a prediction result;

and combining the prediction results corresponding to the product identification into a vector to serve as the portrait of the target user.

The present application further provides a user representation generation apparatus, the apparatus comprising:

the data acquisition module is used for acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;

the model acquisition module is used for searching a behavior prediction model corresponding to the product identifier from a preset model library, wherein the behavior prediction model is obtained based on a Markov decision process and maximum likelihood inverse reinforcement learning;

the prediction module is used for inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identifier to perform probability prediction to obtain behavior prediction data of the target user;

and the portrait module is used for determining the portrait of the target user according to the behavior prediction data.

The present application further proposes a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of the above methods when executing the computer program.

The present application also proposes a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.

According to the user portrait generation method, device, equipment and medium, description of the life stage, the life state and the consumption scene of the user is achieved by acquiring the state characteristic time sequence and the purchasing behavior time sequence of the target user, so that the construction of a multi-view user portrait is facilitated, and the user portrait requirement of a complex scene is met; because different purchasing behavior time sequences are adopted for different products, each behavior prediction model corresponds to one product, and the fineness of the granularity of the portrait of the user is improved; because the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood inverse reinforcement learning, the Markov decision process can fully mine user behaviors when the life stage, the life state and the consumption scene change, the accuracy of the user portrait is improved, the autonomous learning is realized through the maximum likelihood inverse reinforcement learning, and the generalization capability is improved.

Drawings

FIG. 1 is a schematic flow chart illustrating a user representation generation method according to an embodiment of the present application;

FIG. 2 is a block diagram of a user representation generation apparatus according to an embodiment of the present application;

fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In order to solve the technical problems that in the prior art, improvement of a user portrait help service for accurate marketing is limited, requirements of service personnel at a marketing end cannot be met in real time, and feature differences and demand differences of different types of users cannot be distinguished in a high-granularity mode, a user portrait generation method is provided and is applied to the technical field of artificial intelligence. According to the method, a behavior prediction model is obtained through a model obtained based on a Markov decision process and maximum likelihood inverse reinforcement learning, probability prediction is carried out by adopting the behavior prediction model, user behaviors can be fully mined when life stages, life states and consumption scenes change in the Markov decision process, accuracy of user portrayal is improved, autonomous learning is achieved through maximum likelihood inverse reinforcement learning, and generalization capability is improved.

Referring to fig. 1, the user representation generation method includes:

s1: acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;

s2: searching a behavior prediction model corresponding to the product identification from a preset model library, wherein the behavior prediction model is obtained based on a Markov decision process and maximum likelihood inverse reinforcement learning;

s3: inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identification for probability prediction to obtain behavior prediction data of the target user;

s4: determining the portrait of the target user according to the behavior prediction data.

In the embodiment, the description of the life stage, the life state and the consumption scene of the user is realized by acquiring the state characteristic time sequence and the purchasing behavior time sequence of the target user, so that the construction of a multi-view user portrait is facilitated, and the user portrait requirement of a complex scene is met; because different purchasing behavior time sequences are adopted for different products, each behavior prediction model corresponds to one product, and the fineness of the granularity of the portrait of the user is improved; because the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood inverse reinforcement learning, the Markov decision process can fully mine user behaviors when the life stage, the life state and the consumption scene change, the accuracy of the user portrait is improved, the autonomous learning is realized through the maximum likelihood inverse reinforcement learning, and the generalization capability is improved.

For S1, the time series of status features and the time series of purchases of the target user can be obtained from the database.

The state characteristic time sequence and the purchasing behavior time sequence of the target user refer to the state characteristic time sequence and the purchasing behavior time sequence of the same user to be portrait.

The state feature time series refers to a time series of state feature vectors of a user to be depicted. Each state feature vector represents a plurality of user information. That is, the state feature time series includes a plurality of state feature vectors, of which state feature vectors are arranged in time. User information includes, but is not limited to: personal information, financial status, purchased product information, loan records, information browsing records. For example, a state feature time series may be expressed as { x }₁,x₂,x₃,……x_n}，{x₁,x₂,x₃,……x_nEach state feature vector in the system comprises 6 vector elements, and the 6 vector elements respectively represent data generation time, personal information, financial conditions, purchased product information, loan records and information browsing records, namely, x_iComprising 6 vector elements, x_iThe 6 vector elements respectively represent the generation time of the data, personal information, financial condition, purchased product information, loan record, information browsing record, and x_iIs { x₁,x₂,x₃,……x_nThe ith value (i.e. the status feature vector at the ith time) in (i) is not specifically limited by this example.

The purchasing behavior time sequence refers to a time sequence of purchasing behavior characteristics of a product to be imaged by a user. The purchasing behavior time sequence includes a plurality of purchasing behavior characteristics, each of which includes a value, for example, when the purchasing behavior characteristic is 1, it indicates that the product is purchased, and when the purchasing behavior characteristic is 0, it indicates that the product is not purchased, which is not limited in this example. For example, the purchasing behavior time series may be expressed as { a }₁,a₂,a₃,……a_n}，{a₁,a₂,a₃,……a_nIs the purchase of the same product, a_iHas a value (0 or 1) when a_iIs 0 indicates purchase of the product, when a_iIs 1 means that the product was not purchased, a_iIs { a₁,a₂,a₃,……a_nThe ith value (i.e. the purchasing behavior characteristic of the ith time) in the electronic map is not specifically limited by this example.

Preferably, the number of the state feature vectors in the state feature time series is the same as the number of the purchasing behavior features in the purchasing behavior time series.

And S2, finding out the identifier which is the same as the product identifier of the product purchased by the target user and carried by the purchasing behavior time sequence from the product identifiers of a preset model library, and taking the behavior prediction model corresponding to the found product identifier as the behavior prediction model corresponding to the product identifier.

The preset model library comprises at least one behavior prediction model, and each behavior prediction model carries a product identifier. The behavior prediction model is a model for performing probability prediction on purchasing behavior for a target.

And modeling and autonomous learning are carried out on the basis of a Markov decision process and maximum likelihood inverse reinforcement learning by adopting sample data of a plurality of typical users to obtain a behavior prediction model. That is, the behavior prediction model carries the same product identification as that of sample data of a plurality of typical users employed for modeling and autonomous learning.

For S3, the state feature time series and the purchasing behavior time series are input into a behavior prediction model corresponding to a product identifier carried in the purchasing behavior time series for probability prediction, and behavior prediction data of the target user output by the behavior prediction model corresponding to the product identifier carried in the purchasing behavior time series is obtained, that is, the product identifier corresponding to the behavior prediction data is the same as the product identifier carried in the purchasing behavior time for prediction.

The behavior prediction data refers to a probability prediction value of a purchasing behavior of a target user on a product.

Repeating steps S2 to S3 can complete the probability prediction of the state feature time series and the plurality of purchasing behavior time series. That is, steps S2 to S3 predict only the probability prediction value of the purchase behavior of the target user for one product at a time.

For S4, the representation of the target user is used to describe whether the target user purchased the product.

For example, the portrait of the target user may be expressed as [ 1011 ], the first vector element represents product one, the second vector element represents product two, the third vector element represents product three, the fourth vector element represents product four, the vector element value 0 represents no purchase, the vector element value 1 represents purchase, the portrait of the target user [ 1011 ] represents that the target user purchases product one, product three, and product four, and the target user does not purchase product two, which is not limited by the examples.

For another example, the representation of the target user can be expressed as { product one: 1, product two: 0, product three: 1, product four: 1, the value of Collection element 0 represents no purchase, the value of Collection element 1 represents purchase, the representation of the target user { product one: 1, product two: 0, product three: 1, product four: 1 represents that the target user purchases product one, product three, and product four, and the target user does not purchase product two, which is not specifically limited in this example.

In an embodiment, before the step of searching the behavior prediction model corresponding to the product identifier from the preset model library, the method further includes:

s021: acquiring sample data of a plurality of typical users, wherein the sample data carries product identifiers of products purchased by the typical users;

s022: determining a utility function set of the sample data based on a Markov decision process;

s023: and carrying out maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, wherein the behavior prediction model carries the product identification.

According to the embodiment, the behavior prediction model is determined by adopting the sample data of a plurality of typical users based on the Markov decision process and the maximum likelihood inverse reinforcement learning, the Markov decision process can fully mine the user behaviors when the life stage, the life state and the consumption scene change, the accuracy of the user portrait is improved, the autonomous learning is realized through the maximum likelihood inverse reinforcement learning, and the generalization capability is improved.

For S021, sample data of a plurality of typical users may be acquired from the database.

The sample data of the typical user refers to data of a representative client and is determined according to historical client data. A representative customer is a customer whose willingness and behavior to purchase products is on average among a certain class of customers. Wherein, the customers with similar income level, similar education degree, similar family member composition and similar work experience are divided into the same class of customers. It is understood that there are other ways to classify customers into the same category, for example, the customers composed of similar family members with similar education degree are classified into the same category, and the examples are not limited in this respect.

The sample data includes: the method comprises the steps of a state characteristic time sequence and a purchasing behavior time sequence of a typical user, wherein the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user.

The state feature time sequence of the typical user refers to a time sequence of a state feature vector of the typical user.

The purchasing behavior time sequence of the typical user refers to the time sequence of the purchasing behavior characteristics of the typical user on a certain product.

Preferably, the number of the state feature vectors in the state feature time series of the typical user is the same as the number of the purchasing behavior features in the purchasing behavior time series of the typical user.

And for S022, establishing a relation among states, behaviors and utility functions based on a Markov decision process according to the state feature time series of all the typical users and the purchasing behavior time series of all the typical users with the same product identification. And then carrying out optimization solution on the utility function, and determining the utility function set according to an optimization solution result. And extracting utility functions from the optimization solution result, combining the extracted utility functions into a set, wherein the set is the utility function set.

Preferably, the number of utility functions in the utility function set is the same as the number of state feature vectors in the state feature time series of the typical user.

And S023, when performing maximum likelihood inverse reinforcement learning according to the utility function set, integrating the utility functions in the utility function set in a linear superposition mode, performing parameter estimation on an integrated result by adopting maximum entropy inverse reinforcement learning, and completing parameter estimation to obtain the behavior prediction model, thereby fitting out the personal utility function and the purchasing behavior characteristics.

The product identifiers carried by the behavior prediction model are the same as the product identifiers corresponding to the time series of purchasing behaviors of the typical user in step S022.

In an embodiment, the obtaining sample data of a plurality of typical users includes:

s0211: acquiring historical data of a plurality of typical users, wherein the historical data comprises: the system comprises state characteristic data of a typical user and purchasing behavior data of the typical user, wherein the purchasing behavior data of the typical user carries a product identifier of a product purchased by the typical user;

s0212: time series construction is carried out on the state characteristic data of the typical user to obtain sample data of the state characteristic time series of the typical user;

s0213: and constructing a time sequence of the typical user purchasing behavior data according to the product identification to obtain sample data of the typical user purchasing behavior time sequence.

According to the method and the device, the time sequence construction of the state characteristic data of the typical user is carried out to obtain the sample data of the state characteristic time sequence of the typical user, the time sequence construction of the purchasing behavior data of the typical user is carried out according to the product identification, and the sample data of the purchasing behavior time sequence of the typical user is obtained, so that the description of the life stage, the life state and the consumption scene of the user is realized through the sample data of the typical user, the construction of a multi-view-angle user portrait is facilitated, and the user portrait requirement of a complex scene is met.

For S0211, acquiring historical client data to be processed; and extracting typical user characteristics according to the historical client data to be processed to obtain historical data of the plurality of typical users.

And each typical user's historical data corresponds to a typical user.

The state characteristic data is a data set.

Preferably, the number of the status feature data in the status feature data of the typical user is the same as the number of the purchasing behavior data in the purchasing behavior data of the typical user.

For S0212, extracting state feature data from the state feature data of the typical user; and constructing a time sequence of the extracted state characteristic data to obtain sample data of the typical user state characteristic time sequence.

And S0213, extracting purchasing behavior data from the purchasing behavior data of the typical user according to the product identifier, and performing time sequence construction on the extracted purchasing behavior data to obtain sample data of the time sequence of the purchasing behavior of the typical user. That is to say, the purchasing behavior time series of the typical user corresponding to the same typical user can be determined through multiple times of extraction after the purchasing behavior time series of the typical user of one product identifier is extracted each time.

In one embodiment, the sample data includes: the method comprises the steps that a state characteristic time sequence and a purchasing behavior time sequence of a typical user are obtained, and the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user; the step of determining a utility function set according to the state feature time series of all the typical users and the purchasing behavior time series of all the typical users with the same product identification based on the Markov decision process comprises the following steps:

s0221: acquiring a maximum value behavior calculation formula determined by the state characteristic time sequence and the purchasing behavior time sequence of the typical user;

s0222: performing iteration on the maximum value behavior calculation formula by adopting a dynamic programming method to perform optimization solution to obtain a target maximum value behavior calculation formula;

s0223: and extracting a utility function from the target maximum value behavior calculation formula and combining the extracted plurality of utility functions into the utility function set.

According to the method and the device, the utility function set is determined by adopting the sample data of a plurality of typical users based on the Markov decision process, and the Markov decision process can fully mine user behaviors when the life stage, the life state and the consumption scene change.

For S0221, the maximum value behavior calculation formula A is expressed as follows:

where p (a | x) is the probability of taking action a at state x, and U (x, a) is the utility function; x is a value in the state feature time series of the typical user expressed as { x₁,x₂,x₃,……x_n}; a is the value in the purchasing behavior time sequence of the typical user, and the purchasing behavior time sequence of the typical user is expressed as { a₁,a₂,a₃,……a_n}。

And for S0222, performing iterative optimization solution on the maximum value behavior calculation formula by adopting a dynamic programming method to obtain the target maximum value behavior calculation formula.

The optimization solution is to find an optimal strategy to allow a typical user to obtain more gains than other strategies all the time in the interaction process with each state feature in the state feature time sequence. The optimization solution is to make

The value of (a) is the largest,

the utility function U (x, a) extracted when the value of (a) is maximum is the most valuable utility function.

Meaning that an optimal strategy is sought to make an individual obtain always more gains than other strategies in the process of interacting with the environment, and the optimal strategy can be represented by pi. Once the optimal strategy π π is found, we solve the reinforcement learning problem. In general, it is difficult to find an optimal strategy, but a better strategy, i.e. a locally optimal solution, can be determined by comparing the merits of several different strategies.

Preferably, the bellman equation V is adopted to iteratively perform optimization solution on the maximum value behavior calculation formula by adopting a dynamic programming method.

Wherein, V (x)_t) Representation is based on state x_tExpectation of utility function U; u (x)_t,a_t) Is represented by x_t(time t) and a_t(time t) utility function value at time; beta is an attenuation factor, and the value of the attenuation factor is 0-1 (which can include 0 and also can include 1); x is the value in the time series of the state feature of the typical user, and a is the value in the time series of the purchasing behavior of the typical user.

Preferably, the attenuation factor is 0.9, so that excessive attenuation is avoided; t is time; u is the utility function U (x, a).

And for S0223, extracting a utility function from the target maximum value behavior calculation formula, and putting the extracted utility function into the utility function set.

In an embodiment, the step of performing maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model includes:

s0231: linearly superposing utility functions in the utility function set to obtain a personal utility function to be estimated;

s0232: carrying out normalization processing on the personal utility function to be estimated by adopting a softmax function to obtain a normalized personal utility function;

s0233: and performing parameter estimation on the normalized personal utility function by adopting a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model.

The embodiment realizes linear superposition and normalization processing to realize maximum likelihood inverse reinforcement learning, realizes autonomous learning through the maximum likelihood inverse reinforcement learning, and improves generalization capability.

For S0231, the set of utility functions is expressed as { U }₁,U₂,U₃,……U_nLinearly superposing utility functions in the utility function set to obtain the personal utility function U to be estimated_agentThe concrete expression is as follows:

U_agent＝w₁U₁+w₂U₂+w₃U₃+……+w_nU_n

wherein, w₁,w₂,w₃,……w_nAre parameters that need to be estimated.

For S0232, preferably, the personal utility function to be estimated is normalized by a softmax function.

The Softmax function is a normalized exponential function that "compresses" a K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ (z) such that each element ranges between (0,1) and the sum of all elements is 1.

Wherein, U (x, a)_jMeans that U is in step S0231_agentW of_jU_j；U(x,a)_iMeans that U is in step S0231_agentW of_iU_i(ii) a e is a natural constant, a constant in mathematics, an infinite acyclic decimal number, and an transcendental number, and has a value of about 2.718281828459.

In an embodiment, the step of obtaining the behavior prediction model by performing parameter estimation on the normalized personal utility function by using a maximum entropy inverse reinforcement learning method includes:

Is expert characteristic expectation (weighted utility value brought to customers by various products), and the probability of selecting each product (namely the personal utility function U to be estimated)_agentW in₁,w₂,w₃,……w_n) (ii) a The problem is converted into a standard type, and becomes an optimal problem when the solution entropy is maximum:

s.t.∑w＝1

wherein plogp represents the entropy of a random variable;

the maximum value is calculated; s.t. is followed by calculation

The limiting conditions of (1);

by lagrange multiplier method:

In one embodiment, the step of determining a representation of the target user based on the behavior prediction data comprises:

s61: comparing the behavior prediction data with a preset threshold value, and taking the comparison result as a prediction result;

when the behavior prediction data is higher than the preset threshold value, determining that the prediction result corresponding to the product identifier is purchasing, otherwise, determining that the prediction result corresponding to the product identifier is not purchasing;

s62: and combining the prediction results corresponding to the product identification into a vector to serve as the portrait of the target user.

For S61, the preset threshold may be selected from 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, and 0.8, which is not limited by the examples herein. The prediction result obtained by the high preset threshold value has high accuracy relative to the prediction result obtained by the low preset threshold value, and the range is reduced, wherein the range is reduced, and the prediction result of part of users with purchase intentions is determined not to be purchased.

For S62, all of the predictors corresponding to the product identifiers may be combined into a vector, and the combined vector may be used as the representation of the target user.

It is understood that all of the predicted results corresponding to the product identifiers may be combined into a set, and the combined set may be used as the representation of the target user.

With reference to fig. 2, the present application also proposes a user representation generation device, said device comprising:

the data acquisition module 100 is configured to acquire a state feature time sequence and a purchasing behavior time sequence of a target user, where the purchasing behavior time sequence carries a product identifier of a product purchased by the target user;

the model obtaining module 200 is configured to search a behavior prediction model corresponding to the product identifier from a preset model library, where the behavior prediction model is a model obtained based on a markov decision process and maximum likelihood inverse reinforcement learning;

the predicting module 300 is configured to input the state feature time series and the purchasing behavior time series into the behavior predicting model corresponding to the product identifier to perform probability prediction to obtain behavior predicting data of the target user;

a representation module 400 for determining a representation of the target user based on the behavior prediction data.

In one embodiment, the apparatus comprises: a model training module;

the model training module is used for acquiring sample data of a plurality of typical users, wherein the sample data carries product identifiers of products purchased by the typical users; determining a utility function set of the sample data based on a Markov decision process; and carrying out maximum likelihood inverse reinforcement learning on the utility function set to obtain the behavior prediction model, wherein the behavior prediction model carries the product identification.

In one embodiment, the model training module comprises: a sample acquisition submodule;

the sample acquisition submodule is used for acquiring historical data of a plurality of typical users, and the historical data comprises: the system comprises state characteristic data of a typical user and purchasing behavior data of the typical user, wherein the purchasing behavior data of the typical user carries a product identifier of a product purchased by the typical user; time series construction is carried out on the state characteristic data of the typical user to obtain sample data of the state characteristic time series of the typical user; and constructing a time sequence of the typical user purchasing behavior data according to the product identification to obtain sample data of the typical user purchasing behavior time sequence.

In one embodiment, the sample data comprises: the method comprises the steps that a state characteristic time sequence and a purchasing behavior time sequence of a typical user are obtained, and the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user;

the model training module further comprises: a utility function determination submodule;

the utility function determining submodule is used for acquiring a maximum value behavior calculation formula determined by the state characteristic time sequence and the purchasing behavior time sequence of the typical user; performing iteration on the maximum value behavior calculation formula by adopting a dynamic programming method to perform optimization solution to obtain a target maximum value behavior calculation formula; and extracting a utility function from the target maximum value behavior calculation formula and combining the extracted plurality of utility functions into the utility function set.

In one embodiment, the model training module further comprises: a maximum likelihood inverse reinforcement learning submodule;

the maximum likelihood inverse reinforcement learning submodule is used for carrying out linear superposition on the utility functions in the utility function set to obtain a personal utility function to be estimated; carrying out normalization processing on the personal utility function to be estimated by adopting a softmax function to obtain a normalized personal utility function; and performing parameter estimation on the normalized personal utility function by adopting a maximum entropy inverse reinforcement learning method to obtain the behavior prediction model.

In one embodiment, the maximum likelihood inverse reinforcement learning sub-module comprises: a parameter estimation unit;

the parameter estimation unit is configured to assume that there is a potential probability distribution under which an expert trajectory is generated, under the known conditions:

Is expert characteristic expectation (weighted utility value brought to customers by various products), and the probability of selecting each product (namely the personal utility function U to be estimated)_agentW in₁,w₂,w₃,……w_n) And converting the problem into a standard type to become an optimal problem when the solving entropy is maximum:

s.t.∑w＝1

wherein plogp represents the entropy of a random variable;

the maximum value is calculated; s.t. is followed by calculation

The limiting conditions of (1);

by lagrange multiplier method:

In one embodiment, the representation module 400 includes: a prediction result determining submodule and an image determining submodule;

the prediction result determining submodule is used for comparing the target behavior prediction data with a preset threshold value and taking the comparison result as a prediction result;

and the portrait determining submodule is used for combining the prediction results corresponding to the product identifications into a vector to be used as the portrait of the target user.

Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used for storing data such as a user portrait generation method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a user representation generation method. The user representation generation method comprises the following steps: acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user; searching a behavior prediction model corresponding to the product identification from a preset model library, wherein the behavior prediction model is obtained based on a Markov decision process and maximum likelihood inverse reinforcement learning; inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identification for probability prediction to obtain behavior prediction data of the target user; determining the portrait of the target user according to the behavior prediction data.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing a user representation generation method, including the steps of: acquiring a state characteristic time sequence and a purchasing behavior time sequence of a target user, wherein the purchasing behavior time sequence carries a product identifier of a product purchased by the target user; searching a behavior prediction model corresponding to the product identification from a preset model library, wherein the behavior prediction model is obtained based on a Markov decision process and maximum likelihood inverse reinforcement learning; inputting the state characteristic time sequence and the purchasing behavior time sequence into the behavior prediction model corresponding to the product identification for probability prediction to obtain behavior prediction data of the target user; determining the portrait of the target user according to the behavior prediction data.

According to the executed user portrait generation method, the description of the life stage, the life state and the consumption scene of the user is realized by acquiring the state characteristic time sequence and the purchasing behavior time sequence of the target user, so that the construction of a multi-view user portrait is facilitated, and the user portrait requirement of a complex scene is met; because different purchasing behavior time sequences are adopted for different products, each behavior prediction model corresponds to one product, and the fineness of the granularity of the portrait of the user is improved; because the behavior prediction model is a model obtained based on a Markov decision process and maximum likelihood inverse reinforcement learning, the Markov decision process can fully mine user behaviors when the life stage, the life state and the consumption scene change, the accuracy of the user portrait is improved, the autonomous learning is realized through the maximum likelihood inverse reinforcement learning, and the generalization capability is improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method of user representation generation, the method comprising:

2. The user representation generation method of claim 1, wherein said step of searching for the behavior prediction model corresponding to the product identifier from a predetermined model library further comprises:

3. The user representation generation method of claim 2, wherein said obtaining sample data for a plurality of representative users comprises:

4. The user representation generation method of claim 2, wherein the sample data comprises: the method comprises the steps that a state characteristic time sequence and a purchasing behavior time sequence of a typical user are obtained, and the purchasing behavior time sequence of the typical user carries a product identifier of a product purchased by the typical user; the step of determining a set of utility functions for the sample data based on a markov decision process comprises:

5. The user representation generation method of claim 2, wherein said step of performing maximum likelihood inverse reinforcement learning on said set of utility functions to obtain said behavior prediction model comprises:

6. The method of generating a user representation as claimed in claim 5, wherein said step of performing a parameter estimation on said normalized personal utility function using a maximum entropy inverse reinforcement learning method to obtain said behavior prediction model comprises:

s.t.∑w＝1

wherein plogp represents the entropy of a random variable;

the maximum value is calculated; s.t. is followed by calculation

The limiting conditions of (1);

by lagrange multiplier method:

7. A user representation generation method as claimed in claim 1 wherein said step of determining a representation of said target user from said behavioural prediction data comprises:

8. A user representation generation apparatus, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.