CN115760200B - User portrait construction method based on financial transaction data - Google Patents
User portrait construction method based on financial transaction data Download PDFInfo
- Publication number
- CN115760200B CN115760200B CN202310015416.XA CN202310015416A CN115760200B CN 115760200 B CN115760200 B CN 115760200B CN 202310015416 A CN202310015416 A CN 202310015416A CN 115760200 B CN115760200 B CN 115760200B
- Authority
- CN
- China
- Prior art keywords
- attribute
- category
- user
- value
- liveness
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of data processing, in particular to a user portrait construction method based on financial transaction data, which comprises the following steps: constructing a basic attribute vector and a behavior vector of each user; dividing a user into a plurality of attribute categories according to one attribute, acquiring an attribute representative value and an liveness scalar of each attribute category, and further acquiring an independent attribute and a non-independent attribute; obtaining matching class pairs according to the undirected graph constructed by each attribute class of the two non-independent attributes; acquiring a plurality of combined attributes according to attribute tuples and liveness tuples of all matching category pairs; dividing users into a plurality of user categories according to all independent attributes and combined attributes, and acquiring liveness according to the behavior vector of each user category; acquiring attribute value average value vectors of each user category; the attribute value mean vector and liveness constitute a user image for each user category. The invention facilitates accurate marketing recommendations.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a user portrait construction method based on financial transaction data.
Background
User portraits are obtained by cluster analysis of a class of users having common characteristics and are thus not specific to a particular individual of an avatar. The prior user portrait method is to analyze the transaction behaviors of users to realize user classification and obtain user portraits according to each user category, and only analyzes the transaction behaviors of the users without combining the basic attributes of the users, such as: the obtained user classification results are only limited from the aspect of transaction data only under the conditions of age, income and the like. When a user portrait obtained from a user classification result is used for marketing, recommendation, and the like, the effect is often not ideal. Based on the method, the user portrait construction method based on the financial transaction data is provided, and the user portrait is obtained by calculating the independent attribute and the combined attribute which are relatively high in degree of correlation with the liveness and combining with the liveness of the user, so that accurate marketing recommendation is facilitated.
Disclosure of Invention
The invention provides a user portrait construction method based on financial transaction data, which aims to solve the existing problems.
The user portrait construction method based on the financial transaction data adopts the following technical scheme:
One embodiment of the invention provides a user portrait construction method based on financial transaction data, which comprises the following steps:
s1: constructing a basic attribute vector and a behavior vector of each user;
s2: dividing users into a plurality of attribute categories according to one attribute, and obtaining attribute representative values of each attribute category; performing dimension reduction on the behavior vectors of all users in each attribute category to obtain category representative behavior vectors of each attribute category; acquiring liveness scalar of each attribute category according to category representative behavior vectors of all attribute categories; obtaining independent attributes or non-independent attributes according to attribute representative values of all attribute categories and the liveness scalar; all the dependent attributes are formed into a dependent attribute sequence;
s3: performing a combined attribute acquisition operation on the first two dependent attributes in the dependent attribute sequence, including:
constructing an undirected graph according to each attribute category of the two non-independent attributes, carrying out optimal minimum matching on the undirected graph by using a KM matching method to obtain a matching relation between the attribute categories of the two non-independent attributes, and taking the two matched attribute categories as matching category pairs;
updating the attribute categories in the matching category pair, taking the attribute representative values updated by all the attribute categories in the matching category pair as attribute binary groups of the matching category pair, and taking the liveness scalar before updating of all the attribute categories in the matching category pair as liveness binary groups of the matching category pair; acquiring combined attributes or basic units according to the attribute tuples and the liveness tuples of all the matched category pairs; when the combined attribute is obtained, deleting the combined attribute from the non-independent attribute sequence; when a basic unit is obtained, the basic unit is used as a first dependent attribute in a dependent attribute sequence;
S4: repeating the step S3 until the length of the non-independent attribute sequence is less than or equal to 1, and stopping iteration;
s5: classifying all users according to all independent attributes and combined attributes to obtain a plurality of user categories; acquiring the liveness of each user category according to the behavior vector of each user; acquiring attribute value average value vectors of each user category; the attribute value mean vector and liveness of each user category constitute a user representation of that user category.
Preferably, the step of classifying the users into a plurality of attribute categories according to one attribute to obtain attribute representative values of each attribute category includes the following specific steps:
acquiring all different attribute values of the same attribute of all users, and counting the frequency of each attribute value; arranging all attribute values in order from small to large to obtain an attribute value sequence, dividing the attribute value sequence by using a multi-threshold dividing method according to the frequency of each attribute value in the attribute value sequence, dividing the attribute value sequence into a plurality of categories, and taking users corresponding to all attribute values in each category as one attribute category; and taking the average value of the attribute values of all the attributes corresponding to the users in each attribute category as the attribute representative value of the attribute category.
Preferably, the step of obtaining the liveness scalar of each attribute category according to the category representative behavior vector of all attribute categories includes the following specific steps:
let the projection vector beSubstituting each attribute category corresponding to one attributeThe table behavior vectors represent respectively、
The method comprises the steps of carrying out a first treatment on the surface of the Wherein Q represents the number of the behavior vectors represented by the category of each attribute category corresponding to the attribute, and a scalar computing model is constructed:
in a computational modelThe category of the attribute categories corresponding to one attribute represents a behavior vector;is a projection vector;is a dot product operator;respectively isA converted scalar;
in a conditional modelThe category of the attribute categories corresponding to one attribute represents a behavior vector;、respectively isIs a mold of (2);respectively isA converted scalar;
solving a scalar calculation model and a conditional model by using linear algebra to obtain projection vectorsAnd the class of each attribute class represents a scalar converted from a behavior vectorThe method comprises the steps of carrying out a first treatment on the surface of the The resulting scalar is used as the liveness scalar for each attribute category.
Preferably, the obtaining the independent attribute or the non-independent attribute according to the attribute representative values of all attribute categories and the liveness scalar includes the following specific steps:
Sorting the attribute representative values of all the attribute categories according to the order from small to large to obtain a category attribute scalar sequence, and forming the liveness scalar of the category attribute corresponding to each attribute representative value in the category attribute scalar sequence into a liveness scalar sequence; and calculating the pearson correlation coefficient of the category attribute scalar sequence and the liveness scalar sequence, taking the attribute corresponding to the attribute category as an independent attribute if the obtained result is larger than a first preset threshold value, and taking the attribute corresponding to the attribute category as a non-independent attribute if the obtained result is smaller than or equal to the first preset threshold value.
Preferably, the building the undirected graph according to each attribute category of the two non-independent attributes includes the following specific steps:
and taking each attribute category of the two non-independent attributes as a node, taking the attribute representative value of each attribute category as a node value, taking the ratio of the attribute representative values of the attribute categories of the two non-independent attributes as an edge weight value between the two nodes, and constructing the undirected graph according to the nodes, the node values and the edge weight value.
Preferably, the updating the attribute category in the matching category pair includes the following specific steps:
and deleting the users which are not overlapped in the matching category pair from each attribute category in the matching category pair, and updating the attribute category in the matching category pair.
Preferably, the obtaining the combined attribute or the basic unit according to the attribute binary groups and the liveness binary groups of all the matching category pairs includes the following specific steps:
taking the L2 norm of the attribute binary group as the size of the attribute binary group, and recording the size as a binary group attribute value; taking the L2 norm of the liveness binary group as the size of the liveness binary group, and recording as a binary group liveness value; sorting the binary attribute values of all the matched class pairs according to the order from small to large to obtain a binary attribute value sequence, and obtaining a binary activity value sequence corresponding to the binary attribute value sequence; and calculating cosine similarity of the binary group attribute value sequence and the binary activity value sequence, wherein if the cosine similarity is larger than a second preset threshold, the two non-independent attributes corresponding to the matching class pair are combined attributes, and if the cosine similarity is smaller than or equal to the second preset threshold, the two non-independent attributes corresponding to the matching class pair are basic units.
Preferably, the classifying all the users according to all the independent attributes and the combined attributes to obtain a plurality of user categories includes the following specific steps:
taking the ratio of attribute values of the same independent attribute of any two users as the similarity of the same independent attribute of the two users; taking attribute values of a plurality of attributes corresponding to one combined attribute of one user as vectors of the one combined attribute of the one user, and taking cosine similarity between vectors of the same combined attribute of any two users as similarity of the same combined attribute of the two users; the similarity of all independent attributes of any two users and the sum of the similarity of all combined attributes are used as the similarity of the two users, the users with the similarity larger than a similarity threshold value are divided into user categories, and all the user categories are updated according to the user category to which each user belongs;
The updating of all the user categories according to the user category to which each user belongs comprises the following steps:
if one user belongs to a plurality of user categories, the user category to which the user belongs is called a preliminary user category, and the user is deleted from all the preliminary user categories, so that the first update of the user category is realized; calculating the average value of the similarity between the user and all the users in each prepared user category, taking the prepared user category with the largest similarity as the real user category of the user as the similarity between the user and each prepared user category, and adding the user into the real user category to realize the second updating of the user category; and similarly, updating the user category for a plurality of times according to all the users belonging to a plurality of user categories, so that each user finally only belongs to one user category.
Preferably, the step of obtaining the activity of each user category according to the behavior vector of each user includes the following specific steps:
taking the average value of all element values in the behavior vector of each user as the liveness of each user, and taking the average value of the liveness of all users in each user category as the liveness of each user category.
Preferably, the step of obtaining the attribute value mean vector of each user category includes the following specific steps:
the attribute value of the independent attribute of each user and the attribute value of each attribute in the combined attribute form a one-dimensional vector, and the one-dimensional vector is used as the attribute value vector of each user; taking the attribute value average value of the same attribute in the attribute value vectors of all users in one user category as the attribute value average value of the attribute, and forming the attribute value average value vector of the user category by the attribute value average value of all the attributes in the attribute value vectors of all the users in one user category.
The technical scheme of the invention has the beneficial effects that: constructing a basic attribute vector and a behavior vector of each user; dividing a user into a plurality of attribute categories according to one attribute, acquiring an attribute representative value and an liveness scalar of each attribute category, and further acquiring an independent attribute and a non-independent attribute; obtaining matching class pairs according to the undirected graph constructed by each attribute class of the two non-independent attributes; acquiring a plurality of combined attributes according to attribute tuples and liveness tuples of all matching category pairs; dividing users into a plurality of user categories according to all independent attributes and combined attributes, and acquiring liveness according to the behavior vector of each user category; acquiring attribute value average value vectors of each user category; the attribute value mean vector and liveness constitute a user image for each user category. Compared with the conventional user portrait method, the method only considers the transaction behavior of the user, and also considers the basic attribute of the user, and obtains the independent attribute and the combined attribute by mining the correlation between the attribute and the activity of the user, and classifies the user and portraits according to the independent attribute and the combined attribute, so that the results of the user classification and the user portraits are more accurate, and accurate marketing recommendation is facilitated.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of a user portrayal construction method based on financial transaction data of the present invention;
fig. 2 is an undirected illustration of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description is given below of the specific implementation, structure, characteristics and effects of the user portrait construction method based on financial transaction data according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the user portrait construction method based on financial transaction data provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of steps of a user portrait construction method based on financial transaction data according to an embodiment of the present invention is shown, where the method includes the following steps:
s001, constructing a basic attribute vector and a behavior vector of each user.
It should be noted that, the present invention is aimed at constructing a user portrait by combining attributes of users and transaction behaviors of users, so that basic attribute vectors and behavior vectors of each user need to be constructed first.
In this embodiment, the attributes of the user include age, academic, income level, consumption level, industry, etc. data. In order to facilitate subsequent calculation, each attribute is firstly digitized and normalized, for example, age data is in a digital form, no digitization is needed, and the age is divided by the maximum value in all ages, so that the normalization of the ages is realized; for example, the data of the academy is not in a digital form, each academy is given different values, the data values are sequentially increased from the low academy to the high academy, the numeralization of each academy is realized, and the numeralization and normalization of the academy are realized by dividing the numeralized academy by the maximum value in the numeralized academy; for example, according to the industry popularity, different values are given to each industry from small to large, meanwhile, the greater the superposition degree of the industries with similar values is, the numeric value of the industry is realized, the numeric industry is divided by the maximum value in the numeric industry, and the numeric value and normalization of the industry are realized.
And taking the numeric value obtained by digitizing and normalizing each attribute of the user as the attribute value of each attribute of the user. And forming an attribute value of all attributes of a user into a one-dimensional vector, and recording the one-dimensional vector as a basic attribute vector.
The transaction behaviors of the user comprise data such as consumption amount, consumption frequency and the like, and each transaction behavior is normalized for facilitating subsequent calculation. The normalized transaction behaviors of the user in one time period are formed into a one-dimensional vector, a plurality of one-dimensional vectors formed by the normalized transaction behaviors in different time periods of the user are connected and spliced into a high-dimensional vector, and the high-dimensional vector is recorded as a behavior vector. The behavior vector represents the transaction behavior of the user.
It should be noted that, the attribute of the user and the transaction behavior can be obtained through the background data of each financial transaction website.
S002, dividing the users into a plurality of attribute categories according to each attribute, and obtaining attribute representative values of each attribute category.
It should be noted that, S001 obtains a basic attribute vector of each user, where each element in the basic attribute vector represents one attribute of the user, and considering that the transaction behavior preference of the user in the range of different attribute values of the same attribute is often different, for example, the transaction situations of the users in different age stages are often different, and the transaction situations of the users in different income levels are also often different. The user is first classified according to the attribute value distribution of each attribute.
In this embodiment, the specific procedure of classifying the user according to the attribute value distribution condition of each attribute is as follows:
firstly, all different attribute values of the same attribute of all users are obtained, and the frequency of each attribute value (for example, the frequency of each age corresponding to all users is counted) is counted. And arranging all the attribute values in a sequence from small to large to obtain an attribute value sequence, dividing the attribute value sequence by using an OTSU multi-threshold dividing method according to the frequency of each attribute value in the attribute value sequence, dividing the attribute value sequence into a plurality of categories, and marking the users corresponding to all the attribute values in each category as one category as attribute categories. Thus, a plurality of attribute categories can be obtained, and the attribute categories correspond to the same attribute. And taking the average value of the attribute values of the attributes of all users in each attribute category as the attribute representative value of the attribute category.
To this end, users are classified into a plurality of attribute categories according to one attribute, and attribute representative values of each attribute category are acquired.
S003, obtaining the liveness scalar of each attribute type.
It should be noted that, each attribute category of an attribute includes a plurality of users, and each user corresponds to a behavior vector, and the behavior vectors of all users in each attribute category are used as the behavior vectors of the attribute category. The value of the element in the action vector can reflect the liveness of the user, for example, the larger the consumption amount and the consumption frequency of the user, the larger the value of the element in the action vector of the user, and the more lively the user. Therefore, the correlation of attribute values of a plurality of attribute categories corresponding to one attribute and the behavior vector can be analyzed to reflect the relationship between the attribute and the activity of the user. If the attribute value of one attribute is larger, the user activity corresponding to the behavior vector is larger, the attribute value and the user activity are in positive correlation, namely, the attribute can be used as an independent attribute for the financial transaction scene, and the activity of the user on the transaction platform is reflected. For example, for stock frying, it is often the case that the higher the income, the more invested the stock market platform, the more frequent the transactions. In order to analyze the correlation between the attribute values of a plurality of attribute categories corresponding to one attribute and the behavior vector, the behavior vector of each attribute category needs to be converted into a scalar.
In this embodiment, the specific method for converting the behavior vector of each attribute category into a scalar is:
the behavior vectors of each attribute category are multiple and the same in size and correspond to multiple users, and each behavior vector is a high-dimensional vector. Firstly, performing dimension reduction on the behavior vector of each attribute category by using a PCA dimension reduction method, wherein the method specifically comprises the following steps: taking all the behavior vectors of one attribute category as the input of PCA, and performing dimension reduction by using the PCA to obtain a one-dimensional vector. The one-dimensional vector is taken as a category representative behavior vector of the attribute category.
Similarly, a category representative behavior vector of each attribute category corresponding to one attribute may be obtained.
It should be noted that, after the class representative behavior vector of each attribute class is obtained, the class representative behavior vector of each attribute class needs to be converted into a scalar, and PCA dimension reduction cannot be adopted at this time, because the purpose of PCA is to maximize the variance of the dimension reduced data, but the magnitude relation between the data cannot be guaranteed. In the embodiment of the invention, the scalar data converted from the class representative behavior vectors of all the attribute classes are required to be compared, so that the consistency of the size relation between the data after the dimension reduction is required to be ensured, and the PCA dimension reduction method is not adopted in the process of converting the class representative behavior vectors into the scalar. The dot product of the vectors is equivalent to projection multiplication, so that the class representative behavior vector of each attribute class corresponding to one attribute is converted into a scalar by multiplying the class representative behavior vector of each attribute class by the same vector, wherein the multiplied same vector is called a projection vector. Meanwhile, the consistency of the size relation between the obtained scalar quantities and the size relation between the class representative behavior vectors is guaranteed.
In this embodiment, the specific method for converting the class representative behavior vector of each attribute class corresponding to one attribute into a scalar is as follows:
let the projection vector beThe class representative behavior vector of each attribute class corresponding to one attribute is respectively expressed as、The method comprises the steps of carrying out a first treatment on the surface of the Wherein Q represents the number of the behavior vectors represented by the category of each attribute category corresponding to the attribute, and a scalar computing model is constructed:
in a computational modelThe category of the attribute categories corresponding to one attribute represents a behavior vector;is a projection vector;is a dot product operator;respectively isA converted scalar;
in order to ensure the consistency of the obtained size relationship between the scalars and the size relationship between the class-representing behavior vectors, the proportional relationship between the modes of any two class-representing behavior vectors is required to be used as a constraint condition of consistency, and the condition model which is required to be met by the scalars is as follows:
in a conditional modelThe category of the attribute categories corresponding to one attribute represents a behavior vector;、respectively isIs a mold of (2);respectively isA converted scalar;
solving a scalar calculation model and a conditional model by using linear algebra to obtain projection vectorsAnd the class of each attribute class represents a scalar converted from a behavior vector The method comprises the steps of carrying out a first treatment on the surface of the The resulting scalar is used as the liveness scalar for each attribute category.
Solving a scalar calculation model and a conditional model by using linear algebra to obtain projection vectorsAnd the class of each attribute class represents a scalar converted from a behavior vector. The projection vector is used for the projection of the imageThe size of the category representative behavior vector is the same as that of the category representative behavior vector, for example, the size of the category representative behavior vector isProjection vectorIs also of the size of. The scalar calculation model and the condition model are solved by utilizing linear algebra, a plurality of groups of solutions can be obtained, and when the scalar calculation attribute value obtained by converting the class representative action vector of each attribute class corresponding to one attribute is related to the action vector, the pearson correlation coefficient can be adopted for calculation. For example, (1, 2, 3) and (2, 4, 6) are two solutions of a plurality of solutions, the proportional relationship between the two solutions is the same, and the two solutions are the same as the result obtained when the pearson correlation coefficient is calculated by any vector.
The scalar converted from the class representative behavior vector of each attribute class corresponding to one attribute is called an liveness scalar of each attribute class corresponding to one attribute.
Thus far, the liveness scalar for each attribute category is obtained.
S004, obtaining independent attributes and non-independent attributes according to the attribute representative value and the liveness scalar.
Each attribute category corresponds to an attribute representative value and an liveness scalar, attribute representative values of all attribute categories of one attribute are firstly sequenced from small to large to obtain a category attribute scalar sequence, and meanwhile, the corresponding liveness scalar sequence can be obtained. It should be noted that, elements at the same position in the category attribute scalar sequence and the liveness scalar sequence correspond to the same attribute category, for example, a third element in the category attribute scalar sequence and a third element in the liveness scalar sequence correspond to the same attribute category.
After a class attribute scalar sequence and an activity scalar sequence corresponding to an attribute are obtained, calculating the pearson correlation coefficient of the class attribute scalar sequence and the activity scalar sequence of the attribute as the correlation between the attribute and transaction behavior, if the pearson correlation coefficient is larger than a first preset threshold value m, the correlation is considered to be larger, and at the moment, the correlation degree between the attribute and the activity is larger, namely the attribute alone can reflect the activity of the corresponding user, and the attribute is called as an independent attribute; if the pearson correlation coefficient is less than or equal to the first preset threshold m, then the correlation is considered to be smaller, and the attribute is referred to as a dependent attribute. In this embodiment, the first preset threshold value m=0.8, and in other embodiments, the user may set the value of the first preset threshold value m according to the actual situation.
And similarly, judging whether each attribute is an independent attribute or a non-independent attribute according to the Pearson correlation coefficient of the category attribute scalar sequence and the liveness scalar sequence corresponding to each attribute, and dividing all the attributes into independent attributes and non-independent attributes.
Thus, independent attributes and dependent attributes are obtained.
S005, matching the dependent attribute according to all attribute categories of the dependent attribute, and obtaining the combined attribute.
It should be noted that, for the non-independent attribute, the combined attribute may be formed by a combination manner, so that the degree of correlation between the finally obtained combined attribute and the activity level is relatively high, for example, the correlation between the independent learning and the industry and the activity level of the user is not particularly high, but the frequency of financial transactions is relatively high for the user in the financial industry with high learning, namely, the activity level is relatively high.
In this embodiment, the method for acquiring the combination attribute is as follows:
1. firstly, sorting all the dependent attributes according to the sequence from big to small of the pearson correlation coefficient of the class attribute scalar sequence and the liveness scalar sequence of each dependent attribute to obtain a dependent attribute sequence.
2. Calculating the matching relation between the first two non-independent attributes in the non-independent attribute sequence:
The first two dependent attributes in the sequence of dependent attributes are denoted D, E, respectively. For example, dependent attribute D corresponds toThree attribute categories, dependent attribute E corresponds toTwo attribute categories. Will beEach of which is regarded as a node to construct an undirected graph, the undirected graph constructed is shown in fig. 2, wherein the nodesAnd nodeWith edges, nodes betweenThere are no edges between the nodesThere is no edge between them. And taking the attribute representative value of each attribute category as the node value of the corresponding node. For each edge in the undirected graph, the ratio of the attribute representative values of two nodes corresponding to the edge (the attribute representative value with a large attribute representative value is smaller than the attribute representative value with a small attribute representative value) is taken as the edge weight of the edge, for example, the nodeThe attribute representative values of (a) are respectivelyIf (if)Will thenAs a nodeEdge weights of the edges, ifWill thenAs a nodeEdge weights of the edges, ifThenAt this time nodeThe edge weight of the edge is as follows:。
and performing optimal minimum matching on the undirected graph by using a KM matching method to obtain a matching relationship between the attribute category corresponding to the non-independent attribute D and the attribute category corresponding to the non-independent attribute E, wherein the two matched attribute categories are called a matching category pair. For exampleAnd (3) withThe matching is performed so that the matching is performed, And (3) withMatching thenFor a pair of matching categories,is a matching class pair.
It should be noted that, the matching relationship between the attribute categories of the two non-independent attributes is obtained to make the attribute categories of the middle attribute value and the middle attribute value of the two non-independent attributes correspond as far as possible, the attribute category of the large attribute value corresponds, and the correlation between the two attribute categories can be calculated according to the binary activity sequence corresponding to the binary attribute value sequence only after that, so that the calculation is meaningful. Therefore, when the edge weight value in the undirected graph is calculated, the attribute representative value with a large attribute representative value is adopted and smaller attribute representative value is adopted, and meanwhile, when KM matching is utilized, the optimal minimum matching is carried out.
3. Acquiring a combined attribute according to the corresponding relation between the first two non-independent attributes in the non-independent attribute sequence:
in this embodiment, overlapping users in each matching category pair are acquired, overlapping users are reserved, non-overlapping users are deleted from two attribute categories in the matching category pair, updating of the two attribute categories in the matching category pair is achieved, and attribute representative values of each attribute category after updating are acquired. And taking the attribute representative values updated by the two attribute categories in each matching category pair as an attribute binary group of the matching category pair, and taking the liveness scalar before updating of the two attribute categories in each matching category pair as an liveness binary group of the matching category pair. The L2 norm of the attribute tuple is taken as the size of the attribute tuple, and is recorded as a tuple attribute value, and the L2 norm of the liveness tuple is taken as the size of the liveness tuple, and is recorded as a tuple liveness value.
The two-tuple attribute values of all matching category pairs between the two non-independent attributes D, E are sequenced from small to large to obtain a two-tuple attribute value sequence, and a corresponding two-tuple liveness value sequence can be obtained. It should be noted that, elements at the same position in the binary attribute value sequence and the binary activity value sequence correspond to the same matching class pair, for example, a third element in the binary attribute value sequence and a third element in the binary activity value sequence correspond to the same matching class pair.
And calculating cosine similarity of the binary group attribute value sequence and the binary activity value sequence, and if the cosine similarity is larger than a second preset threshold value n, combining the dependent attribute D, E to obtain a combined attribute, wherein the combined attribute can reflect the activity of the user. If the cosine similarity is smaller than or equal to the second preset threshold n, the combination of the dependent attributes D, E cannot reflect the activity of the user, and the combination of the dependent attributes D, E is taken as a basic unit. In this embodiment, the second preset threshold value n=0.7, and in other embodiments, the operator may set the value of n according to the actual requirement.
When the dependent attribute D, E is a combined attribute, deleting the dependent attribute D, E from the dependent attribute sequence to realize updating of the dependent attribute sequence; when the dependent attribute D, E is a basic unit, the basic unit is regarded as a dependent attribute, that is, the basic unit formed by the first dependent attribute in the dependent attribute sequence is the dependent attribute D, E.
4. And (3) repeating the step (2-3) until the length of the non-independent attribute sequence is less than or equal to 1, and stopping iteration.
When the undirected graph is constructed according to the basic unit and one undirected attribute, each matching class pair in the basic unit is used as a node, and the binary group attribute value of each matching class pair is used as the node value of the corresponding node of the matching class pair.
In this way, a plurality of combined properties can be obtained. It should be noted that, the obtained basic unit cannot reflect the liveness of the user, so that the dependent attribute contained in the basic unit is not paid attention.
Thus, the combination attribute is acquired.
S006, classifying the users according to the independent attribute and the combined attribute to obtain a plurality of user categories.
All independent attributes and combined attributes are acquired through steps S004 and S005, and all independent attributes and combined attributes are used as attribute group vectors.
The similarity of the independent attributes of any two users is calculated, specifically: the ratio of the attribute values of the same independent attribute (small attribute value to large attribute value) of two users is taken as the similarity of the independent attributes of the two users.
The similarity of the combined attributes of any two users is calculated, specifically: taking attribute values of a plurality of attributes corresponding to one combined attribute of one user as vectors of the combined attribute of the user, and taking cosine similarity between vectors of the same combined attribute of any two users as similarity of the combined attribute of the two users.
The similarity of all independent attributes of any two users and the sum of the similarity of all combined attributes are taken as the similarity of the two users. Users with similarity greater than the similarity threshold are classified into one category, referred to as user categories, e.g., multiple user categories are available.
The method for obtaining the similarity threshold comprises the following steps: each independent attribute in the attribute group vector is taken as an element, each combined attribute is taken as an element, the number of all elements is counted, and the number is multiplied by a second preset threshold value n to obtain a result which is taken as a similarity threshold value.
If a user belongs to a plurality of user categories, the user category to which the user belongs is called a preliminary user category, and the user is deleted from all the preliminary user categories, so that the first update of the user category is realized. And calculating the average value of the similarity between the user and all the users in each prepared user category, taking the prepared user category with the largest similarity as the actual user category of the user as the similarity between the user and each prepared user category, and adding the user into the actual user category to realize the second updating of the user category. And similarly, updating the user category for a plurality of times according to all the users belonging to a plurality of user categories, so that each user finally only belongs to one user category.
Thus, the classification of the users is completed, and a plurality of user categories are obtained.
S007, carrying out user portrait on each user category, and carrying out marketing recommendation according to the user portrait.
Taking the average value of all element values in the behavior vector of each user as the liveness of the user, and taking the average value of the liveness of all users in each user category as the liveness of each user category.
And forming a one-dimensional vector by the attribute value of the independent attribute of each user and the attribute value of each attribute in the combined attribute, and taking the one-dimensional vector as the attribute value vector of each user. Taking the attribute value average value of the same attribute in the attribute value vectors of all users in one user category as the attribute value average value of the attribute, and forming the attribute value average value vector of the user category by the attribute value average value of all the attributes in the attribute value vectors of all the users in one user category.
The attribute value mean vector and liveness of each user category constitute a user representation of that user category.
When a new activity is deduced, users of corresponding categories are selected according to the activity aiming at the user portrait, targeted recommendation is carried out, and accurate throwing and positioning of the users of the activity are improved.
Through the steps, the user portrait is completed.
The embodiment of the invention constructs the basic attribute vector and the behavior vector of each user; dividing a user into a plurality of attribute categories according to one attribute, acquiring an attribute representative value and an liveness scalar of each attribute category, and further acquiring an independent attribute and a non-independent attribute; obtaining matching class pairs according to the undirected graph constructed by each attribute class of the two non-independent attributes; acquiring a plurality of combined attributes according to attribute tuples and liveness tuples of all matching category pairs; dividing users into a plurality of user categories according to all independent attributes and combined attributes, and acquiring liveness according to the behavior vector of each user category; acquiring attribute value average value vectors of each user category; the attribute value mean vector and liveness constitute a user image for each user category. Compared with the conventional user portrait method, the method only considers the transaction behavior of the user, and also considers the basic attribute of the user, and obtains the independent attribute and the combined attribute by mining the correlation between the attribute and the activity of the user, and classifies the user and portraits according to the independent attribute and the combined attribute, so that the results of the user classification and the user portraits are more accurate, and accurate marketing recommendation is facilitated.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.
Claims (8)
1. The user portrait construction method based on the financial transaction data is characterized by comprising the following steps:
s1: constructing a basic attribute vector and a behavior vector of each user;
s2: dividing users into a plurality of attribute categories according to one attribute, and obtaining attribute representative values of each attribute category; performing dimension reduction on the behavior vectors of all users in each attribute category to obtain category representative behavior vectors of each attribute category; acquiring liveness scalar of each attribute category according to category representative behavior vectors of all attribute categories; obtaining independent attributes or non-independent attributes according to attribute representative values of all attribute categories and the liveness scalar; all the dependent attributes are formed into a dependent attribute sequence;
s3: performing a combined attribute acquisition operation on the first two dependent attributes in the dependent attribute sequence, including:
constructing an undirected graph according to each attribute category of the two non-independent attributes, carrying out optimal minimum matching on the undirected graph by using a KM matching method to obtain a matching relation between the attribute categories of the two non-independent attributes, and taking the two matched attribute categories as matching category pairs;
Updating the attribute categories in the matching category pair, taking the attribute representative values updated by all the attribute categories in the matching category pair as attribute binary groups of the matching category pair, and taking the liveness scalar before updating of all the attribute categories in the matching category pair as liveness binary groups of the matching category pair; acquiring combined attributes or basic units according to the attribute tuples and the liveness tuples of all the matched category pairs; when the combined attribute is obtained, deleting the combined attribute from the non-independent attribute sequence; when a basic unit is obtained, the basic unit is used as a first dependent attribute in a dependent attribute sequence;
s4: repeating the step S3 until the length of the non-independent attribute sequence is less than or equal to 1, and stopping iteration;
s5: classifying all users according to all independent attributes and combined attributes to obtain a plurality of user categories; acquiring the liveness of each user category according to the behavior vector of each user; acquiring attribute value average value vectors of each user category; the attribute value mean vector and the liveness of each user category form a user portrait of the user category;
the method for acquiring the independent attribute or the non-independent attribute according to the attribute representative values of all attribute categories and the liveness scalar comprises the following specific steps:
Sorting the attribute representative values of all the attribute categories according to the order from small to large to obtain a category attribute scalar sequence, and forming the liveness scalar of the category attribute corresponding to each attribute representative value in the category attribute scalar sequence into a liveness scalar sequence; calculating the pearson correlation coefficient of the category attribute scalar sequence and the liveness scalar sequence, taking the attribute corresponding to the attribute category as an independent attribute if the obtained result is larger than a first preset threshold value, and taking the attribute corresponding to the attribute category as a non-independent attribute if the obtained result is smaller than or equal to the first preset threshold value;
the method for acquiring the combined attribute or the basic unit according to the attribute binary groups and the liveness binary groups of all the matching category pairs comprises the following specific steps:
taking the L2 norm of the attribute binary group as the size of the attribute binary group, and recording the size as a binary group attribute value; taking the L2 norm of the liveness binary group as the size of the liveness binary group, and recording as a binary group liveness value; sorting the binary attribute values of all the matched class pairs according to the order from small to large to obtain a binary attribute value sequence, and obtaining a binary activity value sequence corresponding to the binary attribute value sequence; and calculating cosine similarity of the binary group attribute value sequence and the binary activity value sequence, wherein if the cosine similarity is larger than a second preset threshold, the two non-independent attributes corresponding to the matching class pair are combined attributes, and if the cosine similarity is smaller than or equal to the second preset threshold, the two non-independent attributes corresponding to the matching class pair are basic units.
2. The method for constructing a user portrait based on financial transaction data according to claim 1, wherein said classifying users into a plurality of attribute categories according to one attribute, obtaining attribute representative values of each attribute category includes the specific steps of:
acquiring all different attribute values of the same attribute of all users, and counting the frequency of each attribute value; arranging all attribute values in order from small to large to obtain an attribute value sequence, dividing the attribute value sequence by using a multi-threshold dividing method according to the frequency of each attribute value in the attribute value sequence, dividing the attribute value sequence into a plurality of categories, and taking users corresponding to all attribute values in each category as one attribute category; and taking the average value of the attribute values of all the attributes corresponding to the users in each attribute category as the attribute representative value of the attribute category.
3. The method for constructing a user portrait based on financial transaction data according to claim 1, wherein said obtaining an liveness scalar for each attribute category according to category representative behavior vectors of all attribute categories includes the following specific steps:
let the projection vector beThe class representative behavior vector of each attribute class corresponding to one attribute is respectively expressed as +. >、、…、/>The method comprises the steps of carrying out a first treatment on the surface of the Wherein Q represents the number of the behavior vectors represented by the category of each attribute category corresponding to the attribute, and a scalar computing model is constructed:
in the calculation model +.>、/>、…、/>The category of the attribute categories corresponding to one attribute represents a behavior vector; />Is a projection vector; />Is a dot product operator; />、/>、…、/>Respectively->、/>、…、/>A converted scalar; constructing a scalar conditional model: />Condition model +.>、/>、…、/>The category of the attribute categories corresponding to one attribute represents a behavior vector; />、/>、…、/>Respectively->、/>、…、/>Is a mold of (2); />、/>、…、/>Respectively->、/>、…、/>A converted scalar;
4. The method for constructing a user representation based on financial transaction data according to claim 1, wherein the constructing an undirected graph according to each attribute category of two non-independent attributes comprises the following specific steps:
and taking each attribute category of the two non-independent attributes as a node, taking the attribute representative value of each attribute category as a node value, taking the ratio of the attribute representative values of the attribute categories of the two non-independent attributes as an edge weight value between the two nodes, and constructing the undirected graph according to the nodes, the node values and the edge weight value.
5. The method for constructing a user portrait based on financial transaction data according to claim 1, wherein said updating the attribute category in the matching category pair includes the specific steps of:
and deleting the users which are not overlapped in the matching category pair from each attribute category in the matching category pair, and updating the attribute category in the matching category pair.
6. The method for constructing a user portrait based on financial transaction data according to claim 1, wherein said classifying all users according to all independent attributes and combined attributes to obtain a plurality of user categories includes the following specific steps:
taking the ratio of attribute values of the same independent attribute of any two users as the similarity of the same independent attribute of the two users; taking attribute values of a plurality of attributes corresponding to one combined attribute of one user as vectors of the one combined attribute of the one user, and taking cosine similarity between vectors of the same combined attribute of any two users as similarity of the same combined attribute of the two users; the similarity of all independent attributes of any two users and the sum of the similarity of all combined attributes are used as the similarity of the two users, the users with the similarity larger than a similarity threshold value are divided into user categories, and all the user categories are updated according to the user category to which each user belongs;
The updating of all the user categories according to the user category to which each user belongs comprises the following steps:
if one user belongs to a plurality of user categories, the user category to which the user belongs is called a preliminary user category, and the user is deleted from all the preliminary user categories, so that the first update of the user category is realized; calculating the average value of the similarity between the user and all the users in each prepared user category, taking the prepared user category with the largest similarity as the real user category of the user as the similarity between the user and each prepared user category, and adding the user into the real user category to realize the second updating of the user category; and similarly, updating the user category for a plurality of times according to all the users belonging to a plurality of user categories, so that each user finally only belongs to one user category.
7. The method for constructing a user portrait based on financial transaction data according to claim 1, wherein the step of obtaining the liveness of each user category according to the behavior vector of each user includes the following specific steps:
taking the average value of all element values in the behavior vector of each user as the liveness of each user, and taking the average value of the liveness of all users in each user category as the liveness of each user category.
8. The method for constructing a user portrait based on financial transaction data according to claim 1, wherein said obtaining attribute value mean vectors of each user category includes the following specific steps:
the attribute value of the independent attribute of each user and the attribute value of each attribute in the combined attribute form a one-dimensional vector, and the one-dimensional vector is used as the attribute value vector of each user; taking the attribute value average value of the same attribute in the attribute value vectors of all users in one user category as the attribute value average value of the attribute, and forming the attribute value average value vector of the user category by the attribute value average value of all the attributes in the attribute value vectors of all the users in one user category.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310015416.XA CN115760200B (en) | 2023-01-06 | 2023-01-06 | User portrait construction method based on financial transaction data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310015416.XA CN115760200B (en) | 2023-01-06 | 2023-01-06 | User portrait construction method based on financial transaction data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115760200A CN115760200A (en) | 2023-03-07 |
CN115760200B true CN115760200B (en) | 2023-07-04 |
Family
ID=85348255
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310015416.XA Active CN115760200B (en) | 2023-01-06 | 2023-01-06 | User portrait construction method based on financial transaction data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115760200B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021068608A1 (en) * | 2019-10-11 | 2021-04-15 | 深圳壹账通智能科技有限公司 | Method and apparatus for extracting user portrait, and computer device and storage medium |
CN114549035A (en) * | 2021-12-28 | 2022-05-27 | 天翼电子商务有限公司 | Construction method of financial user accurate customer acquisition label based on telecommunication big data |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104915879B (en) * | 2014-03-10 | 2019-08-13 | 华为技术有限公司 | The method and device that social relationships based on finance data are excavated |
US10817804B1 (en) * | 2019-05-07 | 2020-10-27 | Capital One Services, Llc | Using machine learning to predict user profile affinity based on behavioral data analytics |
CN110909222B (en) * | 2019-10-12 | 2023-07-25 | 中国平安人寿保险股份有限公司 | User portrait establishing method and device based on clustering, medium and electronic equipment |
CN113158077B (en) * | 2021-04-08 | 2022-11-08 | 南京邮电大学 | Academic resource recommendation method based on user portrait |
CN114491205A (en) * | 2021-12-31 | 2022-05-13 | 北京五八信息技术有限公司 | User portrait generation method and device, electronic equipment and readable medium |
CN114596031A (en) * | 2022-03-10 | 2022-06-07 | 南京邮电大学 | Express terminal user portrait model based on full life cycle data |
-
2023
- 2023-01-06 CN CN202310015416.XA patent/CN115760200B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021068608A1 (en) * | 2019-10-11 | 2021-04-15 | 深圳壹账通智能科技有限公司 | Method and apparatus for extracting user portrait, and computer device and storage medium |
CN114549035A (en) * | 2021-12-28 | 2022-05-27 | 天翼电子商务有限公司 | Construction method of financial user accurate customer acquisition label based on telecommunication big data |
Also Published As
Publication number | Publication date |
---|---|
CN115760200A (en) | 2023-03-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113468227B (en) | Information recommendation method, system, equipment and storage medium based on graph neural network | |
CN107679946A (en) | Fund Products Show method, apparatus, terminal device and storage medium | |
CN110866782B (en) | Customer classification method and system and electronic equipment | |
CN111967971B (en) | Bank customer data processing method and device | |
CN114202061A (en) | Article recommendation method, electronic device and medium based on generation of confrontation network model and deep reinforcement learning | |
CN111428557A (en) | Method and device for automatically checking handwritten signature based on neural network model | |
CN112348079B (en) | Data dimension reduction processing method and device, computer equipment and storage medium | |
CN112417294A (en) | Intelligent business recommendation method based on neural network mining model | |
Straton et al. | Big social data analytics for public health: Predicting facebook post performance using artificial neural networks and deep learning | |
WO2023024408A1 (en) | Method for determining feature vector of user, and related device and medium | |
CN113159213A (en) | Service distribution method, device and equipment | |
CN115760200B (en) | User portrait construction method based on financial transaction data | |
CN110264311B (en) | Business promotion information accurate recommendation method and system based on deep learning | |
CN116883007A (en) | Method, system, electronic equipment and storage medium for recommending collection-promoting action | |
CN115905648A (en) | Gaussian mixture model-based user group and financial user group analysis method and device | |
CN116029757A (en) | Operation strategy triggering method and device based on deep learning and electronic equipment | |
CN115168740A (en) | Method and system for generating marketing task based on big data analysis | |
CN115062602A (en) | Sample construction method and device for contrast learning, computer equipment and storage medium | |
CN115544379A (en) | Quaternion map convolutional neural network-based recommendation method and device | |
CN112463964B (en) | Text classification and model training method, device, equipment and storage medium | |
CN117194966A (en) | Training method and related device for object classification model | |
CN113822390A (en) | User portrait construction method and device, electronic equipment and storage medium | |
CN112559640A (en) | Training method and device of atlas characterization system | |
CN112507185A (en) | User portrait determination method and device | |
Riasi et al. | Comparing the performance of different data mining techniques in evaluating loan applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |