CN112396428B - User portrait data-based customer group classification management method and device - Google Patents

User portrait data-based customer group classification management method and device Download PDF

Info

Publication number
CN112396428B
CN112396428B CN202011225923.9A CN202011225923A CN112396428B CN 112396428 B CN112396428 B CN 112396428B CN 202011225923 A CN202011225923 A CN 202011225923A CN 112396428 B CN112396428 B CN 112396428B
Authority
CN
China
Prior art keywords
data
user
user portrait
behavior
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011225923.9A
Other languages
Chinese (zh)
Other versions
CN112396428A (en
Inventor
于扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Analysys Digital Intelligence Technology Co ltd
Original Assignee
Beijing Analysys Think Tank Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Analysys Think Tank Network Technology Co ltd filed Critical Beijing Analysys Think Tank Network Technology Co ltd
Priority to CN202011225923.9A priority Critical patent/CN112396428B/en
Publication of CN112396428A publication Critical patent/CN112396428A/en
Application granted granted Critical
Publication of CN112396428B publication Critical patent/CN112396428B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • G06Q30/015Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
    • G06Q30/016After-sales
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a guest group dividing method and device based on user portrait data, which is used for acquiring user portrait data stored in a kudu, hdfs or hive memory; calculating the behavior data, the attribute data and the label data according to the logical operation conditions and the factor operation conditions to obtain a target user; after the target user ID is associated with the user portrait data according to a preset time period, completing and normalizing the user portrait data to obtain feature data meeting a preset format; and after matching operation is carried out on the feature data and the pre-established feature library, the target users are divided into corresponding guest groups. The invention integrates the behavior, attribute and tag data related to the user according to the user id, and stores the behavior, attribute and tag based on the characteristics of kudu, hdfs and hive, thereby providing high-efficiency data query performance. The problem of carry out the guest group under the current scene and divide and use the image dimension singleness, be difficult to promote the accuracy of dividing is solved.

Description

User portrait data-based guest group classification management method and device
Technical Field
The embodiment of the invention relates to the technical field of data classification, in particular to a customer group classification management method and device based on user portrait data.
Background
With the rapid development of the internet, the user scale is significantly increased and the demand diversity is more complicated, and in order to provide more matched products, services and contents for users with different characteristics, effective grouping and analysis are required for the users. In the current market, for the division of user groups, rule configuration is mainly performed by using collected customer data, and users are divided by manually selecting different dimensions and indexes. Such approaches are limited in level of business to the operator and do not allow accurate user segmentation from a more detailed or difficult to manually gain insight. For the user group division scenario, a more intelligent and simpler way is needed to provide services.
However, in order to solve such a scenario problem, business personnel is mainly relied on to manually perform rule configuration and division based on understanding of users and in combination with collected user attributes, where the following problems may cause a defect that it is difficult to effectively divide users. First, it is overly dependent on the business experience of the operator, requiring repeated attempts to determine the final partitioning rule. Secondly, the manual partitioning method can only perform coarse-grained partitioning on users, and it is difficult to find the difference between users from finer granularity to perform object group partitioning. Thirdly, the data is understood, the information of the user cannot be fully utilized, and the hidden factors which can distinguish the user are not included in the rule range.
Disclosure of Invention
Aiming at the defects of a customer group division system in the prior art, the embodiment of the invention provides a customer group division method and a customer group division device based on user portrait data, which are used for dividing customers in different modes aiming at users with different capability levels, helping the users to quickly know the difference and the characteristics among the customer groups aiming at the divided customer groups, and in the aspect of customer data, the system supports the utilization of behavior records generated by the customers, collected customer information, customer labels and the like; in the aspect of guest group division, the system supports a configuration mode of custom behaviors and attribute rules, supports the automatic division of different levels of the selected target guest group by using a supervised algorithm and an unsupervised algorithm, and shows the difference common users of different guest groups for reference. The specific technical scheme is as follows:
the embodiment of the invention provides a customer group division method based on user portrait data, which comprises the following steps: acquiring user portrait data stored in a kudu, hdfs or hive memory; wherein the user representation data includes behavioral data, attribute data, and tag data; the behavior data includes: user ID, action occurrence time and action content;
taking the behavior data, the attribute data and the label data as input conditions, and calculating the user portrait data according to logical operation conditions and factor operation conditions to obtain a target user; associating the target user ID with the user portrait data according to a preset time period, and performing completion and normalization operation on the user portrait data to obtain feature data meeting a preset format; the factor operation condition comprises a numerical value type factor, a character type factor and a time type factor; the characteristic data comprises behavior characteristic data, attribute characteristic data and label characteristic data;
and after matching operation is carried out on the feature data and a pre-established feature library, the target users are divided into corresponding guest groups.
Further, the method also comprises the step of scoring the principal components of the target user in different customer groups by adopting a principal component analysis algorithm, and finishing the evaluation of the customer groups according to the scores.
Furthermore, partitions are created for the user portrait data according to behavior occurrence time, and dynamic bucket-dividing storage is carried out when the number of the behaviors of the partitions in the day is larger than a preset number of times.
Further, the behavior data, the attribute data and the label data are used as input conditions, and the user portrait data are calculated according to logical operation conditions and numerical operation conditions to obtain a target user; associating the target user ID with the user portrait data according to a preset time period, and performing completion and normalization operation on the user portrait data to obtain feature data meeting a preset format; the method specifically comprises the following steps:
taking the behavior data, the attribute data and the tag data as input conditions, and logically screening the user portrait data according to logical operation conditions and by adopting a minimum screening principle;
respectively carrying out the operation of a numerical value type factor, a character type factor and a time type factor on the user portrait data of the target user subjected to the logic screening to obtain the user portrait data subjected to the factor operation screening;
associating the user portrait data subjected to factor operation screening with the target user ID according to a time period;
and performing completion and normalization operation on the associated data fields to obtain characteristic data meeting the preset format.
Further, in the default value processing part, a KNN filling algorithm is adopted for data completion; using a linear function normalization algorithm to perform field normalization, converting the user portrait data into the range of [0,1] in a linear mode according to a linear function, and then performing distance measurement and covariance calculation; when the data do not accord with normal distribution, normalization processing is carried out through mean absolute deviation standardization, logarithmic transformation, decimal scaling and sigmoid functions.
Further, the matching operation of the feature data and a pre-established feature library comprises the following steps:
when the behavior feature data of the target user is matched with the behavior features in the feature library, if the extracted behavior features contain the features of the feature library, the matching can be judged to be successful; otherwise, judging that the matching is unsuccessful; when the extracted user attribute features are matched with the attribute features in the feature library, if the extracted attribute features contain the features in the feature library, the matching can be judged to be successful; otherwise, judging that the matching is unsuccessful; when the extracted user tag features are matched with the tag features in the feature library, if the extracted tag features contain the features in the feature library, the matching can be judged to be successful; otherwise, the matching is judged to be unsuccessful.
Another aspect of the present application provides a customer group classification apparatus based on user portrait data, including:
a data integration module for obtaining user portrait data stored in kudu, hdfs, or hive memory; wherein the user representation data comprises behavioral data, attribute data, and tag data; the behavior data includes: user ID, action occurrence time and action content;
the characteristic extraction module is used for taking the behavior data, the attribute data and the tag data as input conditions, and calculating the user portrait data according to logical operation conditions and factor operation conditions to obtain a target user; associating the target user ID with the user portrait data according to a preset time period, and performing completion and normalization operation on the user portrait data to obtain feature data meeting a preset format; the factor operation condition comprises a numerical value type factor, a character type factor and a time type factor; the characteristic data comprises behavior characteristic data, attribute characteristic data and label characteristic data;
and the guest group division module is used for carrying out matching operation on the feature data and a pre-built feature library and then dividing the target user into corresponding guest groups.
Further, the customer group evaluation module is used for scoring the principal components of the target user in different customer groups by adopting a principal component analysis algorithm and finishing the customer group evaluation according to the scores.
Further, the feature extraction module further includes:
the logic screening module is used for taking the behavior data, the attribute data and the tag data as input conditions and carrying out logic screening on the user portrait data according to logic operation conditions and by adopting a minimum screening principle;
the factor screening module is used for respectively carrying out the operation of a numerical value type factor, a character type factor and a time type factor on the user portrait data of the target user subjected to the logic screening to obtain the user portrait data subjected to the factor operation screening;
the association module is used for associating the user portrait data subjected to factor operation screening with the target user ID according to a time period;
and the completion and normalization module is used for performing completion and normalization operations on the associated data fields to obtain the characteristic data meeting the preset format.
Further, in the default value processing part, a KNN filling algorithm is adopted for data completion; using a linear function normalization algorithm to perform field normalization, converting the user portrait data into the range of [0,1] in a linear mode according to a linear function, and then performing distance measurement and covariance calculation; when the data do not accord with normal distribution, normalization processing is carried out through mean absolute deviation standardization, logarithmic transformation, decimal scaling and sigmoid functions.
The embodiment of the invention provides a method and a device for dividing a guest group based on user portrait data, which comprises the following steps: acquiring user portrait data stored in a kudu, hdfs or hive memory; taking the behavior data, the attribute data and the label data as input conditions, and calculating the user portrait data according to logical operation conditions and factor operation conditions to obtain a target user; associating the target user ID with the user portrait data according to a preset time period, and performing completion and normalization operation on the user portrait data to obtain feature data meeting a preset format; and after matching operation is carried out on the feature data and a pre-established feature library, the target users are divided into corresponding guest groups. The invention integrates the behavior, attribute and tag data related to the user according to the user id, respectively stores the behavior, attribute and tag based on the characteristics of kudu, hdfs and hive, and provides high-efficiency data query performance by using reasonable partitioning and barreling strategies. The problem of carry out the guest group under the current scene and divide and use the image dimension singleness, be difficult to promote the accuracy of dividing is solved.
Furthermore, the invention carries out default value processing and normalization operation on the screened target customer group and the integrated image data such as behaviors, attributes and labels through feature extraction and customer group division, and carries out customer group division by utilizing a classification model algorithm in combination with a pre-valued customer group feature rule in a programmed manner, thereby solving the problem that the current customer group division mainly depends on personal experience and is difficult to carry out deep division by utilizing complete customer image data.
Furthermore, the invention utilizes the finished guest group division result and combines the user portrait data of data integration to carry out the feature and difference identification between different guest groups. And (3) rapidly identifying the significant difference by using a principal component analysis method, grading and quantifying the evaluation on the customer group. The problem that the characteristics and differences of the passenger groups cannot be accurately described because the passenger groups cannot be quantitatively evaluated after being manually divided at present is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art will understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical essence, and any modifications of the structures, changes of the ratio relationships, or adjustments of the sizes, should still fall within the scope covered by the technical contents disclosed in the present invention without affecting the efficacy and the achievable purpose of the present invention.
FIG. 1 is a flowchart of an embodiment of a method for dividing a user portrait data into a plurality of user portrait sections according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a user portrait data-based guest group partitioning apparatus according to an embodiment of the present invention.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart of a preferred embodiment of a method for dividing a guest group based on user portrait data according to an embodiment of the present application is shown, where the method includes the steps of: acquiring user portrait data stored in a kudu, hdfs or hive memory; wherein the user representation data comprises behavioral data, attribute data, and tag data; the behavior data includes: user ID, action occurrence time and action content;
taking the behavior data, the attribute data and the label data as input conditions, and calculating the user portrait data according to logical operation conditions and factor operation conditions to obtain a target user; associating the target user ID with the user portrait data according to a preset time period, and performing completion and normalization operation on the user portrait data to obtain feature data meeting a preset format; the factor operation condition comprises a numerical value type factor, a character type factor and a time type factor; the characteristic data comprises behavior characteristic data, attribute characteristic data and label characteristic data;
and after matching operation is carried out on the feature data and a pre-established feature library, the target users are divided into corresponding guest groups.
The method further comprises the step of scoring the principal components of the target user in different customer groups by adopting a principal component analysis algorithm, and finishing the evaluation of the customer groups according to the scores. According to the technical scheme, the behavior, the attribute and the label feature are extracted respectively according to the behavior data, the attribute data and the label data generated by the user, so that the user data can be utilized more comprehensively and more efficiently to divide the passenger groups, the accuracy of the division of the passenger groups is greatly reduced, meanwhile, the corresponding rule configuration is not required to be carried out manually, the cost of manual participation is reduced, finally, brief insight evaluation is provided for the divided passenger groups, the user is helped to know the features and the differences among the passenger groups more quickly and intuitively, the difficulty of subsequent marketing and operation work is reduced, and the final effect is improved.
In a specific implementation manner of the invention, the method further comprises the steps of creating partitions for the user portrait data according to behavior occurrence time, and performing dynamic bucket-dividing storage when the number of behaviors of the partitions in the day is greater than a preset number of times. The user data is stored through the kudu, the hdfs and the hive, wherein the user behavior data is stored in a column mode and consists of three elements of user ID, behavior occurrence time and behavior content, and due to the fact that attribute fields of different behaviors are different, the kudu which is easier to expand the fields is adopted for storing the behavior data in the system. In order to improve the efficiency of behavior data in association query and feature extraction, partitions are created every day according to behavior occurrence time, and a dynamic partition bucket design is needed in a scene that the daily partition behavior quantity is more than 10 hundred million times, wherein the specific design reference method comprises the following steps: according to the same behavior field (such as the commodity ID) and the two tables (such as the order and the order amount) with the same bucket dividing quantity, when join is carried out through the commodity ID, since the same commodity ID of the two tables is divided into the buckets with the same ID, the join and the aggregation calculation can be independently carried out (refer to the partition process of MapReducer). In this way, every time the data calculation of one bucket is completed, the memory occupied by the bucket can be released immediately, and therefore, the memory occupation can be limited by controlling the number of parallel processing buckets. Calculating the memory occupied by theory: optimized memory footprint = number of buckets of original memory footprint/table + number of parallel processing buckets. The data storage uses the ID of the user as a unique main key, and finally the behavior, the attribute and the label data of the user are associated through the main key when modeling application is carried out.
In the specific implementation manner of the invention, when the characteristics of the user portrait data are extracted, the behavior data, the attribute data and the tag data are used as input conditions, and the user portrait data are logically screened according to logical operation conditions and by adopting a minimum screening principle; respectively carrying out the operation of a numerical value type factor, a character type factor and a time type factor on the user portrait data of the target user subjected to the logic screening to obtain the user portrait data subjected to the factor operation screening; associating the user portrait data subjected to factor operation screening with the target user ID according to a time period; and performing completion and normalization operation on the associated data fields to obtain characteristic data meeting the preset format.
Specifically, screening of target users is carried out according to a specified rule, user behaviors, attributes and label data in the data integration module are correlated according to user IDs and serve as features, feature training is carried out after processing of default values, normalization and the like, matching is carried out by combining an existing customer group rule base, and the matching is output to a customer group division module to complete a customer group division model. Firstly, taking user behaviors, attributes and labels as input conditions, performing multiple operations according to set operation conditions, and selecting target people needing to perform guest group division. Wherein, in the implementation, the collective operation is divided into two items of logic condition and factor, the logic condition supports and is not related, and infinite nesting logic is supported, and logic screening can be performed through the combination relation among any groups. It should be noted that in the implementation process, a principle of minimum screening, that is, logic of support and relationship within a group and support and non-relationship between groups, should be adopted to ensure that the range of the target user can be gradually narrowed along with the increase of the logical relationship in the screening process, thereby ensuring the usability of the program. In the factor part, aiming at the data type stored by the data integration module, the operation and comparison operation between the factor and the factor can be carried out, and the calculation logics of more than, less than, more than or equal to, less than or equal to, unequal to, open interval, closed interval, semi-closed interval, value, no value and the like are supported in the aspect of the numerical type factor; the method supports calculation logics of equal, unequal, containing, not containing, length, row repetition number and the like in terms of character type factors; the time type factor supports computation logic for absolute time, relative time, etc. If the factors are mainly non-numerical values and time types, a bitmap mode can be adopted for data storage, and the calculation and comparison efficiency can be further improved.
After the target population screening is completed, the system needs to realize the association of user information according to the target user id. And (4) extracting the full features without any limitation, and associating all behavior records, attribute data and tag data related to the user in the data integration module according to the specified time period by using the user ID in the result of the target crowd screening. And performing completion and normalization operations on the associated data fields. In the default value processing part, a KNN filling algorithm is adopted for data completion, namely, near neighbor data is filled, KNN is used for calculating adjacent k data, then the average value of the k data is filled, and the dimensional system with the default proportion reaching more than 80% is subjected to column alignment deletion by default. The normalization part defaults to carry out field normalization by using a linear function normalization algorithm, linearly converts the original data into the range of [0,1] according to a linear function, and can carry out normalization processing in the modes of mean absolute deviation normalization, logarithmic transformation, decimal scaling and sigmoid function when distance measurement, covariance calculation and data are not in accordance with normal distribution are involved.
The extracted features classify users into corresponding classifications. The specific implementation steps are that the user, the extracted behavior, attribute and label features are matched with features recorded in a created feature rule base, the final matching degree is obtained by combining the weight coefficients of the input behavior, attribute and label features, and the user is classified into designated classes. The feature rule base should have records for each type of user group, including feature rules of behaviors, attributes and labels.
For example, in the embodiment of the present invention, the feature rule base includes different user groups such as white collar people, tall and tall commander people, family people, quadratic element people, student people, and the like, and all the user groups are users accumulated in the actual business process and generate rules corresponding to the user groups according to different associated behaviors, attributes, and tag features of the user groups. The system defaults to 1:1: the weight distribution mode of 1 uses behaviors, attributes and labels to carry out weighting calculation, and supports a custom input weight adjustment matching algorithm. When the extracted user behavior characteristics are matched with the behavior characteristics in the characteristic library, if the extracted behavior characteristics contain the characteristics of the characteristic library, the matching can be judged to be successful; otherwise, judging that the matching is unsuccessful; when the extracted user attribute features are matched with the attribute features in the feature library, if the extracted attribute features contain the features in the feature library, the matching can be judged to be successful; otherwise, judging that the matching is unsuccessful; when the extracted user tag features are matched with the tag features in the feature library, if the extracted tag features contain the features in the feature library, the matching can be judged to be successful; otherwise, the matching is judged to be unsuccessful. In the process of feature matching, if the situation that the behaviors, attributes and label features matched by the users do not have a matching relation with the existing features of the feature rule base exists, the system divides the user groups into three groups by default according to an unsupervised means, and adds the features of the user groups into the feature rule base as rules.
The invention also provides a better implementation mode, and the three characteristics are directly fused and summarized when the behavior, attribute and label characteristics are extracted, so that the final weighted characteristic value of the user is obtained. And inputting the obtained features into a classification model obtained by pre-training to directly classify the user. The method greatly reduces the complexity of feature calculation, and the calculation logic is clearer. And combining the steps, dividing the target customer group users into corresponding classifications, and storing classification results into the hive database for further application and analysis.
And evaluating the guest groups by the image dimensions with obvious differences aiming at the partitioned guest groups so as to more intuitively understand the characteristics and the differences among different guest groups after the model is partitioned. The concrete implementation steps are as follows:
and after the system receives the request, the user id details of the appointed single or multiple guest groups are obtained by the hive according to the input guest group id. If a comparison request of a three-family guest group and a two-dimensional guest group is received in the implementation case of the invention, the system details the user id of the obtained guest group id to the memory, and matches behavior, attribute and tag data in kudu and hive according to the user id.
And (3) performing primary filtering on the matched behaviors, attributes and label data in a principal component analysis mode, and excluding factors which are not mainly influenced when the default ratio exceeds 90%. And comparing the principal component analysis results of the two classes of customer groups, carrying out secondary filtration on the factors with the same factor difference within 10%, and keeping the final factor result.
When the principal component analysis is implemented, firstly, the matched behaviors, attributes and label data are subjected to standardization treatment, and then a correlation matrix or a covariance matrix is calculated; calculating the eigenvalue and eigenvector of the correlation matrix; the method includes calculating an accumulated contribution ratio (generally, the accumulated contribution ratio is required to be more than 85%), calculating a principal component score by observing a coefficient, and calculating a score of each principal component in a covariance matrix after normalizing each sample data.
And after the score calculation is completed, outputting the result to a system front-end interface to complete the evaluation of the guest group.
The embodiment of the invention provides a user portrait data-based guest group division method and device, which comprises the following steps: acquiring user portrait data stored in a kudu, hdfs or hive memory; taking the behavior data, the attribute data and the label data as input conditions, and calculating the user portrait data according to logical operation conditions and factor operation conditions to obtain a target user; associating the target user ID with the user portrait data according to a preset time period, and performing completion and normalization operation on the user portrait data to obtain feature data meeting a preset format; and after matching operation is carried out on the feature data and a pre-established feature library, the target users are divided into corresponding guest groups. The invention integrates the behavior, attribute and tag data related to the user according to the user id, respectively stores the behavior, attribute and tag based on the characteristics of kudu, hdfs and hive, and provides high-efficiency data query performance by using reasonable partitioning and barreling strategies. The problem of carry out the guest group under the current scene and divide and use the image dimension singleness, be difficult to promote the accuracy of dividing is solved.
Furthermore, the invention carries out default value processing and normalization operation on the screened target customer group and the integrated image data such as behaviors, attributes and labels through feature extraction and customer group division, and carries out customer group division by utilizing a classification model algorithm in combination with a pre-valued customer group feature rule in a programmed manner, thereby solving the problem that the current customer group division mainly depends on personal experience and is difficult to carry out deep division by utilizing complete customer image data.
Furthermore, the invention utilizes the finished guest group division result and combines the user portrait data integrated by data to identify the characteristics and differences among different guest groups. And (3) rapidly identifying the significant difference by using a principal component analysis method, grading and quantifying the evaluation on the customer group. The problem that the characteristics and differences of the passenger groups cannot be accurately described because the passenger groups cannot be quantitatively evaluated after being manually divided at present is solved.
Referring to fig. 2, a schematic structural diagram of a device for dividing a guest group based on user portrait data according to an embodiment of the present invention includes:
a data integration module for obtaining user portrait data stored in kudu, hdfs, or hive memory; wherein the user representation data comprises behavioral data, attribute data, and tag data; the behavior data includes: user ID, action occurrence time and action content;
the characteristic extraction module is used for calculating the user portrait data according to a logic operation condition and a factor operation condition by taking the behavior data, the attribute data and the label data as input conditions to obtain a target user; associating the target user ID with the user portrait data according to a preset time period, and performing completion and normalization operation on the user portrait data to obtain feature data meeting a preset format; the factor operation condition comprises a numerical value type factor, a character type factor and a time type factor; the characteristic data comprises behavior characteristic data, attribute characteristic data and label characteristic data;
and the guest group division module is used for carrying out matching operation on the feature data and a pre-built feature library and then dividing the target user into corresponding guest groups.
Further, the customer group evaluation module is used for scoring the principal components of the target user in different customer groups by adopting a principal component analysis algorithm and finishing the customer group evaluation according to the scores.
Further, the feature extraction module further includes:
the logic screening module is used for taking the behavior data, the attribute data and the tag data as input conditions and carrying out logic screening on the user portrait data according to logic operation conditions and by adopting a minimum screening principle;
the factor screening module is used for respectively carrying out the operation of a numerical value type factor, a character type factor and a time type factor on the user portrait data of the target user subjected to the logic screening to obtain the user portrait data subjected to the factor operation screening;
the association module is used for associating the user portrait data subjected to factor operation screening with the target user ID according to a time period;
and the completion and normalization module is used for performing completion and normalization operations on the associated data fields to obtain the characteristic data meeting the preset format.
Further, in the default value processing part, a KNN filling algorithm is adopted for data completion; using a linear function normalization algorithm to perform field normalization, converting the user portrait data into the range of [0,1] in a linear mode according to a linear function, and then performing distance measurement and covariance calculation; when the data do not accord with normal distribution, normalization processing is carried out through mean absolute deviation standardization, logarithmic transformation, decimal scaling and sigmoid functions.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (3)

1. A user portrait data-based guest group division method is characterized by comprising the following steps: acquiring user portrait data stored in a kudu, hdfs or hive memory; wherein the user representation data comprises behavioral data, attribute data, and tag data; the behavior data includes: user ID, action occurrence time and action content;
taking the behavior data, the attribute data and the label data as input conditions, and calculating the user portrait data according to logical operation conditions and factor operation conditions to obtain a target user; associating the target user ID with the user portrait data according to a preset time period, and performing completion and normalization operation on the user portrait data to obtain feature data meeting a preset format; the factor operation condition comprises a numerical value type factor, a character type factor and a time type factor; the characteristic data comprises behavior characteristic data, attribute characteristic data and label characteristic data;
after matching operation is carried out on the feature data and a pre-built feature library, the target users are divided into corresponding guest groups;
the method also comprises the steps of scoring the principal components of the target user in different customer groups by adopting a principal component analysis algorithm, and finishing customer group evaluation according to the scores;
creating partitions for the user portrait data according to behavior occurrence time, and performing dynamic barrel storage when the number of the behaviors of the partitions in the current day is greater than a preset number of times;
calculating the user portrait data according to a logic operation condition and a numerical operation condition by taking the behavior data, the attribute data and the tag data as input conditions to obtain a target user; associating the target user ID with the user portrait data according to a preset time period, and performing completion and normalization operation on the user portrait data to obtain feature data meeting a preset format; the method specifically comprises the following steps: taking the behavior data, the attribute data and the tag data as input conditions, and logically screening the user portrait data according to logical operation conditions and by adopting a minimum screening principle; respectively performing the operation of a numerical value type factor, a character type factor and a time type factor on the user portrait data of the target user subjected to the logic screening to obtain the user portrait data subjected to the factor operation screening; associating the user portrait data subjected to factor operation screening with the target user ID according to a time period; completing and normalizing the associated data fields to obtain characteristic data meeting a preset format;
in the default value processing part, a KNN filling algorithm is adopted for data completion; using a linear function normalization algorithm to perform field normalization, converting the user portrait data into the range of [0,1] in a linear mode according to a linear function, and then performing distance measurement and covariance calculation; when the data do not accord with normal distribution, normalization processing is carried out through mean absolute deviation standardization, logarithmic transformation, decimal scaling and sigmoid functions.
2. The method of claim 1, wherein matching the feature data with a pre-built feature library comprises:
when the behavior feature data of the target user is matched with the behavior features in the feature library, if the extracted behavior features contain the features of the feature library, the matching can be judged to be successful; otherwise, judging that the matching is unsuccessful; when the extracted user attribute features are matched with the attribute features in the feature library, if the extracted attribute features contain the features in the feature library, the matching can be judged to be successful; otherwise, judging that the matching is unsuccessful; when the extracted user tag features are matched with the tag features in the feature library, if the extracted tag features contain the features in the feature library, the matching can be judged to be successful; otherwise, the matching is judged to be unsuccessful.
3. An apparatus for dividing a guest group based on user portrait data, comprising:
a data integration module for obtaining user portrait data stored in kudu, hdfs, or hive memory; wherein the user representation data comprises behavioral data, attribute data, and tag data; the behavior data includes: user ID, action occurrence time and action content;
the characteristic extraction module is used for calculating the user portrait data according to a logic operation condition and a factor operation condition by taking the behavior data, the attribute data and the label data as input conditions to obtain a target user; associating the target user ID with the user portrait data according to a preset time period, and performing completion and normalization operation on the user portrait data to obtain feature data meeting a preset format; the factor operation condition comprises a numerical value type factor, a character type factor and a time type factor; the characteristic data comprises behavior characteristic data, attribute characteristic data and label characteristic data;
the guest group division module is used for dividing the target user into corresponding guest groups after matching operation is carried out on the feature data and a pre-established feature library;
the client group evaluation module is used for scoring the principal components of the target user in different client groups by adopting a principal component analysis algorithm and finishing client group evaluation according to the scores;
creating partitions for the user portrait data according to behavior occurrence time, and performing dynamic barreling storage when the number of the daily partition behaviors is larger than a preset number of times;
the feature extraction module further comprises:
the logic screening module is used for taking the behavior data, the attribute data and the tag data as input conditions and carrying out logic screening on the user portrait data according to logic operation conditions and by adopting a minimum screening principle;
the factor screening module is used for respectively carrying out the operation of a numerical value type factor, a character type factor and a time type factor on the user portrait data of the target user subjected to the logic screening to obtain the user portrait data subjected to the factor operation screening;
the association module is used for associating the user portrait data subjected to factor operation screening with the target user ID according to a time period;
the completion and normalization module is used for performing completion and normalization operations on the associated data fields to obtain characteristic data meeting a preset format;
in the default value processing part, a KNN filling algorithm is adopted for data completion; using a linear function normalization algorithm to perform field normalization, converting the user portrait data into the range of [0,1] in a linear mode according to a linear function, and then performing distance measurement and covariance calculation; when the data do not accord with normal distribution, normalization processing is carried out through mean absolute deviation standardization, logarithmic transformation, decimal scaling and sigmoid functions.
CN202011225923.9A 2020-11-05 2020-11-05 User portrait data-based customer group classification management method and device Active CN112396428B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011225923.9A CN112396428B (en) 2020-11-05 2020-11-05 User portrait data-based customer group classification management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011225923.9A CN112396428B (en) 2020-11-05 2020-11-05 User portrait data-based customer group classification management method and device

Publications (2)

Publication Number Publication Date
CN112396428A CN112396428A (en) 2021-02-23
CN112396428B true CN112396428B (en) 2023-04-07

Family

ID=74598226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011225923.9A Active CN112396428B (en) 2020-11-05 2020-11-05 User portrait data-based customer group classification management method and device

Country Status (1)

Country Link
CN (1) CN112396428B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114139657B (en) * 2022-02-07 2022-04-26 深圳索信达数据技术有限公司 Guest group portrait generation method and device, electronic equipment and storage medium
CN115545791B (en) * 2022-10-19 2024-06-25 中电金信软件有限公司 Customer group portrait generation method and device, electronic equipment and storage medium
CN116010693B (en) * 2022-12-28 2023-11-07 广州市玄武无线科技股份有限公司 Information pushing method, device and equipment based on guest group and computer storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504099A (en) * 2015-09-07 2017-03-15 国家计算机网络与信息安全管理中心 A kind of system for building user's portrait
CN106484764A (en) * 2016-08-30 2017-03-08 江苏名通信息科技有限公司 User's similarity calculating method based on crowd portrayal technology
CN108388572A (en) * 2018-01-10 2018-08-10 链家网(北京)科技有限公司 A kind of user's portrait access method
CN108510321A (en) * 2018-03-23 2018-09-07 北京焦点新干线信息技术有限公司 A kind of construction method and device of house property user portrait
CN109978608A (en) * 2019-03-05 2019-07-05 广州海晟科技有限公司 The marketing label analysis extracting method and system of target user's portrait

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
熊友君.七步成就智能商务时代数字化智能营销.《智能商务》.2020,227-241. *

Also Published As

Publication number Publication date
CN112396428A (en) 2021-02-23

Similar Documents

Publication Publication Date Title
CN112396428B (en) User portrait data-based customer group classification management method and device
CN105786860B (en) Data processing method and device in data modeling
CN110363387A (en) Portrait analysis method, device, computer equipment and storage medium based on big data
CN111143685A (en) Recommendation system construction method and device
CN107622326A (en) User's classification, available resources Forecasting Methodology, device and equipment
CN107704883A (en) A kind of sorting technique and system of the grade of magnesite ore
WO2019223104A1 (en) Method and apparatus for determining event influencing factors, terminal device, and readable storage medium
CN113256409A (en) Bank retail customer attrition prediction method based on machine learning
CN112232944B (en) Method and device for creating scoring card and electronic equipment
CN114782761B (en) Intelligent storage material identification method and system based on deep learning
CN113052577A (en) Method and system for estimating category of virtual address of block chain digital currency
CN112685635A (en) Item recommendation method, device, server and storage medium based on classification label
CN113920366A (en) Comprehensive weighted main data identification method based on machine learning
CN113177642A (en) Automatic modeling system for data imbalance
CN112801784A (en) Bit currency address mining method and device for digital currency exchange
CN115510331B (en) Shared resource matching method based on idle amount aggregation
US20240152818A1 (en) Methods for mitigation of algorithmic bias discrimination, proxy discrimination and disparate impact
CN114021716A (en) Model training method and system and electronic equipment
Kulothungan Loan Forecast by Using Machine Learning
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
CN114154548A (en) Sales data sequence classification method and device, computer equipment and storage medium
CN112884028A (en) System resource adjusting method, device and equipment
CN113538020B (en) Method and device for acquiring association degree of group of people features, storage medium and electronic device
CN117055818B (en) Client information storage management method and system based on block chain
CN114281994B (en) Text clustering integration method and system based on three-layer weighting model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Room 305, 3rd Floor, Building 25, No. 10 Jiuxianqiao Road, Chaoyang District, Beijing, 100016

Patentee after: Beijing Analysys Digital Intelligence Technology Co.,Ltd.

Address before: 100020 Room 305, 3rd floor, building 25, 10 Jiuxianqiao Road, Chaoyang District, Beijing

Patentee before: BEIJING ANALYSYS THINK TANK NETWORK TECHNOLOGY Co.,Ltd.