CN117455549A - Consumer ability assessment method based on urban sign indexes - Google Patents

Consumer ability assessment method based on urban sign indexes Download PDF

Info

Publication number
CN117455549A
CN117455549A CN202311488106.6A CN202311488106A CN117455549A CN 117455549 A CN117455549 A CN 117455549A CN 202311488106 A CN202311488106 A CN 202311488106A CN 117455549 A CN117455549 A CN 117455549A
Authority
CN
China
Prior art keywords
consumption
data
model
features
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311488106.6A
Other languages
Chinese (zh)
Inventor
陈曦
张静
王鹏亮
林晓玉
周昌盛
胡伟龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Richstone Technology Co ltd
Original Assignee
Guangzhou Richstone Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Richstone Technology Co ltd filed Critical Guangzhou Richstone Technology Co ltd
Priority to CN202311488106.6A priority Critical patent/CN117455549A/en
Publication of CN117455549A publication Critical patent/CN117455549A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a consumption capability assessment method based on urban sign indexes, which relates to the technical field of smart cities, and comprises the following steps of: the data source at least comprises Unionpay consumption data and operator user data; and (3) data source fusion: searching an intersection between data sources, and establishing an intersection data set; feature engineering processing, extracting relevant features of modeling targets; and adopting a federal learning method to evaluate the consumption capability and the consumption portraits. The evaluation method provided by the invention creates a residential consumption capacity model and a consumption portrait model by integrating the Unionpay consumption data and the operator user data and applying a privacy calculation technology and a Federal learning method, thereby providing decision support for urban management and business development; the method can better understand the consumption behaviors and demands of residents, and provides powerful support for the fields of business decision making, policy making, city planning and the like, so that sustainable development and prosperity of cities are promoted.

Description

Consumer ability assessment method based on urban sign indexes
Technical Field
The invention relates to the technical field of smart cities, in particular to a consumption capability assessment method based on urban sign indexes.
Background
With the continued acceleration of the urban process, urban planning and management becomes increasingly complex and critical. Urban sign indicators are key factors in assessing urban development and quality, including but not limited to, population structure, air quality, traffic congestion, residential consumption, and the like. Cities are considered as an organic life body and are also faced with various "urban illness" problems, just as the human body needs to perform regular physical examination, the cities also need to perform regular physical examination to find problems, diagnose etiology and take effective measures. Urban sign index data becomes an indispensable tool in this process.
Taking resident consumption as an example, under the influence of factors such as external environment, the consumption fatigue becomes a remarkable problem, and the provision of an effective consumption stimulation strategy is particularly important to drive the urban economic development. However, traditional resident consumption behavior and consumption capability analysis depends on single financial system consumption data, lacks comprehensive data support, cannot fully reflect the multidimensional characteristics of resident consumption, and is difficult to form comprehensive resident consumption portraits. In addition, the conventional method often cannot fully consider the fusion relationship among various urban sign indexes. For example, chinese patent publication No. CN109829763a discloses a method and apparatus for evaluating consuming ability, an electronic device, and a storage medium, where the method includes: acquiring historical consumption data of a target user; responding to the defect of transaction category in the historical consumption data of the target user, and performing multiple interpolation on the defect data related to the defect transaction category to obtain a plurality of complete data sets; carrying out data analysis on the plurality of complete data sets to obtain characteristic parameters corresponding to each complete data set; and obtaining a target value representing the consumption capability through the characteristic parameters, and evaluating the consumption capability of the target user according to the target value.
Therefore, a new data model construction method is needed to solve the multi-dimensional and deep consumption data mining analysis problem, so as to better support the decision making of urban economic growth and sustainable development.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a consumption capability assessment method based on urban sign indexes.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a consumption capability assessment method based on urban sign indexes comprises the following steps:
step S1: data source preparation: the data source at least comprises Unionpay consumption data and operator user data;
step S2: and (3) data source fusion: searching for intersections between the data sources;
step S3: carrying out characteristic engineering treatment on the fused data sources;
step S4: and predicting and evaluating the consumption capacity and the consumption portraits by adopting a federal learning method.
Based on the above technical solution, in step S1, basic information and consumption information of residents are obtained through the Unionpay consumption data packet interface, wherein the Unionpay consumption data includes basic information and consumption information of users, and the consumption information of users is stored in a database. And the basic information of the user such as name, identification card number, bank card number, etc., the consumption information of the user such as consumption amount, consumption date, consumption behavior (consumption type), consumption location, etc.
Based on the above technical solution, in step S1, privacy data of users related to the operator user data is jointly modeled in the operator database by a privacy calculation method. The operator user data comprise user portrait data such as mobile phone numbers, sexes, ages, professions, academic levels, marital status, number of children and the like.
Based on the above technical solution, in step S2, a hidden set intersection method is adopted to fuse the two data sources, encryption matching is performed according to the common key attribute existing between the two data sources, common information existing between the two data sources is determined, and the common information is stored in the established intersection data set. The encryption technology is needed to protect the data privacy in the process, so that no plaintext sensitive information is revealed in the data fusion process.
Based on the above technical solution, in step S2, the data source fusion process includes the following steps: step S21: carrying out hash processing on the two data sources by adopting a hash function; step S22: the hash values corresponding to the two processed data sources are sent to the other party, and the two parties exchange the hash values; step S23: the two parties locally compare the hash value of the other party with the hash value of the other party, find a common hash value, and find a common intersection data set stored in the respective databases through the common hash value.
Based on the above technical solution, in step S3, the feature engineering process further includes a process for feature construction, where the feature construction is to select useful features from the original data source, and combine the useful features into a new subset. For example: raw feature data set: age, income, marital status, consumption behavior, number of children, etc.; some new features are constructed from these original features.
Based on the above technical solution, in step S3, the feature engineering process includes a process of deriving features, where the process is: and constructing new features according to the business knowledge or the relation between the features. For example: and calculating the business district economy according to the data such as the consumption behavior, the consumption position, the consumption time and the like, forming business district consumption characteristics and business district night economy characteristics, and analyzing and ranking the whole business district.
Based on the above technical solution, in step S3, the feature engineering process includes a process of selecting features, where the process is: firstly, adopting a machine learning method to recursively remove the characteristics with the prediction capability on the target variable; then selecting features by constructing a model and gradually removing features with small contribution to the model predictive ability, wherein the target variable refers to an output or a label in the model being constructed; and finally, sorting the features according to the importance scores, and eliminating the features with the lowest importance scores until the specified feature quantity or the model performance is not improved any more.
Based on the above technical solution, in step S3, the feature engineering process includes a process of feature combination; the treatment process comprises the following steps: the different features are combined to form new features.
Based on the above technical solution, in step S4, the evaluation process performed by using the federal learning method includes the following steps: step S41: selecting a federal learning model; step S42: constructing a consumption capability model and evaluating; step S43: training the consumption portrait model according to the selected federal learning model; step S44: and predicting and evaluating the trained consumer portrait model. Specifically, the consumption portraits are obtained by carrying out joint modeling through the Unionpay and the operator data, and then taking out parts with the same users and the incomplete characteristics in the participant data to carry out federal learning modeling training.
Based on the above technical scheme, in step S42, the consumption amount of the user cardholder in the set period is counted according to the local Union consumption data, and the consumption capability gears are divided according to the calculation of the percentage of the transaction amount of each cardholder, and are set as a significant low consumption capability gear, a significant consumption capability high gear, and a high consumption capability high consumption gear.
Based on the above technical solution, further, the integrated application of the constructed consumption capability model and the trained consumption portrait model to the scene at least includes: urban business district economic analysis, real estate policy regulation and control, residential consumer roll distribution, urban night economic analysis, traffic planning and urban development.
Compared with the prior art, the invention has the following beneficial effects:
(1) The evaluation method provided by the invention creates a resident consumption capacity model and a resident consumption portrait model by integrating the Unionpay consumption data and the operator user data and applying a privacy technology and a Federal learning method. Its main objectives include providing accurate assessment of consumption capabilities, and providing decision support for urban management and business development. The method can better understand the consumption behavior and the demand of residents, and provides powerful support for the fields of business decision making, policy making, city planning and the like, thereby promoting the sustainable development and prosperity of cities.
(2) Data fusion and comprehensiveness: the invention realizes the fusion of the Unionpay consumption data and the operator user data, so that the urban manager can acquire more comprehensive and comprehensive information, and is helpful for more accurately evaluating and analyzing the consumption capability and behavior of urban residents.
(3) Privacy protection: through the method for solving the PSI by the hidden set, the invention protects the privacy of the user, ensures that the sensitive information of the user cannot be revealed, and realizes the safe cross matching of the data.
(4) Training a joint model: by adopting the federal learning technology, multiple parties are allowed to perform model training on local data, the privacy risk of data sharing and transmission is avoided, meanwhile, a stronger global model is built by sharing model parameters, and the multidimensional property and the accuracy of the model are improved.
(5) Decision support and city management: by constructing data models such as consumption capacity, consumption portraits and the like, the invention provides a powerful tool for city managers, can be used for city planning and management decision, helps cities to better meet resident demands, improves resident life quality and promotes urban economic development.
(6) Custom analysis: according to the invention, different models, such as urban night economy analysis, urban business district ranking, residential consumption volume distribution, real estate policy regulation and control and the like, can be flexibly customized according to specific requirements of urban managers, support deep analysis of various layers and fields, and provide more targeted data and insights for decisions in different aspects.
Drawings
FIG. 1 is a flow chart of the evaluation method of the present invention.
Detailed Description
The invention is further illustrated and described below with reference to the drawings and detailed description. The technical features of the embodiments of the invention can be combined correspondingly on the premise of no mutual conflict.
In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, whereby the invention is not limited to the specific embodiments disclosed below. The technical features of the embodiments of the invention can be combined correspondingly on the premise of no mutual conflict.
Example 1
The method provides a consumption capability assessment method based on urban sign indexes, which comprises data fusion, privacy protection, joint model training and algorithm selection. Specifically, the combination of different data sources is realized through the combination of the Unionpay consumption data and the operator user data, and the dimension of model data is improved. And the user privacy protection is focused, and encryption matching is carried out on the sensitive information of the user by adopting a method of solving the PSI by using a hidden set, so that the data privacy is ensured not to be revealed. Model training can be performed on local data among multiple parties through the federal learning technology, sensitive data transmission is avoided, and model parameters are shared to construct a stronger global model. Algorithms such as logistic regression and linear regression are selected, data models such as consumption capacity and consumption portraits are constructed, and tools for decision support and deep analysis are provided for city managers. The method combines the technology fusion application to form an innovative method capable of mining the consumption capability and consumption portraits of residents so as to evaluate the urban condition more accurately and comprehensively, find problems and make countermeasures. By the method, key roles in city planning and management can be played, comprehensive data support is provided for city development, resident privacy is protected, and more scientific basis is provided for decision making and resource allocation.
As shown in fig. 1, the method comprises the following steps: step S1: data source preparation: the data source at least comprises Unionpay consumption data and operator user data; specifically, resident basic information and consumption information are obtained through a Unionpay consumption data packet interface, wherein Unionpay consumption data comprise basic information and consumption information of a user, and the consumption information of the user is stored in a Unionpay consumption database. And the basic information of the user such as name, identification card number, bank card number, etc., the consumption information of the user such as consumption amount, consumption date, consumption behavior (consumption type), consumption location, etc.
The specific data structure of the Unionpay consumption data is shown in the following table 1:
TABLE 1
In the preparation process of the operator user data, due to the data security requirement, the operator user data relates to the fact that the user privacy data cannot be taken out of the database, and the joint modeling is carried out in the operator database through privacy calculation. The operator user data comprise user portrait data such as mobile phone numbers, sexes, ages, professions, academic levels, marital status, number of children and the like. Whereas the specific data structure provided by the operator is as follows in table 2:
TABLE 2
Step S2: and (3) data source fusion: searching for intersections between the data sources; specifically, the method of solving the intersection PSI by using the hidden set is adopted to fuse the two data sources, and the intersection between the two data sources needs to be found under the condition of not revealing any information. And carrying out encryption matching according to the common key attribute existing between the two data sources, determining common information existing between the two data sources, and storing the common information into the established intersection data set. That is, for the Unionpay consumption data and the carrier user data, encryption matching may be performed according to a common key attribute (e.g., a cell phone number) to determine the common user present in the two data sources. An intersection dataset is created that includes the intersection of the data held by both parties. The process needs to use encryption technology to protect data privacy and ensure that plaintext sensitive information is not revealed in the data fusion process, wherein the encryption technology is the main security measure adopted by electronic commerce and is the most commonly used security measure, important data is changed into messy codes (encrypted) to be transmitted by using the technical measure, and the important data is restored (decrypted) by the same or different means after reaching a destination.
Further, the data source fusion process includes the steps of:
step S21: carrying out hash processing on the two data sources by adopting a hash function; specifically, two data sources of the silver-combined consumption data (referred to as a party a below) and the operator user data (referred to as a party B below) are a data source X and a data source Y, respectively. It is desirable to find the intersection of these two data sources while preserving data privacy. The mobile phone number in each data source serves as a common key attribute. Selecting MD5 hash function each party hashes its handset number using the MD5 hash function as exemplified in table 3 below:
TABLE 3 Table 3
Step S22: the hash values corresponding to the two processed data sources are sent to the other party, and the two parties exchange the hash values; specifically, after each of the parties a and B hashes its own data source, a hash value is generated for each data element. Party a calculates hash values of data source X, stores these hash values and sends them to party B. Party B calculates the hash values of data source Y, stores these hash values and sends them to party a.
Step S23: the two parties locally compare the hash value of the other party with the hash value of the other party, find a common hash value, and find a common intersection data set stored in the respective databases through the common hash value. Specifically, taking mobile phone number security exchange as an example, the two-party data uses the 11-bit mobile phone number of the user to carry out md5 encryption 32-bit lowercase value as a matching field, and carries out mobile phone number security exchange. Wherein, after the completion of the concealment set intersection (PSI), the data of both parties are still kept in the respective databases. The intersection of a set of insights is a privacy preserving technique that aims to find the intersection between two sets of data without revealing detailed information of the data, which means that the data remains private between the two parties.
Step S3: carrying out feature engineering processing on the fused data sources, and extracting features related to modeling targets; in particular, the feature engineering process includes a feature construction process in which features are selected from the original data source to be useful features and the useful features are combined into a new subset. The method can acquire features from the Unionpay consumption data and the operator data, transform and combine the features and the like. For example: original characteristic data source: age, income, marital status, consumption behavior, number of children, etc.; some new features are constructed from these original features. Further examples, a data source is first created that contains original features; then, a number of different feature configurations are demonstrated, such as: 1. a new feature "household revenue" is created, representing the total revenue of the household by multiplying the "revenue" feature with the "quantity" feature. 2. And creating a new feature of 'high-income occupation', and judging whether the new feature belongs to the high-income occupation by processing the 'occupation' feature. 3. A new feature 'wedding in middle-aged' is created, and whether the people are wedding in middle-aged or not is judged through application of conditions.
The feature engineering treatment comprises feature deriving treatment, wherein the treatment process comprises the following steps: and constructing new features according to the business knowledge or the relation between the features. For example: and calculating the business district economy according to the data such as the consumption behavior, the consumption position, the consumption time and the like, forming business district consumption characteristics and business district night economy characteristics, and analyzing and ranking the whole business district. Procedure example: three data tables are first defined: behavior_data represents consumption behavior data, location_data represents consumption location data, and time_data represents consumption time data. Then, the three tables are merged according to the user_id field using the merge function to obtain combined_data. And classifying the consumption according to the positions by utilizing a groupby function, and calculating the total consumption of each position to obtain the economic_feature. Finally, ranking the economic features by using the rank function to obtain the economic_ranking.
The feature engineering processing comprises feature selection processing, wherein the processing process is as follows: firstly, adopting a machine learning method to recursively remove the characteristic with prediction capability on the target variable, and preferably adopting the characteristic with stronger prediction capability, wherein the 'stronger' judging condition generally means that the prediction capability of the characteristic on the target variable is obvious, namely that obvious correlation exists between the characteristic and the target variable, which can be measured by some statistical indexes (such as correlation coefficient, mutual information, analysis of variance and the like), and the higher the correlation between the characteristic and the target variable is, the more likely the characteristic is considered to be 'stronger'; while the target variable refers to the output or label in the model being constructed; at each iteration, the model is trained using the remaining features and an importance score for each feature is calculated; then selecting features by constructing a model and gradually removing features with small contribution to the model prediction capability, specifically constructing a target model related to business, and constructing the model by using a machine learning method Recursive Feature Elimination (RFE); where it is preferable to have features that contribute less, by "less" is meant that features do not significantly improve the predicted performance of the model, i.e., in model training, even if these features are removed, the performance of the model does not change much, which can be determined by cross-validation or other performance assessment methods, the small contribution of features means that they do not significantly improve the performance of the model; and finally, sorting the features according to the importance scores, and eliminating the features with the lowest importance scores until the specified feature quantity or the model performance is not improved any more. Recursive feature elimination example: by automatically adjusting the number of selected features through cross-validation, cross-validation scores of the classification sets under the number of features are preferably drawn, and it can be seen that the RFECV can automatically select the effective feature number suitable for classification. The process idea, i.e., recursive Feature Elimination (RFE) using a machine learning method, starts with all features, first model training using all features, and then ordering each feature according to its performance (typically according to the importance or weight of the feature). In each iteration, the RFE will delete one or more features with the lowest score and then retrain the model. This process is repeated until a stop condition (e.g., a specified number of features) is reached. The score of a feature is calculated by accuracy, F1score, or R-square value. In each iteration, the RFE trains the model and evaluates its performance, and then ranks each feature according to its output feature importance or weight. Features with lower scores are considered to contribute less to the performance of the model and are culled progressively to improve the performance of the model. This process is repeated until a stop condition (e.g., a specified number of features or target performance level) is reached.
The feature engineering processing comprises feature combination processing; the treatment process comprises the following steps: the different features are combined to form new features. For example, marital status and number of children may be combined into one family status feature; combining people who are habitually consuming the gym into a sports person feature; the professions are doctors, lawyers, engineers and academic levels combined into a high-school feature profession and the like. Examples: combination of gender and consumption behavior: binary logistic regression may be used to predict consumption behavior, e.g., to split gender into male and female, consumption behavior into high and low, and then to predict consumption behavior. Besides adopting a binary logistic regression method, the method can also adopt a polynomial characteristic or characteristic crossing mode to combine the characteristics to form new characteristics so as to further predict the consumption behavior. The polynomial feature is that by performing polynomial expansion on the original feature, a higher-order feature combination can be generated, for example, two features a and b are combined into terms of ab, b square, a square and the like, which can be realized by using PolynomialFeatures of a machine learning library such as Scikit-Learn and the like. Feature interleaving refers to combining the values of different features with each other, e.g., multiplying or dividing the values of feature a and feature b, which may capture the correlation between features.
Step S4: and (5) carrying out consumption capability assessment by adopting a federal learning method. The evaluation process by adopting the federal learning method comprises the following steps:
step S41: selecting a federal learning model; specifically, federal learning is classified into three types, horizontal federal learning, vertical federal learning, and federal transfer learning. Taking two institutions in a certain city as an example, the A institution is a consumption record of a user in a silver-union mode, the B institution is an operator with user data, the two institutions have a plurality of overlapped users, but the recorded data features are different, and the two institutions want to jointly train a stronger data model by encrypting and aggregating the different features of the users, so that longitudinal federal learning is selected to be most suitable. Longitudinal federal learning requires first sample alignment, i.e., finding out common samples owned by participants, also known as "database hits". Longitudinal federation increases the feature dimension of training samples. In this example, the consumption ability is divided by the absolute value of consumption, and no portrait information such as occupation, marital, child, education level, etc. of the user is present, and if it is desired to obtain what portrait features are present in a crowd with low consumption? What is the academic hierarchy? Whether alone or married? And the occupation of the high-volume consumer group has the characteristics or is concentrated in the areas and other portrait information, and after the encryption primary keys (mobile phone numbers) on two sides are matched in combination with the user portrait data of the operators, the similar corresponding index classification can be interpreted more specifically, so that more-dimensional and deeper analysis and mining are performed.
Step S42: constructing a consumption capability model; specifically, the consumption amount of the resident cardholder in the home city is counted according to the local Unionpay consumption data, and the consumption capacity gears are divided according to the calculation of the percentage of the transaction amount of each cardholder, and are set to be a significantly low consumption capacity gear, a relatively equivalent consumption capacity gear, a high consumption capacity gear, a significantly high consumption capacity gear and a high consumption capacity consumption gear. The consumption amount threshold is based on the transaction condition of the monthly home market cardholder, and statistics is carried out according to the boundary condition of the distribution after the distribution is run out by using a logistic regression model. The following table 4 illustrates:
TABLE 4 Table 4
Step S43: training the consumption portrait model according to the selected federal learning model; specifically, a vertical federal learning model is selected for training, for example, company a has resident consumption data of a city, and company B has resident portrait data of the same city. Matching is performed according to the mobile phone numbers of the users of the two parties, and the parts with the same users and the incomplete characteristics in the data of the participants are taken out for joint training. For reasons of user privacy and data security, the a-party and the B-party cannot directly exchange data, and in order to ensure data confidentiality in the training process, a third-party coordinator C is needed. The specific training process is as follows:and->Local data representing company A, company B,/-respectively>Representing a A, B company trained local model, respectively.
The objective function is:
can be further simplified to make
Merging b intoThe loss function becomes:
order the
The encrypted loss function is:
order the
Then
Order the
Then the loss function pairThe gradients of (2) are respectively: />
Because the private key is stored in the dispatcher C and no private key is used for decryption, the private key is not revealed in the exchange, the encryption gradient sent to C can be masked by a random number, only a and B know, and the gradient is not directly exposed to C.
Step S44: and predicting and evaluating the trained consumption portrait model, wherein the consumption portrait is obtained by carrying out federal learning modeling training on parts with the same user and the incomplete characteristics in the participant data after carrying out joint modeling on the silver-linked and operator data. Specifically, in combination with the case, the prediction process of the consumer representation model is as follows: step one: the A side: the method is free; and B,: the method is free; and C,: and transmitting i (i refers to the mobile phone number of the user) to the A party and the B party. Step two: the A side: calculation ofAnd sends it to party C; and B,: calculate->And sends it to party C; and C,: calculate->Results of (2); c square: and (5) calculating results.
The evaluation of the effect of the consumer representation model is illustrated in Table 5 below:
TABLE 5
The KS statistic is used for measuring the maximum difference between the true case rate and the false case rate of the classification model under different thresholds; the larger the KS value is, the better the model performance is; typically, the value of KS ranges from 0 to 1, with a value of KS closer to 1 indicating better model performance. F1Score is an index that comprehensively considers the accuracy (Precision) and Recall (Recall) of the classification model; the method is a harmonic average value of the two, and is used for measuring the classification accuracy of the model in the positive category and the negative category; the value of F1Score ranges from 0 to 1, and a value closer to 1 indicates better model performance. AUC refers to a performance index for measuring the quality of a learner, which is the area under the ROC curve, and is used for measuring the performance of a classification model; the ROC curve is a curve drawn by taking different classification thresholds as an abscissa and the true case rate (True Positive Rate) and the false case rate (False Positive Rate) as an ordinate; the AUC ranges from 0 to 1, with a closer to 1 indicating better model performance. AUC (Area Under the Curve), KS (Kolmogorov-Smirnov) and F1Score are common indicators of evaluating model performance, ranging from 0 to 1, with values closer to 1 indicating better model performance.
As is clear from the model shown in Table 5, the model effect was stable, as it was 0.8 on auc and 0.57 on ks.
Further, the integrated application of the constructed consumption capability model and the trained consumption portrayal model to the scene at least comprises: urban business district economic analysis, real estate policy regulation and control, residential consumer roll distribution, urban night economic analysis, traffic planning and urban development. Specifically, the specific situation of the application scene is comprehensively analyzed from two dimensions of the consumption capability and the consumption portrait, for example,
1) Urban business district economic analysis: based on these models, city managers can get in depth knowledge of the residential consumption capacity and behavior habits of different business communities. This facilitates business district planning and operation, determines appropriate business development strategies, pricing strategies, and store positioning to maximize residential demand, improving business district economic viability.
2) Real estate policy regulation: the consumption capability and cardholder value model may be used to evaluate the house demand and purchasing power in different areas. Governments may adjust real estate policies, such as limited purchase policies and loan policies, based on these models to maintain stable and sustainable development of the real estate market.
3) Residential consumer roll distribution: by knowing the consumer's ability and freedom of consumption, the city manager can more precisely formulate a consumption incentive policy, such as the issuance of a consumption coupon. This encourages consumption, promotes economic growth, and minimizes waste.
4) Urban night economic analysis: through the consumption portraits and the consumption degree of freedom model, city managers can know night consumption preference and behavior of residents. This is very important for developing the night economy and night service industries of cities, contributing to improving the safety and attractiveness of cities.
5) Traffic planning and city development: these models can also be used for traffic planning and infrastructure construction of cities. Knowing the travel habits and consumption capacities of different residents helps to optimize traffic networks and public transportation systems and ensure sustainable development of cities.
Finally, it should be noted that the above description is only for illustrating the technical solution of the present invention, and not for limiting the scope of the present invention, and that the simple modification and equivalent substitution of the technical solution of the present invention can be made by those skilled in the art without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. The consumption capability assessment method based on the urban sign index is characterized by comprising the following steps of:
step S1: data source preparation: the data source at least comprises Unionpay consumption data and operator user data;
step S2: and (3) data source fusion: searching for intersections between the data sources;
step S3: carrying out characteristic engineering treatment on the fused data sources;
step S4: and evaluating the consumption capacity and the consumption portraits by adopting a federal learning method.
2. The method for evaluating the consumption capability based on the urban sign index according to claim 1, wherein in the step S1, the resident basic information and the consumption information are obtained through a silver-linked consumption data packet interface, wherein the silver-linked consumption data comprises the basic information and the consumption information of the user, and the consumption information of the user is stored in a silver-linked consumption database.
3. The method for evaluating the consumption ability based on the urban sign index according to claim 1, wherein in step S1, privacy data of users involved in the operator user data are jointly modeled in an operator database by a privacy calculation mode.
4. The method for evaluating the consumption ability based on the urban sign index according to claim 1, wherein in the step S2, the data source fusion process comprises the following steps:
step S21: carrying out hash processing on the two data sources by adopting a hash function;
step S22: the hash values corresponding to the two processed data sources are sent to the other party, and the two parties exchange the hash values;
step S23: the two parties locally compare the hash value of the other party with the hash value of the other party, find a common hash value, and find a common intersection data set stored in the respective databases through the common hash value.
5. The method of claim 1, wherein in step S3, the feature engineering process includes a feature construction process in which feature constructions are selected from the original data sources to be useful features, and the useful features are combined into a new subset.
6. The method for evaluating the consumption ability based on the urban sign index according to claim 1, wherein in the step S3, the feature engineering process comprises a feature deriving process, and the processing procedure is as follows: and constructing new features according to the business knowledge or the relation between the features.
7. The method for evaluating the consumption ability based on the urban sign index according to claim 1, wherein in step S3, the feature engineering process comprises a feature selection process, and the processing procedure is as follows: firstly, adopting a machine learning method to recursively remove the characteristics with the prediction capability on the target variable; then selecting features by constructing a model and gradually removing features with small contribution to the model predictive ability, wherein the target variable refers to an output or a label in the model being constructed; and finally, sorting the features according to the importance scores, and eliminating the features with the lowest importance scores until the specified feature quantity or the model performance is not improved any more.
8. The method for evaluating the consumption ability based on the urban sign index according to claim 1, wherein in step S3, the feature engineering process comprises a feature combination process; the treatment process comprises the following steps: the different features are combined to form new features.
9. The method for evaluating the consumption ability based on the urban sign index according to claim 1, wherein in the step S4, the evaluation process comprises the steps of:
step S41: selecting a federal learning model;
step S42: constructing a consumption capability model and evaluating;
step S43: training the consumption portrait model according to the selected federal learning model;
step S44: and predicting and evaluating the trained consumer portrait model.
10. The method for evaluating the consumption capability based on the urban sign indexes according to claim 9, wherein in step S42, the consumption amount of the user cardholder in the set period of time is counted according to the local silver-linked consumption data, and the consumption capability gears are divided according to the calculation of the percentage of the transaction amount of each cardholder, and are set as a significantly low consumption capability gear, a relatively high consumption capability gear, a significantly high consumption capability gear and a high consumption capability gear.
CN202311488106.6A 2023-11-08 2023-11-08 Consumer ability assessment method based on urban sign indexes Pending CN117455549A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311488106.6A CN117455549A (en) 2023-11-08 2023-11-08 Consumer ability assessment method based on urban sign indexes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311488106.6A CN117455549A (en) 2023-11-08 2023-11-08 Consumer ability assessment method based on urban sign indexes

Publications (1)

Publication Number Publication Date
CN117455549A true CN117455549A (en) 2024-01-26

Family

ID=89585211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311488106.6A Pending CN117455549A (en) 2023-11-08 2023-11-08 Consumer ability assessment method based on urban sign indexes

Country Status (1)

Country Link
CN (1) CN117455549A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163979A (en) * 2020-10-19 2021-01-01 科技谷(厦门)信息技术有限公司 Urban traffic trip data analysis method based on federal learning
CN113240509A (en) * 2021-05-18 2021-08-10 重庆邮电大学 Loan risk assessment method based on multi-source data federal learning
CN113313538A (en) * 2021-06-30 2021-08-27 上海浦东发展银行股份有限公司 User consumption capacity prediction method and device, electronic equipment and storage medium
CN113393357A (en) * 2021-06-03 2021-09-14 八维通科技有限公司 Data center station system suitable for urban traffic trip data service

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163979A (en) * 2020-10-19 2021-01-01 科技谷(厦门)信息技术有限公司 Urban traffic trip data analysis method based on federal learning
CN113240509A (en) * 2021-05-18 2021-08-10 重庆邮电大学 Loan risk assessment method based on multi-source data federal learning
CN113393357A (en) * 2021-06-03 2021-09-14 八维通科技有限公司 Data center station system suitable for urban traffic trip data service
CN113313538A (en) * 2021-06-30 2021-08-27 上海浦东发展银行股份有限公司 User consumption capacity prediction method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Ferrer et al. Bias and discrimination in AI: a cross-disciplinary perspective
Lee et al. Algorithmic fairness in mortgage lending: from absolute conditions to relational trade-offs
CN104737152B (en) System and method for information to be transformed into another data set from a data set
TW202022769A (en) Risk identification model training method and device and server
Aiken et al. Machine learning and mobile phone data can improve the targeting of humanitarian assistance
US12026281B2 (en) Method for creating avatars for protecting sensitive data
Schroeder et al. Automated criminal link analysis based on domain knowledge
Kelley et al. Antidiscrimination laws, artificial intelligence, and gender bias: A case study in nonmortgage fintech lending
CN113609193A (en) Method and device for training prediction model for predicting customer transaction behavior
CN111951104A (en) Risk conduction early warning method based on associated graph
Micheni Diffusion of big data and analytics in developing countries
CN112116103A (en) Method, device and system for evaluating personal qualification based on federal learning and storage medium
US20230052225A1 (en) Methods and computer systems for automated event detection based on machine learning
Escudero et al. Risk terrain modeling for monitoring illicit drugs markets across Bogota, Colombia
Yu et al. Credit scoring with AHP and fuzzy comprehensive evaluation based on behavioural data from weibo platform
Abuhusain The role of artificial intelligence and big data on loan decisions
Xu et al. MSEs credit risk assessment model based on federated learning and feature selection
CN118037304A (en) Financial risk grade marking method and system based on data mining
CN105447117A (en) User clustering method and apparatus
Shan et al. Incorporating user behavior flow for user risk assessment
Preko et al. The study of the impact of business intelligence in the banking industry of Ghana
CN117455549A (en) Consumer ability assessment method based on urban sign indexes
Lin et al. Research on Credit Big Data Algorithm Based on Logistic Regression
KR101725015B1 (en) Appartus for home sales index prediction using artificial neural network and method thereof
EP4138021A1 (en) Method of scoring and valuing data for exchange

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination