CN117455549A - A consumption power assessment method based on urban physical indicators - Google Patents

A consumption power assessment method based on urban physical indicators Download PDF

Info

Publication number
CN117455549A
CN117455549A CN202311488106.6A CN202311488106A CN117455549A CN 117455549 A CN117455549 A CN 117455549A CN 202311488106 A CN202311488106 A CN 202311488106A CN 117455549 A CN117455549 A CN 117455549A
Authority
CN
China
Prior art keywords
consumption
data
features
model
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311488106.6A
Other languages
Chinese (zh)
Inventor
陈曦
张静
王鹏亮
林晓玉
周昌盛
胡伟龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Richstone Technology Co ltd
Original Assignee
Guangzhou Richstone Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Richstone Technology Co ltd filed Critical Guangzhou Richstone Technology Co ltd
Priority to CN202311488106.6A priority Critical patent/CN117455549A/en
Publication of CN117455549A publication Critical patent/CN117455549A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a consumption capability assessment method based on urban sign indexes, which relates to the technical field of smart cities, and comprises the following steps of: the data source at least comprises Unionpay consumption data and operator user data; and (3) data source fusion: searching an intersection between data sources, and establishing an intersection data set; feature engineering processing, extracting relevant features of modeling targets; and adopting a federal learning method to evaluate the consumption capability and the consumption portraits. The evaluation method provided by the invention creates a residential consumption capacity model and a consumption portrait model by integrating the Unionpay consumption data and the operator user data and applying a privacy calculation technology and a Federal learning method, thereby providing decision support for urban management and business development; the method can better understand the consumption behaviors and demands of residents, and provides powerful support for the fields of business decision making, policy making, city planning and the like, so that sustainable development and prosperity of cities are promoted.

Description

一种基于城市体征指标的消费能力评估方法A consumption power assessment method based on urban physical indicators

技术领域Technical field

本发明涉及智慧城市技术领域,具体涉及一种基于城市体征指标的消费能力评估方法。The invention relates to the technical field of smart cities, and specifically relates to a consumption capacity assessment method based on urban physical indicators.

背景技术Background technique

随着城市化进程的不断加速,城市规划和管理变得愈发复杂和关键。城市体征指标是评估城市发展和质量的关键因素,包括但不限于人口结构、空气质量、交通拥堵、居民消费等各方面。城市被视为一个有机生命体,也面临着各种"城市病"问题,就像人体需要定期体检一样,城市也需要进行定期体检以发现问题、诊断病因并采取有效措施。城市体征指标数据成为了这一过程中不可或缺的工具。As urbanization continues to accelerate, urban planning and management have become increasingly complex and critical. Urban physical indicators are key factors in assessing urban development and quality, including but not limited to population structure, air quality, traffic congestion, residents' consumption and other aspects. The city is regarded as an organic life form and is also faced with various "urban disease" problems. Just like the human body requires regular physical examinations, cities also need regular physical examinations to discover problems, diagnose the causes and take effective measures. Urban vital indicator data has become an indispensable tool in this process.

以居民消费为例,在外部环境等因素的影响下,消费疲软成为一个显著的问题,提供有效的消费刺激策略从而带动城市经济发展显得尤为重要。然而,传统的居民消费行为和消费能力分析依赖于单一的金融系统消费数据,缺乏全面的数据支持,无法充分反映居民消费的多维特点,也难以形成全面的居民消费画像。此外,传统方法往往无法充分考虑各种城市体征指标之间的融合关系。例如,公开号为CN109829763A的中国专利公开了一种消费能力评估方法及装置、电子设备、存储介质,该方法包括:获取目标用户的历史消费数据;响应于所述目标用户历史消费数据中的交易类目存在缺失,对关联于缺失交易类目的缺失数据进行多重插补,以得到多个完全数据集;对所述多个完全数据集进行数据分析,得到每个完全数据集对应的特征参数;通过所述特征参数得到表征消费能力的目标值,并根据所述目标值对所述目标用户的消费能力进行评估。Taking resident consumption as an example, under the influence of external environment and other factors, weak consumption has become a significant problem. It is particularly important to provide effective consumption stimulation strategies to drive urban economic development. However, the traditional analysis of residents’ consumption behavior and consumption power relies on a single financial system consumption data, which lacks comprehensive data support, cannot fully reflect the multi-dimensional characteristics of residents’ consumption, and is difficult to form a comprehensive portrait of residents’ consumption. In addition, traditional methods often fail to fully consider the integration relationship between various urban physical indicators. For example, the Chinese patent with publication number CN109829763A discloses a consumption power assessment method and device, electronic equipment, and storage media. The method includes: obtaining the historical consumption data of a target user; responding to transactions in the target user's historical consumption data If there are missing categories, perform multiple interpolation on the missing data associated with the missing transaction categories to obtain multiple complete data sets; perform data analysis on the multiple complete data sets to obtain the characteristic parameters corresponding to each complete data set. ; Obtain a target value representing the consumption ability through the characteristic parameters, and evaluate the consumption ability of the target user according to the target value.

因此,亟需一种新的数据模型构建方法,以解决多维度、深层次的消费数据挖掘分析问题,从而更好地支持城市经济增长和可持续发展的决策制定。Therefore, a new data model construction method is urgently needed to solve the problem of multi-dimensional and deep consumption data mining and analysis, so as to better support decision-making for urban economic growth and sustainable development.

发明内容Contents of the invention

本发明针对现有技术存在的问题,提供了一种基于城市体征指标的消费能力评估方法。In view of the problems existing in the existing technology, the present invention provides a consumption power assessment method based on urban physical signs indicators.

为实现上述目的,本发明采用的技术方案如下:In order to achieve the above objects, the technical solutions adopted by the present invention are as follows:

一种基于城市体征指标的消费能力评估方法,包括以下步骤:A consumption capacity assessment method based on urban physical indicators, including the following steps:

步骤S1:数据源准备:数据源至少包括银联消费数据和运营商用户数据;Step S1: Data source preparation: The data source at least includes UnionPay consumption data and operator user data;

步骤S2:数据源融合:寻找数据源之间的交集;Step S2: Data source fusion: find the intersection between data sources;

步骤S3:对融合后的数据源进行特征工程处理;Step S3: Perform feature engineering processing on the fused data source;

步骤S4:采用联邦学习方法进行消费能力及消费画像的预测和评估。Step S4: Use federated learning methods to predict and evaluate consumption power and consumption portraits.

基于上述技术方案,更进一步地,步骤S1中,通过银联消费数据包接口,获取居民基础信息和消费信息,其中,银联消费数据包括用户的基础信息、消费信息,并将用户的消费信息存储至数据库中。而用户的基本信息比如姓名、身份证号码、银行卡号等,用户的消费信息比如消费金额、消费日期、消费行为(消费类型)、消费位置等。Based on the above technical solution, further, in step S1, the basic information and consumption information of residents are obtained through the UnionPay consumption data package interface, where the UnionPay consumption data includes the user's basic information and consumption information, and the user's consumption information is stored in in the database. The user's basic information such as name, ID number, bank card number, etc., and the user's consumption information such as consumption amount, consumption date, consumption behavior (consumption type), consumption location, etc.

基于上述技术方案,更进一步地,步骤S1中,通过隐私计算方式将运营商用户数据涉及到的用户的隐私数据在运营商数据库中进行联合建模。其中,运营商用户数据包括手机号、性别、年龄、职业、学历层次、婚姻状况、子女数量等用户画像数据。Based on the above technical solution, further, in step S1, the user's privacy data involved in the operator's user data is jointly modeled in the operator's database through privacy calculation methods. Among them, operator user data includes user portrait data such as mobile phone number, gender, age, occupation, education level, marital status, number of children, etc.

基于上述技术方案,更进一步地,步骤S2中,采用隐匿集合求交方法,将两个数据源进行融合,根据两个数据源之间存在的共同关键属性进行加密匹配,确定两个数据源存在的共同信息,并将共同信息存储至建立的交集数据集中。其中,这个过程需要使用加密技术来保护数据隐私,确保没有明文敏感信息在数据融合过程中泄露。Based on the above technical solution, further, in step S2, the hidden set intersection method is used to fuse the two data sources, and encrypted matching is performed based on the common key attributes existing between the two data sources to determine the existence of the two data sources. common information, and store the common information into the established intersection data set. Among them, this process requires the use of encryption technology to protect data privacy and ensure that no clear text sensitive information is leaked during the data fusion process.

基于上述技术方案,更进一步地,步骤S2中,数据源融合过程包括以下步骤:步骤S21:对两个数据源采用哈希函数进行哈希处理;步骤S22:将处理后的两个数据源分别对应的哈希值发送至对方,双方进行哈希值交换;步骤S23:双方在本地比较对方的哈希值与自己的哈希值,找到共同的哈希值,通过共同的哈希值找寻存储至各自数据库中的共同的交集数据集。Based on the above technical solution, further, in step S2, the data source fusion process includes the following steps: Step S21: Use a hash function to perform hash processing on the two data sources; Step S22: Hash the two processed data sources respectively. The corresponding hash value is sent to the other party, and the two parties exchange hash values; Step S23: Both parties locally compare the other party's hash value with their own hash value, find a common hash value, and search for storage through the common hash value to a common intersection data set in their respective databases.

基于上述技术方案,更进一步地,步骤S3中,特征工程处理包括对特征构造的处理,其中,特征构造是从原始数据源中选出有用的特征,并将有用的特征组合成新的子集。例如:原始特征数据集:年龄、收入、婚姻状况、消费行为、子女数量等;从这些原始特征中构造出一些新的特征。Based on the above technical solution, further, in step S3, the feature engineering process includes the process of feature construction, where feature construction is to select useful features from the original data source and combine the useful features into a new subset. . For example: original feature data set: age, income, marital status, consumption behavior, number of children, etc.; some new features are constructed from these original features.

基于上述技术方案,更进一步地,步骤S3中,特征工程处理包括对特征衍生的处理,处理过程为:根据业务知识或特征之间的关系,构造新的特征。例如:根据消费行为、消费位置、消费时间等数据,计算商圈经济,形成商圈消费特征、商圈夜经济特征,对全市商圈进行分析和排名。Based on the above technical solution, further, in step S3, feature engineering processing includes feature derivation processing, and the processing process is: constructing new features based on business knowledge or relationships between features. For example: based on consumption behavior, consumption location, consumption time and other data, calculate the business district economy, form the business district consumption characteristics, business district night economy characteristics, and analyze and rank the city's business districts.

基于上述技术方案,更进一步地,步骤S3中,特征工程处理包括对特征选择的处理,处理过程为:先采用机器学习方法递归特征消除对目标变量有预测能力的特征;然后通过构建模型并逐步剔除对模型预测能力贡献小的特征来选择特征,其中,目标变量指的是正在构建的模型中的输出或标签;最后根据重要性得分对特征进行排序,并剔除重要性得分最低的特征,直到达到指定的特征数量或模型性能不再提升为止。Based on the above technical solution, further, in step S3, feature engineering processing includes the processing of feature selection. The processing process is: first use machine learning method recursive features to eliminate features that have the ability to predict the target variable; then build a model and gradually Features are selected by eliminating features that contribute little to the model's predictive ability, where the target variable refers to the output or label in the model being built; finally, the features are sorted according to their importance scores, and the features with the lowest importance scores are eliminated until Until the specified number of features is reached or the model performance no longer improves.

基于上述技术方案,更进一步地,步骤S3中,特征工程处理包括对特征组合的处理;处理过程为:将不同特征进行组合,形成新的特征。Based on the above technical solution, further, in step S3, feature engineering processing includes feature combination processing; the processing process is: combining different features to form new features.

基于上述技术方案,更进一步地,步骤S4中,采用联邦学习方法进行评估过程包括以下步骤:步骤S41:选择联邦学习模型;步骤S42:构建消费能力模型并进行评估;步骤S43:根据选择的联邦学习模型对消费画像模型进行训练操作;步骤S44:对训练后的消费画像模型进行预测和评估处理。具体的,消费画像是通过银联和运营商数据进行联合建模后,取出参与方数据中用户相同而特征不完全相同的部分进行联邦学习建模训练而得出的。Based on the above technical solution, further, in step S4, the evaluation process using the federated learning method includes the following steps: Step S41: Select a federated learning model; Step S42: Construct a consumption power model and evaluate it; Step S43: Based on the selected federation The learning model performs training operations on the consumption portrait model; step S44: perform prediction and evaluation processing on the trained consumption portrait model. Specifically, the consumption portrait is obtained through joint modeling through UnionPay and operator data, and then taking out the parts of the participant data that have the same users but different characteristics for federated learning modeling training.

基于上述技术方案,更进一步地,步骤S42中,根据本地银联消费数据统计用户持卡人设定时间段内的消费金额,根据各持卡人的交易金额所处百分位计算,划分消费能力档位,设为消费能力显著低档位、消费能力低档位、消费能力相当档位、消费能力高档位、消费能力显著高档位、消费能力高额消费档位。Based on the above technical solution, further, in step S42, the consumption amount of the user cardholder within the set time period is calculated based on the local UnionPay consumption data, and the consumption capacity is divided according to the percentile calculation of the transaction amount of each cardholder. The gears are set as the gear with significantly low consumption power, the gear with low consumption power, the gear with considerable consumption power, the gear with high consumption power, the gear with significantly high consumption power, and the gear with high consumption power.

基于上述技术方案,更进一步地,构建的消费能力模型和训练后的消费画像模型综合应用于场景至少包括:城市商圈经济分析、房地产政策调控、居民消费卷发放、城市夜经济分析以及交通规划与城市发展。Based on the above technical solutions, furthermore, the comprehensive application scenarios of the constructed consumption power model and the trained consumption portrait model include at least: urban business district economic analysis, real estate policy regulation, resident consumption voucher issuance, urban night economy analysis and transportation planning. and urban development.

相对于现有技术,本发明具有以下有益效果:Compared with the existing technology, the present invention has the following beneficial effects:

(1)本发明提供的评估方法通过整合银联消费数据和运营商用户数据,运用隐私技术和联邦学习方法,创建居民的消费能力模型和消费画像模型。其主要目标包括提供准确的消费能力评估,为城市管理和商业发展提供决策支持。可以更好地理解居民的消费行为和需求,为商业决策、政策制定、城市规划等领域提供有力支持,从而推动城市的可持续发展和繁荣。(1) The evaluation method provided by the present invention integrates UnionPay consumption data and operator user data, and uses privacy technology and federated learning methods to create residents' consumption power models and consumption portrait models. Its main goals include providing accurate consumption capacity assessments and decision-making support for urban management and business development. It can better understand the consumption behavior and needs of residents, provide strong support for business decision-making, policy formulation, urban planning and other fields, thereby promoting the sustainable development and prosperity of the city.

(2)数据融合与全面性:本发明实现了银联消费数据与运营商用户数据的融合,使城市管理者能够获取更全面、综合的信息,有助于更准确地评估和分析城市居民的消费能力和行为。(2) Data fusion and comprehensiveness: This invention realizes the fusion of UnionPay consumption data and operator user data, enabling city managers to obtain more comprehensive and comprehensive information, and helping to more accurately assess and analyze the consumption of urban residents. abilities and behaviors.

(3)隐私保护:通过隐匿集合求交PSI方法,本发明保护了用户的隐私,确保了用户敏感信息不会被泄露,同时实现了数据的安全交叉匹配。(3) Privacy protection: By concealing the set intersection PSI method, the present invention protects the user's privacy, ensures that the user's sensitive information will not be leaked, and at the same time achieves safe cross-matching of data.

(4)联合模型训练:采用联邦学习技术,允许多方在本地数据上进行模型训练,避免了数据共享和传输的隐私风险,同时通过共享模型参数,构建了更强大的全局模型,提高了模型的多维性和精确性。(4) Joint model training: Federated learning technology is used to allow multiple parties to conduct model training on local data, avoiding the privacy risks of data sharing and transmission. At the same time, by sharing model parameters, a more powerful global model is built, improving the model's accuracy. Multidimensionality and precision.

(5)决策支持与城市管理:通过构建消费能力和消费画像等数据模型,本发明为城市管理者提供了强有力的工具,可用于城市规划和管理决策,帮助城市更好地满足居民需求、提高居民生活质量,并推动城市经济发展。(5) Decision support and urban management: By constructing data models such as consumption capacity and consumption portraits, the present invention provides a powerful tool for city managers, which can be used for urban planning and management decision-making, helping cities better meet the needs of residents, Improve residents' quality of life and promote urban economic development.

(6)定制化分析:本发明可根据城市管理者的具体需求,灵活地定制不同模型,比如城市夜经济分析、城市商圈排名、居民消费卷发放、房地产政策调控等,支持多种层面和领域的深度分析,为不同方面的决策提供更有针对性的数据和见解。(6) Customized analysis: This invention can flexibly customize different models according to the specific needs of city managers, such as urban night economy analysis, city business district rankings, resident consumption voucher issuance, real estate policy regulation, etc., supporting multiple levels and In-depth analysis of the field provides more targeted data and insights for decision-making in different aspects.

附图说明Description of the drawings

图1为本发明评估方法的流程图。Figure 1 is a flow chart of the evaluation method of the present invention.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明做进一步阐述和说明。本发明中各个实施方式的技术特征在没有相互冲突的前提下,均可进行相应组合。The present invention will be further elaborated and described below in conjunction with the accompanying drawings and specific embodiments. The technical features of various embodiments of the present invention can be combined accordingly as long as they do not conflict with each other.

为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图对本发明的具体实施方式做详细的说明。在下面的描述中阐述了很多具体细节以便于充分理解本发明。但是本发明能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本发明内涵的情况下做类似改进,因此本发明不受下面公开的具体实施例的限制。本发明各个实施例中的技术特征在没有相互冲突的前提下,均可进行相应组合。In order to make the above objects, features and advantages of the present invention more obvious and easy to understand, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, the present invention can be implemented in many other ways different from those described here. Those skilled in the art can make similar improvements without departing from the connotation of the present invention. Therefore, the present invention is not limited to the specific embodiments disclosed below. The technical features in various embodiments of the present invention can be combined accordingly as long as they do not conflict with each other.

实施例1Example 1

本方法提供了一种基于城市体征指标的消费能力评估方法,其包括数据融合、隐私保护、联合模型训练和算法选择。具体的,通过银联消费数据与运营商用户数据的融合,实现了不同数据源的合并,提升模型数据维度。并且注重用户隐私保护,采用隐匿集合求交PSI的方法对用户敏感信息进行加密匹配,确保数据隐私不被泄露。还通过联邦学习技术,多方之间可以在本地数据上进行模型训练,避免了敏感数据传输,同时共享模型参数以构建更强大的全局模型。选用了逻辑回归、线性回归等算法,构建了消费能力和消费画像等数据模型,为城市管理者提供了决策支持和深度分析的工具。本方法将技术融合应用共同构成了一个能够挖掘居民消费能力和消费画像的创新方法,以更准确、全面地评估城市的状况、发现问题并制定对策。通过该方法,可以在城市规划和管理中发挥关键作用,为城市发展提供全面的数据支持,同时保护了居民隐私,为决策制定和资源分配提供了更科学的依据。This method provides a consumption power assessment method based on urban physical indicators, which includes data fusion, privacy protection, joint model training and algorithm selection. Specifically, through the integration of UnionPay consumption data and operator user data, different data sources are merged and the model data dimension is improved. It also pays attention to the protection of user privacy and uses the method of hidden set and intersection PSI to encrypt and match user sensitive information to ensure that data privacy is not leaked. Through federated learning technology, multiple parties can train models on local data, avoiding the transmission of sensitive data, while sharing model parameters to build a more powerful global model. Algorithms such as logistic regression and linear regression were selected to construct data models such as consumption capacity and consumption portraits, providing city managers with decision support and in-depth analysis tools. This method combines the integration and application of technology to form an innovative method that can mine residents' consumption capabilities and consumption profiles to more accurately and comprehensively assess the city's situation, identify problems and formulate countermeasures. Through this method, it can play a key role in urban planning and management, provide comprehensive data support for urban development, while protecting residents' privacy, and providing a more scientific basis for decision-making and resource allocation.

具体如图1所示,本方法包括以下步骤:步骤S1:数据源准备:数据源至少包括银联消费数据和运营商用户数据;具体而言,通过银联消费数据包接口,获取居民基础信息和消费信息,其中,银联消费数据包括用户的基础信息、消费信息,并将用户的消费信息存储至银联消费数据库中。而用户的基本信息比如姓名、身份证号码、银行卡号等,用户的消费信息比如消费金额、消费日期、消费行为(消费类型)、消费位置等。As shown in Figure 1, this method includes the following steps: Step S1: Data source preparation: the data source at least includes UnionPay consumption data and operator user data; specifically, obtain basic resident information and consumption through the UnionPay consumption data package interface Information, in which UnionPay consumption data includes users' basic information and consumption information, and the user's consumption information is stored in the UnionPay consumption database. The user's basic information such as name, ID number, bank card number, etc., and the user's consumption information such as consumption amount, consumption date, consumption behavior (consumption type), consumption location, etc.

其中,银联消费数据的具体数据结构如下表1:Among them, the specific data structure of UnionPay consumption data is as follows: Table 1:

表1Table 1

而关于运营商用户数据的准备过程,由于数据安全要求,运营商用户数据涉及到用户隐私数据不能出库,将在运营商数据库中通过隐私计算进行联合建模。其中,运营商用户数据包括手机号、性别、年龄、职业、学历层次、婚姻状况、子女数量等用户画像数据。而运营商提供的具体数据结构如下表2:As for the preparation process of operator user data, due to data security requirements, operator user data involving user privacy data cannot be exported from the database, and will be jointly modeled through privacy calculations in the operator database. Among them, operator user data includes user portrait data such as mobile phone number, gender, age, occupation, education level, marital status, number of children, etc. The specific data structure provided by the operator is as follows in Table 2:

表2Table 2

步骤S2:数据源融合:寻找数据源之间的交集;具体而言,采用隐匿集合求交PSI方法,将两个数据源进行融合,需要在不泄露任何信息的情况下,找到两个数据源之间的交集。根据两个数据源之间存在的共同关键属性进行加密匹配,确定两个数据源存在的共同信息,并将共同信息存储至建立的交集数据集中。也即是,对于银联消费数据和运营商用户数据,可以根据共同的关键属性(例如手机号)进行加密匹配,以确定两个数据源中存在的共同用户。建立一个交集数据集,其中包含了双方持有数据的交叉部分。这个过程需要使用加密技术来保护数据隐私,确保没有明文敏感信息在数据融合过程中泄露,其中,加密技术是电子商务采取的主要安全保密措施,是最常用的安全保密手段,利用技术手段把重要的数据变为乱码(加密)传送,到达目的地后再用相同或不同的手段还原(解密)。Step S2: Data source fusion: Find the intersection between data sources; specifically, use the hidden set intersection PSI method to fuse the two data sources. It is necessary to find the two data sources without leaking any information. intersection between. Encryption matching is performed based on the common key attributes existing between the two data sources, the common information existing in the two data sources is determined, and the common information is stored in the established intersection data set. That is to say, for UnionPay consumption data and operator user data, encrypted matching can be performed based on common key attributes (such as mobile phone numbers) to determine common users existing in the two data sources. Create an intersection data set, which contains the intersection of data held by both parties. This process requires the use of encryption technology to protect data privacy and ensure that no plaintext sensitive information is leaked during the data fusion process. Among them, encryption technology is the main security and confidentiality measure taken by e-commerce and is the most commonly used security and confidentiality method. Technical means are used to protect important information. The data becomes garbled (encrypted) and transmitted, and then restored (decrypted) using the same or different means after reaching the destination.

进一步的,数据源融合过程包括以下步骤:Further, the data source fusion process includes the following steps:

步骤S21:对两个数据源采用哈希函数进行哈希处理;具体的,银联消费数据(下称参与方A)和运营商用户数据(下称参与方B)两个数据源,分别是数据源X和数据源Y。希望找到这两个数据源的交集,同时保护数据隐私。每个数据源中手机号作为共同的关键属性。选择MD5哈希函数对于每个参与方使用MD5哈希函数将其手机号进行哈希处理,如下表3示例:Step S21: Use the hash function to perform hash processing on the two data sources; specifically, the two data sources of UnionPay consumption data (hereinafter referred to as participant A) and operator user data (hereinafter referred to as participant B) are data respectively. Source X and data source Y. The hope is to find the intersection of these two data sources while protecting data privacy. Mobile phone number is a common key attribute in each data source. Select the MD5 hash function. Use the MD5 hash function for each participant to hash their mobile phone number, as shown in Table 3 below:

表3table 3

步骤S22:将处理后的两个数据源分别对应的哈希值发送至对方,双方进行哈希值交换;具体的,参与方A和参与方B分别对自己的数据源进行哈希处理后,对于每个数据元素,都会生成一个哈希值。参与方A计算数据源X的哈希值,存放这些哈希值发送给参与方B。参与方B计算数据源Y的哈希值,存放这些哈希值发送给参与方A。Step S22: Send the hash values corresponding to the two processed data sources to the other party, and the two parties exchange hash values; specifically, after participant A and participant B perform hash processing on their own data sources respectively, For each data element, a hash value is generated. Party A calculates the hash value of data source X, stores these hash values and sends them to Party B. Participant B calculates the hash value of data source Y, stores these hash values and sends them to participant A.

步骤S23:双方在本地比较对方的哈希值与自己的哈希值,找到共同的哈希值,通过共同的哈希值找寻存储至各自数据库中的共同的交集数据集。具体的,以手机号安全求交为例,双方数据以用户11位手机号码进行md5加密32位小写值作为匹配字段,进行手机号安全求交。其中,在隐匿集合求交(PSI)完成后,双方的数据仍然保持在各自的数据库中。隐匿集合求交是一种隐私保护技术,其目的是找到两个数据集之间的交集,同时不泄露数据的详细信息,这意味着数据在双方之间保持私密性。Step S23: Both parties locally compare the other party's hash value with their own hash value, find a common hash value, and use the common hash value to find a common intersection data set stored in their respective databases. Specifically, taking the secure exchange of mobile phone numbers as an example, the data of both parties uses the user's 11-digit mobile phone number to be md5-encrypted with a 32-digit lowercase value as the matching field for secure exchange of mobile phone numbers. Among them, after the hidden set intersection (PSI) is completed, the data of both parties are still maintained in their respective databases. Hidden set intersection is a privacy-preserving technique that aims to find the intersection between two data sets without revealing the details of the data, meaning the data remains private between both parties.

步骤S3:对融合后的数据源进行特征工程处理,提取与建模目标相关的特征;具体而言,特征工程处理包括特征构造处理,其中,特征构造是从原始数据源中选出有用的特征,并将有用的特征组合成新的子集。本方法可以从银联消费数据和运营商用户数据获取特征、对特征进行变换和组合等。例如:原始特征数据源:年龄、收入、婚姻状况、消费行为、子女数量等;从这些原始特征中构造出一些新的特征。进一步示例,首先创建一个包含原始特征的数据源;然后,演示多个不同的特征构造,例如:1、创建新的特征“家庭收入”,通过将“收入”特征与“数量”特征相乘来表示家庭的总收入。2、创建新特征“高收入职业”,通过对“职业”特征进行处理,判断是否属于高收入职业。3、创建新特征“中年已婚”,通过应用条件来判断是否为中年已婚人群。Step S3: Perform feature engineering processing on the fused data source to extract features related to the modeling target; specifically, feature engineering processing includes feature construction processing, where feature construction is to select useful features from the original data source. , and combine useful features into new subsets. This method can obtain features from UnionPay consumption data and operator user data, transform and combine features, etc. For example: original feature data sources: age, income, marital status, consumption behavior, number of children, etc.; some new features are constructed from these original features. As a further example, first create a data source containing original features; then, demonstrate multiple different feature constructions, for example: 1. Create a new feature "household income" by multiplying the "income" feature and the "quantity" feature. Represents the total household income. 2. Create a new feature "high-income occupation" and process the "occupation" feature to determine whether it belongs to a high-income occupation. 3. Create a new feature "middle-aged married" and use conditions to determine whether a person is a middle-aged married person.

特征工程处理包括特征衍生处理,处理过程为:根据业务知识或特征之间的关系,构造新的特征。例如:根据消费行为、消费位置、消费时间等数据,计算商圈经济,形成商圈消费特征、商圈夜经济特征,对全市商圈进行分析和排名。过程示例:首先定义三个数据表格:behavior_data表示消费行为数据、location_data表示消费位置数据、time_data表示消费时间数据。然后,使用merge函数将三个表格按照user_id字段进行合并得到combined_data。接着,利用groupby函数按照位置对消费进行分类,并计算每个位置的总消费额,得到economic_feature。最后,使用rank函数对经济特征进行排名,得到economic_ranking。Feature engineering processing includes feature derivation processing. The processing process is: constructing new features based on business knowledge or the relationship between features. For example: based on consumption behavior, consumption location, consumption time and other data, calculate the business district economy, form the business district consumption characteristics, business district night economy characteristics, and analyze and rank the city's business districts. Process example: First define three data tables: behavior_data represents consumption behavior data, location_data represents consumption location data, and time_data represents consumption time data. Then, use the merge function to merge the three tables according to the user_id field to obtain combined_data. Then, use the groupby function to classify consumption according to location, and calculate the total consumption at each location to obtain economic_feature. Finally, the rank function is used to rank the economic features to obtain economic_ranking.

特征工程处理包括特征选择处理,处理过程为:先采用机器学习方法递归特征消除对目标变量有预测能力的特征,优选具有较强预测能力的特征,其中,"较强"的判断条件通常是指特征对目标变量的预测能力显著,即特征与目标变量之间存在明显的相关性,这可以通过一些统计指标(如相关系数、互信息、方差分析等)来衡量,特征与目标变量之间的相关性越高,特征就越有可能被认为是"较强"的;而目标变量指的是正在构建的模型中的输出或标签;每次迭代时,模型会使用剩余的特征进行训练,并计算每个特征的重要性得分;然后通过构建模型并逐步剔除对模型预测能力贡献小的特征来选择特征,具体是构建跟业务相关的目标模型,通过机器学习方法递归特征消除(RFE)来进行模型构建;其中,优选具有贡献较小的特征,"贡献较小"是指特征对模型的预测性能没有显著提升,即在模型训练中,即使移除这些特征,模型的性能变化不大,这可以通过交叉验证或其他性能评估方法来确定,特征的贡献较小意味着它们对模型的性能改进不明显;最后根据重要性得分对特征进行排序,并剔除重要性得分最低的特征,直到达到指定的特征数量或模型性能不再提升为止。递归特征消除示例:通过交叉验证自动调整所选特征的数量,最好画出选用各个特征数量下分类集的交叉认证分数,可以看到RFECV能够自动选择适合分类的有效特征数。处理思路也即是,采用机器学习方法递归特征消除(RFE)从所有特征开始,首先使用全部特征进行模型训练,然后根据模型的性能(通常是根据特征的重要性或权重)对每个特征进行排序。在每轮迭代中,RFE会删除得分最低的一个或多个特征,然后重新训练模型。这个过程一直重复,直到达到停止条件(如指定的特征数量)为止。通过准确度、F1分数或R方值,来计算特征的得分。在每轮迭代中,RFE训练模型并评估其性能,然后根据模型输出的特征重要性或权重对每个特征进行排序。得分较低的特征被认为对模型性能的贡献较小,被逐步剔除,以提高模型的性能。这个过程重复进行,直到达到停止条件(如指定的特征数量或目标性能水平)。Feature engineering processing includes feature selection processing. The processing process is: first use machine learning method recursive features to eliminate features with predictive ability for the target variable, and select features with strong predictive ability. Among them, "stronger" judgment conditions usually refer to The feature has significant predictive ability for the target variable, that is, there is an obvious correlation between the feature and the target variable, which can be measured through some statistical indicators (such as correlation coefficient, mutual information, variance analysis, etc.). The correlation between the feature and the target variable The higher the correlation, the more likely the feature is considered "stronger"; while the target variable refers to the output or label in the model being built; at each iteration, the model is trained using the remaining features and Calculate the importance score of each feature; then select features by building a model and gradually eliminating features that contribute little to the model's predictive ability. Specifically, build a business-related target model and perform it through the machine learning method Recursive Feature Elimination (RFE). Model construction; among them, features with small contribution are preferred. "Small contribution" means that the features do not significantly improve the prediction performance of the model. That is, in model training, even if these features are removed, the performance of the model will not change much. This is It can be determined through cross-validation or other performance evaluation methods. The small contribution of features means that their performance improvement to the model is not obvious; finally, the features are sorted according to the importance score, and the features with the lowest importance score are eliminated until the specified until the number of features or model performance no longer improves. Example of recursive feature elimination: Automatically adjust the number of selected features through cross-validation. It is best to draw the cross-certification score of the classification set for each number of features. You can see that RFECV can automatically select the number of effective features suitable for classification. The processing idea is to use the machine learning method Recursive Feature Elimination (RFE) to start from all features, first use all features for model training, and then perform each feature based on the performance of the model (usually based on the importance or weight of the feature) Sort. In each iteration, RFE removes the lowest-scoring feature or features and retrains the model. This process is repeated until a stopping condition (such as a specified number of features) is reached. Calculate the score of the feature through accuracy, F1 score or R-squared value. In each iteration, RFE trains the model and evaluates its performance, then ranks each feature based on the feature importance or weight of the model output. Features with lower scores are considered to contribute less to model performance and are gradually eliminated to improve model performance. This process is repeated until a stopping condition (such as a specified number of features or a target performance level) is reached.

特征工程处理包括特征组合处理;处理过程为:将不同特征进行组合,形成新的特征。例如,可以将婚姻状况和子女数量组合成一个家庭状况特征;将习惯消费健身用品健身房的人群组合成一个运动达人特征;将职业为医生、律师、工程师和学历层次组合成一个高学历特征职业等。示例:性别和消费行为的组合:可以使用二元逻辑回归来预测消费行为,例如,将性别分为男性和女性,消费行为分为高和低,然后使用二元逻辑回归来预测消费行为。除了采用二元逻辑回归方法,还可以采用多项式特征或特征交叉的方式进行特征组合形成新的特征,进而预测消费行为。其中,多项式特征是通过将原始特征进行多项式展开,可以生成高阶特征组合,例如,将两个特征a和b组合为ab、b平方、a平方等项,这可以使用Scikit-Learn等机器学习库的PolynomialFeatures来实现。特征交叉是指将不同特征的值相互组合,例如,将特征a和特征b的值相乘或相除,这种交叉可以捕捉特征之间的相互关系。Feature engineering processing includes feature combination processing; the processing process is: combining different features to form new features. For example, marital status and number of children can be combined into a family status feature; people who are accustomed to consuming fitness products and gyms can be combined into a sports expert feature; occupations such as doctors, lawyers, engineers, and education levels can be combined into a high education feature Career etc. Example: Combination of gender and consumption behavior: Binary logistic regression can be used to predict consumption behavior, for example, gender is divided into male and female, consumption behavior is divided into high and low, and then binary logistic regression is used to predict consumption behavior. In addition to using the binary logistic regression method, polynomial features or feature intersection methods can also be used to combine features to form new features to predict consumer behavior. Among them, polynomial features can generate high-order feature combinations by polynomial expansion of the original features. For example, two features a and b are combined into ab, b squared, a squared, etc. This can be done using machine learning such as Scikit-Learn. The PolynomialFeatures of the library are implemented. Feature crossover refers to combining the values of different features with each other, for example, multiplying or dividing the values of feature a and feature b. This crossover can capture the mutual relationship between features.

步骤S4:采用联邦学习方法进行消费能力评估。而采用联邦学习方法进行评估过程包括以下步骤:Step S4: Use federated learning method to evaluate consumption capacity. The evaluation process using federated learning methods includes the following steps:

步骤S41:选择联邦学习模型;具体而言,联邦学习分为横向联邦学习、纵向联邦学习、联邦迁移学习三类。以某市的两个机构为例,A机构是银联有用户的消费记录,B机构是运营商有用户数据,两个机构有很多重叠用户,但是记录的数据特征是不同的,两个机构想通过加密聚合用户的不同特征来联合训练一个更强大的数据模型,所以选择纵向联邦学习最为合适。而纵向联邦学习需要先做样本对齐,即找出参与者拥有的共同的样本,也就叫“数据库撞库”。纵向联邦使训练样本的特征维度增多。如本例中,消费能力通过消费的绝对值来进行划分,没有用户的职业、婚姻、子女、教育程度等画像信息,若想要获得比如消费低的人群有哪些画像特征?学历层次是怎样的?是单身多还是已婚多?又比如高额消费人群的职业有哪些特点或集中在哪些领域等画像信息,结合运营商的用户画像数据,两边的加密主键(手机号)匹配之后,对于类似的相应的指标分类会解释的更具体一些,从而进行更多维度更深层次的分析和挖掘。Step S41: Select a federated learning model; specifically, federated learning is divided into three categories: horizontal federated learning, vertical federated learning, and federated transfer learning. Take two institutions in a certain city as an example. Institution A has user consumption records from UnionPay, and institution B has user data from operators. The two institutions have many overlapping users, but the recorded data characteristics are different. The two institutions want to By encrypting and aggregating different characteristics of users to jointly train a more powerful data model, vertical federated learning is the most appropriate choice. Vertical federated learning requires sample alignment first, that is, finding common samples owned by participants, which is also called "database stuffing". Vertical federation increases the feature dimensions of training samples. In this example, consumption power is divided by the absolute value of consumption. There is no profile information such as the user's occupation, marriage, children, education level, etc. If we want to obtain, for example, what are the profile characteristics of people with low consumption? What is the academic level? Are there more singles or marrieds? Another example is portrait information such as what are the characteristics of the occupations of high-income consumers or what fields they are concentrated in. Combined with the operator's user portrait data, after matching the encrypted primary keys (mobile phone numbers) on both sides, similar corresponding indicator classifications will be explained more clearly. Be more specific so that you can conduct more dimensional and deeper analysis and excavation.

步骤S42:构建消费能力模型;具体而言,根据本地银联消费数据统计本市居民持卡人近三个月消费金额,根据各持卡人的交易金额所处百分位计算,划分消费能力档位,设为消费能力显著低档位、消费能力低档位、消费能力相当档位、消费能力高档位、消费能力显著高档位、消费能力高额消费档位。消费金额阈值是基于月度本市持卡人交易情况、用逻辑回归模型跑出分布之后,按照分布的边界情况进行统计。如下表4示例:Step S42: Construct a consumption power model; specifically, calculate the consumption amount of resident cardholders in the city in the past three months based on local UnionPay consumption data, calculate the percentile of each cardholder's transaction amount, and divide consumption capacity brackets The levels are set as the level with significantly low consumption power, the level with low consumption power, the level with considerable consumption power, the level with high consumption power, the level with significantly high consumption power, and the level with high consumption power. The consumption amount threshold is based on the monthly transactions of cardholders in this city. After running out the distribution using the logistic regression model, statistics are made according to the boundary conditions of the distribution. As an example in Table 4 below:

表4Table 4

步骤S43:根据选择的联邦学习模型对消费画像模型进行训练操作;具体而言,选择纵向联邦学习模型进行训练,比如,A公司有某市居民消费数据,B公司有相同城市的居民画像数据。现根据双方的用户手机号进行匹配,取出参与方数据中用户相同而特征不完全相同的部分进行联合训练。由于用户隐私和数据安全的原因,A方和B方不能直接交换数据,为了保证训练过程中的数据保密性,此时需要加入了一个第三方的协调者C。其具体的训练过程如下:和/>分别表示A公司、B公司的本地数据,/>分别表示A、B公司训练的本地模型。Step S43: Perform training operations on the consumption portrait model based on the selected federated learning model; specifically, select a vertical federated learning model for training. For example, company A has consumption data of residents in a certain city, and company B has resident portrait data of the same city. Now, matching is performed based on the user mobile phone numbers of both parties, and the parts of the participants' data that have the same users but different characteristics are taken out for joint training. Due to user privacy and data security reasons, parties A and B cannot directly exchange data. In order to ensure data confidentiality during the training process, a third-party coordinator C needs to be added at this time. The specific training process is as follows: and/> Represents the local data of Company A and Company B respectively,/> Represent the local models trained by companies A and B respectively.

则目标函数为: Then the objective function is:

可以进一步简化,令 can be further simplified, let

将b合并到中,损失函数变成:Merge b into , the loss function becomes:

make

则加密后的损失函数为:Then the loss function after encryption is:

make ;

but

make

则损失函数对的梯度分别为:/> Then the loss function is The gradients of are respectively:/>

因为涉及到解密,所以私钥存放在调度方C,A和B都没有私钥进行解密,所以在交换中间结果是不会泄露隐私,发给C的加密梯度可以用一个随机数进行掩码,这个随机数只有A和B自己知道,所以梯度也不会直接暴露给C。Because it involves decryption, the private key is stored in scheduler C. Neither A nor B has a private key for decryption, so the privacy will not be leaked during the exchange. The encryption gradient sent to C can be masked with a random number. This random number is known only to A and B themselves, so the gradient is not directly exposed to C.

步骤S44:对训练后的消费画像模型进行预测和评估处理,其中,消费画像是通过银联和运营商数据进行联合建模后,取出参与方数据中用户相同而特征不完全相同的部分进行联邦学习建模训练而得出的。具体而言,结合案例,该消费画像模型的预测过程为:步骤一:A方:无;B方:无;C方:将i(i代指用户手机号)发送给A方和B方。步骤二:A方:计算并将其发送给C方;B方:计算/>并将其发送给C方;C方:计算/>的结果;C方:计算的结果。Step S44: Predict and evaluate the trained consumption profile model. The consumption profile is jointly modeled through UnionPay and operator data, and the parts of the participant data that have the same users but different features are taken out for federated learning. Derived from modeling training. Specifically, based on the case, the prediction process of the consumption portrait model is: Step 1: Party A: None; Party B: None; Party C: Send i (i refers to the user’s mobile phone number) to Party A and Party B. Step 2: Side A: Calculation And send it to Party C; Party B: Calculate/> And send it to Party C; Party C: Calculate/> The result; C square: the calculated result.

该消费画像模型的效果评估如下表5示例:The effect evaluation of this consumer portrait model is as shown in Table 5:

表5table 5

其中,KS统计量用于衡量分类模型在不同阈值下真正例率和假正例率之间的最大差距;KS值越大,模型性能越好;通常,KS值的取值范围在0到1之间,KS值越接近1表示模型性能越好。F1Score是综合考虑了分类模型的精确度(Precision)和召回率(Recall)的指标;它是这两者的调和平均值,用于衡量模型在正类别和负类别的分类准确性;F1Score的取值范围在0到1之间,越接近1表示模型性能越好。AUC指代的衡量学习器优劣的一种性能指标,其是ROC曲线下的面积,用于衡量二分类模型的性能;ROC曲线是以不同的分类阈值为横坐标,真正例率(True Positive Rate)和假正例率(False Positive Rate)为纵坐标绘制的曲线;AUC的取值范围在0到1之间,越接近1表示模型性能越好。AUC(Area Under theCurve)、KS(Kolmogorov-Smirnov)和F1Score是评估模型性能的常见指标,取值范围在0到1之间,都是值越接近1表示模型性能越好。Among them, the KS statistic is used to measure the maximum gap between the true positive rate and the false positive rate of the classification model under different thresholds; the larger the KS value, the better the model performance; usually, the KS value ranges from 0 to 1 Between them, the closer the KS value is to 1, the better the model performance is. F1Score is an indicator that comprehensively considers the precision (Precision) and recall rate (Recall) of the classification model; it is the harmonic average of the two, used to measure the classification accuracy of the model in the positive category and the negative category; F1Score is The value ranges from 0 to 1, with the closer to 1 indicating better model performance. AUC refers to a performance indicator that measures the quality of a learner. It is the area under the ROC curve, which is used to measure the performance of a two-classification model; the ROC curve uses different classification thresholds as the abscissa, and the true positive rate (True Positive Rate) and False Positive Rate (False Positive Rate) are curves drawn on the ordinate; the value range of AUC is between 0 and 1, and the closer to 1, the better the performance of the model. AUC (Area Under the Curve), KS (Kolmogorov-Smirnov) and F1Score are common indicators for evaluating model performance. The value range is between 0 and 1. The closer the value is to 1, the better the model performance.

通过表5中记载的模型可知,在auc上能达到0.8,ks上能达到0.57,模型效果稳定。From the model recorded in Table 5, it can be seen that the auc can reach 0.8 and the ks can reach 0.57, and the model effect is stable.

进一步地,构建的消费能力模型和训练后的消费画像模型综合应用于场景至少包括:城市商圈经济分析、房地产政策调控、居民消费卷发放、城市夜经济分析以及交通规划与城市发展。具体的,从消费能力和消费画像两个维度综合分析应用场景的具体情况,比如,Furthermore, the comprehensive application scenarios of the constructed consumption power model and the trained consumption portrait model include at least: urban business district economic analysis, real estate policy regulation, resident consumption coupon issuance, urban night economy analysis, and transportation planning and urban development. Specifically, the specific situation of the application scenario is comprehensively analyzed from the two dimensions of consumption capacity and consumption portrait, for example,

1)城市商圈经济分析:基于这些模型,城市管理者可以深入了解不同商圈的居民消费能力和行为习惯。这有助于商圈规划和运营,确定适当的商业发展策略、定价策略以及商店定位,以最大程度地满足居民需求,提高商圈的经济活力。1) Economic analysis of urban business districts: Based on these models, city managers can have an in-depth understanding of the consumption power and behavioral habits of residents in different business districts. This helps in the planning and operation of commercial districts and determines appropriate commercial development strategies, pricing strategies and store positioning to best meet the needs of residents and improve the economic vitality of the commercial district.

2)房地产政策调控:消费能力和持卡人价值模型可以用于评估不同区域的购房需求和购买力。政府可以根据这些模型来调整房地产政策,例如,限购政策和贷款政策,以维护房地产市场的稳定和可持续发展。2) Real estate policy regulation: Consumption power and cardholder value models can be used to evaluate the demand and purchasing power of home purchases in different regions. The government can adjust real estate policies, such as purchase restriction policies and loan policies, based on these models to maintain the stability and sustainable development of the real estate market.

3)居民消费卷发放:通过了解居民的消费能力和自由度,城市管理者可以更精确地制定消费刺激政策,例如消费券的发放。这样可以鼓励消费、促进经济增长,同时最大程度地减少浪费。3) Issuance of consumption coupons to residents: By understanding residents’ spending power and freedom, city managers can more accurately formulate consumption stimulus policies, such as the issuance of consumption coupons. This encourages consumption and promotes economic growth while minimizing waste.

4)城市夜经济分析:通过消费画像和消费自由度模型,城市管理者可以了解居民的夜间消费偏好和行为。这对于发展城市的夜经济和夜间服务行业非常重要,有助于改善城市的安全性和吸引力。4) Analysis of urban night economy: Through consumption portraits and consumption freedom models, city managers can understand residents’ nighttime consumption preferences and behaviors. This is very important for the development of the city's night economy and night service industry, helping to improve the safety and attractiveness of the city.

5)交通规划与城市发展:这些模型也可以用于城市的交通规划和基础设施建设。了解不同居民的出行习惯和消费能力有助于优化交通网络、公共交通系统,并确保城市的可持续发展。5) Transportation planning and urban development: These models can also be used for urban transportation planning and infrastructure construction. Understanding the travel habits and spending power of different residents can help optimize transportation networks, public transportation systems, and ensure the sustainable development of cities.

最后应当说明的是,以上内容仅用以说明本发明的技术方案,而非对本发明保护范围的限制,本领域的普通技术人员对本发明的技术方案进行的简单修改或者等同替换,均不脱离本发明技术方案的实质和范围。Finally, it should be noted that the above content is only used to illustrate the technical solution of the present invention, and does not limit the protection scope of the present invention. Simple modifications or equivalent substitutions of the technical solution of the present invention by those of ordinary skill in the art do not deviate from the scope of the present invention. The essence and scope of the technical solution of the invention.

Claims (10)

1. The consumption capability assessment method based on the urban sign index is characterized by comprising the following steps of:
step S1: data source preparation: the data source at least comprises Unionpay consumption data and operator user data;
step S2: and (3) data source fusion: searching for intersections between the data sources;
step S3: carrying out characteristic engineering treatment on the fused data sources;
step S4: and evaluating the consumption capacity and the consumption portraits by adopting a federal learning method.
2. The method for evaluating the consumption capability based on the urban sign index according to claim 1, wherein in the step S1, the resident basic information and the consumption information are obtained through a silver-linked consumption data packet interface, wherein the silver-linked consumption data comprises the basic information and the consumption information of the user, and the consumption information of the user is stored in a silver-linked consumption database.
3. The method for evaluating the consumption ability based on the urban sign index according to claim 1, wherein in step S1, privacy data of users involved in the operator user data are jointly modeled in an operator database by a privacy calculation mode.
4. The method for evaluating the consumption ability based on the urban sign index according to claim 1, wherein in the step S2, the data source fusion process comprises the following steps:
step S21: carrying out hash processing on the two data sources by adopting a hash function;
step S22: the hash values corresponding to the two processed data sources are sent to the other party, and the two parties exchange the hash values;
step S23: the two parties locally compare the hash value of the other party with the hash value of the other party, find a common hash value, and find a common intersection data set stored in the respective databases through the common hash value.
5. The method of claim 1, wherein in step S3, the feature engineering process includes a feature construction process in which feature constructions are selected from the original data sources to be useful features, and the useful features are combined into a new subset.
6. The method for evaluating the consumption ability based on the urban sign index according to claim 1, wherein in the step S3, the feature engineering process comprises a feature deriving process, and the processing procedure is as follows: and constructing new features according to the business knowledge or the relation between the features.
7. The method for evaluating the consumption ability based on the urban sign index according to claim 1, wherein in step S3, the feature engineering process comprises a feature selection process, and the processing procedure is as follows: firstly, adopting a machine learning method to recursively remove the characteristics with the prediction capability on the target variable; then selecting features by constructing a model and gradually removing features with small contribution to the model predictive ability, wherein the target variable refers to an output or a label in the model being constructed; and finally, sorting the features according to the importance scores, and eliminating the features with the lowest importance scores until the specified feature quantity or the model performance is not improved any more.
8. The method for evaluating the consumption ability based on the urban sign index according to claim 1, wherein in step S3, the feature engineering process comprises a feature combination process; the treatment process comprises the following steps: the different features are combined to form new features.
9. The method for evaluating the consumption ability based on the urban sign index according to claim 1, wherein in the step S4, the evaluation process comprises the steps of:
step S41: selecting a federal learning model;
step S42: constructing a consumption capability model and evaluating;
step S43: training the consumption portrait model according to the selected federal learning model;
step S44: and predicting and evaluating the trained consumer portrait model.
10. The method for evaluating the consumption capability based on the urban sign indexes according to claim 9, wherein in step S42, the consumption amount of the user cardholder in the set period of time is counted according to the local silver-linked consumption data, and the consumption capability gears are divided according to the calculation of the percentage of the transaction amount of each cardholder, and are set as a significantly low consumption capability gear, a relatively high consumption capability gear, a significantly high consumption capability gear and a high consumption capability gear.
CN202311488106.6A 2023-11-08 2023-11-08 A consumption power assessment method based on urban physical indicators Pending CN117455549A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311488106.6A CN117455549A (en) 2023-11-08 2023-11-08 A consumption power assessment method based on urban physical indicators

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311488106.6A CN117455549A (en) 2023-11-08 2023-11-08 A consumption power assessment method based on urban physical indicators

Publications (1)

Publication Number Publication Date
CN117455549A true CN117455549A (en) 2024-01-26

Family

ID=89585211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311488106.6A Pending CN117455549A (en) 2023-11-08 2023-11-08 A consumption power assessment method based on urban physical indicators

Country Status (1)

Country Link
CN (1) CN117455549A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163979A (en) * 2020-10-19 2021-01-01 科技谷(厦门)信息技术有限公司 Urban traffic trip data analysis method based on federal learning
CN113240509A (en) * 2021-05-18 2021-08-10 重庆邮电大学 Loan risk assessment method based on multi-source data federal learning
CN113313538A (en) * 2021-06-30 2021-08-27 上海浦东发展银行股份有限公司 User consumption capacity prediction method and device, electronic equipment and storage medium
CN113393357A (en) * 2021-06-03 2021-09-14 八维通科技有限公司 Data center station system suitable for urban traffic trip data service

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163979A (en) * 2020-10-19 2021-01-01 科技谷(厦门)信息技术有限公司 Urban traffic trip data analysis method based on federal learning
CN113240509A (en) * 2021-05-18 2021-08-10 重庆邮电大学 Loan risk assessment method based on multi-source data federal learning
CN113393357A (en) * 2021-06-03 2021-09-14 八维通科技有限公司 Data center station system suitable for urban traffic trip data service
CN113313538A (en) * 2021-06-30 2021-08-27 上海浦东发展银行股份有限公司 User consumption capacity prediction method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Óskarsdóttir et al. The value of big data for credit scoring: Enhancing financial inclusion using mobile phone data and social network analytics
US11907403B2 (en) Dynamic differential privacy to federated learning systems
CN111402095A (en) A method for detecting student behavior and psychology based on homomorphic encryption federated learning
CN111260514A (en) Student score prediction method based on campus big data
US12026281B2 (en) Method for creating avatars for protecting sensitive data
CN114817946B (en) Federal learning gradient lifting decision tree training method based on trusted execution environment
Whittington et al. Push, pull, and spill: A transdisciplinary case study in municipal open government
US20190354993A1 (en) System and method for generation of case-based data for training machine learning classifiers
Wang et al. Understanding the adoption of mobile social payment: from the cognitive behavioural perspective
Yang et al. White paper-IEEE federated machine learning
Liang et al. A methodology of trusted data sharing across telecom and finance sector under china’s data security policy
Chen et al. Data anonymization evaluation against re-identification attacks in edge storage
CN116304346A (en) A cross-domain recommendation method based on federated learning and privacy protection
Rutskiy et al. Prospects for the Use of Artificial Intelligence to Combat Fraud in Bank Payments
de Andrade Simões et al. Benefits of using Blockchain technology as an accounting auditing instrument
CN117455549A (en) A consumption power assessment method based on urban physical indicators
Zhao et al. Network-based feature extraction method for fraud detection via label propagation
CN116596561A (en) Credit evaluation method, system and equipment for energy-consuming enterprises based on longitudinal federated learning
Preko et al. The study of the impact of business intelligence in the banking industry of Ghana
CN114626934A (en) Block chain-based multi-level wind control system and control method
Richter et al. Lending patterns in poor neighborhoods
Kumaran et al. Educational business intelligence framework visualizing significant features using metaheuristic algorithm and feature selection
Más Statistical data protection techniques
Alkatheeri et al. How Blockchain Technology can add Value in Project Management Information System (PMIS)
Brankovic et al. Privacy-preserving data mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20240126