CN109657883B - Multi-source data driving-based bank outlet site selection recommendation method - Google Patents

Multi-source data driving-based bank outlet site selection recommendation method Download PDF

Info

Publication number
CN109657883B
CN109657883B CN201910077769.6A CN201910077769A CN109657883B CN 109657883 B CN109657883 B CN 109657883B CN 201910077769 A CN201910077769 A CN 201910077769A CN 109657883 B CN109657883 B CN 109657883B
Authority
CN
China
Prior art keywords
data
bank
site selection
probability
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910077769.6A
Other languages
Chinese (zh)
Other versions
CN109657883A (en
Inventor
冯辉宗
邓轲
范兴容
许国良
雒江涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201910077769.6A priority Critical patent/CN109657883B/en
Publication of CN109657883A publication Critical patent/CN109657883A/en
Application granted granted Critical
Publication of CN109657883B publication Critical patent/CN109657883B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of data mining and big data application, in particular to a multisource data driving-based bank branch site selection recommendation method, which comprises the following steps: acquiring mobile signaling data and peripheral data of bank outlets, wherein the mobile signaling data and the peripheral data of the bank outlets comprise local public transportation data and network crawling data; extracting five characteristics of the flow of people, the user value, the traffic jam index, the number of peripheral competitive network points and the per capita income from the target bank outlets according to the acquired data; training a plurality of site selection classification models by using the constructed characteristics, and constructing a probability-based multi-model fusion site selection recommendation model of the bank outlets; the constructed site selection recommendation model of the bank outlets is used for recommending the optimal site selection distribution of the bank outlets in the area according to the characteristics of the target area so as to achieve the purpose of intelligent site selection of the bank outlets, solve the problems of inaccurate site selection range and high labor cost caused by the traditional site selection, and obtain better economic benefit and social benefit.

Description

Multi-source data driving-based bank outlet site selection recommendation method
Technical Field
The invention relates to the field of data mining and big data, in particular to a multisource data driving-based bank outlet site selection recommendation method.
Background
With the development of social economy and the demands of various commercial banks on services, the accumulation of users is increasing day by day, and the address selection of banking outlets is directly related to the development and benefits of banks. Therefore, the site selection of the bank outlets has important significance for the development and future planning of banks.
The traditional bank site selection method mainly comprises a manual research method, an expert scoring method, a consulting company for inquiring specialties and the like. The method comprises the steps of firstly collecting data in a large range by manpower, wherein the data comprises the human flow, bank competition net points, traffic flow and the like in a place range, finally combining the data, displaying results in a data visualization mode, and then selecting the optimal address by an expert according to the analysis of the results.
However, these methods usually have personal preference, and the results are not always accurate, and because automation cannot be realized, the efficiency is low, the workload is large, not only is the labor and economic expenditure burden large, but also the decision time is greatly prolonged, and the requirements of modern society on high efficiency and accuracy cannot be met.
Disclosure of Invention
In order to solve the defects of the problems, the invention provides a multisource data drive-based bank outlet site selection recommendation method through mobile signaling big data and bank outlet peripheral data which comprise local public transport data and network crawling data, can effectively solve the problems of long time and high investigation cost of the traditional method, realizes the purpose of automatic bank site selection, and is quick, accurate and effective.
In order to achieve the purpose, the invention provides a multisource data driving-based bank branch site selection recommendation method, which comprises the following steps:
step one, acquiring mobile signaling data and data around a bank outlet;
constructing any two or more characteristics of at least the flow rate of people, the user value, the traffic congestion index, the number of peripheral competitive network points and the per-capita income for the target bank outlets according to the acquired data;
training a plurality of site selection classification models by using the constructed characteristics, and constructing a probability-based multi-model fusion site selection recommendation model of the bank outlets;
and step four, recommending the optimal site selection distribution of the bank outlets in the region according to the characteristics of the target region by using the established site selection recommendation model of the bank outlets.
Further, the mobile signaling data in step one includes, but is not limited to, base station data of a banking outlet, track data of a mobile user, place of residence data of a mobile user, online information data of the mobile user, and bank data of the mobile user.
The peripheral data of the banking outlets in the step one include, but are not limited to, local public transportation data, network data and peripheral room price data.
Further, the second step comprises: selecting an existing bank, defining a circular area by taking the radius as r, and constructing characteristics by moving signaling data and peripheral data of bank outlets. The method comprises a people flow characteristic, a traffic jam index characteristic, a peripheral competition network point number characteristic and a per-capita income characteristic.
Counting the number of mobile users in the range according to the mobile signaling data, and counting the number of records through deduplication processing to serve as the people flow characteristic; and constructing a user value characteristic according to the comprehensive consideration of the internet surfing information data of the mobile user and the bank data of the mobile user and the payment behavior information.
Aiming at the peripheral data of the bank outlets, by means of local public transportation data and application programming interface data of map software, preferably, baidu map API data is selected, and the index characteristics of the traffic jam condition are calculated based on the speed of a road section; inquiring the bank internet points in the address selection range as peripheral competitive internet point features through a Baidu map API; acquiring room price information of peripheral cells through a crawler technology, and calculating the per-capita income characteristic through the urban room price income ratio epsilon; then the calculation formula of the per-capita income X in the range is:
Figure BDA0001959429890000021
wherein X i Representing the room price of the ith cell; n is the number of cells in the address range.
Further, the user value calculation method comprises the following steps:
establishing a step hierarchical structure with the upper layer as user value and the lower layer as three indexes influencing the value; calculating the influence weight of the three indexes on the user score, constructing a comparison matrix, namely comparing the influence weight of each index pairwise to obtain a comparison matrix A, and calculating the maximum characteristic value lambda of the comparison matrix A max With its feature vector a = (a 1, a2, a 3) T (ii) a Under the requirement of satisfying consistency check, the value val of the user is obtained as follows: val = a1 · b 1 +a2·b 2 +a3·b 3 ;b 1 Representing the number of uses of the bank financial app by the mobile user; b is a mixture of 2 Payment information expressed as the mobile user at the bank; b 3 The method comprises the steps that whether a mobile user belongs to a customer of a bank outlet to be addressed or not is shown; the three indexes are sequentially the index of the times that the mobile user uses the bank outlet financial app, the index of the payment information of the mobile user in the bank and the index of whether the mobile user belongs to the customer of the bank outlet to be selected; that is, b1 represents the number of times of use of the financial app, b2 represents bank payment information (the number of times of direct bank payment, such as direct payment by a bank app, is not limited to a bank, and may be any bank), and b3 represents whether the bank belongs to the bank (the bank represents a bank to be predicted and established by a model, that is, whether a user belongs to the bank is determined), where b1 and b2 are normalized values, and b3 is a value of 01, that is, the value of b3 is 0 or 1. And a1, a2, a3 represent weights of three indexes calculated by using an analytic hierarchy process, and the values of a1, a2, a3 are the feature vectors in the invention.
Further, the third step comprises: utilizing all the characteristic data to construct a plurality of address selection classification models, preferably, the invention selects and constructs three address selection classification models, wherein, the first address selection classification model constructed is a support vector machine model, and the objective function of the support vector machine model is expressed as:
Figure BDA0001959429890000031
after conversion, the objective function is:
Figure BDA0001959429890000032
s.t. y iT ·x i +b)≥1,i=1,...,n (3)
in the formulae (1) to (3), ω is a characteristic parameter, y i Representing the real label value corresponding to the ith training set in the second step; it takes on the value of 1 or-1. x is the number of i For the corresponding feature data, n represents the total number of feature sample points.
Further, a second addressing model is constructed in the first layer based on logistic regression, and a prediction function of the second addressing model is represented as:
Figure BDA0001959429890000033
/>
z=θ T ·x+b (5)
by the maximum likelihood method, the corresponding objective function is:
Figure BDA0001959429890000041
in the formulae (5) to (6), the corresponding solution, H, is obtained by the gradient descent method θ (x) For the logistic regression based site selection decision model, H θ (x i ) Expressed as the result obtained by the ith training set data in the second step; theta and b are parameters for fitting the formula; y is i Representing the real label value corresponding to the ith training set in the second step; then, a corresponding solution is obtained by a gradient descent method.
Further, a third addressing model is constructed based on a Gradient Boosting Decision Tree (GBDT for short), and the loss function is as follows:
L(y,f(x))=log(1+exp(-yf(x)) (7)
in the formula (7), y is a corresponding label value, wherein y belongs to { -1,1}, f (x) is a strong learner of a previous iteration, and x is a corresponding feature.
It can be understood that, in the multi-source data-driven-based site selection recommendation method for bank outlets of the present invention, the site selection classification model may select models of different loss functions or different objective functions, which all belong to the protection scope of the present invention, and those skilled in the art may select appropriate models according to actual situations, and the present invention is not enumerated.
Further, the third step comprises:
according to the steps, a plurality of different models are constructed by selecting different feature sets and different model methods, and a decision addressing model with better generalization is constructed by utilizing the difference of the plurality of different models, wherein the specific implementation mode is as follows:
1. selecting m different models as a first-layer classifier; since three addressing models are selected in the present invention, m =3.
2. 5-fold cross validation is carried out on a basic model of the first-layer classifier, namely, a training set is divided into 5 equal parts, 1 equal part and a test set are selected as a prediction set, the other 4 equal parts are used as the training set, 5 times of prediction are needed, and finally, the prediction probability of all records of the training set and the prediction probability of the 5 times of test set can be obtained;
3. taking the average value of the prediction probability of the training set and the prediction probability of the 5-time test set as the characteristics, wherein m different models are provided in total, namely m columns of characteristics, constructing a second-layer classifier based on logistic regression, and predicting the test set; the loss function of the second-layer classifier is designed as follows:
Figure BDA0001959429890000051
Figure BDA0001959429890000052
wherein, in the formula (8) and the formula (9), l (θ) is expressed as a loss function of the second layer classifier; h is θ (X) is a logistic regression-based site selection decision model in the second layer, h θ (x i ) For the result obtained for the ith training set data in the first probability feature, x i The feature data corresponding to the ith training set in the first probability feature; y is i And p is the corresponding output probability of the real label of the ith training set in the first probability characteristic.
It is understood that, in the present invention, the second-layer classifier has only one, i.e. logistic regression, and is characterized by the probability output by the plurality of classifiers of the first layer, i.e. the first probability feature, and if the first layer has m classifiers, m probabilities are output as m first probability features of the second layer.
h θ (x) I.e. a logistic regression based classification model in the second layer, h θ (x i ) X in (2) i Represents the ith piece of training data, i.e. the ith first probability feature, and inputs it to h θ (x) In this model, h θ (x i ) The result of (a) would be a fraction, y, in the range 0-1 i That is, the true tag value corresponding to this data: as shown in the table 1 below, the following examples,
TABLE 1 truth label values corresponding to various first probability characteristics
Fea1 Fea2 Fea3 Fea4 Fea5 Mean value
0.774 0.723 0.75 0.71 0.5 0.5
0.5 0.3 0.3 0.3 0.4 0
0.4 0.1 0.2 0.5 0.5 0
Fea1 to Fea5 respectively represent 5 sets of first probability features; when x is i I =3, i.e. corresponding data is 0.4,0.1,0.2,0.5,0.1.y is i Corresponding to 0.
p i The probability is the probability predicted by the classification model based on the logistic regression, and the main function is to increase the accuracy of prediction of excellent bank outlets. When the genuine label is 1, p i =h θ (x i ) (ii) a When the genuine label is 0, p i =1-h θ (x i )。
Further, the fourth step includes: and sequencing the prediction probabilities of the candidate addresses according to the prediction result, setting a threshold value, selecting k candidate addresses before sequencing as final address selection addresses if the prediction probabilities of the candidate addresses are lower than the threshold value and are not suitable for being used as address selection addresses of the bank.
In the invention, a person skilled in the art can set according to actual conditions, can define according to the requirements of users, and k can be equal to 1 or other integers greater than 1, thereby outputting one or more optimal site selection distributions of bank outlets.
The invention has the beneficial effects that:
according to the method, the purpose of automatic addressing of the bank outlets is achieved by combining mobile signaling big data and bank outlet peripheral data through a multi-model fusion method which is based on multi-source data driving, the addressing problem that enough data support is unavailable due to excessive dependence on manual subjective judgment in the addressing process is solved compared with the traditional method, reasonable addressing positions of the bank outlets are calculated through the big data and a machine learning method, the influence of the manual addressing subjective judgment is greatly reduced, and the addressing decision result is more objective.
Drawings
FIG. 1 is a decision flow chart of a multi-source data-driven site selection recommendation method for banking outlets according to the present invention;
FIG. 2 is a flow chart of a feature construction process based on multi-source data according to the present invention;
FIG. 3 is a schematic diagram of a construction process of a probability-based multi-model fusion method according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention are described clearly and completely below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a multisource data driving-based bank outlet site selection recommendation method, the flow of which is shown in figure 1 and comprises the following steps:
step one, acquiring mobile signaling data and peripheral data of bank outlets;
constructing five characteristics of the pedestrian volume, the user value, the traffic jam index, the number of peripheral competitive network points and the per-capita income for the target bank outlets according to the acquired data;
training a plurality of site selection classification models by using the constructed characteristics, and constructing a probability-based multi-model fusion site selection recommendation model of the bank outlets;
and step four, recommending the optimal site selection distribution of the bank outlets in the region according to the characteristics of the target region based on the multi-model fusion site selection recommendation method of the bank outlets.
Further, the mobile signaling data in step one includes, but is not limited to, base station data of a banking outlet, track data of a mobile user, living place data of a mobile user, networking information data of the mobile user, and bank data of the mobile user. The peripheral data of the banking outlets in the step one include, but are not limited to, local public transportation data, network data and peripheral room price data.
Further, an existing bank is selected, a circular area is defined by taking the radius as r, and characteristics are constructed through mobile signaling data and data around bank outlets. The method comprises a people flow characteristic, a user value characteristic, a traffic jam index characteristic, a peripheral competition network point characteristic and a per-person income characteristic. The feature configuration flow chart is shown in fig. 2, and the feature configuration flow includes:
the flow rate characteristic of the people: and determining the information of the base station in the address selection range, counting the number of mobile users in the range by utilizing the mobile signaling data, and counting the number of records through duplication removal processing to obtain the people flow data.
User value characteristics: the method comprises the following steps of constructing a user value model according to the internet information data of the mobile user and the bank data of the mobile user, wherein the steps are as follows:
and establishing a step hierarchical structure, wherein the upper layer is user value, the lower layer is three indexes influencing the value, and the three indexes are respectively the times of using the bank financial app by the mobile user, the payment information of the mobile user in a bank and whether the mobile user belongs to a customer of a bank outlet to be selected.
Establishing a judgment matrix: calculating the influence weight of the bottom 3 indexes on the user score, and constructing a comparison matrix, namely comparing each element in pairs, wherein the weight is the corresponding weight of each index, and a matrix A can be obtained after comparing each element in pairs:
Figure BDA0001959429890000071
for the matrix A, the maximum eigenvector lambda of the matrix A is obtained max With its feature vector a = (a 1, a2, a 3) T I.e. the weight vector.
Introducing consistency checks, i.e.
Figure BDA0001959429890000081
When CR is less than or equal to 0.1, the results have satisfactory consistency, otherwise, the contrast moment is reconstructedAnd (5) arraying. Here, n is the order of the contrast matrix, i.e., three indicators that affect the value, and n =3.
Finally, the user value val is obtained as follows:
val=a1·b 1 +a2·b 2 +a3·b 3
b 1 representing the number of times the mobile user uses the bank's financial app, b 2 Information indicating the payment of the mobile subscriber at the bank, b 3 Indicating whether the mobile user belongs to the customer of the bank network to be addressed. Traffic congestion index features: the method comprises the steps of crawling peripheral bus stop quantity data through a Baidu map API (application program interface), and calculating a traffic jam condition index based on road speed through local bus data and Baidu map API data.
The detailed steps of calculating the traffic jam condition index based on the road section speed are as follows:
the real-time uploading interval of the local bus data is about 15 seconds, so that the congestion index value A of a road section i within 15s ij The calculation formula of (2) is as follows:
Figure BDA0001959429890000082
wherein RS ij Representing the jth interval reference speed, CS, of the road section i ij The calculated speed of the jth interval of the road section i is represented, namely the actual running speed; the number of segments of the road section can be set according to the actual condition of the road, and the total number of the intervals can be 2880.
According to the congestion index value A ij Calculating the index value B in the address selection range in the statistical interval j By the length L of the road section i As a weighting factor, the index value A for each link ij Performing weighted averaging, wherein N represents the total number of the segments in the address range, and B j The calculation formula is as follows:
Figure BDA0001959429890000083
finally, a traffic jam index is obtained, namely data of 12h in total in a peak period (07:
Figure BDA0001959429890000084
wherein M is B within one month j All the record numbers.
Peripheral competition net point number characteristics: and inquiring the bank internet points in the address selection range as peripheral competition internet point data through the Baidu map API.
Per capita income characteristic: determining a site selection range, crawling the house price and house price income ratio epsilon of peripheral cells from a network by utilizing a crawler technology, setting the information of each house price as X1, X2 and X3.
Figure BDA0001959429890000091
According to the invention, a plurality of different models can be constructed by selecting different feature sets and using different classical classification model methods;
as an optional manner, in this embodiment, three address selection classification models are selected and constructed, and a first address selection classification model is constructed based on a support vector machine by using all feature data, where an objective function is expressed as:
Figure BDA0001959429890000092
after conversion, the objective function is:
Figure BDA0001959429890000093
s.t. y iT ·x i +b)≥1,i=1,...,n
where ω is a characteristic parameter, y i Representing the real label value corresponding to the ith training set in the second step; the value range is 1 or-1. x is the number of i Is the corresponding characteristic data. n represents the total number of feature sample points.
Further, a second addressing model is constructed in the first layer based on logistic regression, and a prediction function of the second addressing model is represented as:
Figure BDA0001959429890000094
z=θ T ·x+b
wherein H θ (x) For the logistic regression based siting decision model in the first layer, H θ (x i ) Representing the result obtained by the ith training set data in the second step; theta and b are parameters for fitting the formula; y is i Representing the real label value corresponding to the ith training set in the second step;
then, by a gradient descent method, solving a corresponding solution:
further, a third addressing model is constructed based on a Gradient Boosting Decision Tree (GBDT for short), and a loss function of the third addressing model is as follows:
L(y,f(x))=log(1+exp(-yf(x))
wherein y is a corresponding label value, y belongs to { -1,1}, f (x) is a strong learner of a previous iteration, and x is a corresponding feature.
Further, model fusion is performed according to the steps, the specific implementation mode is shown in fig. 3, a plurality of different models are constructed by selecting different feature sets and different model methods, and a decision-making and addressing model with better generalization is constructed by utilizing the difference of the plurality of different models.
The concrete implementation steps are as follows:
1. selecting m different models as a first-layer classifier;
2. performing 5-fold cross validation on a basic model of the first-layer classifier, namely dividing the training set into 5 equal parts, selecting 1 equal part and the test set as a prediction set, and taking the other 4 equal parts as the training set, wherein 5 times of prediction are required in total, and finally obtaining the prediction probability of all records of the training set and the prediction probability of the 5 times of test set;
3. and taking the average value of the prediction probability of the training set and the prediction probability of the 5 times of test sets as features, wherein m different models exist in total, namely m columns of features exist, constructing a second-layer classifier based on logistic regression, and predicting the test sets. The loss function of the second-layer classifier is designed as follows:
Figure BDA0001959429890000101
Figure BDA0001959429890000102
wherein l (θ) is expressed as a loss function of the second-layer classifier; h is θ (X) is a logistic regression-based site selection decision model in the second layer, h θ (x i ) For the result obtained for the ith training set data in the first probability feature, x i The feature data corresponding to the ith training set in the first probability feature; y is i For the real label value, p, corresponding to the ith training set in the first probability feature i And correspondingly outputting the probability for the real label. Therefore, the corresponding weight of the excellent bank outlets in the model is ensured to be higher.
Further, according to the prediction result, the prediction probabilities of the candidate addresses are ranked, a threshold value is set, if the threshold value is lower than the threshold value, the candidate addresses are not suitable for being used as the address selection locations of the bank, and k candidate locations before ranking are selected as the final address selection locations of the bank.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.
The above-mentioned embodiments, which are further detailed for the purpose of illustrating the invention, technical solutions and advantages, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made to the present invention within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A bank outlet site selection recommendation method based on multi-source data driving is characterized by comprising the following steps:
the method comprises the steps that firstly, mobile signaling data including base station data of a bank outlet, track data of a mobile user, place data of a mobile user's place of residence, internet surfing information data of the mobile user and bank data of the mobile user are obtained, and peripheral data of the bank outlet including local bus data, network data, map data and peripheral rate data are obtained;
constructing characteristics including pedestrian volume, user value, traffic jam index, peripheral competition network points and per-capita income for a target bank outlet according to the acquired data;
the flow rate characteristic of the people: determining the base station information in the address selection range, counting the number of mobile users in the range by utilizing mobile signaling data, and counting the number of recorded pieces as the people flow characteristic through de-duplication processing;
user value characteristics: constructing a user value characteristic according to the internet information data of the mobile user and the bank data of the mobile user; the user value calculation method comprises the following steps:
establishing a step hierarchical structure with the upper layer as user value and the lower layer as three indexes influencing the value; calculating the influence weight of the three indexes on the user score, constructing a comparison matrix, namely comparing the influence weight of each index pairwise to obtain a comparison matrix A, and calculating the maximum eigenvalue of the comparison matrix A
Figure DEST_PATH_IMAGE002
With its feature vector a = (a 1, a2, a 3) T (ii) a On the basis of a satisfactory conformity test, the user value is determined>
Figure DEST_PATH_IMAGE004
Comprises the following steps: />
Figure DEST_PATH_IMAGE006
;/>
Figure DEST_PATH_IMAGE008
Represents the number of times the mobile user uses a bank financial app, <' > or>
Figure DEST_PATH_IMAGE010
Represents payment information of the mobile user at the bank, and>
Figure DEST_PATH_IMAGE012
whether the mobile user belongs to a customer of a bank outlet to be addressed or not is represented, wherein b1 and b2 are normalized values, b3 is a value of 01, and b3 is equal to 0 or 1; the three indexes are sequentially the index of the times that the mobile user uses the bank financial app, the payment information index of the mobile user in the bank and the customer index of whether the mobile user belongs to the bank;
traffic congestion index features: calculating a traffic jam condition index based on the road speed by combining local bus data and application programming interface data of map software;
peripheral competition net point number characteristics: inquiring the bank internet points in the address selection range through an application programming interface of map software, and taking the bank internet points as peripheral competition internet points;
per capita income characteristic: determining site selection range, crawling the house price and house price-income ratio of peripheral cells in the site selection range from network by using crawler technology
Figure DEST_PATH_IMAGE014
If so, then the person in the range is always admitted>
Figure DEST_PATH_IMAGE016
The calculation formula is as follows: />
Figure DEST_PATH_IMAGE018
;/>
Figure DEST_PATH_IMAGE020
Is shown asiThe housing price of the cell; n is the number of cells in the address range;
training a plurality of site selection classification models by using the constructed characteristics, and constructing a multi-model fusion-based bank outlet site selection recommendation model;
and step four, recommending the optimal site selection distribution of the bank outlets in the region according to the characteristics of the target region by using the established site selection recommendation model of the bank outlets.
2. The multi-source data-driven site selection recommendation method for banking outlets as claimed in claim 1, wherein the traffic congestion status index calculation method comprises uploading congestion index values of road segments in an interval in real time by local public transportation data
Figure DEST_PATH_IMAGE022
(ii) a Based on the congestion index value->
Figure 417645DEST_PATH_IMAGE022
Calculating an index value in a selection range within a statistical interval>
Figure DEST_PATH_IMAGE024
Statistics of the indicator value in peak hours per working day>
Figure 193840DEST_PATH_IMAGE024
Calculating the average value of the traffic congestion indexes, and taking the average value as a calculated traffic congestion index C;
congestion indicator value
Figure DEST_PATH_IMAGE026
The calculation formula of (2) is as follows: />
Figure DEST_PATH_IMAGE028
Index value in site selection range
Figure 832631DEST_PATH_IMAGE024
The calculation formula is as follows:
Figure DEST_PATH_IMAGE030
the traffic jam index C is calculated according to the formula:
Figure DEST_PATH_IMAGE032
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE034
representing road sectionsiFirst, thejA reference speed in an interval->
Figure DEST_PATH_IMAGE036
Representing road sectionsiFirst, thejCalculated speed within interval, i.e. sectioniFirst, thejActual operating speeds corresponding to the intervals; />
Figure DEST_PATH_IMAGE038
Representing road segmentsiLength of (d); n represents the total number of the road sections in the address selection range; m is in one month>
Figure 8267DEST_PATH_IMAGE024
All the record numbers.
3. The multi-source data-driven bank outlet site selection recommendation method according to claim 1, wherein the probability-based multi-model fusion bank outlet site selection recommendation model in step three comprises a plurality of different site selection classification models constructed by selecting different feature data sets and using different classical classification model methods, and a probability-based multi-model fusion bank outlet site selection recommendation model constructed by using output probabilities of the plurality of different site selection classification models.
4. The multi-source data-driven-based banking outlet site selection recommendation method according to claim 3, wherein the probability-based multi-model fusion based banking outlet site selection recommendation model is specifically constructed by the following steps:
selecting m different classification models as a first-layer classifier;
performing five-fold cross validation on each classification model of the first-layer classifier, namely performing five equal divisions on the training set, selecting one of the training sets and the complete test set as predictions, using the other four training sets as training, and performing five predictions in each classification model to obtain respective prediction probability of each training set and the predicted probability of the test set in the five predictions;
averaging the respective prediction probability of each training set and the probabilities predicted by the five test sets, and taking the average value as a first probability characteristic; and constructing a second-layer classifier based on logistic regression, and fusing the first probability characteristics of each classification model to form a multi-model fusion-based bank website selection recommendation model based on probability.
5. The multi-source data-driven-based addressing recommendation method for banking outlets according to claim 4, wherein a loss function of the second-layer classifier is designed as follows:
Figure DEST_PATH_IMAGE040
Figure DEST_PATH_IMAGE042
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE044
expressed as a loss function of the second layer classifier; />
Figure DEST_PATH_IMAGE046
Is the first in the first probability characteristiciResults from the bar training set data->
Figure DEST_PATH_IMAGE048
Is the first in the first probability characteristiciCharacteristic data corresponding to the bar training set; />
Figure DEST_PATH_IMAGE050
Is the first in the first probability characteristiciThe true label value corresponding to the bar training set is greater or less>
Figure DEST_PATH_IMAGE052
And correspondingly outputting the probability for the real label.
6. The multi-source data-driven-based banking outlet site selection recommendation method according to claim 1, wherein the optimal banking outlet site selection distribution is that predicted probabilities of candidate addresses are ranked, a threshold is set, if the predicted probabilities are lower than the threshold, the candidate addresses are not suitable for being used as banking site selection sites, and k candidate sites before ranking are selected as final banking site selection sites.
CN201910077769.6A 2019-01-28 2019-01-28 Multi-source data driving-based bank outlet site selection recommendation method Active CN109657883B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910077769.6A CN109657883B (en) 2019-01-28 2019-01-28 Multi-source data driving-based bank outlet site selection recommendation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910077769.6A CN109657883B (en) 2019-01-28 2019-01-28 Multi-source data driving-based bank outlet site selection recommendation method

Publications (2)

Publication Number Publication Date
CN109657883A CN109657883A (en) 2019-04-19
CN109657883B true CN109657883B (en) 2023-04-07

Family

ID=66121486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910077769.6A Active CN109657883B (en) 2019-01-28 2019-01-28 Multi-source data driving-based bank outlet site selection recommendation method

Country Status (1)

Country Link
CN (1) CN109657883B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111150A (en) * 2019-05-08 2019-08-09 拉扎斯网络科技(上海)有限公司 Information processing method, device, storage medium and electronic equipment
CN111949834B (en) * 2019-05-17 2024-04-05 嘉兴树融数据科技有限公司 Site selection method and site selection platform system
CN110428196B (en) * 2019-06-26 2023-04-07 深圳市跨越新科技有限公司 Quantitative analysis method and system for site selection of logistics network points
CN110378737B (en) * 2019-07-23 2022-04-05 中国工商银行股份有限公司 Site selection method and device for network points
CN110837930B (en) * 2019-11-07 2023-09-19 腾讯科技(深圳)有限公司 Site selection method, device, equipment and storage medium
CN111144935A (en) * 2019-12-17 2020-05-12 武汉海云健康科技股份有限公司 Big data-based sleep member awakening method and system, server and medium
CN111145452A (en) * 2019-12-31 2020-05-12 中国银行股份有限公司 Site selection method and device for self-service cash recycling machine capable of taking train tickets
CN111274341A (en) * 2020-01-16 2020-06-12 中国建设银行股份有限公司 Site selection method and device for network points
CN111598603A (en) * 2020-03-03 2020-08-28 深圳前海微众银行股份有限公司 Warehouse site selection method, device, equipment and storage medium
CN112150187B (en) * 2020-06-24 2023-04-07 北京化工大学 Competitive facility site selection method considering various customer selection rules
CN112541786A (en) * 2020-12-11 2021-03-23 中信银行股份有限公司 Site selection method and device for network points, electronic equipment and storage medium
CN112561322B (en) * 2020-12-14 2023-04-18 广东电网有限责任公司珠海供电局 Charging pile site selection method based on machine learning
CN112990976B (en) * 2021-03-29 2024-06-14 华南理工大学 Commercial site selection method, system, equipment and medium based on open source data mining
CN113128773B (en) * 2021-04-23 2024-03-29 中国工商银行股份有限公司 Training method of address prediction model, address prediction method and device
CN113379462A (en) * 2021-06-29 2021-09-10 北京百度网讯科技有限公司 Site selection method, device, equipment and storage medium
CN113793171B (en) * 2021-08-27 2023-11-07 浙江新再灵科技股份有限公司 Region dividing method, device, storage medium and equipment based on multidimensional data
CN113988628A (en) * 2021-10-28 2022-01-28 中国银行股份有限公司 ATM position determining method and system
CN114819756B (en) * 2022-06-24 2022-09-27 深圳众城卓越科技有限公司 Wind turbine generator set intelligent site selection method, device and equipment based on classification model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140132795A (en) * 2013-05-06 2014-11-19 서울시립대학교 산학협력단 System and method for Estimating of the spatial development patterns based on determination factors of the city form
CN106454729A (en) * 2016-10-09 2017-02-22 广州杰赛科技股份有限公司 Station planning method and device for urban rail transit
CN107316098A (en) * 2017-05-19 2017-11-03 芜湖恒天易开软件科技股份有限公司 A kind of automobile leasing point site selecting method based on user behavior analysis

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6604083B1 (en) * 1997-06-12 2003-08-05 G. William Bailey Market determination based on travel time bands
US20090083128A1 (en) * 2007-09-24 2009-03-26 Introspective Solutions, Llc Predicted variable analysis based on evaluation variables relating to site selection
US10360521B2 (en) * 2015-06-12 2019-07-23 Sap Se Dynamic location recommendation for public service vehicles
CN105516928A (en) * 2016-01-15 2016-04-20 中国联合网络通信有限公司广东省分公司 Position recommending method and system based on position crowd characteristics
CN106548367A (en) * 2016-10-12 2017-03-29 电子科技大学 The site selection model and its applied research of multi-source data
CN106651392A (en) * 2016-12-07 2017-05-10 中国联合网络通信集团有限公司 Intelligent business location selection method, apparatus and system
CN107301221A (en) * 2017-06-16 2017-10-27 华南理工大学 A kind of data digging method of multiple features dimension heap fusion
CN107909433A (en) * 2017-11-14 2018-04-13 重庆邮电大学 A kind of Method of Commodity Recommendation based on big data mobile e-business
CN109242552B (en) * 2018-08-22 2020-09-29 重庆邮电大学 Shop positioning method based on big data
CN109255538A (en) * 2018-09-12 2019-01-22 中国联合网络通信集团有限公司 The appraisal procedure and system of bank branches, server, storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140132795A (en) * 2013-05-06 2014-11-19 서울시립대학교 산학협력단 System and method for Estimating of the spatial development patterns based on determination factors of the city form
CN106454729A (en) * 2016-10-09 2017-02-22 广州杰赛科技股份有限公司 Station planning method and device for urban rail transit
CN107316098A (en) * 2017-05-19 2017-11-03 芜湖恒天易开软件科技股份有限公司 A kind of automobile leasing point site selecting method based on user behavior analysis

Also Published As

Publication number Publication date
CN109657883A (en) 2019-04-19

Similar Documents

Publication Publication Date Title
CN109657883B (en) Multi-source data driving-based bank outlet site selection recommendation method
Ali et al. A data-driven approach for multi-scale GIS-based building energy modeling for analysis, planning and support decision making
CN109034448B (en) Trajectory prediction method based on vehicle trajectory semantic analysis and deep belief network
CN110543999B (en) Method and device for store site selection
CN107194803A (en) A kind of P2P nets borrow the device of borrower&#39;s assessing credit risks
CN104321794B (en) A kind of system and method that the following commercial viability of an entity is determined using multidimensional grading
Chen et al. A case-based distance method for screening in multiple-criteria decision aid
CN109214863B (en) Method for predicting urban house demand based on express delivery data
CN113379227A (en) Industrial park data processing method and device, computer equipment and storage medium
CN110288137A (en) Road network grade road infrastructure structure composition Maintenance Decision making method and system
CN104346425A (en) Method and system of hierarchical internet public sentiment indication system
CN107609771A (en) A kind of supplier&#39;s value assessment method
CN111951050B (en) Financial product recommendation method and device
CN110889092A (en) Short-time large-scale activity peripheral track station passenger flow volume prediction method based on track transaction data
CN109727157A (en) A kind of sale of electricity company credit evaluation method suitable for power exchange
CN115099450A (en) Family carbon emission monitoring and accounting platform based on fusion model
CN113052505A (en) Cross-border travel recommendation method, device and equipment based on artificial intelligence
CN111091460A (en) Data processing method and device
CN110287995A (en) The multiple features learning network model method of whole day overhead transport jam situation classification
Zhu et al. Loan default prediction based on convolutional neural network and LightGBM
CN111552882A (en) News influence calculation method and device, computer equipment and storage medium
CN110807543A (en) Investment portfolio optimization method and device based on group decision intelligent search
Ragapriya et al. Machine Learning Based House Price Prediction Using Modified Extreme Boosting
Dong et al. Research on academic early warning model based on improved SVM algorithm
Wang et al. Computer supported data-driven decisions for service personalization: a variable-scale clustering method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant