WO2019179173A1 - Trading area determination method and device - Google Patents

Trading area determination method and device Download PDF

Info

Publication number
WO2019179173A1
WO2019179173A1 PCT/CN2018/119319 CN2018119319W WO2019179173A1 WO 2019179173 A1 WO2019179173 A1 WO 2019179173A1 CN 2018119319 W CN2018119319 W CN 2018119319W WO 2019179173 A1 WO2019179173 A1 WO 2019179173A1
Authority
WO
WIPO (PCT)
Prior art keywords
stores
business circle
store
value
distance
Prior art date
Application number
PCT/CN2018/119319
Other languages
French (fr)
Chinese (zh)
Inventor
黄凯
钟蛵雩
贾全慧
余泉
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2019179173A1 publication Critical patent/WO2019179173A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0204Market segmentation
    • G06Q30/0205Location or geographical consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Definitions

  • the embodiments of the present specification relate to the field of machine learning, and more particularly, to a method and apparatus for training a business circle determination model, a method and apparatus for determining a business circle, and a method and apparatus for updating a business circle determination.
  • offline stores have been vigorously developed. Compared with online merchants, offline merchants have physical stores, and objects are clustered. They have a certain concentration in geographical location, that is, they can be divided into various business districts. Through the business circle information, you can deepen your understanding of the store: identify the offline industry market, and assist in judging the operation status of the store.
  • the existing business circle information includes: business circle information marked by offline BD, and business circle information mainly from the results of the popular public comment business circle. The above business circle information is obtained by manual labeling. Therefore, there is a need for a more effective solution for determining a business circle.
  • the embodiments of the present specification aim to provide a more effective solution for determining a business circle to solve the deficiencies in the prior art.
  • an aspect of the present specification provides a method for training a business circle determination model, comprising: acquiring respective location information of a plurality of stores within a predetermined geographical range and respective business circle annotation information of the plurality of stores; according to CFSFDP a clustering algorithm, using the position information, calculating a value of a local density ⁇ of each store, a value of a minimum distance ⁇ with a higher density store, and a value of a product ⁇ ; acquiring according to respective current thresholds of ⁇ , ⁇ , and ⁇ Business circle determination information of each store; calculating the similarity of all the business circle determination information with respect to all the business circle label information by using the business circle determination information and the business circle label information of the plurality of stores; and adjusting ⁇ The respective thresholds of ⁇ and ⁇ increase the similarity.
  • a value of a local density ⁇ of a shop i in the plurality of stores is ⁇ i , wherein
  • d c is a radius threshold
  • d ij is the distance between the store i and the store j in the plurality of stores
  • i and j are natural numbers less than or equal to the total number of stores of the plurality of stores, and i ⁇ j.
  • a value of a local density ⁇ of a shop i in the plurality of stores is ⁇ i , wherein
  • d c is a radius threshold
  • d ij is the distance between the store i and the store j in the plurality of stores
  • i and j are natural numbers less than or equal to the total number of stores of the plurality of stores, and i ⁇ j.
  • the position information of the shop i and the shop j are expressed as latitude and longitude (Lon i , Lat i ) and (Lon j , Lat j ), respectively, and the distance d ij is calculated as follows:
  • R is the radius of the Earth.
  • the similarity is represented by a parameter WFS, wherein
  • i is an integer from 0 to A
  • j is an integer from 0 to B
  • A is the number of quotients in the circle
  • B is the number of quotients in the quotient
  • N i is the number of stores in the ith circle
  • N is the total number of stores of the plurality of stores
  • P ij is the accuracy rate for the i-th labeled business circle and the j-th determined business circle
  • R ij is the recall rate for the i-th labeled business circle and the j-th determined business circle, which will include a collection of labeled scattered stores Set to the 0th labeled business circle, and set the set of the determined scattered stores as the 0th determination business circle, wherein the marked scattered shop is an annotated shop that does not belong to any labeled business circle, and the determined scattered shop is A judgment shop that does not belong to any judgment business circle.
  • adjusting respective threshold values of ⁇ , ⁇ , and ⁇ such that the similarity improvement includes adjusting respective threshold values of ⁇ , ⁇ , and ⁇ such that the similarity The greatest degree.
  • Another aspect of the present specification provides a method for determining a business circle, comprising: acquiring respective location information of a plurality of stores within a predetermined geographical range; and calculating a local density ⁇ of each store by using the location information according to a CFSFDP clustering algorithm. a value, a value of a minimum distance ⁇ with a higher density store, and a value of the product ⁇ ; for each of the plurality of stores by the adjusted threshold values of ⁇ , ⁇ , and ⁇ obtained by the method of training the business circle determination model Determine the business circle.
  • the predetermined geographic range is a predetermined city.
  • Another aspect of the present specification provides a method for updating a business circle determination, comprising: acquiring respective first location information of a plurality of first stores within a predetermined geographic range and a first distance between each first store; acquiring the predetermined Second position information of each of the at least one second store in the geographical range; using the first location information and the second location information, calculating a second distance between the second stores, and any one of the second stores a third distance between any one of the first stores; calculating, according to the CFSFDP clustering algorithm, the plurality of first stores and the at least one second store based on the first distance, the second distance, and the third distance
  • a business circle is determined for the plurality of first stores and at least one second store.
  • the method of updating the business circle determination is performed every predetermined time period.
  • the present disclosure provides an apparatus for training a business circle determination model, including: a first acquisition unit configured to acquire respective location information of a plurality of stores within a predetermined geographic range and respective business circle labels of the plurality of stores
  • the first calculation unit is configured to calculate, according to the CFSFDP clustering algorithm, a value of a local density ⁇ of each store, a value of a minimum distance ⁇ with a higher density store, and a value of a product ⁇ using the position information
  • the second obtaining unit is configured to acquire the business circle determination information of each store according to the current threshold value of each of ⁇ , ⁇ , and ⁇ ; the second calculating unit is configured to determine the information and the quotient by using the respective business circle of the plurality of stores
  • the circle labeling information calculates a similarity of all the merchant circle determination information with respect to all of the merchant circle labeling information; and the threshold value adjusting unit is configured to adjust respective threshold values of ⁇ , ⁇ , and ⁇ such that the similarity is improved.
  • Another aspect of the present disclosure provides an apparatus for determining a business circle, comprising: an obtaining unit configured to acquire respective location information of a plurality of stores within a predetermined geographical range; and a calculating unit configured to use the CFSFDP clustering algorithm according to the CFSFDP clustering algorithm
  • the position information is calculated, the value of the local density ⁇ of each store, the value of the minimum distance ⁇ with the higher density store, and the value of the product ⁇ are calculated; and the determining unit is configured to obtain by the method according to the above-described training quotation determination model
  • the adjusted threshold values of ⁇ , ⁇ , and ⁇ are each determined for the plurality of stores.
  • the present disclosure provides an apparatus for updating a business circle determination, including: a first obtaining unit configured to acquire first location information of each of a plurality of first stores within a predetermined geographic range and between respective first stores a first distance unit, configured to acquire respective second location information of the at least one second store in the predetermined geographic range; the first calculating unit is configured to utilize the first location information and the second Position information, calculating a second distance between the respective second stores, a third distance between any one of the second stores and any one of the first stores; and a second calculating unit configured to cluster according to CFSFDP And calculating, based on the first distance, the second distance, and the third distance, a value of a local density ⁇ of each of the plurality of first stores and at least one second store, and a value of a minimum distance ⁇ from a higher density store And a value of the product ⁇ ; and a determining unit configured to adjust the threshold values of ⁇ , ⁇ , and ⁇ obtained by the method of training the business circle determination model for the
  • the business circle can be quickly and accurately determined, and the stability of the determination result can be ensured.
  • the embodiments of the present specification also effectively reduce the computational complexity and optimize the calculation time.
  • FIG. 1 shows a schematic diagram of a system 100 for determining a business circle in accordance with an embodiment of the present specification
  • FIG. 2 is a flow chart showing a method of training a business circle determination model according to an embodiment of the present specification
  • Figure 3 schematically shows an example of a ⁇ - ⁇ profile
  • Figure 4 schematically shows an example of a gamma distribution map
  • FIG. 5 is a flowchart showing a method of determining a business circle according to an embodiment of the present specification
  • FIG. 6 shows a flow chart of a method for updating a business circle determination according to an embodiment of the present specification
  • Figure 7 illustrates an apparatus 700 for training a business circle decision model in accordance with an embodiment of the present specification
  • Figure 8 illustrates an apparatus 800 for determining a business circle in accordance with an embodiment of the present specification
  • FIG. 9 illustrates an apparatus 900 for updating a business circle determination in accordance with an embodiment of the present specification.
  • FIG. 1 shows a schematic diagram of a system 100 for determining a business circle in accordance with an embodiment of the present specification.
  • system 100 includes a clustering module 11, an evaluation module 12, and a threshold adjustment module 13.
  • the training samples are input to the clustering module 11.
  • the training sample includes the location information of the store and the respective business circle labeling information of the store.
  • the clustering module 11 calculates the value of the local density ⁇ of each store, the value of the minimum distance ⁇ with the higher density store, and the value of the product ⁇ according to the CFSFDP clustering algorithm according to the position information, according to ⁇ , ⁇ , and ⁇ .
  • the respective current thresholds are used to obtain the business circle determination information of each store.
  • the clustering module 11 transmits the above-mentioned business circle determination information to the evaluation module 12.
  • the evaluation module 12 calculates the similarity of all the business circle determination information with respect to all the business circle annotation information as the evaluation score by using the respective business circle determination information and the business circle annotation information of the plurality of stores, and the evaluation score is It is transmitted to the threshold adjustment module 13.
  • the threshold adjustment module 13 adjusts the thresholds of the parameters ⁇ , ⁇ , and ⁇ in the clustering module 11 according to the evaluation score to increase the evaluation score, and after the plurality of adjustments, maximizes the evaluation score.
  • the full amount of store information can be clustered by the distance module 11 to obtain the business circle determination result.
  • the method includes: acquiring, in step S21, location information of each of a plurality of stores in a predetermined geographical range and respective business circle annotation information of the plurality of stores; and in step S22, according to a CFSFDP clustering algorithm Using the position information, calculating the value of the local density ⁇ of each store, the value of the minimum distance ⁇ with the higher density store, and the value of the product ⁇ ; in step S23, according to the respective threshold values of ⁇ , ⁇ , and ⁇ , Acquiring the business circle determination information of each store; in step S24, calculating the similarity of all the business circle determination information with respect to all the business circle annotation information by using the business circle determination information and the business circle annotation information of each of the plurality of stores And in step S25, the respective thresholds of ⁇ , ⁇ , and ⁇ are adjusted such that the similarity is improved.
  • step S21 location information of each of a plurality of stores within a predetermined geographical range and business circle annotation information of each of the plurality of stores are acquired.
  • the predetermined geographic extent may be, for example, a geographic extent including more than 100 business districts, such as a district, county, etc. of the city.
  • the number of stores in a plurality of stores may be, for example, on the order of several thousand, for example, 3,000.
  • the plurality of shopping districts covered by the plurality of stores include a plurality of location relationships, for example, the business circle is adjacent to the business circle, intersects, is away from, and the like.
  • the location information of the store may be expressed in various known forms.
  • the location information of the store may be the latitude and longitude of the store, or the location information of the store may be the city coordinates or the like.
  • the business district labeling information of the store includes whether the store belongs to a certain business circle, and which business district the store belongs to.
  • the business circle label information of the store may be indicated by the labeled business circle field. When the field is 0, the store is a scattered store that does not belong to any business circle. When the field is a natural number, it indicates that the store belongs to the natural number. Business district.
  • step S22 based on the CFSFDP clustering algorithm, the value of the local density ⁇ of each store, the value of the minimum distance ⁇ with the higher density store, and the value of the product ⁇ are calculated using the position information.
  • the value of the local density ⁇ of the shop i in the plurality of stores is ⁇ i , wherein ⁇ i is calculated by the following formula (1):
  • the CFSFDP clustering algorithm is used to determine the business circle, that is, clustering the shop points. Since the shape of the business circle is generally fixed, d c is set to 0.2 (ie, 200 m). With this setting, the stability of the clustering result is greatly improved.
  • the distance d ij between the store i and the store j may adopt different formulas according to different forms of the store location information. For example, when the position information of the shop i and the shop j are expressed as latitude and longitude (Lon i , Lat i ) and (Lon j , Lat j ), respectively, the distance d ij is calculated by the following formula (2):
  • the d ij calculated by the formula (2) is the distance between two points on the spherical surface, where R is the radius of the earth, and the average value is 6371 km.
  • d ij can be Euclidean distance, Minkowsky distance, Manhattan distance, and the like.
  • the location information of the store i and the store j are respectively represented in three-dimensional or two-dimensional coordinates in the city coordinate system, so that, for example, the Euclidean distance between the store i and the store j can be calculated as d ij .
  • each d ij can be calculated by, for example, the above formula (2), thereby obtaining a distance matrix.
  • a distance calculation formula based on a Gaussian kernel function is introduced, and ⁇ i is calculated by the following formula (3),
  • d c is a radius threshold
  • d ij is the distance between the store i and the store j in the plurality of stores
  • i and j are natural numbers less than or equal to the total number of stores of the plurality of stores
  • i ⁇ j is natural numbers less than or equal to the total number of stores of the plurality of stores
  • i ⁇ j is natural numbers less than or equal to the total number of stores of the plurality of stores
  • i j are natural numbers less than or equal to the total number of stores of the plurality of stores
  • i j are natural numbers less than or equal to the total number of stores of the plurality of stores
  • i ⁇ j are natural numbers less than or equal to the total number of stores of the plurality of stores
  • i ⁇ j is natural numbers less than or equal to the total number of stores of the plurality of stores
  • i ⁇ j is natural numbers less than or equal to the total number of stores of the plurality of stores
  • i ⁇ j
  • the center of the quotient can be determined by plotting the ⁇ - ⁇ profile.
  • Fig. 3 shows an example of a ⁇ - ⁇ profile.
  • the abscissa in the ⁇ - ⁇ distribution map is ⁇ and the ordinate is ⁇ .
  • the quasi-center has a higher local density value ⁇ and a higher high-density distance ⁇ , and therefore, the points in the upper right portion of the graph are located in the ⁇ - ⁇ distribution map. May be the cluster center.
  • points of different gradations other than black dots may be the center of the quotient circle.
  • Fig. 4 schematically shows an example of a gamma distribution map. As shown in FIG. 4, the ordinate is ⁇ , and the abscissa is the serial number of the shop i sorted according to the ⁇ i size, wherein each serial number corresponds to one store. The larger ⁇ , the larger the representative ⁇ * ⁇ , that is, the shop corresponding to the point becomes more likely to become the center of the business circle.
  • step S23 the business circle determination information of each store is acquired based on the current threshold values of ⁇ , ⁇ , and ⁇ .
  • the shop that becomes the center of the business circle is determined based on the threshold values of the set ⁇ , ⁇ , and ⁇ .
  • a broken line perpendicular to the ⁇ axis represents a threshold of ⁇
  • a broken line perpendicular to the ⁇ axis represents a threshold of ⁇
  • a broken line perpendicular to the ⁇ axis represents a threshold of ⁇ .
  • the point in the upper right of the intersection of the ⁇ threshold line and the ⁇ threshold line is the center of the first quotient circle
  • the point above the ⁇ threshold line is the second quotient circle. center.
  • each of the plurality of stores is clustered, that is, divided into a certain business circle.
  • each store is classified into a business circle to which the store point closest to it and whose density is higher than it belongs.
  • each store is categorized into the business district of the nearest business district center. When the store is too far away from any store with a higher density than the store, or if the store is too far away from the center of any commercial circle, for example, if it is more than 2km, the store may be considered as a scattered store and does not belong to any business district. Or the store may be considered to be a business district with a business circle identification number of zero. Thereby, the business circle determination information of each store is obtained.
  • step S24 the similarity of all the business circle determination information with respect to all the business circle label information is calculated by using the business circle determination information and the business circle label information of the plurality of stores.
  • the similarity is a degree indicating that the business circle determination information is similar to the business circle annotation information.
  • the similarity can be calculated in various forms to evaluate the judgment result. For example, Precision, Recall, AUC score, log loss, Accuracy, etc. can all be used to indicate similarity.
  • the similarity is expressed by a WFS score, wherein the WFS is calculated by the formula (5),
  • i is an integer from 0 to A
  • j is an integer from 0 to B
  • A is the number of quotients in the circle
  • B is the number of quotients in the quotient
  • N i is the number of stores in the ith circle
  • N is the total number of stores of the plurality of stores
  • P ij is the accuracy rate for the i-th labeled business circle and the j-th determined business circle
  • R ij is the recall rate for the i-th labeled business circle and the j-th determined business circle, which will include a collection of labeled scattered stores Set to the 0th labeled business circle, and set the set of the determined scattered stores as the 0th determination business circle, wherein the marked scattered shop is an annotated shop that does not belong to any labeled business circle, and the determined scattered shop is A judgment shop that does not belong to any judgment business circle.
  • P ij can be calculated by the following formula (7)
  • R ij can be calculated by the following formula (8):
  • i is an integer from 0 to A
  • j is an integer from 0 to B
  • A is the number of quotients in the labeled business circle
  • B is the number of quotients in the determination of the business circle.
  • x ij is the number of stores in the i-th labeled business circle that are assigned to the j-th decision business circle.
  • step S25 the respective threshold values of ⁇ , ⁇ , and ⁇ are adjusted such that the similarity is improved.
  • the at least one threshold line may be moved above and below at least one threshold line of the three threshold lines in FIGS. 3 and 4 to obtain an adjusted threshold.
  • the adjusted business circle determination information of each store is acquired based on the adjusted threshold values of ⁇ , ⁇ , and ⁇ , and the information and adjustment are marked by the business circle.
  • the subsequent business circle determination information calculates the adjusted similarity.
  • the threshold line may be moved a plurality of times in a direction in which the similarity becomes large according to the change in the similarity, so that the respective threshold values of ⁇ , ⁇ , and ⁇ are continuously adjusted, so that the similarity is continuously increased.
  • the respective thresholds of ⁇ , ⁇ , and ⁇ are adjusted such that the similarity is maximized, thereby obtaining respective thresholds of ⁇ , ⁇ , and ⁇ .
  • step S5 is a flowchart of a method for determining a business circle according to an embodiment of the present specification, including the steps of: acquiring, in step S51, respective location information of a plurality of stores within a predetermined geographic range; and in step S52, according to a CFSFDP clustering algorithm Using the position information, calculating the value of the local density ⁇ of each store, the value of the minimum distance ⁇ with the higher density store, and the value of the product ⁇ ; and, in step S53, by determining the model according to the training circle described above
  • the adjusted threshold values of ⁇ , ⁇ , and ⁇ obtained by the method determine a business circle for the plurality of stores.
  • the predetermined geographic range may be a predetermined city range, that is, the business circle is determined in units of cities.
  • the process of the step S52 is substantially the same as the step S22 of FIG. 2, and the process of the step S53 is substantially the same as the step S23 of FIG. 2, and details are not described herein again.
  • step S61 acquiring first location information of each of a plurality of first stores within a predetermined geographic range and each first store a first distance between the two; at step S62, acquiring second location information of each of the at least one second store within the predetermined geographic range; and in step S63, calculating the location by using the first location information and the second location information a second distance between each of the second stores, a third distance between any one of the second stores and any of the first stores; and in step S64, based on the CFSFDP clustering algorithm, based on the first distance, a second distance and a third distance, calculating a value of a local density ⁇ of each of the plurality of first stores and at least one second store, a value of a minimum distance ⁇ with a higher density store, and a value of a product ⁇ ; Step S65, determining a business circle for the plurality of first stores and at least
  • the method shown in Figure 6 is an incremental iterative approach. As offline merchants continue to expand and the number of stores continues to expand, the direct calculation of the distance matrix will face the computational complexity of O(N 2 ). Therefore, by the method shown in FIG. 6, the amount of calculation is reduced to speed up the calculation.
  • step S61 first location information of each of the plurality of first stores within the predetermined geographic range and a first distance between the respective first stores are acquired.
  • the predetermined geographic extent may be, for example, a predetermined city.
  • location information of a plurality of stores may be acquired in the initial month M 0 , and the distance between the respective stores may be calculated as described above, thereby acquiring the distance matrix N 0 .
  • add at least one store or multiple stores already have the location information has been changed at the shop.
  • the store that is not related to the second store among the plurality of stores is the first store, or x old .
  • the second store is a new store
  • the first store to acquire all shops M 0 months.
  • the shop has been changed to the second store stores position occurs, the first store is removed from the position changing all stores acquired in the rest of the store M 0 month shop.
  • a first plurality of available store respective first and the respective position information of the first store from the distance matrix N 0 store position information acquired and calculated in the M 0 months
  • the first distance between the first distance is the distance between the store x old and the old .
  • step S62 second location information of each of the at least one second store within the predetermined geographic range is acquired.
  • the location information of the newly added second store is acquired.
  • the second store location information in M January has changed in the shops, to obtain location information after the change of the second shop.
  • step S63 using the first location information and the second location information, calculating a second distance between the second stores, and a third between any of the second stores and any of the first stores distance. That is, using the first position information and second position information, calculating a third distance between the second and the distance between the new x x x new x-new and old.
  • step S64 according to the CFSFDP clustering algorithm, calculating a value of a local density ⁇ of each of the plurality of first stores and at least one second store based on the first distance, the second distance, and the third distance, and higher The value of the minimum distance ⁇ of the density store and the value of the product ⁇ .
  • the first distance, the second distance, and the third distance together form a new distance matrix, so that the value of the local density ⁇ of the plurality of stores including the first store and the second store, and the higher density store can be calculated as described above
  • step S65 the business circle is determined for the plurality of first stores and the at least one second store by the adjusted threshold values of ⁇ , ⁇ , and ⁇ obtained by the method of training the business circle determination model.
  • This step is basically the same as step S23 in FIG. 2 and step S53 in FIG. 5, and details are not described herein again.
  • the above method of updating the business circle determination may be performed once every predetermined time period, for example, once a month, so that the determination of the business circle may be periodically updated. And the update method reduces the computational complexity of at least two orders of magnitude.
  • FIG. 7 shows an apparatus 700 for training a business circle determination model according to an embodiment of the present specification, including: a first acquisition unit 71 configured to acquire respective location information of a plurality of stores within a predetermined geographic range and the plurality of The respective business circle label information of the stores; the first calculating unit 72 is configured to calculate the value of the local density ⁇ of each store and the minimum distance ⁇ with the higher density store by using the position information according to the CFSFDP clustering algorithm.
  • the second obtaining unit 73 is configured to acquire the business circle determination information of each store according to the current threshold of each of ⁇ , ⁇ , and ⁇ ;
  • the second calculating unit 74 is configured to utilize the plurality of The business district determination information and the business circle annotation information of each store, calculating the similarity of all the business circle determination information with respect to all the business circle annotation information;
  • the threshold adjustment unit 75 configured to adjust ⁇ , ⁇ , and ⁇ The respective thresholds increase the similarity.
  • FIG. 8 shows an apparatus 800 for determining a business circle according to an embodiment of the present specification, comprising: an obtaining unit 81 configured to acquire respective location information of a plurality of stores within a predetermined geographical range; and a calculating unit 82 configured to Calculating, according to the CFSFDP clustering algorithm, the value of the local density ⁇ of each store, the value of the minimum distance ⁇ with the higher density store, and the value of the product ⁇ ; and the determining unit 83 configured to pass The adjusted threshold values of ⁇ , ⁇ , and ⁇ obtained by the method for training the business circle determination model are determined for the plurality of stores.
  • FIG. 9 illustrates an apparatus 900 for updating a business circle determination, including: a first obtaining unit 91 configured to acquire first location information of each of a plurality of first stores within a predetermined geographic range and between respective first stores
  • the first obtaining unit 92 is configured to acquire second location information of each of the at least one second store in the predetermined geographic range;
  • the first calculating unit 93 is configured to use the first location information And the second location information, calculating a second distance between the second stores, a third distance between any one of the second stores and any of the first stores;
  • the second calculating unit 94 is configured to: Calculating, according to the CFSFDP clustering algorithm, a value of a local density ⁇ of each of the plurality of first stores and at least one second store based on the first distance, the second distance, and the third distance, and a minimum value of the higher density store a value of the distance ⁇ and a value of the product ⁇ thereof; and a determining unit 95 configured to adjust the threshold values of ⁇ ,
  • the quotient determination model according to an embodiment of the present specification can be evaluated by calculating a Sil score.
  • the SIL score can be calculated by the following formulas (9)-(11):
  • c k represents the set of the kth clustering result
  • a(i) represents the average distance of point i to all points in the circle
  • b(i) represents the average distance of point i to all points in the closest quotient circle p.
  • the method of the embodiment of the present specification only needs to obtain the geographical location information of the full amount of shops at the input end, and can determine the business circle for it without the need for manual one-to-one determination.
  • the coverage of the business circle by the measured method can reach 92.5%, wherein Covered stores are basically isolated points or dirty data points.
  • the CFSFDP algorithm used in the method of the embodiment of the present specification does not need to be defined in advance, but directly obtains the circle of the business circle by the method of threshold definition.
  • the method of the embodiment of the present specification enhances stability from two aspects: firstly, the optimal parameter is pre-trained by using the known high accuracy rate labeling business circle information to ensure the stability of the parameter; secondly, in the case of stable parameters, Using the threshold-defined way to obtain the business circle center can ensure that the business circle finds stable results when the data is unchanged or the change is small.
  • the method of the embodiment of the present specification introduces a distance matrix based on urban partitioning and incremental iteration, and utilizes the time series of store evolution, which effectively reduces the computational complexity, and the calculation time in the actual measurement is optimized by about 10 times.
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein may be implemented in hardware, in a software module in a processor orbit, or in a combination of the two.
  • the software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Provided in embodiments of the present description are a method and device for training a trading area determination model, a method and device for determining a trading area, and a method and device for updating trading area determination, wherein the method for training a trading area determination model comprises: acquiring location information respective to a plurality of shops in a predetermined geographical range and trading area labeling information respective to the plurality of shops; according to a CFSFDP clustering algorithm, utilizing the location information and calculating the value of a local density ρ of each shop, the value of a minimum distance δ to a shop that has a higher density, and the value of a product γ thereof; acquiring trading area determination information of each shop according to current thresholds respective to ρ, δ and γ; utilizing the trading area determination information and trading area labeling information respective to the plurality of shops, and calculating the degree of similarity between all of the trading area determination information relative to all of the trading area labeling information; and adjusting the thresholds respective to ρ, δ and γ such that the degree of similarity is increased.

Description

一种商圈判定方法和装置Method and device for determining business circle 技术领域Technical field
本说明书实施例涉及机器学习领域,更具体地,涉及一种训练商圈判定模型的方法和装置、判定商圈的方法和装置以及更新商圈判定的方法和装置。The embodiments of the present specification relate to the field of machine learning, and more particularly, to a method and apparatus for training a business circle determination model, a method and apparatus for determining a business circle, and a method and apparatus for updating a business circle determination.
背景技术Background technique
近年来,在“新零售”、“新金融”的背景下,线下店铺得到大力发展。相比于线上商户,线下商户拥有实体店,物以类聚,其在地理位置上具有一定的聚集性,即,可以将这些实体店划分到各个商圈中。通过商圈信息,可以加深对店铺的认识:识别线下行业市场、辅助判断门店经营状况等。目前已有的商圈信息包括:通过线下BD打标的商圈信息、以及主要来源于爬取的大众点评商圈结果的商圈信息。上述商圈信息都是通过人工标注获取。因此,需要一种更有效的判定商圈的方案。In recent years, under the background of “new retail” and “new finance”, offline stores have been vigorously developed. Compared with online merchants, offline merchants have physical stores, and objects are clustered. They have a certain concentration in geographical location, that is, they can be divided into various business districts. Through the business circle information, you can deepen your understanding of the store: identify the offline industry market, and assist in judging the operation status of the store. The existing business circle information includes: business circle information marked by offline BD, and business circle information mainly from the results of the popular public comment business circle. The above business circle information is obtained by manual labeling. Therefore, there is a need for a more effective solution for determining a business circle.
发明内容Summary of the invention
本说明书实施例旨在提供一种更有效的判定商圈的方案,以解决现有技术中的不足。The embodiments of the present specification aim to provide a more effective solution for determining a business circle to solve the deficiencies in the prior art.
为实现上述目的,本说明书一个方面提供一种训练商圈判定模型的方法,包括:获取预定地理范围内的多个店铺各自的位置信息和所述多个店铺各自的商圈标注信息;根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;根据ρ、δ和γ各自的当前阈值,获取各个店铺的商圈判定信息;利用所述多个店铺各自的商圈判定信息和商圈标注信息,计算全部所述商圈判定信息相对于全部所述商圈标注信息的相似度;以及调整ρ、δ及γ各自的阈值,使得所述相似度提高。In order to achieve the above object, an aspect of the present specification provides a method for training a business circle determination model, comprising: acquiring respective location information of a plurality of stores within a predetermined geographical range and respective business circle annotation information of the plurality of stores; according to CFSFDP a clustering algorithm, using the position information, calculating a value of a local density ρ of each store, a value of a minimum distance δ with a higher density store, and a value of a product γ; acquiring according to respective current thresholds of ρ, δ, and γ Business circle determination information of each store; calculating the similarity of all the business circle determination information with respect to all the business circle label information by using the business circle determination information and the business circle label information of the plurality of stores; and adjusting ρ The respective thresholds of δ and γ increase the similarity.
在一个实施例中,在所述训练商圈判定模型的方法中,所述多个店铺 中店铺i的局部密度ρ的值为ρ i,其中, In one embodiment, in the method for training a business circle determination model, a value of a local density ρ of a shop i in the plurality of stores is ρ i , wherein
ρ i=∑ jχ(d ij-d c), ρ i =∑ j χ(d ij -d c ),
其中,当d ij-d c<0时,χ(d ij-d c)=1,当d ij-d c≥0时,χ(d ij-d c)=0, Wherein, when d ij -d c <0, χ(d ij -d c )=1, when d ij -d c ≥0, χ(d ij -d c )=0,
其中d c为半径阈值,d ij为所述多个店铺中的店铺i与店铺j之间的距离,i和j为小于等于所述多个店铺的总店铺数的自然数,且i≠j。 Where d c is a radius threshold, d ij is the distance between the store i and the store j in the plurality of stores, and i and j are natural numbers less than or equal to the total number of stores of the plurality of stores, and i ≠ j.
在一个实施例中,在所述训练商圈判定模型的方法中,所述多个店铺中店铺i的局部密度ρ的值为ρ i,其中, In one embodiment, in the method for training a business circle determination model, a value of a local density ρ of a shop i in the plurality of stores is ρ i , wherein
Figure PCTCN2018119319-appb-000001
Figure PCTCN2018119319-appb-000001
其中d c为半径阈值,d ij为所述多个店铺中的店铺i与店铺j之间的距离,i和j为小于等于所述多个店铺的总店铺数的自然数,且i≠j。 Where d c is a radius threshold, d ij is the distance between the store i and the store j in the plurality of stores, and i and j are natural numbers less than or equal to the total number of stores of the plurality of stores, and i ≠ j.
在一个实施例中,在所述训练商圈判定模型的方法中,In one embodiment, in the method of training a business circle to determine a model,
店铺i和店铺j的位置信息分别以经纬度表示为(Lon i,Lat i)和(Lon j,Lat j),并且其距离d ij如下计算: The position information of the shop i and the shop j are expressed as latitude and longitude (Lon i , Lat i ) and (Lon j , Lat j ), respectively, and the distance d ij is calculated as follows:
Figure PCTCN2018119319-appb-000002
Figure PCTCN2018119319-appb-000002
其中R为地球半径。Where R is the radius of the Earth.
在一个实施例中,在所述训练商圈判定模型的方法中,所述相似度以参数WFS表示,其中,In an embodiment, in the method for training a business circle determination model, the similarity is represented by a parameter WFS, wherein
Figure PCTCN2018119319-appb-000003
Figure PCTCN2018119319-appb-000003
Figure PCTCN2018119319-appb-000004
Figure PCTCN2018119319-appb-000004
其中i为0到A的整数,j为0到B的整数,A为标注商圈的商圈数,B为判定商圈的商圈数,N i为第i个标注商圈包含的店铺数,N为所述多个店铺的总店铺数, Where i is an integer from 0 to A, j is an integer from 0 to B, A is the number of quotients in the circle, B is the number of quotients in the quotient, and N i is the number of stores in the ith circle , N is the total number of stores of the plurality of stores,
P ij为关于第i个标注商圈和第j个判定商圈的精确率,R ij为关于第i 个标注商圈和第j个判定商圈的召回率,其中将包括标注零散店铺的集合设定为第0个标注商圈,将包括判定零散店铺的集合设定为第0个判定商圈,其中所述标注零散店铺为不属于任何标注商圈的标注店铺,所述判定零散店铺为不属于任何判定商圈的判定店铺。 P ij is the accuracy rate for the i-th labeled business circle and the j-th determined business circle, and R ij is the recall rate for the i-th labeled business circle and the j-th determined business circle, which will include a collection of labeled scattered stores Set to the 0th labeled business circle, and set the set of the determined scattered stores as the 0th determination business circle, wherein the marked scattered shop is an annotated shop that does not belong to any labeled business circle, and the determined scattered shop is A judgment shop that does not belong to any judgment business circle.
在一个实施例中,在所述训练商圈判定模型的方法中,调整ρ、δ及γ各自的阈值,使得所述相似度提高包括,调整ρ、δ及γ各自的阈值,使得所述相似度最大。In one embodiment, in the method for training a business circle determination model, adjusting respective threshold values of ρ, δ, and γ such that the similarity improvement includes adjusting respective threshold values of ρ, δ, and γ such that the similarity The greatest degree.
本说明书另一方面提供一种判定商圈的方法,包括:获取预定地理范围内的多个店铺各自的位置信息;根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;通过根据上述训练商圈判定模型的方法获取的ρ、δ及γ各自的调整后的阈值,针对所述多个店铺判定商圈。Another aspect of the present specification provides a method for determining a business circle, comprising: acquiring respective location information of a plurality of stores within a predetermined geographical range; and calculating a local density ρ of each store by using the location information according to a CFSFDP clustering algorithm. a value, a value of a minimum distance δ with a higher density store, and a value of the product γ; for each of the plurality of stores by the adjusted threshold values of ρ, δ, and γ obtained by the method of training the business circle determination model Determine the business circle.
在一个实施例中,在所述判定商圈的方法中,所述预定地理范围为预定城市。In one embodiment, in the method of determining a business circle, the predetermined geographic range is a predetermined city.
本说明书另一方面提供一种更新商圈判定的方法,包括:获取预定地理范围内的多个第一店铺各自的第一位置信息和各个第一店铺之间的第一距离;获取所述预定地理范围内的至少一个第二店铺各自的第二位置信息;利用所述第一位置信息和第二位置信息,计算所述各个第二店铺之间第二距离、第二店铺中任一店铺与第一店铺中任一店铺之间的第三距离;根据CFSFDP聚类算法,基于所述第一距离、第二距离和第三距离,计算所述多个第一店铺和至少一个第二店铺各自的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;以及通过根据上述训练商圈判定模型的方法获取的ρ、δ及γ各自的调整后的阈值,针对所述多个第一店铺和至少一个第二店铺判定商圈。Another aspect of the present specification provides a method for updating a business circle determination, comprising: acquiring respective first location information of a plurality of first stores within a predetermined geographic range and a first distance between each first store; acquiring the predetermined Second position information of each of the at least one second store in the geographical range; using the first location information and the second location information, calculating a second distance between the second stores, and any one of the second stores a third distance between any one of the first stores; calculating, according to the CFSFDP clustering algorithm, the plurality of first stores and the at least one second store based on the first distance, the second distance, and the third distance The value of the local density ρ, the value of the minimum distance δ with the higher density store, and the value of the product γ; and the adjusted threshold values of ρ, δ, and γ obtained by the method of training the business circle determination model, A business circle is determined for the plurality of first stores and at least one second store.
在一个实施例中,所述更新商圈判定的方法每预定时段执行一次。In one embodiment, the method of updating the business circle determination is performed every predetermined time period.
本说明书另一方面提供一种训练商圈判定模型的装置,包括:第一获取单元,配置为,获取预定地理范围内的多个店铺各自的位置信息和所述 多个店铺各自的商圈标注信息;第一计算单元,配置为,根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;第二获取单元,配置为,根据ρ、δ和γ各自的当前阈值,获取各个店铺的商圈判定信息;第二计算单元,配置为,利用所述多个店铺各自的商圈判定信息和商圈标注信息,计算全部所述商圈判定信息相对于全部所述商圈标注信息的相似度;以及阈值调整单元,配置为,调整ρ、δ及γ各自的阈值,使得所述相似度提高。In another aspect, the present disclosure provides an apparatus for training a business circle determination model, including: a first acquisition unit configured to acquire respective location information of a plurality of stores within a predetermined geographic range and respective business circle labels of the plurality of stores The first calculation unit is configured to calculate, according to the CFSFDP clustering algorithm, a value of a local density ρ of each store, a value of a minimum distance δ with a higher density store, and a value of a product γ using the position information; The second obtaining unit is configured to acquire the business circle determination information of each store according to the current threshold value of each of ρ, δ, and γ; the second calculating unit is configured to determine the information and the quotient by using the respective business circle of the plurality of stores The circle labeling information calculates a similarity of all the merchant circle determination information with respect to all of the merchant circle labeling information; and the threshold value adjusting unit is configured to adjust respective threshold values of ρ, δ, and γ such that the similarity is improved.
本说明书另一方面提供一种判定商圈的装置,包括:获取单元,配置为,获取预定地理范围内的多个店铺各自的位置信息;计算单元,配置为,根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;以及判定单元,配置为,通过根据上述训练商圈判定模型的方法获取的ρ、δ及γ各自的调整后的阈值,针对所述多个店铺判定商圈。Another aspect of the present disclosure provides an apparatus for determining a business circle, comprising: an obtaining unit configured to acquire respective location information of a plurality of stores within a predetermined geographical range; and a calculating unit configured to use the CFSFDP clustering algorithm according to the CFSFDP clustering algorithm The position information is calculated, the value of the local density ρ of each store, the value of the minimum distance δ with the higher density store, and the value of the product γ are calculated; and the determining unit is configured to obtain by the method according to the above-described training quotation determination model The adjusted threshold values of ρ, δ, and γ are each determined for the plurality of stores.
本说明书另一方面提供一种更新商圈判定的装置,包括:第一获取单元,配置为,获取预定地理范围内的多个第一店铺各自的第一位置信息和各个第一店铺之间的第一距离;第二获取单元,配置为,获取所述预定地理范围内的至少一个第二店铺各自的第二位置信息;第一计算单元,配置为,利用所述第一位置信息和第二位置信息,计算所述各个第二店铺之间第二距离、第二店铺中任一店铺与第一店铺中任一店铺之间的第三距离;第二计算单元,配置为,根据CFSFDP聚类算法,基于所述第一距离、第二距离和第三距离,计算所述多个第一店铺和至少一个第二店铺各自的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;以及判定单元,配置为,通过根据上述训练商圈判定模型的方法获取的ρ、δ及γ各自的调整后的阈值,针对所述多个第一店铺和至少一个第二店铺判定商圈。In another aspect, the present disclosure provides an apparatus for updating a business circle determination, including: a first obtaining unit configured to acquire first location information of each of a plurality of first stores within a predetermined geographic range and between respective first stores a first distance unit, configured to acquire respective second location information of the at least one second store in the predetermined geographic range; the first calculating unit is configured to utilize the first location information and the second Position information, calculating a second distance between the respective second stores, a third distance between any one of the second stores and any one of the first stores; and a second calculating unit configured to cluster according to CFSFDP And calculating, based on the first distance, the second distance, and the third distance, a value of a local density ρ of each of the plurality of first stores and at least one second store, and a value of a minimum distance δ from a higher density store And a value of the product γ; and a determining unit configured to adjust the threshold values of ρ, δ, and γ obtained by the method of training the business circle determination model for the plurality of And at least one second store stores determination district.
通过根据本说明书实施例的判定商圈的方案,可快速准确地判定商圈,同时可保证判定结果的稳定性。并且,本说明书实施例还有效降低了计算 复杂度,优化了计算时间。By determining the business circle scheme according to the embodiment of the present specification, the business circle can be quickly and accurately determined, and the stability of the determination result can be ensured. Moreover, the embodiments of the present specification also effectively reduce the computational complexity and optimize the calculation time.
附图说明DRAWINGS
通过结合附图描述本说明书实施例,可以使得本说明书实施例更加清楚:The embodiments of the present specification can be more clearly understood by describing the embodiments of the specification with reference to the accompanying drawings:
图1示出了根据本说明书实施例的用于判定商圈的系统100的示意图;FIG. 1 shows a schematic diagram of a system 100 for determining a business circle in accordance with an embodiment of the present specification;
图2示出了根据本说明书实施例的训练商圈判定模型的方法流程图;2 is a flow chart showing a method of training a business circle determination model according to an embodiment of the present specification;
图3示意示出了ρ -δ分布图的示例; Figure 3 schematically shows an example of a ρ - δ profile;
图4示意示出了γ分布图的示例;Figure 4 schematically shows an example of a gamma distribution map;
图5示出了根据本说明书实施例的判定商圈的方法流程图;FIG. 5 is a flowchart showing a method of determining a business circle according to an embodiment of the present specification;
图6示出了根据本说明书实施例的一种更新商圈判定的方法流程图;6 shows a flow chart of a method for updating a business circle determination according to an embodiment of the present specification;
图7示出了根据本说明书实施例的一种训练商圈判定模型的装置700;Figure 7 illustrates an apparatus 700 for training a business circle decision model in accordance with an embodiment of the present specification;
图8示出了根据本说明书实施例的一种判定商圈的装置800;以及Figure 8 illustrates an apparatus 800 for determining a business circle in accordance with an embodiment of the present specification;
图9示出了根据本说明书实施例的一种更新商圈判定的装置900。FIG. 9 illustrates an apparatus 900 for updating a business circle determination in accordance with an embodiment of the present specification.
具体实施方式detailed description
下面将结合附图描述本说明书实施例。Embodiments of the present specification will be described below with reference to the drawings.
图1示出了根据本说明书实施例的用于判定商圈的系统100的示意图。如图1所示,系统100包括聚类模块11、评估模块12和阈值调整模块13。在训练阶段,将训练样本输入到聚类模块11。训练样本包括店铺各自的位置信息和店铺各自的商圈标注信息。聚类模块11根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值,根据ρ、δ和γ各自的当前阈值,获取各个店铺的商圈判定信息。之后,聚类模块11将上述商圈判定信息传送给评估模块12。评估模块12利用所述多个店铺各自的商圈判定信息和商圈标注信息,计算全部所述商圈判定信息相对于全部所述商圈标注信息的相似度作为评估分数,并将该评估分数传送给阈值调整模块13。阈值调整模块13 根据评估分数对聚类模块11中的参数ρ、δ和γ的阈值进行调整,以提高评估分数,并在多次调整之后,使得评估分数达到最大。在通过对聚类模块11的训练达到其最优的参数阈值之后,可通过该距离模块11对全量店铺信息进行聚类,以获取商圈判定结果。FIG. 1 shows a schematic diagram of a system 100 for determining a business circle in accordance with an embodiment of the present specification. As shown in FIG. 1, system 100 includes a clustering module 11, an evaluation module 12, and a threshold adjustment module 13. In the training phase, the training samples are input to the clustering module 11. The training sample includes the location information of the store and the respective business circle labeling information of the store. The clustering module 11 calculates the value of the local density ρ of each store, the value of the minimum distance δ with the higher density store, and the value of the product γ according to the CFSFDP clustering algorithm according to the position information, according to ρ, δ, and γ. The respective current thresholds are used to obtain the business circle determination information of each store. Thereafter, the clustering module 11 transmits the above-mentioned business circle determination information to the evaluation module 12. The evaluation module 12 calculates the similarity of all the business circle determination information with respect to all the business circle annotation information as the evaluation score by using the respective business circle determination information and the business circle annotation information of the plurality of stores, and the evaluation score is It is transmitted to the threshold adjustment module 13. The threshold adjustment module 13 adjusts the thresholds of the parameters ρ, δ, and γ in the clustering module 11 according to the evaluation score to increase the evaluation score, and after the plurality of adjustments, maximizes the evaluation score. After the training of the clustering module 11 reaches its optimal parameter threshold, the full amount of store information can be clustered by the distance module 11 to obtain the business circle determination result.
图2示出了根据本说明书实施例的训练商圈判定模型的方法流程图。如图2所示,所述方法包括:在步骤S21,获取预定地理范围内的多个店铺各自的位置信息和所述多个店铺各自的商圈标注信息;在步骤S22,根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;在步骤S23,根据ρ、δ和γ各自的当前阈值,获取各个店铺的商圈判定信息;在步骤S24,利用所述多个店铺各自的商圈判定信息和商圈标注信息,计算全部所述商圈判定信息相对于全部所述商圈标注信息的相似度;以及在步骤S25,调整ρ、δ及γ各自的阈值,使得所述相似度提高。2 shows a flow chart of a method of training a business circle decision model in accordance with an embodiment of the present specification. As shown in FIG. 2, the method includes: acquiring, in step S21, location information of each of a plurality of stores in a predetermined geographical range and respective business circle annotation information of the plurality of stores; and in step S22, according to a CFSFDP clustering algorithm Using the position information, calculating the value of the local density ρ of each store, the value of the minimum distance δ with the higher density store, and the value of the product γ; in step S23, according to the respective threshold values of ρ, δ, and γ, Acquiring the business circle determination information of each store; in step S24, calculating the similarity of all the business circle determination information with respect to all the business circle annotation information by using the business circle determination information and the business circle annotation information of each of the plurality of stores And in step S25, the respective thresholds of ρ, δ, and γ are adjusted such that the similarity is improved.
首先,在步骤S21,获取预定地理范围内的多个店铺各自的位置信息和所述多个店铺各自的商圈标注信息。预定地理范围例如可以是包括100多个商圈的地理范围,例如城市的区、县等。为了参数的准确性,多个店铺的店铺数例如可以为几千的量级,例如3000个。优选地,所述多个店铺覆盖的多个商圈包括多种位置关系,例如,商圈与商圈相邻、相交、远离等。店铺的位置信息可以以已知的各种形式表示,例如,店铺的位置信息可以是店铺经纬度,或者,店铺的位置信息可以是城市坐标等。店铺的商圈标注信息包括:店铺是否属于某个商圈,店铺属于哪个商圈等。例如,可以以标注商圈字段表示店铺的商圈标注信息,当该字段为0时,表示店铺为不属于任何商圈的零散店铺,当该字段为自然数时,表示该店铺属于以该自然数标识的商圈。First, in step S21, location information of each of a plurality of stores within a predetermined geographical range and business circle annotation information of each of the plurality of stores are acquired. The predetermined geographic extent may be, for example, a geographic extent including more than 100 business districts, such as a district, county, etc. of the city. For the accuracy of the parameters, the number of stores in a plurality of stores may be, for example, on the order of several thousand, for example, 3,000. Preferably, the plurality of shopping districts covered by the plurality of stores include a plurality of location relationships, for example, the business circle is adjacent to the business circle, intersects, is away from, and the like. The location information of the store may be expressed in various known forms. For example, the location information of the store may be the latitude and longitude of the store, or the location information of the store may be the city coordinates or the like. The business district labeling information of the store includes whether the store belongs to a certain business circle, and which business district the store belongs to. For example, the business circle label information of the store may be indicated by the labeled business circle field. When the field is 0, the store is a scattered store that does not belong to any business circle. When the field is a natural number, it indicates that the store belongs to the natural number. Business district.
在步骤S22,根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值。In step S22, based on the CFSFDP clustering algorithm, the value of the local density ρ of each store, the value of the minimum distance δ with the higher density store, and the value of the product γ are calculated using the position information.
首先说明对本说明书实施例中使用的CFSFDP算法中的参数局部密度ρ的值的计算。First, the calculation of the value of the parameter local density ρ in the CFSFDP algorithm used in the embodiment of the present specification will be explained.
在一个实施例中,所述多个店铺中店铺i的局部密度ρ的值为ρ i,其中,通过以下公式(1)计算ρ iIn one embodiment, the value of the local density ρ of the shop i in the plurality of stores is ρ i , wherein ρ i is calculated by the following formula (1):
ρ i=∑ jχ(d ij-d c)       (1) ρ i =∑ j χ(d ij -d c ) (1)
其中,当d ij-d c<0时,χ(d ij-d c)=1,当d ij-d c≥0时,χ(d ij-d c)=0,其中d c为半径阈值,d ij为所述多个店铺中的店铺i与店铺j之间的距离,i和j为小于等于所述多个店铺的总店铺数的自然数,且i≠j。 Wherein, when d ij -d c <time 0, χ (d ij -d c ) = 1, when d ij -d c ≥0, χ ( d ij -d c) = 0, where d c is the radius of the threshold d ij is the distance between the shop i and the shop j among the plurality of stores, and i and j are natural numbers smaller than or equal to the total number of stores of the plurality of stores, and i ≠ j.
在本说明书实施例中,使用CFSFDP聚类算法来判定商圈,即对店铺点进行聚类,由于商圈的形状一般比较固定,因此将d c为设定为0.2(即200m)。通过该设定,极大地提高了聚类结果的稳定性。所述店铺i与店铺j之间的距离d ij可根据店铺位置信息的不同形式而采用不同的公式。例如,在店铺i和店铺j的位置信息分别以经纬度表示为(Lon i,Lat i)和(Lon j,Lat j)时,通过如下公式(2)计算距离d ijIn the embodiment of the present specification, the CFSFDP clustering algorithm is used to determine the business circle, that is, clustering the shop points. Since the shape of the business circle is generally fixed, d c is set to 0.2 (ie, 200 m). With this setting, the stability of the clustering result is greatly improved. The distance d ij between the store i and the store j may adopt different formulas according to different forms of the store location information. For example, when the position information of the shop i and the shop j are expressed as latitude and longitude (Lon i , Lat i ) and (Lon j , Lat j ), respectively, the distance d ij is calculated by the following formula (2):
Figure PCTCN2018119319-appb-000005
Figure PCTCN2018119319-appb-000005
通过公式(2)计算的d ij是球面上两点间的距离,其中R为地球半径,可取平均值6371km。在另一个实例中,d ij可以是欧式距离、明氏(Minkowsky)距离、马氏(Manhattan)距离等。在另一个实例中,店铺i和店铺j的位置信息分别以城市坐标系中的三维或二维坐标表示,从而可通过店铺i和店铺j的坐标计算其之间的例如欧式距离作为d ij。在计算中,可通过例如上述公式(2)计算各个d ij,从而获得距离矩阵。 The d ij calculated by the formula (2) is the distance between two points on the spherical surface, where R is the radius of the earth, and the average value is 6371 km. In another example, d ij can be Euclidean distance, Minkowsky distance, Manhattan distance, and the like. In another example, the location information of the store i and the store j are respectively represented in three-dimensional or two-dimensional coordinates in the city coordinate system, so that, for example, the Euclidean distance between the store i and the store j can be calculated as d ij . In the calculation, each d ij can be calculated by, for example, the above formula (2), thereby obtaining a distance matrix.
根据公式(1),当店铺i与店铺j之间的距离小于d c(即0.2(km))时,即,当店铺j在店铺i的200m半径范围内时,χ(d ij-d c)的值为1,当店铺j在店铺i的200m半径范围以外时,χ(d ij-d c)的值为0。也就是说,这里的ρ i为距离店铺i的一定半径(200m)范围内的店铺数量。 According to the formula (1), when the distance between the shop i and the shop j is smaller than d c (ie, 0.2 (km)), that is, when the shop j is within the radius of 200 m of the shop i, χ (d ij -d c The value of ") is 1, and when the shop j is outside the 200m radius of the shop i, the value of χ(d ij -d c ) is 0. That is to say, ρ i here is the number of stores within a certain radius (200 m) from the shop i.
在另一个优选实施例中,引入基于高斯核函数的距离计算公式,通过 以下公式(3)计算ρ iIn another preferred embodiment, a distance calculation formula based on a Gaussian kernel function is introduced, and ρ i is calculated by the following formula (3),
Figure PCTCN2018119319-appb-000006
Figure PCTCN2018119319-appb-000006
其中d c为半径阈值,d ij为所述多个店铺中的店铺i与店铺j之间的距离,i和j为小于等于所述多个店铺的总店铺数的自然数,且i≠j。其中d c和d ij可与上述公式(1)中的相同参数同样地获取。通过公式(3)计算的局部密度ρ i表示的是关于d ij(d ij<d c)与d c的比值的函数,其可视为在店铺i的200m范围内的店铺的距离增益之和。通过公式(3)更合理准确地描述了店铺的局部密度。 Where d c is a radius threshold, d ij is the distance between the store i and the store j in the plurality of stores, and i and j are natural numbers less than or equal to the total number of stores of the plurality of stores, and i ≠ j. Where d c and d ij can be obtained in the same manner as the same parameters in the above formula (1). The local density ρ i calculated by the formula (3) represents a function of the ratio of d ij (d ij <d c ) to d c , which can be regarded as the sum of the distance gains of the shops in the range of 200 m of the shop i . The local density of the store is more accurately and accurately described by the formula (3).
接下来,说明对本说明书实施例中使用的CFSFDP算法中的参数与更高密度店铺的最小距离δ的值的计算。店铺i的δ的值可表示为δ i,可通过以下公式(4)计算δ iNext, the calculation of the value of the parameter in the CFSFDP algorithm used in the embodiment of the present specification and the minimum distance δ of the higher density shop will be explained. Store the value [delta] i may be represented as δ i, can (4) [delta] i is calculated by the following equation:
Figure PCTCN2018119319-appb-000007
Figure PCTCN2018119319-appb-000007
通过公式(4)可以得出,当ρ i为最高密度的店铺点时,δ i为d ij(其中j为除i之外的任一个店铺)中的最大值,当ρ i不是最高密度的店铺点时,δ i等于店铺i与更高密度的店铺之间的距离中的最小距离。 From equation (4), it can be concluded that when ρ i is the highest density shop point, δ i is the maximum value of d ij (where j is any store other than i), when ρ i is not the highest density At the shop point, δ i is equal to the minimum distance between the shop i and the higher density store.
在确定各个店铺点的ρ i和δ i之后,可通过绘制ρ -δ分布图来确定商圈中心。图3示出了ρ -δ分布图的示例。如图3所示,ρ -δ分布图中的横坐标为ρ,纵坐标为δ。如本领域技术人员所知,在CFSFDP算法中,类中心具有较高的局部密度值ρ和较高的高密度距离δ,因此,在ρ -δ分布图中位于图中右上部分中的点有可能是聚类中心。例如,在图3所示的ρ -δ分布图中,除黑色点之外的不同灰度的点都有可能是商圈中心。 After determining ρ i and δ i for each store point, the center of the quotient can be determined by plotting the ρ - δ profile. Fig. 3 shows an example of a ρ - δ profile. As shown in Fig. 3, the abscissa in the ρ - δ distribution map is ρ and the ordinate is δ. As known to those skilled in the art, in the CFSFDP algorithm, the quasi-center has a higher local density value ρ and a higher high-density distance δ, and therefore, the points in the upper right portion of the graph are located in the ρ - δ distribution map. May be the cluster center. For example, in the ρ - δ distribution map shown in FIG. 3, points of different gradations other than black dots may be the center of the quotient circle.
另外,在确定各个店铺点的ρ i和δ i之后,还可以计算其乘积γ i=ρ ii,然后,可对全部店铺各自的γ i进行排序,并在γ分布图中按照各个γ i从大到小的顺序绘制γ分布图。图4示意示出了γ分布图的示例。如图4所示,图中纵坐标为γ,横坐标为店铺i根据γ i大小进行排序后的序列号,其中每 个序列号对应一个店铺。γ越大,代表ρ*δ越大,即该点对应的店铺成为商圈中心的可能性较大。 In addition, after determining ρ i and δ i of each store point, the product γ i = ρ i * δ i can also be calculated, and then γ i of each store can be sorted, and each γ distribution map is γ i draws the γ distribution map in descending order. Fig. 4 schematically shows an example of a gamma distribution map. As shown in FIG. 4, the ordinate is γ, and the abscissa is the serial number of the shop i sorted according to the γ i size, wherein each serial number corresponds to one store. The larger γ, the larger the representative ρ*δ, that is, the shop corresponding to the point becomes more likely to become the center of the business circle.
再回到图2,在步骤S23,根据ρ、δ和γ各自的当前阈值,获取各个店铺的商圈判定信息。Returning to Fig. 2, in step S23, the business circle determination information of each store is acquired based on the current threshold values of ρ, δ, and γ.
在本说明书实施例中,通过结合例如图3的ρ -δ分布图和例如图4的γ分布图,根据设定的ρ、δ和γ的阈值,判定成为商圈中心的店铺。例如,在图3中,垂直于ρ轴的虚线代表ρ的阈值,垂直于δ轴的虚线代表δ的阈值,在图4中,垂直于γ轴的虚线代表γ的阈值。在画出阈值线之后,在图3中,在ρ阈值线与δ阈值线交点右上方中的点为第一商圈中心,在图4中,在γ阈值线上方的点为第二商圈中心。再结合图3和图4,取根据图3判定的第一商圈中心与根据图4判定的第二商圈中心的交集,从而获取最终作为商圈中心的店铺。 In the embodiment of the present specification, by combining, for example, the ρ - δ distribution map of FIG. 3 and the γ distribution map of FIG. 4, for example, the shop that becomes the center of the business circle is determined based on the threshold values of the set ρ, δ, and γ. For example, in FIG. 3, a broken line perpendicular to the ρ axis represents a threshold of ρ, and a broken line perpendicular to the δ axis represents a threshold of δ, and in FIG. 4, a broken line perpendicular to the γ axis represents a threshold of γ. After drawing the threshold line, in FIG. 3, the point in the upper right of the intersection of the ρ threshold line and the δ threshold line is the center of the first quotient circle, and in FIG. 4, the point above the γ threshold line is the second quotient circle. center. Referring to FIG. 3 and FIG. 4, the intersection of the first business circle center determined according to FIG. 3 and the second business circle center determined according to FIG. 4 is taken, thereby obtaining the store which is finally the center of the business circle.
在获取商圈中心之后,对所述多个店铺中的每个店铺进行聚类,即,将其划分到某个商圈中。具体是,在一个实施例中,将每个店铺归类到距离其最近的并且密度比其高的店铺点所属的商圈中。在一个实施例中,将每个店铺归类到距离其最近的商圈中心所述商圈中。当店铺距离任一密度比其高的店铺的距离太远,或者店铺距离任一商圈中心的距离太远时,例如超过2km时,可认为该店铺为零散店铺,不属于任一商圈,或者可认为该店铺属于商圈标识号为0的商圈。从而获得每个店铺的商圈判定信息。After acquiring the business circle center, each of the plurality of stores is clustered, that is, divided into a certain business circle. Specifically, in one embodiment, each store is classified into a business circle to which the store point closest to it and whose density is higher than it belongs. In one embodiment, each store is categorized into the business district of the nearest business district center. When the store is too far away from any store with a higher density than the store, or if the store is too far away from the center of any commercial circle, for example, if it is more than 2km, the store may be considered as a scattered store and does not belong to any business district. Or the store may be considered to be a business district with a business circle identification number of zero. Thereby, the business circle determination information of each store is obtained.
在步骤S24,利用所述多个店铺各自的商圈判定信息和商圈标注信息,计算全部所述商圈判定信息相对于全部所述商圈标注信息的相似度。In step S24, the similarity of all the business circle determination information with respect to all the business circle label information is calculated by using the business circle determination information and the business circle label information of the plurality of stores.
所述相似度是表示商圈判定信息与商圈标注信息相似的程度。可通过多种形式来计算相似度,以评估判定结果。例如,精确率(Precision)、召回率(Recall)、AUC分数、对数损失、准确率(Accuracy)等,都可以用来表示相似度。The similarity is a degree indicating that the business circle determination information is similar to the business circle annotation information. The similarity can be calculated in various forms to evaluate the judgment result. For example, Precision, Recall, AUC score, log loss, Accuracy, etc. can all be used to indicate similarity.
在本说明书一个实施例中,以WFS分数表示相似度,其中,通过公式(5)计算WFS,In an embodiment of the present specification, the similarity is expressed by a WFS score, wherein the WFS is calculated by the formula (5),
Figure PCTCN2018119319-appb-000008
Figure PCTCN2018119319-appb-000008
其中通过公式(6)计算f ijWhere f ij is calculated by the formula (6),
Figure PCTCN2018119319-appb-000009
Figure PCTCN2018119319-appb-000009
其中i为0到A的整数,j为0到B的整数,A为标注商圈的商圈数,B为判定商圈的商圈数,N i为第i个标注商圈包含的店铺数,N为所述多个店铺的总店铺数, Where i is an integer from 0 to A, j is an integer from 0 to B, A is the number of quotients in the circle, B is the number of quotients in the quotient, and N i is the number of stores in the ith circle , N is the total number of stores of the plurality of stores,
P ij为关于第i个标注商圈和第j个判定商圈的精确率,R ij为关于第i个标注商圈和第j个判定商圈的召回率,其中将包括标注零散店铺的集合设定为第0个标注商圈,将包括判定零散店铺的集合设定为第0个判定商圈,其中所述标注零散店铺为不属于任何标注商圈的标注店铺,所述判定零散店铺为不属于任何判定商圈的判定店铺。 P ij is the accuracy rate for the i-th labeled business circle and the j-th determined business circle, and R ij is the recall rate for the i-th labeled business circle and the j-th determined business circle, which will include a collection of labeled scattered stores Set to the 0th labeled business circle, and set the set of the determined scattered stores as the 0th determination business circle, wherein the marked scattered shop is an annotated shop that does not belong to any labeled business circle, and the determined scattered shop is A judgment shop that does not belong to any judgment business circle.
具体地,可通过以下公式(7)计算P ij,可通过以下公式(8)计算R ijSpecifically, P ij can be calculated by the following formula (7), and R ij can be calculated by the following formula (8):
Figure PCTCN2018119319-appb-000010
Figure PCTCN2018119319-appb-000010
Figure PCTCN2018119319-appb-000011
Figure PCTCN2018119319-appb-000011
其中,i为0到A的整数,j为0到B的整数,A为标注商圈的商圈数,B为判定商圈的商圈数。x ij为第i个标注商圈中的店铺被分配到第j个判定商圈中的店铺数。其中,假设包括零散标注店铺的集合为第0个标注商圈(即,i=0),假设包括零散判定店铺的集合为第0个判定商圈(即,j=0),以便于通过上述公式(7)和(8)统一计入零散店铺。 Where i is an integer from 0 to A, j is an integer from 0 to B, A is the number of quotients in the labeled business circle, and B is the number of quotients in the determination of the business circle. x ij is the number of stores in the i-th labeled business circle that are assigned to the j-th decision business circle. Wherein, it is assumed that the set including the scattered mark shop is the 0th labeled business circle (ie, i=0), and it is assumed that the set including the scattered judgment shop is the 0th decision business circle (ie, j=0), so as to pass the above Formulas (7) and (8) are counted in the scattered stores.
在步骤S25,调整ρ、δ及γ各自的阈值,使得所述相似度提高。In step S25, the respective threshold values of ρ, δ, and γ are adjusted such that the similarity is improved.
再参考图3和图4,可在图3和图4中的三条阈值线的至少一条阈值线的上下方移动所述至少一条阈值线,以获得调整后的阈值。在进行所述移 动之后,与上述步骤S23和S24中同样地,根据ρ、δ和γ各自的调整后的阈值,获取各个店铺的调整后的商圈判定信息,并利用商圈标注信息和调整后的商圈判定信息计算调整后的相似度。可依据所述相似度的变化,在使得相似度变大的方向上对所述阈值线进行多次移动,以不断地调整ρ、δ及γ各自的阈值,从而使得相似度不断提高。在一个实施例中,调整ρ、δ及γ各自的阈值,使得相似度最大,从而获得最优的ρ、δ及γ各自的阈值。Referring again to FIGS. 3 and 4, the at least one threshold line may be moved above and below at least one threshold line of the three threshold lines in FIGS. 3 and 4 to obtain an adjusted threshold. After the movement is performed, similarly to the above-described steps S23 and S24, the adjusted business circle determination information of each store is acquired based on the adjusted threshold values of ρ, δ, and γ, and the information and adjustment are marked by the business circle. The subsequent business circle determination information calculates the adjusted similarity. The threshold line may be moved a plurality of times in a direction in which the similarity becomes large according to the change in the similarity, so that the respective threshold values of ρ, δ, and γ are continuously adjusted, so that the similarity is continuously increased. In one embodiment, the respective thresholds of ρ, δ, and γ are adjusted such that the similarity is maximized, thereby obtaining respective thresholds of ρ, δ, and γ.
图5示出了根据本说明书实施例的判定商圈的方法流程图,包括以下步骤:在步骤S51,获取预定地理范围内的多个店铺各自的位置信息;在步骤S52,根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;以及,在步骤S53,通过根据上述训练商圈判定模型的方法获取的ρ、δ及γ各自的调整后的阈值,针对所述多个店铺判定商圈。5 is a flowchart of a method for determining a business circle according to an embodiment of the present specification, including the steps of: acquiring, in step S51, respective location information of a plurality of stores within a predetermined geographic range; and in step S52, according to a CFSFDP clustering algorithm Using the position information, calculating the value of the local density ρ of each store, the value of the minimum distance δ with the higher density store, and the value of the product γ; and, in step S53, by determining the model according to the training circle described above The adjusted threshold values of ρ, δ, and γ obtained by the method determine a business circle for the plurality of stores.
在图5所示的方法中,所述预定地理范围可以是预定城市范围,即,以城市为单位来判定商圈。步骤S52的过程与图2中的步骤S22基本相同,步骤S53的过程与图2中的步骤S23基本相同,在此不再赘述。In the method shown in FIG. 5, the predetermined geographic range may be a predetermined city range, that is, the business circle is determined in units of cities. The process of the step S52 is substantially the same as the step S22 of FIG. 2, and the process of the step S53 is substantially the same as the step S23 of FIG. 2, and details are not described herein again.
图6示出了根据本说明书实施例的一种更新商圈判定的方法流程图,包括:在步骤S61,获取预定地理范围内的多个第一店铺各自的第一位置信息和各个第一店铺之间的第一距离;在步骤S62,获取所述预定地理范围内的至少一个第二店铺各自的第二位置信息;在步骤S63,利用所述第一位置信息和第二位置信息,计算所述各个第二店铺之间第二距离、第二店铺中任一店铺与第一店铺中任一店铺之间的第三距离;在步骤S64,根据CFSFDP聚类算法,基于所述第一距离、第二距离和第三距离,计算所述多个第一店铺和至少一个第二店铺各自的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;以及在步骤S65,通过根据权利要求1-6中任一项所述的方法获取的ρ、δ及γ各自的调整后的阈值,针对所述多个第一店铺和至少一个第二店铺判定商圈。6 is a flowchart of a method for updating a business circle determination according to an embodiment of the present specification, including: in step S61, acquiring first location information of each of a plurality of first stores within a predetermined geographic range and each first store a first distance between the two; at step S62, acquiring second location information of each of the at least one second store within the predetermined geographic range; and in step S63, calculating the location by using the first location information and the second location information a second distance between each of the second stores, a third distance between any one of the second stores and any of the first stores; and in step S64, based on the CFSFDP clustering algorithm, based on the first distance, a second distance and a third distance, calculating a value of a local density ρ of each of the plurality of first stores and at least one second store, a value of a minimum distance δ with a higher density store, and a value of a product γ; Step S65, determining a business circle for the plurality of first stores and at least one second store by adjusting the respective thresholds of ρ, δ, and γ obtained by the method according to any one of claims 1-6.
图6所示的方法是一种增量迭代的方法。由于线下商户在不断拓展, 店铺数不断扩大,直接计算距离矩阵将面临O(N 2)的计算复杂度。因此,通过图6所示的方法,减少计算量,以加快计算速度。 The method shown in Figure 6 is an incremental iterative approach. As offline merchants continue to expand and the number of stores continues to expand, the direct calculation of the distance matrix will face the computational complexity of O(N 2 ). Therefore, by the method shown in FIG. 6, the amount of calculation is reduced to speed up the calculation.
具体地,首先,在步骤S61,获取预定地理范围内的多个第一店铺各自的第一位置信息和各个第一店铺之间的第一距离。所述预定地理范围例如可以是预定城市。在一个实例中,可在初始月份M 0,获取多个店铺的位置信息,并如上所述计算各个店铺之间的距离,从而获取距离矩阵N 0。而在M 0下一个月份M 1,新增了至少一个店铺,或者已有的多个店铺中有店铺的位置信息发生了变更。可将新增的店铺或者位置发生变更的店铺记为第二店铺,或者x ,从而上述多个店铺中与第二店铺无关的店铺为第一店铺,或者x 。例如,当第二店铺为新增店铺时,则第一店铺为在M 0月获取的全部店铺。在另一个实例中,当第二店铺为已有店铺的位置发生变更的店铺时,则第一店铺为从在M 0月获取的全部店铺除去该位置变更的店铺所剩下的店铺。 Specifically, first, in step S61, first location information of each of the plurality of first stores within the predetermined geographic range and a first distance between the respective first stores are acquired. The predetermined geographic extent may be, for example, a predetermined city. In one example, location information of a plurality of stores may be acquired in the initial month M 0 , and the distance between the respective stores may be calculated as described above, thereby acquiring the distance matrix N 0 . And in a month M 0 M 1, add at least one store or multiple stores already have the location information has been changed at the shop. It is possible to record the newly added store or the store whose location has been changed as the second store, or x new , and the store that is not related to the second store among the plurality of stores is the first store, or x old . For example, when the second store is a new store, the first store to acquire all shops M 0 months. In another example, when the shop has been changed to the second store stores position occurs, the first store is removed from the position changing all stores acquired in the rest of the store M 0 month shop.
当在M 1月对商圈判定进行更新时,可从在M 0月获取的店铺位置信息及计算的距离矩阵N 0直接获取多个第一店铺各自的第一位置信息和各个第一店铺之间的第一距离,所述第一距离即为店铺x 与x 之间的距离。 When M January determination to update the business district, a first plurality of available store respective first and the respective position information of the first store from the distance matrix N 0 store position information acquired and calculated in the M 0 months The first distance between the first distance is the distance between the store x old and the old .
在步骤S62,获取所述预定地理范围内的至少一个第二店铺各自的第二位置信息。在如上所述实例中,当在M 0下一个月份M 1,新增了至少一个第二店铺时,获取新增的第二店铺的位置信息。或者,当在M 1月,已有店铺中的第二店铺的位置信息发生变更时,获取第二店铺的变更后的位置信息。 In step S62, second location information of each of the at least one second store within the predetermined geographic range is acquired. In the example described above, when at least one second store is added in the next month M 1 of M 0 , the location information of the newly added second store is acquired. Alternatively, when the second store location information in M January, has changed in the shops, to obtain location information after the change of the second shop.
在步骤S63,利用所述第一位置信息和第二位置信息,计算所述各个第二店铺之间第二距离、第二店铺中任一店铺与第一店铺中任一店铺之间的第三距离。即,利用所述第一位置信息和第二位置信息,计算x 与x 之间的第二距离以及x 与x 之间的第三距离。 In step S63, using the first location information and the second location information, calculating a second distance between the second stores, and a third between any of the second stores and any of the first stores distance. That is, using the first position information and second position information, calculating a third distance between the second and the distance between the new x x x new x-new and old.
在步骤S64,根据CFSFDP聚类算法,基于所述第一距离、第二距离和第三距离,计算所述多个第一店铺和至少一个第二店铺各自的局部密度ρ 的值、与更高密度店铺的最小距离δ的值及其乘积γ的值。所述第一距离、第二距离和第三距离一起构成新的距离矩阵,从而可如上所述计算包括第一店铺和第二店铺的多个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值。In step S64, according to the CFSFDP clustering algorithm, calculating a value of a local density ρ of each of the plurality of first stores and at least one second store based on the first distance, the second distance, and the third distance, and higher The value of the minimum distance δ of the density store and the value of the product γ. The first distance, the second distance, and the third distance together form a new distance matrix, so that the value of the local density ρ of the plurality of stores including the first store and the second store, and the higher density store can be calculated as described above The value of the minimum distance δ and the value of its product γ.
最后,在步骤S65,通过根据上述训练商圈判定模型的方法获取的ρ、δ及γ各自的调整后的阈值,针对所述多个第一店铺和至少一个第二店铺判定商圈。该步骤与图2中的步骤S23和图5中的步骤S53基本相同,在此不再赘述。Finally, in step S65, the business circle is determined for the plurality of first stores and the at least one second store by the adjusted threshold values of ρ, δ, and γ obtained by the method of training the business circle determination model. This step is basically the same as step S23 in FIG. 2 and step S53 in FIG. 5, and details are not described herein again.
上述更新商圈判定的方法可每预定时段执行一次,例如每月执行一次,从而可定期更新对商圈的判定。并且该更新方法至少降低了两个量级的计算复杂度。The above method of updating the business circle determination may be performed once every predetermined time period, for example, once a month, so that the determination of the business circle may be periodically updated. And the update method reduces the computational complexity of at least two orders of magnitude.
图7示出了根据本说明书实施例的一种训练商圈判定模型的装置700,包括:第一获取单元71,配置为,获取预定地理范围内的多个店铺各自的位置信息和所述多个店铺各自的商圈标注信息;第一计算单元72,配置为,根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;第二获取单元73,配置为,根据ρ、δ和γ各自的当前阈值,获取各个店铺的商圈判定信息;第二计算单元74,配置为,利用所述多个店铺各自的商圈判定信息和商圈标注信息,计算全部所述商圈判定信息相对于全部所述商圈标注信息的相似度;以及阈值调整单元75,配置为,调整ρ、δ及γ各自的阈值,使得所述相似度提高。FIG. 7 shows an apparatus 700 for training a business circle determination model according to an embodiment of the present specification, including: a first acquisition unit 71 configured to acquire respective location information of a plurality of stores within a predetermined geographic range and the plurality of The respective business circle label information of the stores; the first calculating unit 72 is configured to calculate the value of the local density ρ of each store and the minimum distance δ with the higher density store by using the position information according to the CFSFDP clustering algorithm. a value and a value of the product γ; the second obtaining unit 73 is configured to acquire the business circle determination information of each store according to the current threshold of each of ρ, δ, and γ; the second calculating unit 74 is configured to utilize the plurality of The business district determination information and the business circle annotation information of each store, calculating the similarity of all the business circle determination information with respect to all the business circle annotation information; and the threshold adjustment unit 75 configured to adjust ρ, δ, and γ The respective thresholds increase the similarity.
图8示出了根据本说明书实施例的一种判定商圈的装置800,包括:获取单元81,配置为,获取预定地理范围内的多个店铺各自的位置信息;计算单元82,配置为,根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;以及判定单元83,配置为,通过根据上述训练商圈判定模型的方法获取的ρ、δ及γ各自的调整后的阈值,针对所述多个店铺判定商圈。FIG. 8 shows an apparatus 800 for determining a business circle according to an embodiment of the present specification, comprising: an obtaining unit 81 configured to acquire respective location information of a plurality of stores within a predetermined geographical range; and a calculating unit 82 configured to Calculating, according to the CFSFDP clustering algorithm, the value of the local density ρ of each store, the value of the minimum distance δ with the higher density store, and the value of the product γ; and the determining unit 83 configured to pass The adjusted threshold values of ρ, δ, and γ obtained by the method for training the business circle determination model are determined for the plurality of stores.
图9示出了一种更新商圈判定的装置900,包括:第一获取单元91,配置为,获取预定地理范围内的多个第一店铺各自的第一位置信息和各个第一店铺之间的第一距离;第二获取单元92,配置为,获取所述预定地理范围内的至少一个第二店铺各自的第二位置信息;第一计算单元93,配置为,利用所述第一位置信息和第二位置信息,计算所述各个第二店铺之间第二距离、第二店铺中任一店铺与第一店铺中任一店铺之间的第三距离;第二计算单元94,配置为,根据CFSFDP聚类算法,基于所述第一距离、第二距离和第三距离,计算所述多个第一店铺和至少一个第二店铺各自的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;以及判定单元95,配置为,通过根据上述训练商圈判定模型的方法获取的ρ、δ及γ各自的调整后的阈值,针对所述多个第一店铺和至少一个第二店铺判定商圈。FIG. 9 illustrates an apparatus 900 for updating a business circle determination, including: a first obtaining unit 91 configured to acquire first location information of each of a plurality of first stores within a predetermined geographic range and between respective first stores The first obtaining unit 92 is configured to acquire second location information of each of the at least one second store in the predetermined geographic range; the first calculating unit 93 is configured to use the first location information And the second location information, calculating a second distance between the second stores, a third distance between any one of the second stores and any of the first stores; and the second calculating unit 94 is configured to: Calculating, according to the CFSFDP clustering algorithm, a value of a local density ρ of each of the plurality of first stores and at least one second store based on the first distance, the second distance, and the third distance, and a minimum value of the higher density store a value of the distance δ and a value of the product γ thereof; and a determining unit 95 configured to adjust the threshold values of ρ, δ, and γ obtained by the method of training the business circle determination model for the plurality of first And at least a second shop shop determination shopping district.
可通过计算Sil分数来评估根据本说明书实施例的商圈判定模型。可通过以下公式(9)-(11)计算SIL分数:The quotient determination model according to an embodiment of the present specification can be evaluated by calculating a Sil score. The SIL score can be calculated by the following formulas (9)-(11):
Figure PCTCN2018119319-appb-000012
Figure PCTCN2018119319-appb-000012
a(i)=avg(d ij),x i,x j∈c k       (10), a(i)=avg(d ij ), x i , x j ∈c k (10),
b(i)=avg(d ij),x i∈c k,x j∈c p        (11), b(i)=avg(d ij ), x i ∈c k ,x j ∈c p (11),
其中c k表示第k个聚类结果的集合,a(i)表示点i到圈内所有点的平均距离,b(i)表示点i到距离最近的商圈p内所有点的平均距离。假设各个商圈大小一致,此时位于两商圈分界处的点对应s(i)值为0,因此评估整体系数时,统计s(i)大于0的店铺所占比即为有效分类店铺数的占比,作为最终的SIL分数。通过该SIL分数评估,验证了根据本说明书实施例的商圈判定模型拥有较好的判定性能。 Where c k represents the set of the kth clustering result, a(i) represents the average distance of point i to all points in the circle, and b(i) represents the average distance of point i to all points in the closest quotient circle p. Assuming that the sizes of the respective business districts are the same, the point corresponding to the s(i) value at the boundary of the two business districts is 0. Therefore, when the overall coefficient is evaluated, the proportion of stores with statistics s(i) greater than 0 is the number of valid classified stores. The percentage as the final SIL score. Through the SIL score evaluation, it was verified that the quotient circle judgment model according to the embodiment of the present specification has better judgment performance.
本说明书实施例方法只需在输入端获取全量店铺的地理位置信息,即可对其判定商圈,而无需人工的一一判定,经实测本方法的商圈覆盖率可达92.5%,其中未被覆盖店铺基本为孤立点或脏数据点。The method of the embodiment of the present specification only needs to obtain the geographical location information of the full amount of shops at the input end, and can determine the business circle for it without the need for manual one-to-one determination. The coverage of the business circle by the measured method can reach 92.5%, wherein Covered stores are basically isolated points or dirty data points.
本说明书实施例方法中使用的CFSFDP算法无需预先定义,而是通过阈值限定的方法直接求取商圈中心。The CFSFDP algorithm used in the method of the embodiment of the present specification does not need to be defined in advance, but directly obtains the circle of the business circle by the method of threshold definition.
本说明书实施例方法从两个方面增强了稳定性:首先,使用已知的高准确率标注商圈信息预先训练出最优参数,保证了参数的稳定性;其次,在参数稳定的情况下,使用阈值限定的方式获取商圈中心,可保证在数据不变或改变较小的情况下,商圈发现结果的稳定。The method of the embodiment of the present specification enhances stability from two aspects: firstly, the optimal parameter is pre-trained by using the known high accuracy rate labeling business circle information to ensure the stability of the parameter; secondly, in the case of stable parameters, Using the threshold-defined way to obtain the business circle center can ensure that the business circle finds stable results when the data is unchanged or the change is small.
另外,本说明书实施例方法引入基于城市分区和增量迭代的方式构建距离矩阵,利用了店铺演变的时序性,有效的降低了计算复杂度,实测中计算时间优化了约10倍。In addition, the method of the embodiment of the present specification introduces a distance matrix based on urban partitioning and incremental iteration, and utilizes the time series of store evolution, which effectively reduces the computational complexity, and the calculation time in the actual measurement is optimized by about 10 times.
本领域普通技术人员应该还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执轨道,取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art should further appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both, in order to clearly illustrate the hardware. Interchangeability with software, the components and steps of the various examples have been generally described in terms of functionality in the above description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the solution. Different methods may be used to implement the described functionality for each particular application, but such implementation should not be considered to be beyond the scope of the application.
结合本文中所公开的实施例描述的方法或算法的步骤可以用硬件、处理器执轨道的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of a method or algorithm described in connection with the embodiments disclosed herein may be implemented in hardware, in a software module in a processor orbit, or in a combination of the two. The software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The specific embodiments of the present invention have been described in detail with reference to the preferred embodiments of the present invention. All modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (20)

  1. 一种训练商圈判定模型的方法,包括:A method of training a business circle decision model, comprising:
    获取预定地理范围内的多个店铺各自的位置信息和所述多个店铺各自的商圈标注信息;Obtaining respective location information of a plurality of stores within a predetermined geographical range and respective business circle annotation information of the plurality of stores;
    根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;Calculating, according to the CFSFDP clustering algorithm, the value of the local density ρ of each store, the value of the minimum distance δ with the higher density store, and the value of the product γ;
    根据ρ、δ和γ各自的当前阈值,获取各个店铺的商圈判定信息;Obtaining the business circle determination information of each store according to the current threshold values of ρ, δ, and γ;
    利用所述多个店铺各自的商圈判定信息和商圈标注信息,计算全部所述商圈判定信息相对于全部所述商圈标注信息的相似度;以及Calculating, by using the business circle determination information and the business circle annotation information of each of the plurality of stores, the similarity of all the business circle determination information with respect to all the business circle annotation information;
    调整ρ、δ及γ各自的阈值,使得所述相似度提高。The respective thresholds of ρ, δ, and γ are adjusted such that the similarity is improved.
  2. 根据权利要求1所述的训练商圈判定模型的方法,其中,所述多个店铺中店铺i的局部密度ρ的值为ρ i,其中, The method of training a business circle determination model according to claim 1, wherein a value of a local density ρ of a shop i in the plurality of stores is ρ i , wherein
    ρ i=∑ jχ(d ij-d c), ρ i =∑ j χ(d ij -d c ),
    其中,当d ij-d c<0时,χ(d ij-d c)=1,当d ij-d c≥0时,χ(d ij-d c)=0, Wherein, when d ij -d c <0, χ(d ij -d c )=1, when d ij -d c ≥0, χ(d ij -d c )=0,
    d c为半径阈值,d ij为所述多个店铺中的店铺i与店铺j之间的距离,i和j为小于等于所述多个店铺的总店铺数的自然数,且i≠j。 d c is a radius threshold, d ij is the distance between the store i and the store j in the plurality of stores, and i and j are natural numbers less than or equal to the total number of stores of the plurality of stores, and i ≠ j.
  3. 根据权利要求1所述的训练商圈判定模型的方法,其中,所述多个店铺中店铺i的局部密度ρ的值为ρ i,其中, The method of training a business circle determination model according to claim 1, wherein a value of a local density ρ of a shop i in the plurality of stores is ρ i , wherein
    Figure PCTCN2018119319-appb-100001
    Figure PCTCN2018119319-appb-100001
    其中,d c为半径阈值,d ij为所述多个店铺中的店铺i与店铺j之间的距离,i和j为小于等于所述多个店铺的总店铺数的自然数,且i≠j。 Where d c is a radius threshold, d ij is the distance between the store i and the store j in the plurality of stores, and i and j are natural numbers less than or equal to the total number of stores of the plurality of stores, and i≠j .
  4. 根据权利要求2或3所述的训练商圈判定模型的方法,其中,A method of training a business circle determination model according to claim 2 or 3, wherein
    店铺i和店铺j的位置信息分别以经纬度表示为(Lon i,Lat i)和(Lon j,Lat j),并且其距离d ij如下计算: The position information of the shop i and the shop j are expressed as latitude and longitude (Lon i , Lat i ) and (Lon j , Lat j ), respectively, and the distance d ij is calculated as follows:
    Figure PCTCN2018119319-appb-100002
    Figure PCTCN2018119319-appb-100002
    其中,R为地球半径。Where R is the radius of the Earth.
  5. 根据权利要求1所述的训练商圈判定模型的方法,其中,所述相似度以参数WFS表示,其中,The method of training a business circle determination model according to claim 1, wherein the similarity is represented by a parameter WFS, wherein
    Figure PCTCN2018119319-appb-100003
    Figure PCTCN2018119319-appb-100003
    Figure PCTCN2018119319-appb-100004
    Figure PCTCN2018119319-appb-100004
    其中,i为0到A的整数,j为0到B的整数,A为标注商圈的商圈数,B为判定商圈的商圈数,N i为第i个标注商圈包含的店铺数,N为所述多个店铺的总店铺数, Where i is an integer from 0 to A, j is an integer from 0 to B, A is the number of quotients in the labeled business circle, B is the number of quotients in the determination of the business circle, and N i is the store included in the i-th labeled business circle Number, N is the total number of stores in the plurality of stores,
    P ij为关于第i个标注商圈和第j个判定商圈的精确率,R ij为关于第i个标注商圈和第j个判定商圈的召回率,其中将包括标注零散店铺的集合设定为第0个标注商圈,将包括判定零散店铺的集合设定为第0个判定商圈,其中所述标注零散店铺为不属于任何标注商圈的标注店铺,所述判定零散店铺为不属于任何判定商圈的判定店铺。 P ij is the accuracy rate for the i-th labeled business circle and the j-th determined business circle, and R ij is the recall rate for the i-th labeled business circle and the j-th determined business circle, which will include a collection of labeled scattered stores Set to the 0th labeled business circle, and set the set of the determined scattered stores as the 0th determination business circle, wherein the marked scattered shop is an annotated shop that does not belong to any labeled business circle, and the determined scattered shop is A judgment shop that does not belong to any judgment business circle.
  6. 根据权利要求1所述的训练商圈判定模型的方法,其中调整ρ、δ及γ各自的阈值,使得所述相似度提高包括,调整ρ、δ及γ各自的阈值,使得所述相似度最大。The method for training a business circle determination model according to claim 1, wherein adjusting respective threshold values of ρ, δ, and γ such that said similarity improvement comprises adjusting respective threshold values of ρ, δ, and γ such that said similarity is maximum .
  7. 一种判定商圈的方法,包括:A method of determining a business circle, including:
    获取预定地理范围内的多个店铺各自的位置信息;Obtaining respective location information of a plurality of stores within a predetermined geographical range;
    根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;Calculating, according to the CFSFDP clustering algorithm, the value of the local density ρ of each store, the value of the minimum distance δ with the higher density store, and the value of the product γ;
    通过根据权利要求1-6中任一项所述的方法获取的ρ、δ及γ各自的调整后的阈值,针对所述多个店铺判定商圈。The business circle is determined for the plurality of stores by the adjusted threshold values of ρ, δ, and γ obtained by the method according to any one of claims 1-6.
  8. 根据权利要求7所述的判定商圈的方法,其中所述预定地理范围为预定城市。The method of determining a business circle according to claim 7, wherein said predetermined geographic range is a predetermined city.
  9. 一种更新商圈判定的方法,包括:A method for updating a business circle determination, comprising:
    获取预定地理范围内的多个第一店铺各自的第一位置信息和各个第一店铺之间的第一距离;Obtaining respective first location information of the plurality of first stores within the predetermined geographic range and a first distance between the respective first stores;
    获取所述预定地理范围内的至少一个第二店铺各自的第二位置信息;Obtaining second location information of each of the at least one second store within the predetermined geographic range;
    利用所述第一位置信息和第二位置信息,计算所述各个第二店铺之间第二距离、第二店铺中任一店铺与第一店铺中任一店铺之间的第三距离;Using the first location information and the second location information, calculating a second distance between the second stores, a third distance between any one of the second stores and any of the first stores;
    根据CFSFDP聚类算法,基于所述第一距离、第二距离和第三距离,计算所述多个第一店铺和至少一个第二店铺各自的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;以及Calculating, according to the CFSFDP clustering algorithm, a value of a local density ρ of each of the plurality of first stores and at least one second store based on the first distance, the second distance, and the third distance, and a minimum value of the higher density store The value of the distance δ and the value of its product γ;
    通过根据权利要求1-6中任一项所述的方法获取的ρ、δ及γ各自的调整后的阈值,针对所述多个第一店铺和至少一个第二店铺判定商圈。The business circle is determined for the plurality of first stores and the at least one second store by respective adjusted thresholds of ρ, δ, and γ obtained by the method according to any one of claims 1-6.
  10. 根据权利要求9所述的更新商圈判定的方法,所述方法每预定时段执行一次。The method of updating a business district decision according to claim 9, the method being performed once every predetermined time period.
  11. 一种训练商圈判定模型的装置,包括:A device for training a business circle determination model, comprising:
    第一获取单元,配置为,获取预定地理范围内的多个店铺各自的位置信息和所述多个店铺各自的商圈标注信息;The first obtaining unit is configured to acquire location information of each of the plurality of stores in the predetermined geographical range and the business circle labeling information of the plurality of stores;
    第一计算单元,配置为,根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;The first calculating unit is configured to calculate, according to the CFSFDP clustering algorithm, a value of a local density ρ of each store, a value of a minimum distance δ with a higher density store, and a value of a product γ by using the position information;
    第二获取单元,配置为,根据ρ、δ和γ各自的当前阈值,获取各个店铺的商圈判定信息;a second acquiring unit, configured to acquire the business circle determination information of each store according to a current threshold value of each of ρ, δ, and γ;
    第二计算单元,配置为,利用所述多个店铺各自的商圈判定信息和商圈标注信息,计算全部所述商圈判定信息相对于全部所述商圈标注信息的相似度;以及The second calculating unit is configured to calculate the similarity of all the business circle determination information with respect to all the business circle label information by using the respective business circle determination information and the business circle label information of the plurality of stores;
    阈值调整单元,配置为,调整ρ、δ及γ各自的阈值,使得所述相似度提高。The threshold adjustment unit is configured to adjust respective threshold values of ρ, δ, and γ such that the similarity is improved.
  12. 根据权利要求11所述的训练商圈判定模型的装置,其中,所述多 个店铺中店铺i的局部密度ρ的值为ρ i,其中, The apparatus for training a business circle determination model according to claim 11, wherein a value of a local density ρ of the shop i in the plurality of stores is ρ i , wherein
    ρ i=∑ jχ(d ij-d c), ρ i =∑ j χ(d ij -d c ),
    其中,当d ij-d c<0时,χ(d ij-d c)=1,当d ij-d c≥0时,χ(d ij-d c)=0, Wherein, when d ij -d c <0, χ(d ij -d c )=1, when d ij -d c ≥0, χ(d ij -d c )=0,
    d c为半径阈值,d ij为所述多个店铺中的店铺i与店铺j之间的距离,i和j为小于等于所述多个店铺的总店铺数的自然数,且i≠j。 d c is a radius threshold, d ij is the distance between the store i and the store j in the plurality of stores, and i and j are natural numbers less than or equal to the total number of stores of the plurality of stores, and i ≠ j.
  13. 根据权利要求11所述的训练商圈判定模型的装置,其中,所述多个店铺中店铺i的局部密度ρ的值为ρ i,其中, The apparatus for training a business circle determination model according to claim 11, wherein a value of a local density ρ of the shop i in the plurality of stores is ρ i , wherein
    Figure PCTCN2018119319-appb-100005
    Figure PCTCN2018119319-appb-100005
    其中,d c为半径阈值,d ij为所述多个店铺中的店铺i与店铺j之间的距离,i和j为小于等于所述多个店铺的总店铺数的自然数,且i≠j。 Where d c is a radius threshold, d ij is the distance between the store i and the store j in the plurality of stores, and i and j are natural numbers less than or equal to the total number of stores of the plurality of stores, and i≠j .
  14. 根据权利要求12或13所述的训练商圈判定模型的装置,其中,The apparatus for training a business circle determination model according to claim 12 or 13, wherein
    店铺i和店铺j的位置信息分别以经纬度表示为(Lon i,Lat i)和(Lon j,Lat j),并且其距离d ij如下计算: The position information of the shop i and the shop j are expressed as latitude and longitude (Lon i , Lat i ) and (Lon j , Lat j ), respectively, and the distance d ij is calculated as follows:
    Figure PCTCN2018119319-appb-100006
    Figure PCTCN2018119319-appb-100006
    其中,R为地球半径。Where R is the radius of the Earth.
  15. 根据权利要求11所述的训练商圈判定模型的装置,其中,所述相似度以参数WFS表示,其中,The apparatus for training a business circle determination model according to claim 11, wherein the similarity is represented by a parameter WFS, wherein
    Figure PCTCN2018119319-appb-100007
    Figure PCTCN2018119319-appb-100007
    Figure PCTCN2018119319-appb-100008
    Figure PCTCN2018119319-appb-100008
    其中,i为0到A的整数,j为0到B的整数,A为标注商圈的商圈数,B为判定商圈的商圈数,N i为第i个标注商圈包含的店铺数,N为所述多个店铺的总店铺数, Where i is an integer from 0 to A, j is an integer from 0 to B, A is the number of quotients in the labeled business circle, B is the number of quotients in the determination of the business circle, and N i is the store included in the i-th labeled business circle Number, N is the total number of stores in the plurality of stores,
    P ij为关于第i个标注商圈和第j个判定商圈的精确率,R ij为关于第i 个标注商圈和第j个判定商圈的召回率,其中将包括标注零散店铺的集合设定为第0个标注商圈,将包括判定零散店铺的集合设定为第0个判定商圈,其中所述标注零散店铺为不属于任何标注商圈的标注店铺,所述判定零散店铺为不属于任何判定商圈的判定店铺。 P ij is the accuracy rate for the i-th labeled business circle and the j-th determined business circle, and R ij is the recall rate for the i-th labeled business circle and the j-th determined business circle, which will include a collection of labeled scattered stores Set to the 0th labeled business circle, and set the set of the determined scattered stores as the 0th determination business circle, wherein the marked scattered shop is an annotated shop that does not belong to any labeled business circle, and the determined scattered shop is A judgment shop that does not belong to any judgment business circle.
  16. 根据权利要求11所述的训练商圈判定模型的装置,其中所述阈值调整单元还配置为,调整ρ、δ及γ各自的阈值,使得所述相似度最大。The apparatus for training a business circle determination model according to claim 11, wherein said threshold adjustment unit is further configured to adjust respective threshold values of ρ, δ, and γ such that said similarity is maximum.
  17. 一种判定商圈的装置,包括:A device for determining a business circle, comprising:
    获取单元,配置为,获取预定地理范围内的多个店铺各自的位置信息;An obtaining unit configured to acquire location information of each of a plurality of stores within a predetermined geographical range;
    计算单元,配置为,根据CFSFDP聚类算法,利用所述位置信息,计算各个店铺的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;以及a calculating unit configured to calculate, according to the CFSFDP clustering algorithm, a value of a local density ρ of each store, a value of a minimum distance δ with a higher density store, and a value of a product γ thereof;
    判定单元,配置为,通过根据权利要求1-6中任一项所述的方法获取的ρ、δ及γ各自的调整后的阈值,针对所述多个店铺判定商圈。The determining unit is configured to determine the business circle for the plurality of stores by the adjusted threshold values of ρ, δ, and γ obtained by the method according to any one of claims 1-6.
  18. 根据权利要求17所述的判定商圈的装置,其中所述预定地理范围为预定城市。The apparatus for determining a business district according to claim 17, wherein said predetermined geographic range is a predetermined city.
  19. 一种更新商圈判定的装置,包括:A device for updating a business circle determination, comprising:
    第一获取单元,配置为,获取预定地理范围内的多个第一店铺各自的第一位置信息和各个第一店铺之间的第一距离;a first acquiring unit, configured to acquire first location information of each of the plurality of first stores in a predetermined geographic range and a first distance between the respective first stores;
    第二获取单元,配置为,获取所述预定地理范围内的至少一个第二店铺各自的第二位置信息;a second acquiring unit, configured to acquire second location information of each of the at least one second store in the predetermined geographic range;
    第一计算单元,配置为,利用所述第一位置信息和第二位置信息,计算所述各个第二店铺之间第二距离、第二店铺中任一店铺与第一店铺中任一店铺之间的第三距离;The first calculating unit is configured to calculate, by using the first location information and the second location information, a second distance between the second stores, any one of the second stores, and any one of the first stores The third distance between
    第二计算单元,配置为,根据CFSFDP聚类算法,基于所述第一距离、第二距离和第三距离,计算所述多个第一店铺和至少一个第二店铺各自的局部密度ρ的值、与更高密度店铺的最小距离δ的值及其乘积γ的值;以及a second calculating unit, configured to calculate, according to the CFSFDP clustering algorithm, a value of a local density ρ of each of the plurality of first stores and at least one second store based on the first distance, the second distance, and the third distance The value of the minimum distance δ from the higher density store and the value of the product γ;
    判定单元,配置为,通过根据权利要求1-6中任一项所述的方法获取的ρ、δ及γ各自的调整后的阈值,针对所述多个第一店铺和至少一个第二店铺判定商圈。The determining unit is configured to determine, by the respective adjusted thresholds of ρ, δ, and γ obtained by the method according to any one of claims 1-6, for the plurality of first stores and at least one second store Business circle.
  20. 根据权利要求19所述的更新商圈判定的装置,所述装置每预定时段实施一次。The apparatus for updating a business district decision according to claim 19, the apparatus being implemented once every predetermined time period.
PCT/CN2018/119319 2018-03-20 2018-12-05 Trading area determination method and device WO2019179173A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810231483.4A CN108596648B (en) 2018-03-20 2018-03-20 Business circle judgment method and device
CN201810231483.4 2018-03-20

Publications (1)

Publication Number Publication Date
WO2019179173A1 true WO2019179173A1 (en) 2019-09-26

Family

ID=63626938

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/119319 WO2019179173A1 (en) 2018-03-20 2018-12-05 Trading area determination method and device

Country Status (3)

Country Link
CN (1) CN108596648B (en)
TW (1) TWI711983B (en)
WO (1) WO2019179173A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111369284A (en) * 2020-03-03 2020-07-03 浙江网商银行股份有限公司 Target object type determination method and device
CN111815361A (en) * 2020-07-10 2020-10-23 北京思特奇信息技术股份有限公司 Region boundary calculation method and device, electronic equipment and storage medium
CN112783963A (en) * 2021-03-17 2021-05-11 上海数喆数据科技有限公司 Enterprise offline and online multi-source data integration method and device based on business circle division
CN116308501A (en) * 2023-05-24 2023-06-23 北京骑胜科技有限公司 Method, apparatus, device and medium for managing operation area of shared vehicle

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596648B (en) * 2018-03-20 2020-07-17 阿里巴巴集团控股有限公司 Business circle judgment method and device
CN110175865A (en) * 2019-04-23 2019-08-27 国网浙江省电力有限公司湖州供电公司 Electric car charging real time pricing method based on ubiquitous cognition technology
CN111091417B (en) * 2019-12-12 2023-10-31 拉扎斯网络科技(上海)有限公司 Site selection method and device
CN111210269B (en) * 2020-01-02 2020-09-18 平安科技(深圳)有限公司 Object identification method based on big data, electronic device and storage medium
CN111932318B (en) * 2020-09-21 2021-01-19 腾讯科技(深圳)有限公司 Region division method and device, electronic equipment and computer readable storage medium
CN112016326A (en) * 2020-09-25 2020-12-01 北京百度网讯科技有限公司 Map area word recognition method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574014A (en) * 2014-10-13 2016-05-11 北京明略软件系统有限公司 Commercial district division method and system
CN107657474A (en) * 2017-07-31 2018-02-02 石河子大学 The determination method and service end on a kind of commercial circle border
CN108596648A (en) * 2018-03-20 2018-09-28 阿里巴巴集团控股有限公司 A kind of commercial circle determination method and device

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111946B (en) * 2013-04-19 2018-08-07 腾讯科技(深圳)有限公司 Clustering method based on user interest and device
CN106649331B (en) * 2015-10-29 2020-09-11 阿里巴巴集团控股有限公司 Business circle identification method and equipment
US20170308929A1 (en) * 2016-04-25 2017-10-26 Chian Chiu Li Social Network Based Advertisement
CN106339416B (en) * 2016-08-15 2019-11-08 常熟理工学院 Educational data clustering method based on grid fast searching density peaks
CN106777984B (en) * 2016-12-19 2019-02-22 福州大学 A method of photovoltaic array Working state analysis and fault diagnosis are realized based on density clustering algorithm
CN106649877A (en) * 2017-01-06 2017-05-10 广东工业大学 Density peak-based big data mining method and apparatus
CN107563789A (en) * 2017-07-31 2018-01-09 石河子大学 Data processing method, system, terminal and computer-readable recording medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574014A (en) * 2014-10-13 2016-05-11 北京明略软件系统有限公司 Commercial district division method and system
CN107657474A (en) * 2017-07-31 2018-02-02 石河子大学 The determination method and service end on a kind of commercial circle border
CN108596648A (en) * 2018-03-20 2018-09-28 阿里巴巴集团控股有限公司 A kind of commercial circle determination method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111369284A (en) * 2020-03-03 2020-07-03 浙江网商银行股份有限公司 Target object type determination method and device
CN111369284B (en) * 2020-03-03 2023-08-15 浙江网商银行股份有限公司 Target object type determining method and device
CN111815361A (en) * 2020-07-10 2020-10-23 北京思特奇信息技术股份有限公司 Region boundary calculation method and device, electronic equipment and storage medium
CN112783963A (en) * 2021-03-17 2021-05-11 上海数喆数据科技有限公司 Enterprise offline and online multi-source data integration method and device based on business circle division
CN112783963B (en) * 2021-03-17 2023-04-28 上海数喆数据科技有限公司 Enterprise offline and online multi-source data integration method and device based on business district division
CN116308501A (en) * 2023-05-24 2023-06-23 北京骑胜科技有限公司 Method, apparatus, device and medium for managing operation area of shared vehicle
CN116308501B (en) * 2023-05-24 2023-10-17 北京骑胜科技有限公司 Method, apparatus, device and medium for managing operation area of shared vehicle

Also Published As

Publication number Publication date
CN108596648B (en) 2020-07-17
TW201941116A (en) 2019-10-16
TWI711983B (en) 2020-12-01
CN108596648A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
WO2019179173A1 (en) Trading area determination method and device
US10496678B1 (en) Systems and methods for generating and implementing knowledge graphs for knowledge representation and analysis
Lloyd Spatial data analysis: an introduction for GIS users
US11118921B2 (en) Vehicle routing guidance to an authoritative location for a point of interest
KR20160100809A (en) Method and device for determining a target location
CN108540988B (en) Scene division method and device
CN106649331A (en) Business district recognition method and equipment
WO2021109775A1 (en) Methods and devices for generating training sample, training model and recognizing character
CN104915879A (en) Social relationship mining method and device based on financial data
CN110119772B (en) Three-dimensional model classification method based on geometric shape feature fusion
CN101127049A (en) Clustering for structured data
CN113344019A (en) K-means algorithm for improving decision value selection initial clustering center
CN107203529A (en) Multi-service correlation analysis method and device based on metadata graph structural similarity
CN116008671A (en) Lightning positioning method based on time difference and clustering
CN113379269B (en) Urban business function partitioning method, device and medium for multi-factor spatial clustering
CN107507176A (en) A kind of image detecting method and system
CN109919227A (en) A kind of density peaks clustering method towards mixed attributes data set
WO2019087552A1 (en) Financial transaction style feature mapping device and method for generating transaction style feature map
CN112445976A (en) City address positioning method based on congestion index map
US11810001B1 (en) Systems and methods for generating and implementing knowledge graphs for knowledge representation and analysis
CN112001384A (en) Business circle identification method and equipment
CN108647189B (en) Method and device for identifying user crowd attributes
CN108229572B (en) Parameter optimization method and computing equipment
Du et al. Similarity measurements on multi‐scale qualitative locations
CN110874607A (en) Clustering method and device for network nodes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18911316

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18911316

Country of ref document: EP

Kind code of ref document: A1