WO2021027407A1 - Risky user identification method and apparatus, computer device, and storage medium - Google Patents

Risky user identification method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2021027407A1
WO2021027407A1 PCT/CN2020/098579 CN2020098579W WO2021027407A1 WO 2021027407 A1 WO2021027407 A1 WO 2021027407A1 CN 2020098579 W CN2020098579 W CN 2020098579W WO 2021027407 A1 WO2021027407 A1 WO 2021027407A1
Authority
WO
WIPO (PCT)
Prior art keywords
location data
location
user
preset
terminal
Prior art date
Application number
PCT/CN2020/098579
Other languages
French (fr)
Chinese (zh)
Inventor
丁露涛
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021027407A1 publication Critical patent/WO2021027407A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification

Definitions

  • This application relates to the field of computer data processing, and in particular to a risk user identification method, device, computer equipment and computer-readable storage medium.
  • the embodiments of the present application provide a method, device, computer equipment, and storage medium for identifying risky users, aiming to solve the problems of low identification accuracy and slow identification speed of risky users.
  • an embodiment of the present application provides a risk user identification method, which includes:
  • the location data set corresponding to the terminal is acquired, the location data set includes the location information of at least two of the terminals;
  • centroid matches the reserved location data corresponding to the user, determining whether the centroid matches the preset risk location data
  • centroid matches the preset risk location data, it is determined that the user is a risk user and the order data corresponding to the user is determined as risk data.
  • an embodiment of the present application provides a risk user identification device, which includes:
  • the first obtaining unit is configured to obtain a location data set corresponding to the terminal if the order data sent by the user through the terminal is received, the location data set including the location information of at least two of the terminals;
  • the first clustering unit is configured to perform clustering processing on the position data set according to a preset first clustering algorithm to obtain a position data cluster corresponding to the position data set after the clustering processing;
  • the second clustering unit is configured to perform clustering processing on the position data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the position data clusters after the clustering processing;
  • the first determining unit is configured to determine whether the center of mass matches the reserved location data corresponding to the user;
  • a second determining unit configured to determine whether the center of mass matches the preset risk location data if the center of mass matches the reserved location data corresponding to the user;
  • the order determination unit is configured to determine that the user is a risk user and determine the order data corresponding to the user as risk data if the center of mass matches the preset risk location data.
  • an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor executes all Perform the following steps when describing the procedure:
  • the location data set corresponding to the terminal is acquired, the location data set includes the location information of at least two of the terminals;
  • centroid matches the reserved location data corresponding to the user, determining whether the centroid matches the preset risk location data
  • centroid matches the preset risk location data, it is determined that the user is a risk user and the order data corresponding to the user is determined as risk data.
  • the embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which when executed by a processor causes the processor to perform the following steps :
  • the location data set corresponding to the terminal is acquired, the location data set includes the location information of at least two of the terminals;
  • centroid matches the reserved location data corresponding to the user, determining whether the centroid matches the preset risk location data
  • centroid matches the preset risk location data, it is determined that the user is a risk user and the order data corresponding to the user is determined as risk data.
  • the position data set is clustered by the preset first clustering algorithm and the preset second clustering algorithm clustering to obtain the centroid; and then according to the centroid and the reserved position data corresponding to the user
  • the preset risk location data realizes the identification of risk users, which is not affected by human subjective factors throughout the process, which is beneficial to improve the accuracy and speed of identification of risk users.
  • FIG. 1 is a schematic flowchart of a risk user identification method provided by an embodiment of this application
  • FIG. 2 is a schematic diagram of an application scenario of a risk user identification method provided by an embodiment of this application;
  • FIG. 3 is a schematic diagram of another process of a risk user identification method provided by an embodiment of this application.
  • FIG. 4 is a schematic diagram of another process of a risk user identification method provided by an embodiment of this application.
  • FIG. 5 is another flowchart of a method for identifying risky users according to an embodiment of this application.
  • FIG. 6 is a schematic diagram of another process of a risk user identification method provided by an embodiment of this application.
  • FIG. 7 is a schematic block diagram of a risk user identification device provided by an embodiment of this application.
  • FIG. 8 is another schematic block diagram of a risk user identification device provided by an embodiment of this application.
  • FIG. 9 is another schematic block diagram of a risk user identification device provided by an embodiment of this application.
  • FIG. 10 is another schematic block diagram of a risk user identification device provided by an embodiment of this application.
  • FIG. 11 is another schematic block diagram of a risk user identification device provided by an embodiment of this application.
  • FIG. 12 is a schematic block diagram of a computer device provided by an embodiment of this application.
  • FIG. 1 is a schematic flowchart of a risk user identification method provided by an embodiment of the application.
  • the risk user identification method provided in the embodiment of the present application can be applied to the server 20.
  • the server 20 may be a server used in an enterprise to process order data for risk user identification.
  • the server 20 may be an independent server, or a server cluster composed of multiple servers.
  • the server 20 can establish a communication connection with the terminal 10 for data exchange.
  • the server 20 can establish a communication connection with the terminal 10 to receive order data sent by the terminal.
  • the terminal 10 may be an electronic terminal such as a mobile phone, a tablet computer, or a desktop computer.
  • the risk user identification method includes steps S110-S160.
  • the terminal can realize data interaction by establishing a communication connection with the server.
  • the user can send order data to the server by operating the terminal.
  • the order data may be order data of various commodities.
  • order data includes, but is not limited to: travel order data, takeaway order data, insurance order data, etc.
  • the location data set corresponding to the terminal includes location information of at least two of the terminals.
  • the acquiring of the location data set corresponding to the terminal may specifically be by acquiring multiple location information corresponding to the terminal, and the acquired set of multiple location information corresponding to the terminal is the location data set corresponding to the terminal.
  • step S110 includes but is not limited to steps S111-S112.
  • S111 If the order data sent by the user through the terminal is received, generate a location acquisition time range according to the sending time of the order data and a preset time period.
  • the preset time period can be set according to actual needs.
  • the preset time period is, for example, 7 days, 30 days, 60 days, and so on.
  • Generating the location acquisition time range according to the sending time of the order data and the preset time period specifically includes: subtracting the sending time of the order data from the preset time period to obtain the location acquisition start time; and The sending time of the order data is determined as the position acquisition end time; the time range between the position acquisition start time and the position acquisition end time is determined as the position acquisition time range.
  • Each terminal corresponds to a unique terminal identification code, and the terminal identification code is, for example, International Mobile Equipment Identity (IMEI).
  • the preset location database is used to store the acquired location information corresponding to the terminal.
  • the position information includes position coordinates and a coordinate acquisition time corresponding to the position coordinates, and the position coordinates include longitude coordinate information and latitude coordinate information.
  • the method for obtaining the location coordinates corresponding to the terminal includes, but is not limited to, a global positioning system (Global Positioning System, GPS), a mobile location base station system (Location Based Service, LBS), or a combination thereof.
  • the storage format of the position information may be "L1, L2; T"; where L1 represents longitude coordinate information, L2 represents latitude coordinate information, and T represents coordinate acquisition time.
  • the location information includes: 114.059818, 22.540215; 2019-1-14 16:52:55; among them, 114.059818 is the longitude coordinate information, 22.540215 is the latitude coordinate information, and 2019-1-1416:52:55 is the coordinate acquisition time.
  • the generating of the position data set according to the position information matching the position acquisition time range is specifically: determining whether the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time If the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time range, determine that the position information is the position information that matches the position acquisition time range; store the The location acquires location information matching the time range to form the location data set. If the coordinate acquisition time corresponding to the position information in the preset position database is not within the position acquisition time range, it is determined that the position information is position information that does not match the position acquisition time range.
  • step S210 may be further included before step S110.
  • S210 Acquire location information of the terminal according to a preset time interval, and store the location information in preset location data.
  • the preset time interval can be set according to actual needs.
  • the preset time interval is, for example, 5 minutes, 10 minutes, and 30 minutes. The smaller the preset time interval, the higher the recognition accuracy. Wherein, the preset time interval is less than or equal to the preset time period.
  • S120 Perform clustering processing on the location data set according to a preset first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing.
  • the preset first clustering algorithm may be the DBSCAN algorithm (Density-Based Spatial Clustering of Applications with Noise, a density-based clustering method with noise).
  • the DBSCAN algorithm is a density-based spatial clustering algorithm. The algorithm divides areas with sufficient density into clusters, and finds clusters of arbitrary shapes in a noisy spatial database. The DBSCAN algorithm defines clusters as the largest collection of densely connected points.
  • the DBSCAN algorithm can operate normally.
  • the calculation parameters include the scanning radius Eps and the minimum number of points contained MinPts.
  • the scan radius Eps represents the range of the circular neighborhood centered on point P, where P is any unvisited data in the data set;
  • the minimum number of points included MinPts represents the neighborhood centered on point P The minimum number of points contained within the domain MinPts. If the number of points in the neighborhood with the point P as the center and the scanning radius Eps is not less than the minimum number of contained points MinPts, then the point P is called the core point.
  • the calculation parameters can be adjusted according to actual needs. If the minimum number of points contained in MinPts remains unchanged, and the scanning radius Eps is too large, most data points will be clustered into the same cluster; if the scanning radius Eps is too small, a cluster will be split. If the scanning radius Eps remains the same and the minimum included points MinPts is too large, it will cause the points in the same cluster to be determined as outliers. If the minimum included points MinPts is too small, a large number of core points will be found. In specific implementation, the scanning radius Eps can be set to 2 kilometers, and the minimum number of points contained MinPts can be set to 5.
  • S130 Perform clustering processing on the position data clusters according to a preset second clustering algorithm to obtain a centroid corresponding to the position data clusters after the clustering processing.
  • the preset second clustering algorithm may be the K-means algorithm (K-Means Clustering Algorithm, K-means clustering algorithm).
  • K-means algorithm uses pre-selected K objects as the initial cluster centers, and the value of K needs to be set in advance. Then calculate the distance between each object and each seed cluster center, and assign each object to the cluster center closest to it.
  • the cluster centers and the objects assigned to them represent a cluster. Once all objects have been allocated, the cluster center of each cluster will be recalculated based on the existing objects in the cluster. This process will be repeated until the termination condition is met.
  • the termination condition can be that no (or minimum number) of objects are reassigned to different clusters, or no (or minimum number) of cluster centers change again, or the sum of squared errors is locally minimum.
  • the number of the position data clusters may be one or more, and the position data clusters are clustered according to a preset second clustering algorithm to obtain the centroid corresponding to the position data clusters after the clustering processing. Specifically, clustering is performed on each position data cluster according to K-means to obtain the centroid corresponding to the position data cluster after the clustering processing. Among them, the value of K is set to 1 in advance.
  • S140 Determine whether the center of mass matches the reserved location data corresponding to the user.
  • the reserved position data corresponding to the user is position information pre-stored in the server by the user, and the reserved position data corresponding to the user includes but is not limited to home address information, office address information, and the like.
  • the preset risk location data type may be one or more, and the preset risk location data type may be determined according to the type of the order data and the preset type mapping relationship.
  • the preset type mapping relationship is used to determine the corresponding relationship between the order data type and the preset risk location data type.
  • the order data is insurance order data
  • the insurance policy type corresponding to the insurance order data is critical illness insurance.
  • the type of the preset risk location data is a hospital.
  • the centroid By judging whether the centroid matches the reserved location data corresponding to the user, it is judged whether the centroid obtained after clustering by the preset first clustering algorithm and the preset second clustering algorithm is smaller than the preset The error threshold. If the centroid matches the reserved location data corresponding to the user, it is determined that the obtained centroid is less than the preset error threshold, and risk user identification can be performed according to the obtained centroid. If the center of mass does not match the reserved position data corresponding to the user, and it is determined that the obtained center of mass is not less than the preset error threshold, a reminder message is sent to the manager to remind the manager to modify the calculation parameters to improve the obtained The accuracy of the center of mass improves the accuracy of risk user identification.
  • the preset error threshold can be set according to actual requirements, and the preset error threshold is, for example, 1 kilometer.
  • step S140 includes but is not limited to steps S141-S143.
  • Calculating the distance difference between the center of mass and the reserved position data corresponding to the user may be implemented by a first formula, and the first formula may be a Haversine formula. Wherein, the first formula is specifically:
  • S142 Determine whether the distance difference between the center of mass and the reserved location data corresponding to the user is less than a preset first difference threshold.
  • the preset first difference threshold may be set according to actual requirements, for example, the preset first difference threshold may be set to 1 km.
  • the centroid matches the reserved position data corresponding to the user. If the distance difference between the centroid and the reserved position data corresponding to the user is not less than the preset first difference threshold, it is determined that the centroid does not match the reserved position data corresponding to the user.
  • the center of mass matches the reserved location data corresponding to the user, it indicates that the center of mass obtained after clustering through the preset first clustering algorithm and the preset second clustering algorithm is less than the preset error threshold,
  • the obtained centroid has a high degree of reliability and can be used for risk user identification, so as to determine whether the centroid matches the preset risk location data.
  • step S150 includes but is not limited to steps S151-S153.
  • Calculating the distance difference between the center of mass and the preset risk location data can be implemented by a second formula, and the second formula can be a Haversine formula.
  • the second formula is specifically:
  • S152 Determine whether the distance difference between the centroid and the preset risk location data is less than a preset second difference threshold.
  • the preset second difference threshold may be set according to actual requirements, for example, the preset second difference threshold may be set to 1 km.
  • the centroid matches the preset risk location data. If the distance difference between the centroid and the preset risk location data is not less than the preset second difference threshold, it is determined that the centroid does not match the preset risk location data.
  • S160 If the center of mass matches the preset risk location data, determine that the user is a risk user and determine the order data corresponding to the user as risk data.
  • the order data is insurance order data
  • the centroid matches the preset risk location data, it indicates that the user has been active at the preset risk location (such as a hospital, etc.) before sending the insurance order data.
  • the risk of insuring with illness and then determine that the user is a risk user.
  • the order data corresponding to the user needs to be determined as risk data for subsequent monitoring or manual follow-up.
  • the order data is insurance order data
  • the user is a risk user, it indicates that the insurance order data has a high probability of fraud, and then the insurance policy corresponding to the user is determined as a risk insurance policy for reviewers to conduct Manual investigation reduces the risk of insurance policy fraud.
  • FIG. 7 is a schematic block diagram of a risk user identification device 100 provided by an embodiment of the present application. As shown in FIG. 7, corresponding to the above risk user identification method, the present application also provides a risk user identification device 100.
  • the risk user identification device 100 includes a unit for executing the above risk user identification method, and the device 100 may be configured in a server.
  • the server may be an independent server or a server cluster composed of multiple servers.
  • the device 100 includes a first obtaining unit 110, a first clustering unit 120, a second clustering unit 130, a first judging unit 140, a second judging unit, and an order determining unit 160.
  • the first obtaining unit 110 is configured to obtain a location data set corresponding to the terminal if the order data sent by the user through the terminal is received, the location data set including location information of at least two of the terminals.
  • the first obtaining unit 110 includes a first generating unit 111 and a second generating unit 112.
  • the first generating unit 111 is configured to generate a location acquisition time range according to the sending time of the order data and a preset time period if the order data sent by the user through the terminal is received.
  • the second generating unit 112 is configured to obtain the location information matching the location acquisition time range in a preset location database according to the terminal identification code corresponding to the terminal, and according to the location information matching the location acquisition time range The location information of generates the location data set.
  • the device 100 further includes a position storage unit 210.
  • the location storage unit 210 is configured to obtain location information of the terminal according to a preset time interval, and store the location information to preset location data.
  • the first clustering unit 120 is configured to perform clustering processing on the position data set according to a preset first clustering algorithm to obtain a position data cluster corresponding to the position data set after the clustering processing.
  • the second clustering unit 130 is configured to perform clustering processing on the position data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the position data clusters after the clustering processing.
  • the first determining unit 140 is configured to determine whether the user is a risk user according to the centroid, the reserved location data corresponding to the user, and preset risk location data.
  • the first judgment unit 140 includes a first calculation unit 141, a fourth judgment unit 142 and a second determination unit 143.
  • the first calculation unit 141 is configured to calculate the distance difference between the center of mass and the reserved position data corresponding to the user.
  • the fourth determining unit 142 is configured to determine whether the distance difference between the center of mass and the reserved position data corresponding to the user is less than a preset first difference threshold.
  • the second determining unit 143 is configured to determine a reservation corresponding to the centroid and the user if the distance difference between the centroid and the reserved position data corresponding to the user is less than a preset first difference threshold. The location data matches.
  • the second determining unit 150 is configured to determine whether the center of mass matches the preset risk location data if the center of mass matches the reserved location data corresponding to the user.
  • the second judgment unit 150 includes a second calculation unit 151, a fifth judgment unit 152 and a third determination unit 153.
  • the second calculation unit 151 is configured to calculate the distance difference between the center of mass and the preset risk position data if the center of mass matches the reserved position data corresponding to the user.
  • the fifth determining unit 152 is configured to determine whether the distance difference between the centroid and the preset risk location data is smaller than a preset second difference threshold.
  • the third determining unit 153 is configured to determine the center of mass and the preset risk location data if the distance difference between the center of mass and the preset risk location data is less than a preset second difference threshold. match.
  • the order determination unit 160 is configured to determine the order data corresponding to the user as risk data if the user is a risk user.
  • the above-mentioned apparatus 100 may be implemented in the form of a computer program, and the computer program may run on a computer device as shown in FIG.
  • FIG. 12 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 500 may be in a server.
  • the server may be an independent server or a server cluster composed of multiple servers.
  • the computer device 500 includes a processor 520, a memory, and a network interface 550 connected through a system bus 510, where the memory may include a non-volatile storage medium 530 and an internal memory 540.
  • the non-volatile storage medium 530 can store an operating system 531 and a computer program 532.
  • the processor 520 can execute a risk user identification method.
  • the processor 520 is used to provide calculation and control capabilities, and support the operation of the entire computer device 500.
  • the internal memory 540 provides an environment for the running of a computer program in a non-volatile storage medium.
  • the processor 520 can execute a risk user identification method.
  • the network interface 550 is used for network communication with other devices.
  • the schematic block diagram of the computer device is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
  • the specific computer device 500 It may include more or fewer components than shown in the figures, or combine certain components, or have a different component arrangement.
  • the processor 520 is configured to run a computer program stored in a memory to implement any embodiment of the risk user identification method described above.
  • the processor 520 may be a central processing unit (Central Processing Unit, CPU), and the processor 520 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • the computer program may be stored in a storage medium, and the storage medium may be a computer-readable storage medium.
  • the computer program is executed by at least one processor in the computer system to implement the process steps of the foregoing method embodiment.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the storage medium stores a computer program that, when executed by a processor, implements any embodiment of the risk user identification method described above.
  • the computer-readable storage medium may be a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a magnetic disk, or an optical disk, and other media that can store program codes.

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A risky user identification method and apparatus, a computer device, and a storage medium, wherein same are applied to the field of data processing and belong to artificial intelligence technology. The method comprises: if order data sent by a user through a terminal is received, acquiring a location data set corresponding to the terminal; performing clustering processing on the location data set according to a pre-set first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing; performing clustering processing on the location data cluster according to a pre-set second clustering algorithm to obtain a centroid corresponding to the location data cluster after the clustering processing; determining, according to the centroid, reserved location data corresponding to the user, and pre-set risky location data, whether the user is a risky user; and if the user is a risky user, determining the order data corresponding to the user to be risky data.

Description

风险用户识别方法、装置、计算机设备及存储介质Risk user identification method, device, computer equipment and storage medium
本申请要求于2019年08月13日提交中国专利局、申请号为201910746104.X,发明名称为“风险用户识别方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on August 13, 2019, the application number is 201910746104.X, and the invention title is "risk user identification method, device, computer equipment and storage medium", and its entire content Incorporated in this application by reference.
技术领域Technical field
本申请涉及计算机数据处理领域,尤其涉及一种风险用户识别方法、装置、计算机设备及计算机可读存储介质。This application relates to the field of computer data processing, and in particular to a risk user identification method, device, computer equipment and computer-readable storage medium.
背景技术Background technique
随着互联网的快速发展,越来越多产品可通过互联网实现交易(例如商品交易、服务交易等)。为了保障利用互联网进行交易的安全性,需要识别出风险用户(例如经营欺诈网站的广告主、经营非法产品的商家、伪造信息骗保的用户等),并避免其参与交易。现有技术中,对风险用户进行排查的方式是用户提交订单后,由人工对订单数据进行风险识别。发明人意识到通过人工识别风险用户不仅容易受主观因素影响造成识别准确度低,而且耗费时间长,导致识别速度慢。With the rapid development of the Internet, more and more products can be traded via the Internet (such as commodity transactions, service transactions, etc.). In order to ensure the security of transactions using the Internet, it is necessary to identify risky users (for example, advertisers who operate fraudulent websites, businesses who operate illegal products, users who falsify information and fraudulent insurance, etc.) and avoid their participation in transactions. In the prior art, a way to check risk users is that after the user submits an order, the order data is manually identified for risk. The inventor realizes that the manual identification of risky users is not only susceptible to subjective factors, resulting in low identification accuracy, but also takes a long time, resulting in slow identification speed.
发明内容Summary of the invention
本申请实施例提供了一种风险用户识别方法、装置、计算机设备及存储介质,旨在解决风险用户识别准确度低、识别速度慢等问题。The embodiments of the present application provide a method, device, computer equipment, and storage medium for identifying risky users, aiming to solve the problems of low identification accuracy and slow identification speed of risky users.
第一方面,本申请实施例提供了一种风险用户识别方法,其包括:In the first aspect, an embodiment of the present application provides a risk user identification method, which includes:
若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集,所述位置数据集包括至少两个所述终端的位置信息;If the order data sent by the user through the terminal is received, the location data set corresponding to the terminal is acquired, the location data set includes the location information of at least two of the terminals;
根据预设的第一聚类算法对所述位置数据集进行聚类处理,以得出所述位置数据集聚类处理后所对应的位置数据簇;Performing clustering processing on the location data set according to a preset first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing;
据预设的第二聚类算法对所述位置数据簇进行聚类处理,以得出所述位置数据簇聚类处理后所对应的质心;Performing clustering processing on the location data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the clustering processing of the location data clusters;
判断所述质心与所述用户对应的预留位置数据是否匹配;Judging whether the center of mass matches the reserved location data corresponding to the user;
若所述质心与所述用户对应的预留位置数据匹配,判断所述质心与所述预设的风险位置数据是否匹配;If the centroid matches the reserved location data corresponding to the user, determining whether the centroid matches the preset risk location data;
若所述质心与所述预设的风险位置数据匹配,确定所述用户为风险用户并将所述用户对应的订单数据确定为风险数据。If the centroid matches the preset risk location data, it is determined that the user is a risk user and the order data corresponding to the user is determined as risk data.
第二方面,本申请实施例提供了一种风险用户识别装置,其包括:In the second aspect, an embodiment of the present application provides a risk user identification device, which includes:
第一获取单元,用于若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集,所述位置数据集包括至少两个所述终端的位置信息;The first obtaining unit is configured to obtain a location data set corresponding to the terminal if the order data sent by the user through the terminal is received, the location data set including the location information of at least two of the terminals;
第一聚类单元,用于根据预设的第一聚类算法对所述位置数据集进行聚类处理,以得出所述位置数据集聚类处理后所对应的位置数据簇;The first clustering unit is configured to perform clustering processing on the position data set according to a preset first clustering algorithm to obtain a position data cluster corresponding to the position data set after the clustering processing;
第二聚类单元,用于根据预设的第二聚类算法对所述位置数据簇进行聚类处理,以得出所述位置数据簇聚类处理后所对应的质心;The second clustering unit is configured to perform clustering processing on the position data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the position data clusters after the clustering processing;
第一判断单元,用于判断所述质心与所述用户对应的预留位置数据是否匹配;The first determining unit is configured to determine whether the center of mass matches the reserved location data corresponding to the user;
第二判断单元,用于若所述质心与所述用户对应的预留位置数据匹配,判断所述质心与所述预设的风险位置数据是否匹配;A second determining unit, configured to determine whether the center of mass matches the preset risk location data if the center of mass matches the reserved location data corresponding to the user;
订单确定单元,用于若所述质心与所述预设的风险位置数据匹配,确定所述用户为风险用户并将所述用户对应的订单数据确定为风险数据。The order determination unit is configured to determine that the user is a risk user and determine the order data corresponding to the user as risk data if the center of mass matches the preset risk location data.
第三方面,本申请实施例又提供了一种计算机设备,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述程序时执行以下步骤:In the third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor executes all Perform the following steps when describing the procedure:
若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集,所述位置数据集包括至少两个所述终端的位置信息;If the order data sent by the user through the terminal is received, the location data set corresponding to the terminal is acquired, the location data set includes the location information of at least two of the terminals;
根据预设的第一聚类算法对所述位置数据集进行聚类处理,以得出所述位置数据集聚类处理后所对应的位置数据簇;Performing clustering processing on the location data set according to a preset first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing;
根据预设的第二聚类算法对所述位置数据簇进行聚类处理,以得出所述位置数据簇聚类处理后所对应的质心;Performing clustering processing on the location data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the clustering location data clusters;
判断所述质心与所述用户对应的预留位置数据是否匹配;Judging whether the center of mass matches the reserved location data corresponding to the user;
若所述质心与所述用户对应的预留位置数据匹配,判断所述质心与所述预设的风险位置数据是否匹配;If the centroid matches the reserved location data corresponding to the user, determining whether the centroid matches the preset risk location data;
若所述质心与所述预设的风险位置数据匹配,确定所述用户为风险用户并将所述用户对应的订单数据确定为风险数据。If the centroid matches the preset risk location data, it is determined that the user is a risk user and the order data corresponding to the user is determined as risk data.
第四方面,本申请实施例还提供了一种计算机可读存储介质,其中所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行以下步骤:In a fourth aspect, the embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which when executed by a processor causes the processor to perform the following steps :
若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集,所述位置数据集包括至少两个所述终端的位置信息;If the order data sent by the user through the terminal is received, the location data set corresponding to the terminal is acquired, the location data set includes the location information of at least two of the terminals;
根据预设的第一聚类算法对所述位置数据集进行聚类处理,以得出所述位置数据集聚类处理后所对应的位置数据簇;Performing clustering processing on the location data set according to a preset first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing;
根据预设的第二聚类算法对所述位置数据簇进行聚类处理,以得出所述位置数据簇聚类处理后所对应的质心;Performing clustering processing on the location data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the clustering location data clusters;
判断所述质心与所述用户对应的预留位置数据是否匹配;Judging whether the center of mass matches the reserved location data corresponding to the user;
若所述质心与所述用户对应的预留位置数据匹配,判断所述质心与所述预设的风险位置数据是否匹配;If the centroid matches the reserved location data corresponding to the user, determining whether the centroid matches the preset risk location data;
若所述质心与所述预设的风险位置数据匹配,确定所述用户为风险用户并将所述用户对应的订单数据确定为风险数据。If the centroid matches the preset risk location data, it is determined that the user is a risk user and the order data corresponding to the user is determined as risk data.
实施本申请实施例通过预设的第一聚类算法以及预设的第二聚类算法聚类对位置数据集进行聚类处理以得到质心;进而根据质心、所述用户对应的预留位置数据以及预设的风险位置数据实现风险用户的识别,全程不受人工主观因素影响,有利于提升风险用户识别的准确度以及识别速度。In the embodiment of this application, the position data set is clustered by the preset first clustering algorithm and the preset second clustering algorithm clustering to obtain the centroid; and then according to the centroid and the reserved position data corresponding to the user And the preset risk location data realizes the identification of risk users, which is not affected by human subjective factors throughout the process, which is beneficial to improve the accuracy and speed of identification of risk users.
附图说明Description of the drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.
图1为本申请一实施例提供的一种风险用户识别方法的流程示意图;FIG. 1 is a schematic flowchart of a risk user identification method provided by an embodiment of this application;
图2为本申请一实施例提供的一种风险用户识别方法的应用场景示意图;FIG. 2 is a schematic diagram of an application scenario of a risk user identification method provided by an embodiment of this application;
图3为本申请一实施例提供的一种风险用户识别方法的另一流程示意图;FIG. 3 is a schematic diagram of another process of a risk user identification method provided by an embodiment of this application;
图4为本申请一实施例提供的一种风险用户识别方法的另一流程示意图;FIG. 4 is a schematic diagram of another process of a risk user identification method provided by an embodiment of this application;
图5为本申请一实施例提供的一种风险用户识别方法的另一流程示意图;FIG. 5 is another flowchart of a method for identifying risky users according to an embodiment of this application;
图6为本申请一实施例提供的一种风险用户识别方法的另一流程示意图;FIG. 6 is a schematic diagram of another process of a risk user identification method provided by an embodiment of this application;
图7为本申请一实施例提供的一种风险用户识别装置的示意性框图;FIG. 7 is a schematic block diagram of a risk user identification device provided by an embodiment of this application;
图8为本申请一实施例提供的一种风险用户识别装置的另一示意性框图;FIG. 8 is another schematic block diagram of a risk user identification device provided by an embodiment of this application;
图9为本申请一实施例提供的一种风险用户识别装置的另一示意性框图;FIG. 9 is another schematic block diagram of a risk user identification device provided by an embodiment of this application;
图10为本申请一实施例提供的一种风险用户识别装置的另一示意性框图;FIG. 10 is another schematic block diagram of a risk user identification device provided by an embodiment of this application;
图11为本申请一实施例提供的一种风险用户识别装置的另一示意性框图;FIG. 11 is another schematic block diagram of a risk user identification device provided by an embodiment of this application;
图12为本申请一实施例提供的一种计算机设备的示意性框图。FIG. 12 is a schematic block diagram of a computer device provided by an embodiment of this application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描 述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
请参照图1,其为本申请一实施例提供的一种风险用户识别方法的流程示意图。本申请实施例所提供的风险用户识别方法可应用于服务器20中。该服务器20可以为企业内用于处理订单数据以进行风险用户识别的服务器。其中,所述服务器20可以是独立的服务器,也可以是多个服务器组成的服务器集群。所述服务器20可与终端10建立通讯连接以进行数据交互,例如该服务器20可与终端10建立通讯连接以接收终端发送的订单数据。其中,所述终端10可以是手机、平板电脑、台式电脑等电子终端。Please refer to FIG. 1, which is a schematic flowchart of a risk user identification method provided by an embodiment of the application. The risk user identification method provided in the embodiment of the present application can be applied to the server 20. The server 20 may be a server used in an enterprise to process order data for risk user identification. The server 20 may be an independent server, or a server cluster composed of multiple servers. The server 20 can establish a communication connection with the terminal 10 for data exchange. For example, the server 20 can establish a communication connection with the terminal 10 to receive order data sent by the terminal. Wherein, the terminal 10 may be an electronic terminal such as a mobile phone, a tablet computer, or a desktop computer.
如图1所示,该风险用户识别方法包括步骤S110-S160。As shown in Fig. 1, the risk user identification method includes steps S110-S160.
S110,若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集,所述位置数据集包括至少两个所述终端的位置信息。S110: If order data sent by a user through a terminal is received, a location data set corresponding to the terminal is obtained, where the location data set includes location information of at least two of the terminals.
终端通过与服务器通过建立通讯连接可实现数据交互。用户通过操作终端,可实现向服务器发送订单数据。所述订单数据可以为各类商品的订单数据。例如,订单数据包括但不限于:旅游订单数据、外卖订单数据、保险订单数据等。The terminal can realize data interaction by establishing a communication connection with the server. The user can send order data to the server by operating the terminal. The order data may be order data of various commodities. For example, order data includes, but is not limited to: travel order data, takeaway order data, insurance order data, etc.
其中,所述终端对应的位置数据集包括至少两个所述终端的位置信息。所述获取所述终端对应的位置数据集具体可通过获取所述终端对应多个位置信息,所获取到的所述终端对应多个位置信息的集合即为所述终端对应的位置数据集。Wherein, the location data set corresponding to the terminal includes location information of at least two of the terminals. The acquiring of the location data set corresponding to the terminal may specifically be by acquiring multiple location information corresponding to the terminal, and the acquired set of multiple location information corresponding to the terminal is the location data set corresponding to the terminal.
在一些实施例中,如图3所示,步骤S110包括但不限于步骤S111-S112。In some embodiments, as shown in FIG. 3, step S110 includes but is not limited to steps S111-S112.
S111,若接收到用户通过终端发送的订单数据,根据所述订单数据的发送时间以及预设的时间周期生成位置获取时间范围。S111: If the order data sent by the user through the terminal is received, generate a location acquisition time range according to the sending time of the order data and a preset time period.
预设的时间周期可根据实际需求进行设定。预设的时间周期例如为7天、30天、60天等等。根据所述订单数据的发送时间以及预设的时间周期生成位置获取时间范围具体为:将所述订单数据的发送时间与预设的时间周期相减以得出位置获取开始时间;并将所述订单数据的发送时间确定为位置获取结束时间;将所述位置获取开始时间与位置获取结束时间之间的时间范围确定为位置获取时间范围。The preset time period can be set according to actual needs. The preset time period is, for example, 7 days, 30 days, 60 days, and so on. Generating the location acquisition time range according to the sending time of the order data and the preset time period specifically includes: subtracting the sending time of the order data from the preset time period to obtain the location acquisition start time; and The sending time of the order data is determined as the position acquisition end time; the time range between the position acquisition start time and the position acquisition end time is determined as the position acquisition time range.
S112,根据所述终端对应的终端标识码,在预设的位置数据库中获取与所述位置获取时间范围匹配的位置信息,并根据所述与所述位置获取时间范围匹配的位置信息生成所述位置数据集。S112. According to the terminal identification code corresponding to the terminal, obtain the location information matching the location acquisition time range from a preset location database, and generate the location information according to the location information matching the location acquisition time range. Location data set.
每一个终端对应唯一的终端识别码,所述终端识别码例如为国际移动设备识别码(International Mobile Equipment Identity,IMEI)。预设的位置数据库用于储存所获取的所述终端对应的位置信息。所述位置信息包括位置坐标以及与该位置坐标对应的坐标获取时间,所述位置坐标包括经度坐标信息以及纬度坐标信息。所述终端对应的位置坐标的获取方式包括但不限于全球卫星定位系统(Global Positioning System,GPS)、移动地点基站系统(Location Based Service,LBS)或者其结合等等。Each terminal corresponds to a unique terminal identification code, and the terminal identification code is, for example, International Mobile Equipment Identity (IMEI). The preset location database is used to store the acquired location information corresponding to the terminal. The position information includes position coordinates and a coordinate acquisition time corresponding to the position coordinates, and the position coordinates include longitude coordinate information and latitude coordinate information. The method for obtaining the location coordinates corresponding to the terminal includes, but is not limited to, a global positioning system (Global Positioning System, GPS), a mobile location base station system (Location Based Service, LBS), or a combination thereof.
具体地,位置信息的储存格式可以为“L1,L2;T”;其中,L1表示经度坐标信息,L2表示纬度坐标信息,T表示坐标获取时间。例如位置信息包括:114.059818,22.540215;2019-1-14 16:52:55;其中,114.059818为经度坐标信息,22.540215为纬度坐标信息,2019-1-1416:52:55为坐标获取时间。Specifically, the storage format of the position information may be "L1, L2; T"; where L1 represents longitude coordinate information, L2 represents latitude coordinate information, and T represents coordinate acquisition time. For example, the location information includes: 114.059818, 22.540215; 2019-1-14 16:52:55; among them, 114.059818 is the longitude coordinate information, 22.540215 is the latitude coordinate information, and 2019-1-1416:52:55 is the coordinate acquisition time.
具体地,所述根据所述与所述位置获取时间范围匹配的位置信息生成所述位置数据集具体为:判断预设的位置数据库中的位置信息对应的坐标获取时间是否在所述位置获取时间范围内;若预设的位置数据库中的位置信息对应的坐标获取时间在所述位置获取时间范围内,确定所述位置信息为与所述位置获取时间范围匹配的位置信息;储存所述与所述位置获取时间范围匹配的位置信息以形成所述位置数据集。若预设的位置数据库中的位置信息对应的坐标获取时间不在所述位置获取时间范围内,确定所述位置信息为与所述位置获取时间范围不匹配的位置信息。Specifically, the generating of the position data set according to the position information matching the position acquisition time range is specifically: determining whether the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time If the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time range, determine that the position information is the position information that matches the position acquisition time range; store the The location acquires location information matching the time range to form the location data set. If the coordinate acquisition time corresponding to the position information in the preset position database is not within the position acquisition time range, it is determined that the position information is position information that does not match the position acquisition time range.
在一些实施例中,如图4所示,步骤S110之前还可包括步骤S210。In some embodiments, as shown in FIG. 4, step S210 may be further included before step S110.
S210,根据预设的时间间隔获取所述终端的位置信息,并将所述位置信息储存至预设的位置数据。S210: Acquire location information of the terminal according to a preset time interval, and store the location information in preset location data.
预设的时间间隔可根据实际需求进行设定。预设时间间隔例如为5分钟、10分钟、30分钟。预设时间间隔越小,识别精度越高。其中,所述预设的时间间隔小于或者等于所述预设的时间周期。The preset time interval can be set according to actual needs. The preset time interval is, for example, 5 minutes, 10 minutes, and 30 minutes. The smaller the preset time interval, the higher the recognition accuracy. Wherein, the preset time interval is less than or equal to the preset time period.
S120,根据预设的第一聚类算法对所述位置数据集进行聚类处理,以得出所述位置数据集聚类处理后所对应的位置数据簇。S120: Perform clustering processing on the location data set according to a preset first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing.
预设的第一聚类算法可以为DBSCAN算法(Density-Based Spatial Clustering of Applications with Noise,具有噪声的基于密度的聚类方法)。DBSCAN算法是一种基于密度的空间聚类算法。该算法将具有足够密度的区域划分为簇,并在具有噪声的空间数据库中发现任意形状的簇,DBSCAN算法将簇定义为密度相连的点的最大集合。The preset first clustering algorithm may be the DBSCAN algorithm (Density-Based Spatial Clustering of Applications with Noise, a density-based clustering method with noise). The DBSCAN algorithm is a density-based spatial clustering algorithm. The algorithm divides areas with sufficient density into clusters, and finds clusters of arbitrary shapes in a noisy spatial database. The DBSCAN algorithm defines clusters as the largest collection of densely connected points.
通过预先为DBSCAN算法设置运算参数,以使DBSCAN算法正常运算。其中,运算参数包括扫描半径Eps以及最小包含点数MinPts。(1)扫描半径Eps表示以点P为中心的圆形邻域的范围,P为数据集中任意一个未被访问(unvisited)的数据;(2)最小包含点数MinPts表示以点P为中心的邻域内的最小包含点数MinPts。若以点P为中心、扫描半径为Eps的邻域内的点的个数不少于最小包含点数MinPts,则称点P为核心点。By presetting the operation parameters for the DBSCAN algorithm, the DBSCAN algorithm can operate normally. Among them, the calculation parameters include the scanning radius Eps and the minimum number of points contained MinPts. (1) The scan radius Eps represents the range of the circular neighborhood centered on point P, where P is any unvisited data in the data set; (2) The minimum number of points included MinPts represents the neighborhood centered on point P The minimum number of points contained within the domain MinPts. If the number of points in the neighborhood with the point P as the center and the scanning radius Eps is not less than the minimum number of contained points MinPts, then the point P is called the core point.
其中,所述运算参数可根据实际需求进行调节。若最小包含点数MinPts不变,扫描半径Eps过大,会导致大多数数据点聚到同一个簇中;若扫描半径Eps过小,会导致一个簇产生分裂。若扫描半径Eps不变,最小包含点数MinPts的过大,会导致同一个簇中点被确定为离群点,最小包含点数MinPts过小,会导致发现大量的核心点。具体实施中,扫描半径Eps可设置为2千米,最小包含点数MinPts可设置为5个。Wherein, the calculation parameters can be adjusted according to actual needs. If the minimum number of points contained in MinPts remains unchanged, and the scanning radius Eps is too large, most data points will be clustered into the same cluster; if the scanning radius Eps is too small, a cluster will be split. If the scanning radius Eps remains the same and the minimum included points MinPts is too large, it will cause the points in the same cluster to be determined as outliers. If the minimum included points MinPts is too small, a large number of core points will be found. In specific implementation, the scanning radius Eps can be set to 2 kilometers, and the minimum number of points contained MinPts can be set to 5.
S130,根据预设的第二聚类算法对所述位置数据簇进行聚类处理,以得出所述位置数据簇聚类处理后所对应的质心。S130: Perform clustering processing on the position data clusters according to a preset second clustering algorithm to obtain a centroid corresponding to the position data clusters after the clustering processing.
预设的第二聚类算法可以为K-means算法(K-Means Clustering Algorithm,K均值聚类算法)。K-means算法通过预先选取的K个对象作为初始的聚类中心,K的数值需预先设置。然后计算每个对象与各个种子聚类中心之间的距离,把每个对象分配给距离它最近的聚类中心。聚类中心以及分配给它们的对象就代表一个聚类。一旦全部对象都被分配了,每个聚类的聚类中心会根据聚类中现有的对象被重新计算。这个过程将不断重复直到满足终止条件。终止条件可以是没有(或最小数目)对象被重新分配给不同的聚类,或者是没有(或最小数目)聚类中心再发生变化,又或者是误差平方和局部最小。The preset second clustering algorithm may be the K-means algorithm (K-Means Clustering Algorithm, K-means clustering algorithm). The K-means algorithm uses pre-selected K objects as the initial cluster centers, and the value of K needs to be set in advance. Then calculate the distance between each object and each seed cluster center, and assign each object to the cluster center closest to it. The cluster centers and the objects assigned to them represent a cluster. Once all objects have been allocated, the cluster center of each cluster will be recalculated based on the existing objects in the cluster. This process will be repeated until the termination condition is met. The termination condition can be that no (or minimum number) of objects are reassigned to different clusters, or no (or minimum number) of cluster centers change again, or the sum of squared errors is locally minimum.
所述位置数据簇的数量可以为一个或者多个,根据预设的第二聚类算法对所述位置数据簇进行聚类处理,以得出所述位置数据簇聚类处理后所对应的质心具体为:根据K-means对每一个位置数据簇进行聚类处理,以得出所述位置数据簇聚类处理后所对应的质心。其中,K的数值均预先设置为1。The number of the position data clusters may be one or more, and the position data clusters are clustered according to a preset second clustering algorithm to obtain the centroid corresponding to the position data clusters after the clustering processing. Specifically, clustering is performed on each position data cluster according to K-means to obtain the centroid corresponding to the position data cluster after the clustering processing. Among them, the value of K is set to 1 in advance.
S140,判断所述质心与所述用户对应的预留位置数据是否匹配。S140: Determine whether the center of mass matches the reserved location data corresponding to the user.
所述用户对应的预留位置数据为所述用户预先储存于服务器中的位置信息,所述用户对应的预留位置数据包括但不限于家庭住址信息、办公地址信息等。The reserved position data corresponding to the user is position information pre-stored in the server by the user, and the reserved position data corresponding to the user includes but is not limited to home address information, office address information, and the like.
预设的风险位置数据的类型可以为一个或者多个,预设的风险位置数据的类型可根据所述订单数据的类型以及预设的类型映射关系确定。预设的类型映射关系用于确定订单数据类型与预设的风险位置数据类型之间的对应关系。The preset risk location data type may be one or more, and the preset risk location data type may be determined according to the type of the order data and the preset type mapping relationship. The preset type mapping relationship is used to determine the corresponding relationship between the order data type and the preset risk location data type.
例如,假设所述订单数据为保险订单数据,且所述保险订单数据对应的保单类型为重疾险。根据预设的类型映射关系以及订单数据的类型可确定所述预设的风险位置数据的类型为医院。For example, suppose that the order data is insurance order data, and the insurance policy type corresponding to the insurance order data is critical illness insurance. According to the preset type mapping relationship and the type of order data, it can be determined that the type of the preset risk location data is a hospital.
通过判断所述质心与所述用户对应的预留位置数据是否匹配,以判断通过预设的第一聚类算法以及预设的第二聚类算法聚类处理后所得到的质心是否小于预设的误差阈值。若所述 质心与所述用户对应的预留位置数据匹配,确定所得到质心小于预设的误差阈值,可根据所得到的质心进行风险用户识别。若所述质心与所述用户对应的预留位置数据不匹配,确定所得到质心不小于预设的误差阈值,则通过向管理人员发送提醒消息,提醒管理人员修改运算参数,以提高所得到的质心的准确度,进而提升风险用户设别的准确度。其中,预设的误差阈值可根据实际需求进行设定,所述预设的误差阈值例如为1千米。By judging whether the centroid matches the reserved location data corresponding to the user, it is judged whether the centroid obtained after clustering by the preset first clustering algorithm and the preset second clustering algorithm is smaller than the preset The error threshold. If the centroid matches the reserved location data corresponding to the user, it is determined that the obtained centroid is less than the preset error threshold, and risk user identification can be performed according to the obtained centroid. If the center of mass does not match the reserved position data corresponding to the user, and it is determined that the obtained center of mass is not less than the preset error threshold, a reminder message is sent to the manager to remind the manager to modify the calculation parameters to improve the obtained The accuracy of the center of mass improves the accuracy of risk user identification. Wherein, the preset error threshold can be set according to actual requirements, and the preset error threshold is, for example, 1 kilometer.
在一些实施例中,如图5所示,步骤S140包括但不限于步骤S141-S143。In some embodiments, as shown in FIG. 5, step S140 includes but is not limited to steps S141-S143.
S141,计算所述质心与所述用户对应的预留位置数据之间的距离差值。S141. Calculate the distance difference between the center of mass and the reserved location data corresponding to the user.
计算所述质心与所述用户对应的预留位置数据之间的距离差值可通过第一公式实现,所述第一公式可以为Haversine公式。其中,所述第一公式具体为:Calculating the distance difference between the center of mass and the reserved position data corresponding to the user may be implemented by a first formula, and the first formula may be a Haversine formula. Wherein, the first formula is specifically:
Figure PCTCN2020098579-appb-000001
Figure PCTCN2020098579-appb-000001
其中,havesin(θ)=sin 2(θ/2)=(1-cos(θ))/2;d1为所述质心与所述用户对应的预留位置数据之间的距离差值;R为地球半径,可取平均值6371千米;φ1,φ2表示质心与所述预留位置数据的纬度;Δλ表示质心与所述预留位置数据的经度的差值。 Where, haveesin(θ)=sin 2 (θ/2)=(1-cos(θ))/2; d1 is the distance difference between the center of mass and the reserved position data corresponding to the user; R is The radius of the earth can be an average of 6371 kilometers; φ1, φ2 represent the latitude of the center of mass and the reserved position data; Δλ represents the difference between the center of mass and the longitude of the reserved position data.
S142,判断所述质心与所述用户对应的预留位置数据之间的距离差值是否小于预设的第一差值阈值。S142: Determine whether the distance difference between the center of mass and the reserved location data corresponding to the user is less than a preset first difference threshold.
预设的第一差值阈值可根据实际需求进行设定,例如预设的第一差值阈值可设定为1千米。The preset first difference threshold may be set according to actual requirements, for example, the preset first difference threshold may be set to 1 km.
S143,若所述质心与所述用户对应的预留位置数据之间的距离差值小于预设的第一差值阈值,确定所述质心与所述用户对应的预留位置数据匹配。S143: If the distance difference between the centroid and the reserved position data corresponding to the user is less than a preset first difference threshold, determine that the centroid matches the reserved position data corresponding to the user.
若所述质心与所述用户对应的预留位置数据之间的距离差值小于预设的第一差值阈值,确定所述质心与所述用户对应的预留位置数据匹配。若所述质心与所述用户对应的预留位置数据之间的距离差值不小于预设的第一差值阈值,确定所述质心与所述用户对应的预留位置数据不匹配。If the distance difference between the centroid and the reserved position data corresponding to the user is less than the preset first difference threshold, it is determined that the centroid matches the reserved position data corresponding to the user. If the distance difference between the centroid and the reserved position data corresponding to the user is not less than the preset first difference threshold, it is determined that the centroid does not match the reserved position data corresponding to the user.
S150,若所述质心与所述用户对应的预留位置数据匹配,判断所述质心与所述预设的风险位置数据是否匹配。S150: If the center of mass matches the reserved location data corresponding to the user, determine whether the center of mass matches the preset risk location data.
若所述质心与所述用户对应的预留位置数据匹配,表明通过预设的第一聚类算法以及预设的第二聚类算法聚类处理后所得到的质心小于预设的误差阈值,所得到的质心可靠度较高,可用于进行风险用户识别,进而判断所述质心与所述预设的风险位置数据是否匹配。If the center of mass matches the reserved location data corresponding to the user, it indicates that the center of mass obtained after clustering through the preset first clustering algorithm and the preset second clustering algorithm is less than the preset error threshold, The obtained centroid has a high degree of reliability and can be used for risk user identification, so as to determine whether the centroid matches the preset risk location data.
在一些实施例中,如图6所示,步骤S150包括但不限于步骤S151-S153。In some embodiments, as shown in FIG. 6, step S150 includes but is not limited to steps S151-S153.
S151,若所述质心与所述用户对应的预留位置数据匹配,计算所述质心与所述预设的风险位置数据之间的距离差值。S151: If the center of mass matches the reserved location data corresponding to the user, calculate the distance difference between the center of mass and the preset risk location data.
计算所述质心与预设的风险位置数据之间的距离差值可通过第二公式实现,所述第二公式可以为Haversine公式。其中,所述第二公式具体为:Calculating the distance difference between the center of mass and the preset risk location data can be implemented by a second formula, and the second formula can be a Haversine formula. Wherein, the second formula is specifically:
Figure PCTCN2020098579-appb-000002
Figure PCTCN2020098579-appb-000002
其中,havesin(θ)=sin 2(θ/2)=(1-cos(θ))/2;d2为所述质心与预设的风险位置数据之间的距离差值;R为地球半径,可取平均值6371千米;φ1,φ2表示质心与所述预留位置数据的纬度;Δλ表示质心与所述预留位置数据的经度的差值。 Among them, haveesin(θ)=sin 2 (θ/2)=(1-cos(θ))/2; d2 is the distance difference between the center of mass and the preset risk location data; R is the radius of the earth, The average value is 6371 kilometers; φ1, φ2 indicate the latitude of the center of mass and the reserved position data; Δλ indicates the difference between the center of mass and the longitude of the reserved position data.
S152,判断所述质心与所述预设的风险位置数据之间的距离差值是否小于预设的第二差值阈值。S152: Determine whether the distance difference between the centroid and the preset risk location data is less than a preset second difference threshold.
预设的第二差值阈值可根据实际需求进行设定,例如预设的第二差值阈值可设定为1千米。The preset second difference threshold may be set according to actual requirements, for example, the preset second difference threshold may be set to 1 km.
S153,若所述质心与所述预设的风险位置数据之间的距离差值小于预设的第二差值阈值,确定所述质心与所述预设的风险位置数据匹配。S153: If the distance difference between the center of mass and the preset risk location data is less than a preset second difference threshold, determine that the center of mass matches the preset risk location data.
若所述质心与所述预设的风险位置数据之间的距离差值小于预设的第二差值阈值,确定所述质心与所述预设的风险位置数据匹配。若所述质心与所述预设的风险位置数据之间的距 离差值不小于预设的第二差值阈值,确定所述质心与所述预设的风险位置数据不匹配。If the distance difference between the centroid and the preset risk location data is less than a preset second difference threshold, it is determined that the centroid matches the preset risk location data. If the distance difference between the centroid and the preset risk location data is not less than the preset second difference threshold, it is determined that the centroid does not match the preset risk location data.
S160,若所述质心与所述预设的风险位置数据匹配,确定所述用户为风险用户并将所述用户对应的订单数据确定为风险数据。S160: If the center of mass matches the preset risk location data, determine that the user is a risk user and determine the order data corresponding to the user as risk data.
假设订单数据为保险订单数据,若所述质心与所述预设的风险位置数据匹配,表明所述用户发送保险订单数据之前曾在预设的风险位置(如医院等位置)处活动,可能存在带病投保的风险,进而确定所述用户为风险用户。Assuming that the order data is insurance order data, if the centroid matches the preset risk location data, it indicates that the user has been active at the preset risk location (such as a hospital, etc.) before sending the insurance order data. The risk of insuring with illness, and then determine that the user is a risk user.
若所述用户为风险用户,表明所述用户所对应的订单数据存在较大的风险,需将将所述用户对应的订单数据确定为风险数据,以便后续进行监控或者人工跟进。例如,假设所述订单数据为保险订单数据,若所述用户为风险用户,表明该保险订单数据存在诈骗的几率较大,进而将所述用户对应的保单确定为风险保单,以供审核人员进行人工排查,降低保单诈骗风险。If the user is a risk user, it indicates that the order data corresponding to the user has a greater risk, and the order data corresponding to the user needs to be determined as risk data for subsequent monitoring or manual follow-up. For example, assuming that the order data is insurance order data, if the user is a risk user, it indicates that the insurance order data has a high probability of fraud, and then the insurance policy corresponding to the user is determined as a risk insurance policy for reviewers to conduct Manual investigation reduces the risk of insurance policy fraud.
图7是本申请实施例提供的一种风险用户识别装置100的示意性框图。如图7所示,对应于以上风险用户识别方法,本申请还提供一种风险用户识别装置100。该风险用户识别装置100包括用于执行上述风险用户识别方法的单元,该装置100可以被配置于服务器中。其中,所述服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。如图7所示,所述装置100包括第一获取单元110、第一聚类单元120、第二聚类单元130、第一判断单元140、第二判断单元以及订单确定单元160。FIG. 7 is a schematic block diagram of a risk user identification device 100 provided by an embodiment of the present application. As shown in FIG. 7, corresponding to the above risk user identification method, the present application also provides a risk user identification device 100. The risk user identification device 100 includes a unit for executing the above risk user identification method, and the device 100 may be configured in a server. The server may be an independent server or a server cluster composed of multiple servers. As shown in FIG. 7, the device 100 includes a first obtaining unit 110, a first clustering unit 120, a second clustering unit 130, a first judging unit 140, a second judging unit, and an order determining unit 160.
第一获取单元110,用于若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集,所述位置数据集包括至少两个所述终端的位置信息。The first obtaining unit 110 is configured to obtain a location data set corresponding to the terminal if the order data sent by the user through the terminal is received, the location data set including location information of at least two of the terminals.
在一些实施例中,如图8所示,所述第一获取单元110包括第一生成单元111以及第二生成单元112。其中,第一生成单元111,用于若接收到用户通过终端发送的订单数据,根据所述订单数据的发送时间以及预设的时间周期生成位置获取时间范围。第二生成单元112,用于根据所述终端对应的终端标识码,在预设的位置数据库中获取与所述位置获取时间范围匹配的位置信息,并根据所述与所述位置获取时间范围匹配的位置信息生成所述位置数据集。In some embodiments, as shown in FIG. 8, the first obtaining unit 110 includes a first generating unit 111 and a second generating unit 112. The first generating unit 111 is configured to generate a location acquisition time range according to the sending time of the order data and a preset time period if the order data sent by the user through the terminal is received. The second generating unit 112 is configured to obtain the location information matching the location acquisition time range in a preset location database according to the terminal identification code corresponding to the terminal, and according to the location information matching the location acquisition time range The location information of generates the location data set.
在一些实施例中,如图9所示,所述装置100还包括位置储存单元210。其中,位置储存单元210,用于根据预设的时间间隔获取所述终端的位置信息,并将所述位置信息储存至预设的位置数据。In some embodiments, as shown in FIG. 9, the device 100 further includes a position storage unit 210. Wherein, the location storage unit 210 is configured to obtain location information of the terminal according to a preset time interval, and store the location information to preset location data.
第一聚类单元120,用于根据预设的第一聚类算法对所述位置数据集进行聚类处理,以得出所述位置数据集聚类处理后所对应的位置数据簇。The first clustering unit 120 is configured to perform clustering processing on the position data set according to a preset first clustering algorithm to obtain a position data cluster corresponding to the position data set after the clustering processing.
第二聚类单元130,用于根据预设的第二聚类算法对所述位置数据簇进行聚类处理,以得出所述位置数据簇聚类处理后所对应的质心。The second clustering unit 130 is configured to perform clustering processing on the position data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the position data clusters after the clustering processing.
第一判断单元140,用于根据所述质心、所述用户对应的预留位置数据以及预设的风险位置数据判断所述用户是否为风险用户。The first determining unit 140 is configured to determine whether the user is a risk user according to the centroid, the reserved location data corresponding to the user, and preset risk location data.
在一些实施例中,如图10所示,所述第一判断单元140包括第一计算单元141、第四判断单元142以及第二确定单元143。其中,第一计算单元141,用于计算所述质心与所述用户对应的预留位置数据之间的距离差值。第四判断单元142,用于判断所述质心与所述用户对应的预留位置数据之间的距离差值是否小于预设的第一差值阈值。第二确定单元143,用于若所述质心与所述用户对应的预留位置数据之间的距离差值小于预设的第一差值阈值,确定所述质心与所述用户对应的预留位置数据匹配。In some embodiments, as shown in FIG. 10, the first judgment unit 140 includes a first calculation unit 141, a fourth judgment unit 142 and a second determination unit 143. Wherein, the first calculation unit 141 is configured to calculate the distance difference between the center of mass and the reserved position data corresponding to the user. The fourth determining unit 142 is configured to determine whether the distance difference between the center of mass and the reserved position data corresponding to the user is less than a preset first difference threshold. The second determining unit 143 is configured to determine a reservation corresponding to the centroid and the user if the distance difference between the centroid and the reserved position data corresponding to the user is less than a preset first difference threshold. The location data matches.
第二判断单元150,用于若所述质心与所述用户对应的预留位置数据匹配,判断所述质心与所述预设的风险位置数据是否匹配。The second determining unit 150 is configured to determine whether the center of mass matches the preset risk location data if the center of mass matches the reserved location data corresponding to the user.
在一些实施例中,如图11所示,所述第二判断单元150包括第二计算单元151、第五判断单元152以及第三确定单元153。其中,第二计算单元151,用于若所述质心与所述用户对应的预留位置数据匹配,计算所述质心与所述预设的风险位置数据之间的距离差值。第五判断单元152,用于判断所述质心与所述预设的风险位置数据之间的距离差值是否小于预设的第二差值阈值。第三确定单元153,用于若所述质心与所述预设的风险位置数据之间的距离 差值小于预设的第二差值阈值,确定所述质心与所述预设的风险位置数据匹配。In some embodiments, as shown in FIG. 11, the second judgment unit 150 includes a second calculation unit 151, a fifth judgment unit 152 and a third determination unit 153. The second calculation unit 151 is configured to calculate the distance difference between the center of mass and the preset risk position data if the center of mass matches the reserved position data corresponding to the user. The fifth determining unit 152 is configured to determine whether the distance difference between the centroid and the preset risk location data is smaller than a preset second difference threshold. The third determining unit 153 is configured to determine the center of mass and the preset risk location data if the distance difference between the center of mass and the preset risk location data is less than a preset second difference threshold. match.
订单确定单元160,用于若所述用户为风险用户,将所述用户对应的订单数据确定为风险数据。The order determination unit 160 is configured to determine the order data corresponding to the user as risk data if the user is a risk user.
需要说明的是,所属领域的技术人员可以清楚地了解到,上述风险用户识别装置100和各单元的具体实现过程,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。It should be noted that those skilled in the art can clearly understand that the specific implementation process of the above risk user identification device 100 and each unit can refer to the corresponding description in the foregoing method embodiment. For the convenience and brevity of the description, No longer.
上述装置100可以实现为一种计算机程序的形式,计算机程序可以在如图12所示的计算机设备上运行。The above-mentioned apparatus 100 may be implemented in the form of a computer program, and the computer program may run on a computer device as shown in FIG.
请参阅图12,图12是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备500可以是服务器中。其中,所述服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。该计算机设备500包括通过系统总线510连接的处理器520、存储器和网络接口550,其中,存储器可以包括非易失性存储介质530和内存储器540。Please refer to FIG. 12, which is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be in a server. The server may be an independent server or a server cluster composed of multiple servers. The computer device 500 includes a processor 520, a memory, and a network interface 550 connected through a system bus 510, where the memory may include a non-volatile storage medium 530 and an internal memory 540.
该非易失性存储介质530可存储操作系统531和计算机程序532。该计算机程序532被执行时,可使得处理器520执行一种风险用户识别方法。该处理器520用于提供计算和控制能力,支撑整个计算机设备500的运行。该内存储器540为非易失性存储介质中的计算机程序的运行提供环境,该计算机程序被处理器520执行时,可使得处理器520执行一种风险用户识别方法。该网络接口550用于与其它设备进行网络通信。本领域技术人员可以理解,该计算机设备的示意性框图仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。The non-volatile storage medium 530 can store an operating system 531 and a computer program 532. When the computer program 532 is executed, the processor 520 can execute a risk user identification method. The processor 520 is used to provide calculation and control capabilities, and support the operation of the entire computer device 500. The internal memory 540 provides an environment for the running of a computer program in a non-volatile storage medium. When the computer program is executed by the processor 520, the processor 520 can execute a risk user identification method. The network interface 550 is used for network communication with other devices. Those skilled in the art can understand that the schematic block diagram of the computer device is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied. The specific computer device 500 It may include more or fewer components than shown in the figures, or combine certain components, or have a different component arrangement.
其中,所述处理器520用于运行存储在存储器中的计算机程序,以实现上述风险用户识别方法的任一实施例。Wherein, the processor 520 is configured to run a computer program stored in a memory to implement any embodiment of the risk user identification method described above.
应当理解,在本申请实施例中,处理器520可以是中央处理单元(Central Processing Unit,CPU),该处理器520还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that, in this embodiment of the application, the processor 520 may be a central processing unit (Central Processing Unit, CPU), and the processor 520 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Among them, the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
本领域普通技术人员可以理解的是实现上述实施例的方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成。该计算机程序可存储于一存储介质中,该存储介质可以为计算机可读存储介质。该计算机程序被该计算机系统中的至少一个处理器执行,以实现上述方法的实施例的流程步骤。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by computer programs instructing relevant hardware. The computer program may be stored in a storage medium, and the storage medium may be a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the process steps of the foregoing method embodiment.
因此,本申请还提供了一种计算机可读存储介质。该计算机可读存储介质可以是非易失性,也可以是易失性。该存储介质存储有计算机程序,该计算机程序当被处理器执行时实现上述风险用户识别方法的任一实施例。Therefore, this application also provides a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The storage medium stores a computer program that, when executed by a processor, implements any embodiment of the risk user identification method described above.
该计算机可读存储介质可以是U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The computer-readable storage medium may be a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a magnetic disk, or an optical disk, and other media that can store program codes.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置、设备和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置、设备和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。In the several embodiments provided in this application, it should be understood that the disclosed devices, equipment, and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative, and the division of the units is only a logical function division, and there may be other division methods in actual implementation. Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working processes of the devices, equipment and units described above can refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here. The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims (20)

  1. 一种风险用户识别方法,其中,所述方法包括:A method for identifying risky users, wherein the method includes:
    若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集,所述位置数据集包括至少两个所述终端的位置信息;If the order data sent by the user through the terminal is received, the location data set corresponding to the terminal is acquired, the location data set includes the location information of at least two of the terminals;
    根据预设的第一聚类算法对所述位置数据集进行聚类处理,以得出所述位置数据集聚类处理后所对应的位置数据簇;Performing clustering processing on the location data set according to a preset first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing;
    根据预设的第二聚类算法对所述位置数据簇进行聚类处理,以得出所述位置数据簇聚类处理后所对应的质心;Performing clustering processing on the location data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the clustering location data clusters;
    判断所述质心与所述用户对应的预留位置数据是否匹配;Judging whether the center of mass matches the reserved location data corresponding to the user;
    若所述质心与所述用户对应的预留位置数据匹配,判断所述质心与所述预设的风险位置数据是否匹配;If the centroid matches the reserved location data corresponding to the user, determining whether the centroid matches the preset risk location data;
    若所述质心与所述预设的风险位置数据匹配,确定所述用户为风险用户并将所述用户对应的订单数据确定为风险数据。If the centroid matches the preset risk location data, it is determined that the user is a risk user and the order data corresponding to the user is determined as risk data.
  2. 如权利要求1所述的方法,其中,所述若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集,包括:The method according to claim 1, wherein, if the order data sent by the user through the terminal is received, obtaining the location data set corresponding to the terminal comprises:
    若接收到用户通过终端发送的订单数据,根据所述订单数据的发送时间以及预设的时间周期生成位置获取时间范围;If the order data sent by the user through the terminal is received, the location acquisition time range is generated according to the sending time of the order data and the preset time period;
    根据所述终端对应的终端标识码,在预设的位置数据库中获取与所述位置获取时间范围匹配的位置信息,并根据所述与所述位置获取时间范围匹配的位置信息生成所述位置数据集。According to the terminal identification code corresponding to the terminal, obtain the location information matching the location acquisition time range in a preset location database, and generate the location data according to the location information matching the location acquisition time range set.
  3. 如权利要求1所述的方法,其中,所述若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集之前,还包括:The method according to claim 1, wherein, if the order data sent by the user through the terminal is received, before obtaining the location data set corresponding to the terminal, the method further comprises:
    根据预设的时间间隔获取所述终端的位置信息,并将所述位置信息储存至预设的位置数据。Obtain the location information of the terminal according to a preset time interval, and store the location information in preset location data.
  4. 如权利要求1所述的方法,其中,所述判断所述质心与所述用户对应的预留位置数据是否匹配,包括:The method of claim 1, wherein the determining whether the center of mass matches the reserved location data corresponding to the user comprises:
    计算所述质心与所述用户对应的预留位置数据之间的距离差值;Calculating the distance difference between the centroid and the reserved location data corresponding to the user;
    判断所述质心与所述用户对应的预留位置数据之间的距离差值是否小于预设的第一差值阈值;Judging whether the distance difference between the centroid and the reserved location data corresponding to the user is less than a preset first difference threshold;
    若所述质心与所述用户对应的预留位置数据之间的距离差值小于预设的第一差值阈值,确定所述质心与所述用户对应的预留位置数据匹配。If the distance difference between the centroid and the reserved position data corresponding to the user is less than the preset first difference threshold, it is determined that the centroid matches the reserved position data corresponding to the user.
  5. 如权利要求1所述的方法,其中,所述若所述质心与所述用户对应的预留位置数据匹配,判断所述质心与所述预设的风险位置数据是否匹配,包括:The method according to claim 1, wherein, if the center of mass matches the reserved location data corresponding to the user, determining whether the center of mass matches the preset risk location data comprises:
    若所述质心与所述用户对应的预留位置数据匹配,计算所述质心与所述预设的风险位置数据之间的距离差值;If the center of mass matches the reserved location data corresponding to the user, calculating the distance difference between the center of mass and the preset risk location data;
    判断所述质心与所述预设的风险位置数据之间的距离差值是否小于预设的第二差值阈值;Judging whether the distance difference between the centroid and the preset risk location data is less than a preset second difference threshold;
    若所述质心与所述预设的风险位置数据之间的距离差值小于预设的第二差值阈值,确定所述质心与所述预设的风险位置数据匹配。If the distance difference between the centroid and the preset risk location data is less than a preset second difference threshold, it is determined that the centroid matches the preset risk location data.
  6. 如权利要求1所述的方法,其中,所述获取所述终端对应的位置数据集,包括:The method according to claim 1, wherein said acquiring a location data set corresponding to said terminal comprises:
    获取所述终端对应多个位置信息以获取所述终端对应的位置数据集,其中,所述终端对应多个位置信息的集合为所述终端对应的位置数据集。A plurality of location information corresponding to the terminal is obtained to obtain a location data set corresponding to the terminal, wherein a set of the multiple location information corresponding to the terminal is a location data set corresponding to the terminal.
  7. 如权利要求2所述的方法,其中,所述根据所述订单数据的发送时间以及预设的时间周期生成位置获取时间范围,包括:The method according to claim 2, wherein said generating a position acquisition time range according to the sending time of the order data and a preset time period comprises:
    将所述订单数据的发送时间与预设的时间周期相减以得出位置获取开始时间并将所述订单数据的发送时间确定为位置获取结束时间;Subtracting the sending time of the order data with a preset time period to obtain a location acquisition start time and determining the sending time of the order data as the location acquisition end time;
    将所述位置获取开始时间与位置获取结束时间之间的时间范围确定为位置获取时间范围。The time range between the position acquisition start time and the position acquisition end time is determined as the position acquisition time range.
  8. 如权利要求2所述的方法,其中,所述根据所述与所述位置获取时间范围匹配的位置 信息生成所述位置数据集,包括:The method according to claim 2, wherein said generating said location data set according to said location information matching said location acquisition time range comprises:
    判断预设的位置数据库中的位置信息对应的坐标获取时间是否在所述位置获取时间范围内;Determine whether the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time range;
    若预设的位置数据库中的位置信息对应的坐标获取时间在所述位置获取时间范围内,确定所述位置信息为与所述位置获取时间范围匹配的位置信息并储存所述与所述位置获取时间范围匹配的位置信息以形成所述位置数据集。If the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time range, determine that the position information is the position information matching the position acquisition time range and store the The location information matched in the time range forms the location data set.
  9. 一种风险用户识别装置,其中,所述装置包括:A risk user identification device, wherein the device includes:
    第一获取单元,用于若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集,所述位置数据集包括至少两个所述终端的位置信息;The first obtaining unit is configured to obtain a location data set corresponding to the terminal if the order data sent by the user through the terminal is received, the location data set including the location information of at least two of the terminals;
    第一聚类单元,用于根据预设的第一聚类算法对所述位置数据集进行聚类处理,以得出所述位置数据集聚类处理后所对应的位置数据簇;The first clustering unit is configured to perform clustering processing on the position data set according to a preset first clustering algorithm to obtain a position data cluster corresponding to the position data set after the clustering processing;
    第二聚类单元,用于根据预设的第二聚类算法对所述位置数据簇进行聚类处理,以得出所述位置数据簇聚类处理后所对应的质心;The second clustering unit is configured to perform clustering processing on the position data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the position data clusters after the clustering processing;
    第一判断单元,用于判断所述质心与所述用户对应的预留位置数据是否匹配;The first determining unit is configured to determine whether the center of mass matches the reserved location data corresponding to the user;
    第二判断单元,用于若所述质心与所述用户对应的预留位置数据匹配,判断所述质心与所述预设的风险位置数据是否匹配;A second determining unit, configured to determine whether the center of mass matches the preset risk location data if the center of mass matches the reserved location data corresponding to the user;
    订单确定单元,用于若所述质心与所述预设的风险位置数据匹配,确定所述用户为风险用户并将所述用户对应的订单数据确定为风险数据。The order determination unit is configured to determine that the user is a risk user and determine the order data corresponding to the user as risk data if the center of mass matches the preset risk location data.
  10. 一种计算机设备,所述计算机设备包括存储器,以及与所述存储器相连的处理器;A computer device, the computer device includes a memory, and a processor connected to the memory;
    所述存储器用于存储计算机程序;所述处理器用于运行所述存储器中存储的计算机程序,以执行以下步骤:The memory is used to store a computer program; the processor is used to run the computer program stored in the memory to perform the following steps:
    若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集,所述位置数据集包括至少两个所述终端的位置信息;If the order data sent by the user through the terminal is received, the location data set corresponding to the terminal is acquired, the location data set includes the location information of at least two of the terminals;
    根据预设的第一聚类算法对所述位置数据集进行聚类处理,以得出所述位置数据集聚类处理后所对应的位置数据簇;Performing clustering processing on the location data set according to a preset first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing;
    根据预设的第二聚类算法对所述位置数据簇进行聚类处理,以得出所述位置数据簇聚类处理后所对应的质心;Performing clustering processing on the location data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the clustering location data clusters;
    判断所述质心与所述用户对应的预留位置数据是否匹配;Judging whether the center of mass matches the reserved location data corresponding to the user;
    若所述质心与所述用户对应的预留位置数据匹配,判断所述质心与所述预设的风险位置数据是否匹配;If the centroid matches the reserved location data corresponding to the user, determining whether the centroid matches the preset risk location data;
    若所述质心与所述预设的风险位置数据匹配,确定所述用户为风险用户并将所述用户对应的订单数据确定为风险数据。If the centroid matches the preset risk location data, it is determined that the user is a risk user and the order data corresponding to the user is determined as risk data.
  11. 如权利要求10所述的计算机设备,其中,所述若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集,包括:The computer device according to claim 10, wherein, if the order data sent by the user through the terminal is received, obtaining the location data set corresponding to the terminal comprises:
    若接收到用户通过终端发送的订单数据,根据所述订单数据的发送时间以及预设的时间周期生成位置获取时间范围;If the order data sent by the user through the terminal is received, the location acquisition time range is generated according to the sending time of the order data and the preset time period;
    根据所述终端对应的终端标识码,在预设的位置数据库中获取与所述位置获取时间范围匹配的位置信息,并根据所述与所述位置获取时间范围匹配的位置信息生成所述位置数据集。According to the terminal identification code corresponding to the terminal, obtain the location information matching the location acquisition time range in a preset location database, and generate the location data according to the location information matching the location acquisition time range set.
  12. 如权利要求10所述的计算机设备,其中,所述若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集之前,还包括:The computer device according to claim 10, wherein, if the order data sent by the user through the terminal is received, before obtaining the location data set corresponding to the terminal, the method further comprises:
    根据预设的时间间隔获取所述终端的位置信息,并将所述位置信息储存至预设的位置数据。Obtain the location information of the terminal according to a preset time interval, and store the location information in preset location data.
  13. 如权利要求10所述的计算机设备,其中,所述判断所述质心与所述用户对应的预留位置数据是否匹配,包括:The computer device according to claim 10, wherein said determining whether said center of mass matches said reserved location data corresponding to said user comprises:
    计算所述质心与所述用户对应的预留位置数据之间的距离差值;Calculating the distance difference between the centroid and the reserved location data corresponding to the user;
    判断所述质心与所述用户对应的预留位置数据之间的距离差值是否小于预设的第一差值 阈值;Judging whether the distance difference between the centroid and the reserved location data corresponding to the user is less than a preset first difference threshold;
    若所述质心与所述用户对应的预留位置数据之间的距离差值小于预设的第一差值阈值,确定所述质心与所述用户对应的预留位置数据匹配。If the distance difference between the centroid and the reserved position data corresponding to the user is less than the preset first difference threshold, it is determined that the centroid matches the reserved position data corresponding to the user.
  14. 如权利要求10所述的计算机设备,其中,所述若所述质心与所述用户对应的预留位置数据匹配,判断所述质心与所述预设的风险位置数据是否匹配,包括:The computer device of claim 10, wherein, if the center of mass matches the reserved location data corresponding to the user, determining whether the center of mass matches the preset risk location data comprises:
    若所述质心与所述用户对应的预留位置数据匹配,计算所述质心与所述预设的风险位置数据之间的距离差值;If the center of mass matches the reserved location data corresponding to the user, calculating the distance difference between the center of mass and the preset risk location data;
    判断所述质心与所述预设的风险位置数据之间的距离差值是否小于预设的第二差值阈值;Judging whether the distance difference between the centroid and the preset risk location data is less than a preset second difference threshold;
    若所述质心与所述预设的风险位置数据之间的距离差值小于预设的第二差值阈值,确定所述质心与所述预设的风险位置数据匹配。If the distance difference between the centroid and the preset risk location data is less than a preset second difference threshold, it is determined that the centroid matches the preset risk location data.
  15. 如权利要求10所述的计算机设备,其中,所述获取所述终端对应的位置数据集,包括:10. The computer device according to claim 10, wherein said acquiring a location data set corresponding to said terminal comprises:
    获取所述终端对应多个位置信息以获取所述终端对应的位置数据集,其中,所述终端对应多个位置信息的集合为所述终端对应的位置数据集。A plurality of location information corresponding to the terminal is obtained to obtain a location data set corresponding to the terminal, wherein a set of the multiple location information corresponding to the terminal is a location data set corresponding to the terminal.
  16. 如权利要求11所述的计算机设备,其中,所述根据所述订单数据的发送时间以及预设的时间周期生成位置获取时间范围,包括:11. The computer device according to claim 11, wherein said generating a location acquisition time range according to the sending time of the order data and a preset time period comprises:
    将所述订单数据的发送时间与预设的时间周期相减以得出位置获取开始时间并将所述订单数据的发送时间确定为位置获取结束时间;Subtracting the sending time of the order data with a preset time period to obtain a location acquisition start time and determining the sending time of the order data as the location acquisition end time;
    将所述位置获取开始时间与位置获取结束时间之间的时间范围确定为位置获取时间范围。The time range between the position acquisition start time and the position acquisition end time is determined as the position acquisition time range.
  17. 如权利要求11所述的计算机设备,其中,所述根据所述与所述位置获取时间范围匹配的位置信息生成所述位置数据集,包括:The computer device according to claim 11, wherein said generating said location data set according to said location information matching the time range of said location acquisition comprises:
    判断预设的位置数据库中的位置信息对应的坐标获取时间是否在所述位置获取时间范围内;Determine whether the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time range;
    若预设的位置数据库中的位置信息对应的坐标获取时间在所述位置获取时间范围内,确定所述位置信息为与所述位置获取时间范围匹配的位置信息并储存所述与所述位置获取时间范围匹配的位置信息以形成所述位置数据集。If the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time range, determine that the position information is the position information matching the position acquisition time range and store the The location information matched in the time range forms the location data set.
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行以下操作:A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program that, when executed by a processor, causes the processor to perform the following operations:
    若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集,所述位置数据集包括至少两个所述终端的位置信息;If the order data sent by the user through the terminal is received, the location data set corresponding to the terminal is acquired, the location data set includes the location information of at least two of the terminals;
    根据预设的第一聚类算法对所述位置数据集进行聚类处理,以得出所述位置数据集聚类处理后所对应的位置数据簇;Performing clustering processing on the location data set according to a preset first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing;
    根据预设的第二聚类算法对所述位置数据簇进行聚类处理,以得出所述位置数据簇聚类处理后所对应的质心;Performing clustering processing on the location data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the clustering location data clusters;
    判断所述质心与所述用户对应的预留位置数据是否匹配;Judging whether the center of mass matches the reserved location data corresponding to the user;
    若所述质心与所述用户对应的预留位置数据匹配,判断所述质心与所述预设的风险位置数据是否匹配;If the centroid matches the reserved location data corresponding to the user, determining whether the centroid matches the preset risk location data;
    若所述质心与所述预设的风险位置数据匹配,确定所述用户为风险用户并将所述用户对应的订单数据确定为风险数据。If the centroid matches the preset risk location data, it is determined that the user is a risk user and the order data corresponding to the user is determined as risk data.
  19. 如权利要求18所述的计算机可读存储介质,其中,所述若接收到用户通过终端发送的订单数据,获取所述终端对应的位置数据集,包括:18. The computer-readable storage medium of claim 18, wherein, if the order data sent by the user through the terminal is received, obtaining the location data set corresponding to the terminal comprises:
    若接收到用户通过终端发送的订单数据,根据所述订单数据的发送时间以及预设的时间周期生成位置获取时间范围;If the order data sent by the user through the terminal is received, the location acquisition time range is generated according to the sending time of the order data and the preset time period;
    根据所述终端对应的终端标识码,在预设的位置数据库中获取与所述位置获取时间范围匹配的位置信息,并根据所述与所述位置获取时间范围匹配的位置信息生成所述位置数据集。According to the terminal identification code corresponding to the terminal, obtain the location information matching the location acquisition time range in a preset location database, and generate the location data according to the location information matching the location acquisition time range set.
  20. 如权利要求18所述的计算机可读存储介质,其中,所述若接收到用户通过终端发送 的订单数据,获取所述终端对应的位置数据集之前,还包括:The computer-readable storage medium according to claim 18, wherein, if the order data sent by the user through the terminal is received, before obtaining the location data set corresponding to the terminal, the method further comprises:
    根据预设的时间间隔获取所述终端的位置信息,并将所述位置信息储存至预设的位置数据。Obtain the location information of the terminal according to a preset time interval, and store the location information in preset location data.
PCT/CN2020/098579 2019-08-13 2020-06-28 Risky user identification method and apparatus, computer device, and storage medium WO2021027407A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910746104.X 2019-08-13
CN201910746104.XA CN110689218A (en) 2019-08-13 2019-08-13 Risk user identification method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021027407A1 true WO2021027407A1 (en) 2021-02-18

Family

ID=69108197

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/098579 WO2021027407A1 (en) 2019-08-13 2020-06-28 Risky user identification method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN110689218A (en)
WO (1) WO2021027407A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689218A (en) * 2019-08-13 2020-01-14 平安科技(深圳)有限公司 Risk user identification method and device, computer equipment and storage medium
CN111400663B (en) * 2020-03-17 2022-06-14 深圳前海微众银行股份有限公司 Model training method, device, equipment and computer readable storage medium
CN112527936A (en) * 2020-12-16 2021-03-19 平安科技(深圳)有限公司 Statistical method and device for disaster density, computer equipment and storage medium
CN112907257B (en) * 2021-04-26 2024-03-26 中国工商银行股份有限公司 Risk threshold determining method and device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300919A1 (en) * 2014-12-30 2017-10-19 Alibaba Group Holding Limited Transaction risk detection method and apparatus
CN109118119A (en) * 2018-09-06 2019-01-01 多点生活(成都)科技有限公司 Air control model generating method and device
CN109191226A (en) * 2018-06-29 2019-01-11 阿里巴巴集团控股有限公司 risk control method and device
CN109409902A (en) * 2018-09-04 2019-03-01 平安普惠企业管理有限公司 Risk subscribers recognition methods, device, computer equipment and storage medium
CN109978075A (en) * 2019-04-04 2019-07-05 江苏满运软件科技有限公司 Vehicle dummy location information identifying method, device, electronic equipment, storage medium
CN110689218A (en) * 2019-08-13 2020-01-14 平安科技(深圳)有限公司 Risk user identification method and device, computer equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106651603A (en) * 2016-12-29 2017-05-10 平安科技(深圳)有限公司 Risk evaluation method and apparatus based on position service
CN109784636A (en) * 2018-12-13 2019-05-21 中国平安财产保险股份有限公司 Fraudulent user recognition methods, device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300919A1 (en) * 2014-12-30 2017-10-19 Alibaba Group Holding Limited Transaction risk detection method and apparatus
CN109191226A (en) * 2018-06-29 2019-01-11 阿里巴巴集团控股有限公司 risk control method and device
CN109409902A (en) * 2018-09-04 2019-03-01 平安普惠企业管理有限公司 Risk subscribers recognition methods, device, computer equipment and storage medium
CN109118119A (en) * 2018-09-06 2019-01-01 多点生活(成都)科技有限公司 Air control model generating method and device
CN109978075A (en) * 2019-04-04 2019-07-05 江苏满运软件科技有限公司 Vehicle dummy location information identifying method, device, electronic equipment, storage medium
CN110689218A (en) * 2019-08-13 2020-01-14 平安科技(深圳)有限公司 Risk user identification method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110689218A (en) 2020-01-14

Similar Documents

Publication Publication Date Title
WO2021027407A1 (en) Risky user identification method and apparatus, computer device, and storage medium
TWI693527B (en) Position information processing method and device
CN109995884B (en) Method and apparatus for determining precise geographic location
US20140122684A1 (en) Early access to user-specific data for behavior prediction
US10496993B1 (en) DNS-based device geolocation
US20190289433A1 (en) Mobile device location proofing
CN111104825A (en) Face registry updating method, device, equipment and medium
US9787557B2 (en) Determining semantic place names from location reports
CN107948274B (en) Transaction authentication method and system, server, and storage medium
CN111512288A (en) Mapping entities to accounts
US10542384B1 (en) Efficient risk model computations
CN108319550A (en) A kind of test system and test method
CN110991903A (en) Service personnel allocation method, device, equipment and storage medium
CN111222960A (en) Room source recommendation method and system based on public traffic zone
CN110764979A (en) Log identification method, system, electronic device and computer readable medium
US10762103B2 (en) Calculating representative location information for network addresses
US10708713B2 (en) Systems and methods for beacon location verification
CN110619253B (en) Identity recognition method and device
CN112037052B (en) User behavior detection method and device
CN106817296B (en) Information recommendation test method and device and electronic equipment
CN107292111B (en) Information processing method and server
US10536466B1 (en) Risk assessment of electronic communication using time zone data
CN111310242B (en) Method and device for generating device fingerprint, storage medium and electronic device
CN113765850A (en) Internet of things anomaly detection method and device, computing equipment and computer storage medium
US20220222752A1 (en) Methods for analyzing insurance data and devices thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20851871

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20851871

Country of ref document: EP

Kind code of ref document: A1