WO2021027407A1

WO2021027407A1 - Risky user identification method and apparatus, computer device, and storage medium

Info

Publication number: WO2021027407A1
Application number: PCT/CN2020/098579
Authority: WO
Inventors: 丁露涛
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-08-13
Filing date: 2020-06-28
Publication date: 2021-02-18
Also published as: CN110689218A

Abstract

A risky user identification method and apparatus, a computer device, and a storage medium, wherein same are applied to the field of data processing and belong to artificial intelligence technology. The method comprises: if order data sent by a user through a terminal is received, acquiring a location data set corresponding to the terminal; performing clustering processing on the location data set according to a pre-set first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing; performing clustering processing on the location data cluster according to a pre-set second clustering algorithm to obtain a centroid corresponding to the location data cluster after the clustering processing; determining, according to the centroid, reserved location data corresponding to the user, and pre-set risky location data, whether the user is a risky user; and if the user is a risky user, determining the order data corresponding to the user to be risky data.

Description

Risk user identification method, device, computer equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on August 13, 2019, the application number is 201910746104.X, and the invention title is "risk user identification method, device, computer equipment and storage medium", and its entire content Incorporated in this application by reference.

Technical field

This application relates to the field of computer data processing, and in particular to a risk user identification method, device, computer equipment and computer-readable storage medium.

Background technique

With the rapid development of the Internet, more and more products can be traded via the Internet (such as commodity transactions, service transactions, etc.). In order to ensure the security of transactions using the Internet, it is necessary to identify risky users (for example, advertisers who operate fraudulent websites, businesses who operate illegal products, users who falsify information and fraudulent insurance, etc.) and avoid their participation in transactions. In the prior art, a way to check risk users is that after the user submits an order, the order data is manually identified for risk. The inventor realizes that the manual identification of risky users is not only susceptible to subjective factors, resulting in low identification accuracy, but also takes a long time, resulting in slow identification speed.

Summary of the invention

The embodiments of the present application provide a method, device, computer equipment, and storage medium for identifying risky users, aiming to solve the problems of low identification accuracy and slow identification speed of risky users.

In the first aspect, an embodiment of the present application provides a risk user identification method, which includes:

If the order data sent by the user through the terminal is received, the location data set corresponding to the terminal is acquired, the location data set includes the location information of at least two of the terminals;

Performing clustering processing on the location data set according to a preset first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing;

Performing clustering processing on the location data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the clustering processing of the location data clusters;

Judging whether the center of mass matches the reserved location data corresponding to the user;

If the centroid matches the reserved location data corresponding to the user, determining whether the centroid matches the preset risk location data;

If the centroid matches the preset risk location data, it is determined that the user is a risk user and the order data corresponding to the user is determined as risk data.

In the second aspect, an embodiment of the present application provides a risk user identification device, which includes:

The first obtaining unit is configured to obtain a location data set corresponding to the terminal if the order data sent by the user through the terminal is received, the location data set including the location information of at least two of the terminals;

The first clustering unit is configured to perform clustering processing on the position data set according to a preset first clustering algorithm to obtain a position data cluster corresponding to the position data set after the clustering processing;

The second clustering unit is configured to perform clustering processing on the position data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the position data clusters after the clustering processing;

The first determining unit is configured to determine whether the center of mass matches the reserved location data corresponding to the user;

A second determining unit, configured to determine whether the center of mass matches the preset risk location data if the center of mass matches the reserved location data corresponding to the user;

The order determination unit is configured to determine that the user is a risk user and determine the order data corresponding to the user as risk data if the center of mass matches the preset risk location data.

In the third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor executes all Perform the following steps when describing the procedure:

Performing clustering processing on the location data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the clustering location data clusters;

In a fourth aspect, the embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which when executed by a processor causes the processor to perform the following steps :

In the embodiment of this application, the position data set is clustered by the preset first clustering algorithm and the preset second clustering algorithm clustering to obtain the centroid; and then according to the centroid and the reserved position data corresponding to the user And the preset risk location data realizes the identification of risk users, which is not affected by human subjective factors throughout the process, which is beneficial to improve the accuracy and speed of identification of risk users.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.

FIG. 1 is a schematic flowchart of a risk user identification method provided by an embodiment of this application;

FIG. 2 is a schematic diagram of an application scenario of a risk user identification method provided by an embodiment of this application;

FIG. 3 is a schematic diagram of another process of a risk user identification method provided by an embodiment of this application;

FIG. 4 is a schematic diagram of another process of a risk user identification method provided by an embodiment of this application;

FIG. 5 is another flowchart of a method for identifying risky users according to an embodiment of this application;

FIG. 6 is a schematic diagram of another process of a risk user identification method provided by an embodiment of this application;

FIG. 7 is a schematic block diagram of a risk user identification device provided by an embodiment of this application;

FIG. 8 is another schematic block diagram of a risk user identification device provided by an embodiment of this application;

FIG. 9 is another schematic block diagram of a risk user identification device provided by an embodiment of this application;

FIG. 10 is another schematic block diagram of a risk user identification device provided by an embodiment of this application;

FIG. 11 is another schematic block diagram of a risk user identification device provided by an embodiment of this application;

FIG. 12 is a schematic block diagram of a computer device provided by an embodiment of this application.

detailed description

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

Please refer to FIG. 1, which is a schematic flowchart of a risk user identification method provided by an embodiment of the application. The risk user identification method provided in the embodiment of the present application can be applied to the server 20. The server 20 may be a server used in an enterprise to process order data for risk user identification. The server 20 may be an independent server, or a server cluster composed of multiple servers. The server 20 can establish a communication connection with the terminal 10 for data exchange. For example, the server 20 can establish a communication connection with the terminal 10 to receive order data sent by the terminal. Wherein, the terminal 10 may be an electronic terminal such as a mobile phone, a tablet computer, or a desktop computer.

As shown in Fig. 1, the risk user identification method includes steps S110-S160.

S110: If order data sent by a user through a terminal is received, a location data set corresponding to the terminal is obtained, where the location data set includes location information of at least two of the terminals.

The terminal can realize data interaction by establishing a communication connection with the server. The user can send order data to the server by operating the terminal. The order data may be order data of various commodities. For example, order data includes, but is not limited to: travel order data, takeaway order data, insurance order data, etc.

Wherein, the location data set corresponding to the terminal includes location information of at least two of the terminals. The acquiring of the location data set corresponding to the terminal may specifically be by acquiring multiple location information corresponding to the terminal, and the acquired set of multiple location information corresponding to the terminal is the location data set corresponding to the terminal.

In some embodiments, as shown in FIG. 3, step S110 includes but is not limited to steps S111-S112.

S111: If the order data sent by the user through the terminal is received, generate a location acquisition time range according to the sending time of the order data and a preset time period.

The preset time period can be set according to actual needs. The preset time period is, for example, 7 days, 30 days, 60 days, and so on. Generating the location acquisition time range according to the sending time of the order data and the preset time period specifically includes: subtracting the sending time of the order data from the preset time period to obtain the location acquisition start time; and The sending time of the order data is determined as the position acquisition end time; the time range between the position acquisition start time and the position acquisition end time is determined as the position acquisition time range.

S112. According to the terminal identification code corresponding to the terminal, obtain the location information matching the location acquisition time range from a preset location database, and generate the location information according to the location information matching the location acquisition time range. Location data set.

Each terminal corresponds to a unique terminal identification code, and the terminal identification code is, for example, International Mobile Equipment Identity (IMEI). The preset location database is used to store the acquired location information corresponding to the terminal. The position information includes position coordinates and a coordinate acquisition time corresponding to the position coordinates, and the position coordinates include longitude coordinate information and latitude coordinate information. The method for obtaining the location coordinates corresponding to the terminal includes, but is not limited to, a global positioning system (Global Positioning System, GPS), a mobile location base station system (Location Based Service, LBS), or a combination thereof.

Specifically, the storage format of the position information may be "L1, L2; T"; where L1 represents longitude coordinate information, L2 represents latitude coordinate information, and T represents coordinate acquisition time. For example, the location information includes: 114.059818, 22.540215; 2019-1-14 16:52:55; among them, 114.059818 is the longitude coordinate information, 22.540215 is the latitude coordinate information, and 2019-1-1416:52:55 is the coordinate acquisition time.

Specifically, the generating of the position data set according to the position information matching the position acquisition time range is specifically: determining whether the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time If the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time range, determine that the position information is the position information that matches the position acquisition time range; store the The location acquires location information matching the time range to form the location data set. If the coordinate acquisition time corresponding to the position information in the preset position database is not within the position acquisition time range, it is determined that the position information is position information that does not match the position acquisition time range.

In some embodiments, as shown in FIG. 4, step S210 may be further included before step S110.

S210: Acquire location information of the terminal according to a preset time interval, and store the location information in preset location data.

The preset time interval can be set according to actual needs. The preset time interval is, for example, 5 minutes, 10 minutes, and 30 minutes. The smaller the preset time interval, the higher the recognition accuracy. Wherein, the preset time interval is less than or equal to the preset time period.

S120: Perform clustering processing on the location data set according to a preset first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing.

The preset first clustering algorithm may be the DBSCAN algorithm (Density-Based Spatial Clustering of Applications with Noise, a density-based clustering method with noise). The DBSCAN algorithm is a density-based spatial clustering algorithm. The algorithm divides areas with sufficient density into clusters, and finds clusters of arbitrary shapes in a noisy spatial database. The DBSCAN algorithm defines clusters as the largest collection of densely connected points.

By presetting the operation parameters for the DBSCAN algorithm, the DBSCAN algorithm can operate normally. Among them, the calculation parameters include the scanning radius Eps and the minimum number of points contained MinPts. (1) The scan radius Eps represents the range of the circular neighborhood centered on point P, where P is any unvisited data in the data set; (2) The minimum number of points included MinPts represents the neighborhood centered on point P The minimum number of points contained within the domain MinPts. If the number of points in the neighborhood with the point P as the center and the scanning radius Eps is not less than the minimum number of contained points MinPts, then the point P is called the core point.

Wherein, the calculation parameters can be adjusted according to actual needs. If the minimum number of points contained in MinPts remains unchanged, and the scanning radius Eps is too large, most data points will be clustered into the same cluster; if the scanning radius Eps is too small, a cluster will be split. If the scanning radius Eps remains the same and the minimum included points MinPts is too large, it will cause the points in the same cluster to be determined as outliers. If the minimum included points MinPts is too small, a large number of core points will be found. In specific implementation, the scanning radius Eps can be set to 2 kilometers, and the minimum number of points contained MinPts can be set to 5.

S130: Perform clustering processing on the position data clusters according to a preset second clustering algorithm to obtain a centroid corresponding to the position data clusters after the clustering processing.

The preset second clustering algorithm may be the K-means algorithm (K-Means Clustering Algorithm, K-means clustering algorithm). The K-means algorithm uses pre-selected K objects as the initial cluster centers, and the value of K needs to be set in advance. Then calculate the distance between each object and each seed cluster center, and assign each object to the cluster center closest to it. The cluster centers and the objects assigned to them represent a cluster. Once all objects have been allocated, the cluster center of each cluster will be recalculated based on the existing objects in the cluster. This process will be repeated until the termination condition is met. The termination condition can be that no (or minimum number) of objects are reassigned to different clusters, or no (or minimum number) of cluster centers change again, or the sum of squared errors is locally minimum.

The number of the position data clusters may be one or more, and the position data clusters are clustered according to a preset second clustering algorithm to obtain the centroid corresponding to the position data clusters after the clustering processing. Specifically, clustering is performed on each position data cluster according to K-means to obtain the centroid corresponding to the position data cluster after the clustering processing. Among them, the value of K is set to 1 in advance.

S140: Determine whether the center of mass matches the reserved location data corresponding to the user.

The reserved position data corresponding to the user is position information pre-stored in the server by the user, and the reserved position data corresponding to the user includes but is not limited to home address information, office address information, and the like.

The preset risk location data type may be one or more, and the preset risk location data type may be determined according to the type of the order data and the preset type mapping relationship. The preset type mapping relationship is used to determine the corresponding relationship between the order data type and the preset risk location data type.

For example, suppose that the order data is insurance order data, and the insurance policy type corresponding to the insurance order data is critical illness insurance. According to the preset type mapping relationship and the type of order data, it can be determined that the type of the preset risk location data is a hospital.

By judging whether the centroid matches the reserved location data corresponding to the user, it is judged whether the centroid obtained after clustering by the preset first clustering algorithm and the preset second clustering algorithm is smaller than the preset The error threshold. If the centroid matches the reserved location data corresponding to the user, it is determined that the obtained centroid is less than the preset error threshold, and risk user identification can be performed according to the obtained centroid. If the center of mass does not match the reserved position data corresponding to the user, and it is determined that the obtained center of mass is not less than the preset error threshold, a reminder message is sent to the manager to remind the manager to modify the calculation parameters to improve the obtained The accuracy of the center of mass improves the accuracy of risk user identification. Wherein, the preset error threshold can be set according to actual requirements, and the preset error threshold is, for example, 1 kilometer.

In some embodiments, as shown in FIG. 5, step S140 includes but is not limited to steps S141-S143.

S141. Calculate the distance difference between the center of mass and the reserved location data corresponding to the user.

Calculating the distance difference between the center of mass and the reserved position data corresponding to the user may be implemented by a first formula, and the first formula may be a Haversine formula. Wherein, the first formula is specifically:

Where, haveesin(θ)=sin ² (θ/2)=(1-cos(θ))/2; d1 is the distance difference between the center of mass and the reserved position data corresponding to the user; R is The radius of the earth can be an average of 6371 kilometers; φ1, φ2 represent the latitude of the center of mass and the reserved position data; Δλ represents the difference between the center of mass and the longitude of the reserved position data.

S142: Determine whether the distance difference between the center of mass and the reserved location data corresponding to the user is less than a preset first difference threshold.

The preset first difference threshold may be set according to actual requirements, for example, the preset first difference threshold may be set to 1 km.

S143: If the distance difference between the centroid and the reserved position data corresponding to the user is less than a preset first difference threshold, determine that the centroid matches the reserved position data corresponding to the user.

If the distance difference between the centroid and the reserved position data corresponding to the user is less than the preset first difference threshold, it is determined that the centroid matches the reserved position data corresponding to the user. If the distance difference between the centroid and the reserved position data corresponding to the user is not less than the preset first difference threshold, it is determined that the centroid does not match the reserved position data corresponding to the user.

S150: If the center of mass matches the reserved location data corresponding to the user, determine whether the center of mass matches the preset risk location data.

If the center of mass matches the reserved location data corresponding to the user, it indicates that the center of mass obtained after clustering through the preset first clustering algorithm and the preset second clustering algorithm is less than the preset error threshold, The obtained centroid has a high degree of reliability and can be used for risk user identification, so as to determine whether the centroid matches the preset risk location data.

In some embodiments, as shown in FIG. 6, step S150 includes but is not limited to steps S151-S153.

S151: If the center of mass matches the reserved location data corresponding to the user, calculate the distance difference between the center of mass and the preset risk location data.

Calculating the distance difference between the center of mass and the preset risk location data can be implemented by a second formula, and the second formula can be a Haversine formula. Wherein, the second formula is specifically:

Among them, haveesin(θ)=sin ² (θ/2)=(1-cos(θ))/2; d2 is the distance difference between the center of mass and the preset risk location data; R is the radius of the earth, The average value is 6371 kilometers; φ1, φ2 indicate the latitude of the center of mass and the reserved position data; Δλ indicates the difference between the center of mass and the longitude of the reserved position data.

S152: Determine whether the distance difference between the centroid and the preset risk location data is less than a preset second difference threshold.

The preset second difference threshold may be set according to actual requirements, for example, the preset second difference threshold may be set to 1 km.

S153: If the distance difference between the center of mass and the preset risk location data is less than a preset second difference threshold, determine that the center of mass matches the preset risk location data.

If the distance difference between the centroid and the preset risk location data is less than a preset second difference threshold, it is determined that the centroid matches the preset risk location data. If the distance difference between the centroid and the preset risk location data is not less than the preset second difference threshold, it is determined that the centroid does not match the preset risk location data.

S160: If the center of mass matches the preset risk location data, determine that the user is a risk user and determine the order data corresponding to the user as risk data.

Assuming that the order data is insurance order data, if the centroid matches the preset risk location data, it indicates that the user has been active at the preset risk location (such as a hospital, etc.) before sending the insurance order data. The risk of insuring with illness, and then determine that the user is a risk user.

If the user is a risk user, it indicates that the order data corresponding to the user has a greater risk, and the order data corresponding to the user needs to be determined as risk data for subsequent monitoring or manual follow-up. For example, assuming that the order data is insurance order data, if the user is a risk user, it indicates that the insurance order data has a high probability of fraud, and then the insurance policy corresponding to the user is determined as a risk insurance policy for reviewers to conduct Manual investigation reduces the risk of insurance policy fraud.

FIG. 7 is a schematic block diagram of a risk user identification device 100 provided by an embodiment of the present application. As shown in FIG. 7, corresponding to the above risk user identification method, the present application also provides a risk user identification device 100. The risk user identification device 100 includes a unit for executing the above risk user identification method, and the device 100 may be configured in a server. The server may be an independent server or a server cluster composed of multiple servers. As shown in FIG. 7, the device 100 includes a first obtaining unit 110, a first clustering unit 120, a second clustering unit 130, a first judging unit 140, a second judging unit, and an order determining unit 160.

The first obtaining unit 110 is configured to obtain a location data set corresponding to the terminal if the order data sent by the user through the terminal is received, the location data set including location information of at least two of the terminals.

In some embodiments, as shown in FIG. 8, the first obtaining unit 110 includes a first generating unit 111 and a second generating unit 112. The first generating unit 111 is configured to generate a location acquisition time range according to the sending time of the order data and a preset time period if the order data sent by the user through the terminal is received. The second generating unit 112 is configured to obtain the location information matching the location acquisition time range in a preset location database according to the terminal identification code corresponding to the terminal, and according to the location information matching the location acquisition time range The location information of generates the location data set.

In some embodiments, as shown in FIG. 9, the device 100 further includes a position storage unit 210. Wherein, the location storage unit 210 is configured to obtain location information of the terminal according to a preset time interval, and store the location information to preset location data.

The first clustering unit 120 is configured to perform clustering processing on the position data set according to a preset first clustering algorithm to obtain a position data cluster corresponding to the position data set after the clustering processing.

The second clustering unit 130 is configured to perform clustering processing on the position data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the position data clusters after the clustering processing.

The first determining unit 140 is configured to determine whether the user is a risk user according to the centroid, the reserved location data corresponding to the user, and preset risk location data.

In some embodiments, as shown in FIG. 10, the first judgment unit 140 includes a first calculation unit 141, a fourth judgment unit 142 and a second determination unit 143. Wherein, the first calculation unit 141 is configured to calculate the distance difference between the center of mass and the reserved position data corresponding to the user. The fourth determining unit 142 is configured to determine whether the distance difference between the center of mass and the reserved position data corresponding to the user is less than a preset first difference threshold. The second determining unit 143 is configured to determine a reservation corresponding to the centroid and the user if the distance difference between the centroid and the reserved position data corresponding to the user is less than a preset first difference threshold. The location data matches.

The second determining unit 150 is configured to determine whether the center of mass matches the preset risk location data if the center of mass matches the reserved location data corresponding to the user.

In some embodiments, as shown in FIG. 11, the second judgment unit 150 includes a second calculation unit 151, a fifth judgment unit 152 and a third determination unit 153. The second calculation unit 151 is configured to calculate the distance difference between the center of mass and the preset risk position data if the center of mass matches the reserved position data corresponding to the user. The fifth determining unit 152 is configured to determine whether the distance difference between the centroid and the preset risk location data is smaller than a preset second difference threshold. The third determining unit 153 is configured to determine the center of mass and the preset risk location data if the distance difference between the center of mass and the preset risk location data is less than a preset second difference threshold. match.

The order determination unit 160 is configured to determine the order data corresponding to the user as risk data if the user is a risk user.

It should be noted that those skilled in the art can clearly understand that the specific implementation process of the above risk user identification device 100 and each unit can refer to the corresponding description in the foregoing method embodiment. For the convenience and brevity of the description, No longer.

The above-mentioned apparatus 100 may be implemented in the form of a computer program, and the computer program may run on a computer device as shown in FIG.

Please refer to FIG. 12, which is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be in a server. The server may be an independent server or a server cluster composed of multiple servers. The computer device 500 includes a processor 520, a memory, and a network interface 550 connected through a system bus 510, where the memory may include a non-volatile storage medium 530 and an internal memory 540.

The non-volatile storage medium 530 can store an operating system 531 and a computer program 532. When the computer program 532 is executed, the processor 520 can execute a risk user identification method. The processor 520 is used to provide calculation and control capabilities, and support the operation of the entire computer device 500. The internal memory 540 provides an environment for the running of a computer program in a non-volatile storage medium. When the computer program is executed by the processor 520, the processor 520 can execute a risk user identification method. The network interface 550 is used for network communication with other devices. Those skilled in the art can understand that the schematic block diagram of the computer device is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied. The specific computer device 500 It may include more or fewer components than shown in the figures, or combine certain components, or have a different component arrangement.

Wherein, the processor 520 is configured to run a computer program stored in a memory to implement any embodiment of the risk user identification method described above.

It should be understood that, in this embodiment of the application, the processor 520 may be a central processing unit (Central Processing Unit, CPU), and the processor 520 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Among them, the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by computer programs instructing relevant hardware. The computer program may be stored in a storage medium, and the storage medium may be a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the process steps of the foregoing method embodiment.

Therefore, this application also provides a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The storage medium stores a computer program that, when executed by a processor, implements any embodiment of the risk user identification method described above.

The computer-readable storage medium may be a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a magnetic disk, or an optical disk, and other media that can store program codes.

In the several embodiments provided in this application, it should be understood that the disclosed devices, equipment, and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative, and the division of the units is only a logical function division, and there may be other division methods in actual implementation. Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working processes of the devices, equipment and units described above can refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here. The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

A method for identifying risky users, wherein the method includes:

If the order data sent by the user through the terminal is received, the location data set corresponding to the terminal is acquired, the location data set includes the location information of at least two of the terminals;

Performing clustering processing on the location data set according to a preset first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing;

Performing clustering processing on the location data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the clustering location data clusters;

Judging whether the center of mass matches the reserved location data corresponding to the user;

If the centroid matches the reserved location data corresponding to the user, determining whether the centroid matches the preset risk location data;

If the centroid matches the preset risk location data, it is determined that the user is a risk user and the order data corresponding to the user is determined as risk data.
The method according to claim 1, wherein, if the order data sent by the user through the terminal is received, obtaining the location data set corresponding to the terminal comprises:

If the order data sent by the user through the terminal is received, the location acquisition time range is generated according to the sending time of the order data and the preset time period;

According to the terminal identification code corresponding to the terminal, obtain the location information matching the location acquisition time range in a preset location database, and generate the location data according to the location information matching the location acquisition time range set.
The method according to claim 1, wherein, if the order data sent by the user through the terminal is received, before obtaining the location data set corresponding to the terminal, the method further comprises:

Obtain the location information of the terminal according to a preset time interval, and store the location information in preset location data.
The method of claim 1, wherein the determining whether the center of mass matches the reserved location data corresponding to the user comprises:

Calculating the distance difference between the centroid and the reserved location data corresponding to the user;

Judging whether the distance difference between the centroid and the reserved location data corresponding to the user is less than a preset first difference threshold;

If the distance difference between the centroid and the reserved position data corresponding to the user is less than the preset first difference threshold, it is determined that the centroid matches the reserved position data corresponding to the user.
The method according to claim 1, wherein, if the center of mass matches the reserved location data corresponding to the user, determining whether the center of mass matches the preset risk location data comprises:

If the center of mass matches the reserved location data corresponding to the user, calculating the distance difference between the center of mass and the preset risk location data;

Judging whether the distance difference between the centroid and the preset risk location data is less than a preset second difference threshold;

If the distance difference between the centroid and the preset risk location data is less than a preset second difference threshold, it is determined that the centroid matches the preset risk location data.
The method according to claim 1, wherein said acquiring a location data set corresponding to said terminal comprises:

A plurality of location information corresponding to the terminal is obtained to obtain a location data set corresponding to the terminal, wherein a set of the multiple location information corresponding to the terminal is a location data set corresponding to the terminal.
The method according to claim 2, wherein said generating a position acquisition time range according to the sending time of the order data and a preset time period comprises:

Subtracting the sending time of the order data with a preset time period to obtain a location acquisition start time and determining the sending time of the order data as the location acquisition end time;

The time range between the position acquisition start time and the position acquisition end time is determined as the position acquisition time range.
The method according to claim 2, wherein said generating said location data set according to said location information matching said location acquisition time range comprises:

Determine whether the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time range;

If the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time range, determine that the position information is the position information matching the position acquisition time range and store the The location information matched in the time range forms the location data set.
A risk user identification device, wherein the device includes:

The first obtaining unit is configured to obtain a location data set corresponding to the terminal if the order data sent by the user through the terminal is received, the location data set including the location information of at least two of the terminals;

The first clustering unit is configured to perform clustering processing on the position data set according to a preset first clustering algorithm to obtain a position data cluster corresponding to the position data set after the clustering processing;

The second clustering unit is configured to perform clustering processing on the position data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the position data clusters after the clustering processing;

The first determining unit is configured to determine whether the center of mass matches the reserved location data corresponding to the user;

A second determining unit, configured to determine whether the center of mass matches the preset risk location data if the center of mass matches the reserved location data corresponding to the user;

The order determination unit is configured to determine that the user is a risk user and determine the order data corresponding to the user as risk data if the center of mass matches the preset risk location data.
A computer device, the computer device includes a memory, and a processor connected to the memory;

The memory is used to store a computer program; the processor is used to run the computer program stored in the memory to perform the following steps:

If the order data sent by the user through the terminal is received, the location data set corresponding to the terminal is acquired, the location data set includes the location information of at least two of the terminals;

Performing clustering processing on the location data set according to a preset first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing;

Performing clustering processing on the location data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the clustering location data clusters;

Judging whether the center of mass matches the reserved location data corresponding to the user;

If the centroid matches the reserved location data corresponding to the user, determining whether the centroid matches the preset risk location data;

If the centroid matches the preset risk location data, it is determined that the user is a risk user and the order data corresponding to the user is determined as risk data.
The computer device according to claim 10, wherein, if the order data sent by the user through the terminal is received, obtaining the location data set corresponding to the terminal comprises:

If the order data sent by the user through the terminal is received, the location acquisition time range is generated according to the sending time of the order data and the preset time period;

According to the terminal identification code corresponding to the terminal, obtain the location information matching the location acquisition time range in a preset location database, and generate the location data according to the location information matching the location acquisition time range set.
The computer device according to claim 10, wherein, if the order data sent by the user through the terminal is received, before obtaining the location data set corresponding to the terminal, the method further comprises:

Obtain the location information of the terminal according to a preset time interval, and store the location information in preset location data.
The computer device according to claim 10, wherein said determining whether said center of mass matches said reserved location data corresponding to said user comprises:

Calculating the distance difference between the centroid and the reserved location data corresponding to the user;

Judging whether the distance difference between the centroid and the reserved location data corresponding to the user is less than a preset first difference threshold;

If the distance difference between the centroid and the reserved position data corresponding to the user is less than the preset first difference threshold, it is determined that the centroid matches the reserved position data corresponding to the user.
The computer device of claim 10, wherein, if the center of mass matches the reserved location data corresponding to the user, determining whether the center of mass matches the preset risk location data comprises:

If the center of mass matches the reserved location data corresponding to the user, calculating the distance difference between the center of mass and the preset risk location data;

Judging whether the distance difference between the centroid and the preset risk location data is less than a preset second difference threshold;

If the distance difference between the centroid and the preset risk location data is less than a preset second difference threshold, it is determined that the centroid matches the preset risk location data.
10. The computer device according to claim 10, wherein said acquiring a location data set corresponding to said terminal comprises:

A plurality of location information corresponding to the terminal is obtained to obtain a location data set corresponding to the terminal, wherein a set of the multiple location information corresponding to the terminal is a location data set corresponding to the terminal.
11. The computer device according to claim 11, wherein said generating a location acquisition time range according to the sending time of the order data and a preset time period comprises:

Subtracting the sending time of the order data with a preset time period to obtain a location acquisition start time and determining the sending time of the order data as the location acquisition end time;

The time range between the position acquisition start time and the position acquisition end time is determined as the position acquisition time range.
The computer device according to claim 11, wherein said generating said location data set according to said location information matching the time range of said location acquisition comprises:

Determine whether the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time range;

If the coordinate acquisition time corresponding to the position information in the preset position database is within the position acquisition time range, determine that the position information is the position information matching the position acquisition time range and store the The location information matched in the time range forms the location data set.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program that, when executed by a processor, causes the processor to perform the following operations:

If the order data sent by the user through the terminal is received, the location data set corresponding to the terminal is acquired, the location data set includes the location information of at least two of the terminals;

Performing clustering processing on the location data set according to a preset first clustering algorithm to obtain a location data cluster corresponding to the location data set after the clustering processing;

Performing clustering processing on the location data clusters according to a preset second clustering algorithm to obtain the centroid corresponding to the clustering location data clusters;

Judging whether the center of mass matches the reserved location data corresponding to the user;

If the centroid matches the reserved location data corresponding to the user, determining whether the centroid matches the preset risk location data;

If the centroid matches the preset risk location data, it is determined that the user is a risk user and the order data corresponding to the user is determined as risk data.
18. The computer-readable storage medium of claim 18, wherein, if the order data sent by the user through the terminal is received, obtaining the location data set corresponding to the terminal comprises:

If the order data sent by the user through the terminal is received, the location acquisition time range is generated according to the sending time of the order data and the preset time period;

According to the terminal identification code corresponding to the terminal, obtain the location information matching the location acquisition time range in a preset location database, and generate the location data according to the location information matching the location acquisition time range set.
The computer-readable storage medium according to claim 18, wherein, if the order data sent by the user through the terminal is received, before obtaining the location data set corresponding to the terminal, the method further comprises:

Obtain the location information of the terminal according to a preset time interval, and store the location information in preset location data.