CN111311276B - Identification method and device for abnormal user group and readable storage medium - Google Patents

Identification method and device for abnormal user group and readable storage medium Download PDF

Info

Publication number
CN111311276B
CN111311276B CN202010082795.0A CN202010082795A CN111311276B CN 111311276 B CN111311276 B CN 111311276B CN 202010082795 A CN202010082795 A CN 202010082795A CN 111311276 B CN111311276 B CN 111311276B
Authority
CN
China
Prior art keywords
user
value
users
similarity
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010082795.0A
Other languages
Chinese (zh)
Other versions
CN111311276A (en
Inventor
韩跃盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202010082795.0A priority Critical patent/CN111311276B/en
Publication of CN111311276A publication Critical patent/CN111311276A/en
Application granted granted Critical
Publication of CN111311276B publication Critical patent/CN111311276B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Technology Law (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides an identification method, an identification device and a readable storage medium of an abnormal user group, wherein the identification method comprises the following steps: generating a plurality of user sets based on each user service starting time and a preset time interval; determining a plurality of users repeatedly associated in each user set as a user group, and calculating a first similarity value among all users in the user group under each service dimension; determining a second similarity value among all users of the user group based on the first similarity value under each service dimension and the corresponding dimension weight coefficient; and ordering the second similarity values among all users of each user community according to a preset arrangement order, and determining a plurality of user communities ordered before the preset number as a plurality of abnormal user communities. In this way, a plurality of associated abnormal users are determined through the abnormal user group, so that the judging efficiency and accuracy of the abnormal users are improved.

Description

Identification method and device for abnormal user group and readable storage medium
Technical Field
The present application relates to the field of data analysis and statistics technologies, and in particular, to a method and apparatus for identifying an abnormal user group, and a readable storage medium.
Background
With the rapid development of internet finance, related businesses in the finance field have more and more utilization of information technology, and users may have some abnormal operations in the process of performing financial transactions on the internet to perform false transactions, thereby seriously affecting the security of the internet finance system.
At present, whether the abnormal user is checked or the single transaction of a single user is judged based on a fixed rule, the abnormal condition of the single transaction of one user is judged, the data quantity required to be judged is large, the accuracy of the single data is low, the number of users capable of being qualitatively analyzed at the same time is small, and the judging efficiency and accuracy of the abnormal user are affected.
Disclosure of Invention
In view of the above, an object of the present application is to provide a method and apparatus for identifying abnormal user groups, and a readable storage medium, which are capable of improving the efficiency and accuracy of abnormal user judgment by determining at least one abnormal user group by analyzing the similarity between users in the divided user groups and determining a plurality of associated abnormal users based on the abnormal user group.
The embodiment of the application provides a method for identifying an abnormal user group, which comprises the following steps:
dividing a plurality of users based on the acquired service start time indicated by the service information of each user and a preset time interval to generate a plurality of user sets;
determining a plurality of users repeatedly associated in each user set as a user group, and calculating a first similarity value among all users in the user group under each service dimension;
linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient, and determining second similarity values among all users of the user group;
and ordering the second similarity values among all users of each of the determined user communities according to a preset arrangement order, and determining the user communities corresponding to the second similarity values which are ordered before the preset number as abnormal user communities.
Further, the service dimension includes one of a numeric service dimension and a category service dimension.
Further, when the service dimension includes a numeric service dimension, calculating a first similarity value between all users in the user group by:
For the same service dimension, acquiring a characteristic value corresponding to each user in the user group under the service dimension, adding the characteristic values corresponding to all users, and determining a characteristic value and a value;
respectively differencing the characteristic value corresponding to each user with the characteristic value sum value to obtain a plurality of first difference values, and determining a first difference absolute value of each first difference value;
adding the absolute value of the first difference to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity;
and normalizing the initial first numerical similarity, and determining a first similarity value among all users in the user group.
Further, when the service dimension includes a category type service dimension, a first similarity value between all users in the user group is calculated through the following steps:
aiming at the same service dimension, acquiring a service type corresponding to each user in the service dimension, and counting the total number of the service types and the total number of the users in the dimension;
the total number of the service types and the second quotient of the total number of the users are calculated, and initial first similarity values among all users in the user group are calculated;
And normalizing the initial first numerical similarity, and determining a first similarity value among all users in the user group.
Further, the first similarity value is determined by:
determining initial first similarity values corresponding to each user group under the same service dimension, and determining the largest initial first similarity value and the smallest initial first similarity value in all the initial first similarity values;
respectively differencing the maximum initial first similarity value with each initial first similarity value to obtain a plurality of second difference values;
determining a third difference value between the maximum initial first similarity value and the minimum initial first similarity value as a normalization coefficient;
and multiplying each second difference value by the normalized coefficient to determine a plurality of first similarity values.
The embodiment of the application also provides a device for identifying the abnormal user group, which comprises:
the user set generation module is used for dividing a plurality of users to generate a plurality of user sets based on the acquired service start time indicated by the service information of each user and a preset time interval;
The first similarity value calculation module is used for determining a plurality of users repeatedly associated and appearing in each user set as a user group, and calculating first similarity values among all users in the user group under each service dimension;
the second similarity value determining module is used for linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient to determine the second similarity value among all users of the user group;
and the abnormal user community determining module is used for ordering the second similarity values among all users of each user community in the determined multiple user communities according to a preset arrangement order, and determining the multiple user communities corresponding to the multiple second similarity values ordered before the preset quantity as multiple abnormal user communities.
Further, the service dimension includes one of a numeric service dimension and a category service dimension.
Further, when the service dimension includes a numeric service dimension, the first similarity value calculating module is configured to calculate a first similarity value between all users in the user group by:
For the same service dimension, acquiring a characteristic value corresponding to each user in the user group under the service dimension, adding the characteristic values corresponding to all users, and determining a characteristic value and a value;
respectively differencing the characteristic value corresponding to each user with the characteristic value sum value to obtain a plurality of first difference values, and determining a first difference absolute value of each first difference value;
adding the absolute value of the first difference to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity;
and normalizing the initial first numerical similarity, and determining a first similarity value among all users in the user group.
Further, when the service dimension includes a category type service dimension, the first similarity value calculating module is configured to calculate a first similarity value between all users in the user group by:
aiming at the same service dimension, acquiring a service type corresponding to each user in the service dimension, and counting the total number of the service types and the total number of the users in the dimension;
the total number of the service types and the second quotient of the total number of the users are calculated, and initial first similarity values among all users in the user group are calculated;
And normalizing the initial first numerical similarity, and determining a first similarity value among all users in the user group.
Further, the first similarity value calculation module is configured to determine the first similarity value by:
determining initial first similarity values corresponding to each user group under the same service dimension, and determining the largest initial first similarity value and the smallest initial first similarity value in all the initial first similarity values;
respectively differencing the maximum initial first similarity value with each initial first similarity value to obtain a plurality of second difference values;
determining a third difference value between the maximum initial first similarity value and the minimum initial first similarity value as a normalization coefficient;
and multiplying each second difference value by the normalized coefficient to determine a plurality of first similarity values.
The embodiment of the application also provides electronic equipment, which comprises: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating over the bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of the method of identifying an abnormal community of users as described above.
The embodiment of the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method for identifying an abnormal user community as described above.
The method, the device and the readable storage device for identifying the abnormal user group provided by the embodiment of the application divide a plurality of users to generate a plurality of user sets based on the service starting time indicated by the acquired service information of each user and a preset time interval; determining a plurality of users repeatedly associated in each user set as a user group, and calculating a first similarity value among all users in the user group under each service dimension; linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient, and determining second similarity values among all users of the user group; and ordering the second similarity values among all users of each of the determined user communities according to a preset arrangement order, and determining the user communities corresponding to the second similarity values which are ordered before the preset number as abnormal user communities.
In this way, a plurality of user sets are determined according to the service information starting time indicated by the service information of each user, a plurality of users repeatedly associated in the plurality of user sets are determined as one user group, first similarity values of all users in the user groups in each dimension are determined, second similarity values among all users in each user group are determined according to the weight coefficient corresponding to each service dimension, all user groups are ordered according to the second similarity values, the user groups with the similarity values at the preset positions are determined as abnormal user groups, and a plurality of associated abnormal users can be determined through the abnormal user groups at the same time, so that the judging efficiency and the judging accuracy of the abnormal users can be improved.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a system architecture diagram of one possible application scenario;
FIG. 2 is a flowchart of a method for identifying an abnormal user group according to an embodiment of the present application;
FIG. 3 is a flowchart of calculating a first similarity value between all users in the user group when the service dimension includes a numeric service dimension according to an embodiment of the present application;
FIG. 4 is a flowchart of calculating a first similarity value between all users in the user group when the service dimension includes a category type service dimension according to an embodiment of the present application;
FIG. 5 is a flowchart for determining a first similarity value according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a device for identifying an abnormal user group according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, every other embodiment obtained by a person skilled in the art without making any inventive effort falls within the scope of protection of the present application.
First, an application scenario to which the present application is applicable will be described. The method and the device can be applied to the technical field of data analysis and statistics, at least one abnormal user group is determined through analysis of the similarity among the users in the divided user groups, and a plurality of associated abnormal users are determined through the abnormal user groups at the same time, so that the efficiency and the accuracy of abnormal user judgment are improved. Referring to fig. 1, fig. 1 is a system structure diagram in a possible application scenario, as shown in fig. 1, the system includes a service information database and an identifying device, the service information database stores service information generated by a user when the user performs a service, where the service data includes service feature data and basic feature data; after the identification device acquires the service information, a plurality of user groups are determined according to the service information, and a plurality of abnormal user groups are determined according to the similarity among all users in the user groups.
According to research, at the present stage, whether the abnormal user is checked or the single transaction of a single user is judged based on a fixed rule, the abnormal condition of the single transaction of one user is judged, the data quantity required to be judged is large, the accuracy of the single transaction is low, the number of users capable of being qualitatively analyzed is small, and the judging efficiency and accuracy of the shadow abnormal user are high.
Based on the above, the embodiment of the application provides a method for identifying abnormal user groups, which determines at least one abnormal user group through analyzing the similarity between users in the divided user groups, and determines a plurality of associated abnormal users based on the abnormal user groups, thereby being beneficial to improving the judging efficiency and accuracy of the abnormal users.
Referring to fig. 2, fig. 2 is a flowchart of a method for identifying an abnormal user group according to an embodiment of the present application. As shown in fig. 2, the method for identifying an abnormal user group provided by the embodiment of the present application includes:
step 201, dividing a plurality of users to generate a plurality of user sets based on the service start time indicated by the acquired service information of each user and a preset time interval.
In the step, a plurality of pieces of service information of a plurality of users are obtained from a service information base, and the plurality of users are divided according to corresponding service starting time and a preset time interval according to the service starting time of the user starting service indicated by the service information, so as to generate a plurality of user sets.
In this case, since one user may correspond to more than one piece of service information, multiple user sets generated by dividing the users according to the service start time of each piece of service information may have the same user, that is, the same user may belong to different user sets.
Here, the specific process of dividing the users according to their service start times may be: and sequencing all the acquired service starting times of the plurality of users according to the sequence of the time, starting from the user with the earliest service starting time, and starting to divide the user set according to a preset time interval until the user with the latest service starting time is divided.
For example, there are five users: the method comprises the steps of providing a user A, a user B, a user C, a user D and a user E, assuming that each user obtains only one piece of service information, analyzing to know that the service starting time of the user A is 8:00, the service starting time of the user B is 9:00, the service starting time of the user C is 9:30, the service starting time of the user D is 9:45, the service starting time of the user E is 10:00, arranging a post-trip queue of the user A, the user B, the user C, the user D and the user E as the user A, the user B, the user C, the user D and the user E according to the sequence of the service starting times, and dividing the obtained five users to obtain three user sets, (the user A), (the user B, the user C, the user D) and the user E) on the assumption that the preset time interval is 1 hour.
Step 202, determining a plurality of users repeatedly associated in each user set as a user group, and calculating a first similarity value among all users in the user group under each service dimension.
In this step, after generating a plurality of user sets in step 201, each user included in each user set is analyzed, users appearing in each user set in repeated association are determined as one user group, and for each user group, a first similarity value between all users included in the user group in each of a plurality of service dimensions is calculated.
Wherein the business dimension comprises one of a numeric business dimension and a category business dimension. Taking the business information as the transaction information as an example, for the transaction information, the transaction dimension can comprise dimensions of transaction times, transaction amount, transaction opponents, account opening dates, ages, account opening institutions and the like, for the dimensions, the transaction times, the transaction amount, the ages and the like can be quantitatively described through specific numerical values, the characteristics belong to numerical business dimensions, for the dimensions of the transaction opponents, the account opening dates, the account opening institutions and the like, qualitative characteristic descriptions exist, and only different types exist among different characteristic descriptions, and cannot be described through the specific numerical values, so that the characteristic dimensions belong to the category type business dimensions.
Here, the determination of the multiple users where repeated association occurs may be determined by FP-Growth algorithm, which employs the following strategy: compressing the database providing frequent item sets into a frequent pattern tree, but still retaining item set association information; in the algorithm, a data structure called a frequent pattern tree is used, the FP-tree is a special prefix tree and consists of a frequent item header list and an item prefix tree, and the FP-Growth algorithm accelerates the whole mining process based on the structure, so that a plurality of frequently associated users are determined.
For example, four user sets are divided at present, user set 1 (user 1, user 2, user 3, user 4, user 5, user 6), user set 2 (user 4, user 5, user 7, user 8, user 9), user set 3 (user 1, user 2, user 3, user 7), user set 4 (user 1, user 2, user 3, user 9), and two user groups (user 1, user 2, user 3) and (user 4, user 5) can be obtained through FP-Growth algorithm.
Here, after the user group is determined, first similarity values between all users are calculated for all users included in the user group under each service dimension that can be acquired through service information of the users, and for one user group, how many service dimensions are indicated in the service information of the users included in the group, and how many first similarity values exist between all users of the user group.
Step 203, linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient, and determining a second similarity value between all users of the user group.
In this step, after obtaining the first similarity value of the user group in each service dimension in step 202, a weight coefficient corresponding to each service dimension is determined, each first similarity value and the corresponding weight coefficient are weighted linearly, and a second similarity value between all users of the user group is determined.
Here, the setting of the weight coefficient of each service dimension may be determined based on the history calculation process, or may be that the service dimensions with more emphasis on the calculation result are sorted according to the service dimensions, and the weight coefficient corresponding to each service dimension is sequentially set according to the order.
And 204, sorting the second similarity values among all users of each of the determined user communities according to a preset sorting order, and determining the user communities corresponding to the second similarity values sorted before the preset number as abnormal user communities.
In this step, steps 201 to 203 are looped, a plurality of user groups and a second similarity value of each user group are determined, the second similarity value of each user group is sorted from large to small, and a plurality of user groups in a preset number are determined as a plurality of abnormal user groups according to a preset number.
Here, the preset number may be set according to a history, or may be set according to a requirement for the number of abnormal user groups, and may be ranked in the first three digits, the first ten digits, or the like.
The method for identifying abnormal user groups provided by the embodiment of the application divides a plurality of users to generate a plurality of user sets based on the acquired service start time indicated by the service information of each user and a preset time interval; determining a plurality of users repeatedly associated in each user set as a user group, and calculating a first similarity value among all users in the user group under each service dimension; linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient, and determining second similarity values among all users of the user group; and ordering the second similarity values among all users of each of the determined user communities according to a preset arrangement order, and determining the user communities corresponding to the second similarity values which are ordered before the preset number as abnormal user communities.
In this way, a plurality of user sets are determined according to the service information starting time indicated by the service information of each user, a plurality of users repeatedly associated in the plurality of user sets are determined as one user group, first similarity values of all users in the user groups in each dimension are determined, second similarity values among all users in each user group are determined according to the weight coefficient corresponding to each service dimension, all user groups are ordered according to the second similarity values, the user groups with the similarity values at the preset positions are determined as abnormal user groups, and a plurality of associated abnormal users can be determined through the abnormal user groups at the same time, so that the judging efficiency and the judging accuracy of the abnormal users can be improved.
Referring to fig. 3, fig. 3 is a flowchart illustrating calculation of a first similarity value between all users in the user group when the service dimension includes a numeric service dimension according to an embodiment of the present application. As shown in fig. 3, a first similarity value between all users in the community of users is calculated by:
step 301, for the same service dimension, obtaining a feature value corresponding to each user in the user group in the service dimension, adding the feature values corresponding to all users, and determining a feature value and a value.
In the step, for the same service dimension, the characteristic value of each user in the user group under the dimension is determined, and the characteristic value corresponding to each user is added to determine the characteristic value sum value.
Wherein the feature value may be scored as 0 when summing to determine the feature value and value for a user for which there may be no feature value in that dimension.
Taking the service information as transaction information, taking the service feature dimension as the number of transaction opponents as an example, three users in the user group, namely, user A, user B and user C, wherein the number of the transaction opponents of the user A is 2, the number of the transaction opponents of the user B is 5, the number of the transaction opponents of the user C is 9, and summing the transaction numbers of the user A, the user B and the user C to obtain the feature value sum value of 16.
Step 302, difference is made between the feature value corresponding to each user and the sum of the feature values, so as to obtain a plurality of first difference values, and the first difference absolute value of each first difference value is determined.
In this step, after determining the feature value and the value in step 301, the feature value corresponding to each user in the user group is differenced from the feature value and the value to obtain a plurality of first differences, and the absolute value of the plurality of differences smaller than 0 is removed, so as to facilitate subsequent calculation.
Corresponding to the example, the characteristic value corresponding to the user A is 2, the first difference value between the characteristic value and the characteristic sum value is-14 through calculating the characteristic sum value to be 16, and the absolute value is taken to obtain the corresponding first difference value of 14; the corresponding characteristic value of the user B is 5, and the corresponding first difference value is 11 by calculating the characteristic sum value to be 16, wherein the first difference value of the characteristic sum value and the characteristic sum value is-11, and then taking the absolute value; and the corresponding characteristic value of the user C is 9, and the corresponding first difference value is 7 by calculating the characteristic sum value to be 16, wherein the first difference value of the characteristic sum value and the characteristic sum value is-7, and then taking the absolute value.
And 303, adding the absolute value of the first difference value to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity.
In this step, the absolute value of each first difference value in the user group obtained in step 302 is added to obtain an absolute value sum, the absolute value sum is divided by the feature value sum, and the obtained first quotient is determined as an initial first numerical similarity.
Corresponding to the above example, the first difference absolute value 14 determined by the user a, the first difference absolute value 11 determined by the user B and the first difference absolute value 7 determined by the user C are summed to obtain an absolute value sum value 32, and the absolute value sum value 32 is divided by the characteristic value sum value 16 to obtain the initial first numerical similarity 2.
Step 304, normalizing the initial first numerical similarity, and determining a first similarity value among all users in the user group.
In this step, the initial first value similarity determined in step 303 is normalized, that is, the initial first value is converted into a value between (0, 1), and the normalized result is determined as a first similarity value between all users in the user group.
Here, the normalization of the initial similarity value between all users of the user group is performed according to the initial similarity value between all users of other user groups under the same service dimension.
Further, referring to fig. 4, fig. 4 is a flowchart illustrating calculation of a first similarity value between all users in the user group when the service dimension includes a category type service dimension according to an embodiment of the present application. As shown in fig. 4, a first similarity value between all users in the community of users is calculated by:
step 401, for the same service dimension, obtaining the service type corresponding to each user in the service dimension, and counting the total number of service types and the total number of users in the dimension.
In the step, for the same type service dimension, determining the corresponding service type of each user in the user group, and according to the qualitative analysis of each user service type, counting the number of service types in the dimension and the number of users included in the user group.
Taking the service information as transaction information, taking the service feature dimension as an account opening mechanism as an example, according to the transaction information of a user, it can be known that an account opening mechanism of a user A in the user group is an industrial and commercial bank, an account opening mechanism of a user B is a traffic bank, and an account opening mechanism of a user C is an industrial and commercial bank, wherein under the account opening mechanism dimension, the total number of service types is 2, and the total number of users is 3.
Step 402, setting the total number of the service types and the second quotient value of the total number of the users, wherein the initial first similarity value among all users in the user group.
In this step, the total number of service types determined in step 401 is divided by the total number of users, and a second quotient between the two is determined as an initial first similarity value between all users in the user group.
Corresponding to the above example, the total number of service types is 2, the total number of users is 3, and the two are divided to obtain an initial first similarity value of 0.6.
Step 403, normalizing the initial first numerical similarity, and determining a first similarity value among all users in the user group.
In this step, the initial first value similarity determined in step 402 is normalized, that is, the initial first value is converted into a value between (0, 1), and the normalized result is determined as a first similarity value between all users in the user group.
Here, the similarity value for the category type user is generally a value between (0, 1), and normalization may not be performed.
Further, referring to fig. 5, fig. 5 is a flowchart of determining a first similarity value according to an embodiment of the present application, and as shown in fig. 5, the first similarity value is determined by:
step 501, determining an initial first similarity value corresponding to each user group under the same service dimension, and determining the largest initial first similarity value and the smallest initial first similarity value in all the initial first similarity values.
In the step, under the same service dimension, each determined initial first similarity value of each user group under the dimension is arranged in the order from big to small (from small to big), and the largest initial first similarity value and the smallest initial first similarity value are determined.
The business feature dimension is the number of transaction opponents as an example, three user communities are all used, the transaction number similarity of the user communities 1 is 2, the transaction number similarity of the user communities 2 is 1.5, the transaction number similarity of the user communities 3 is 3, and therefore, the largest initial first similarity value is 3 of the user communities 3, and the smallest initial first similarity value is 1.5 of the user communities 2.
Step 502, respectively making differences between the largest initial first similarity value and each initial first similarity value to obtain a plurality of second difference values.
In this step, after determining the maximum initial first similarity value in step 502, the maximum initial first similarity value is respectively differenced from the initial first similarity value of each user in the user group, so as to obtain a plurality of second differences.
Corresponding to the above example, the second difference between the maximum initial similarity 3 and the user group 1 is 1, the second difference between the maximum initial similarity 3 and the user group 2 is 1.5, and the second difference between the maximum initial similarity 3 and the user group 3 is 0.
Step 503, determining a third difference value between the largest initial first similarity value and the smallest initial first similarity value as a normalization coefficient.
In this step, the normalization coefficient is determined by subtracting the maximum initial first similarity value and the minimum initial first similarity value determined in step 501.
Corresponding to the above example, the largest initial first similarity value is 3 for user community 3, the smallest initial first similarity value is 1.5 for user community 2, and the difference between the two is 1.5, i.e. the normalization factor is 1.5.
Step 504, determining a plurality of first similarity values by multiplying each second difference value by the normalized coefficient.
In the step, each second difference value of each user group under the same service dimension is compared with a normalized coefficient manufacturer, and a first similarity value of each user group is determined.
Corresponding to the above example, the second difference value of the user community 1 is 1, the normalization coefficient is 1.5, so the first similarity value of the user community 1 is 0.67; the second difference of the user group 2 is 1.5, and the normalized coefficient is 1.5, so the first similarity value of the user group 2 is 1; the second difference of the user community 3 is 10 and the normalized coefficient is 1.5, so the first similarity value of the user community 3 is 0.
According to the identification method of the abnormal user group provided by the embodiment of the application, when the service dimension comprises a numerical service dimension, the characteristic value corresponding to each user in the user group in the service dimension is obtained aiming at the same service dimension, and the characteristic values corresponding to all users are added to determine the characteristic value and the value; respectively differencing the characteristic value corresponding to each user with the characteristic value sum value to obtain a plurality of first difference values, and determining a first difference absolute value of each first difference value; adding the absolute value of the first difference to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity; normalizing the initial first numerical similarity, and determining a first similarity value among all users in the user group; when the service dimension comprises a category type service dimension, acquiring a service type corresponding to each user in the service dimension aiming at the same service dimension, and counting the total number of the service types and the total number of the users in the service dimension; the total number of the service types and the second quotient of the total number of the users are calculated, and initial first similarity values among all users in the user group are calculated; and normalizing the initial first numerical similarity, and determining a first similarity value among all users in the user group.
Therefore, the first similarity value of all users in the user group can be determined under different types of service dimensions, the first similarity value can be accurately and conveniently calculated, and further accurate calculation of the subsequent second similarity value is facilitated.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an apparatus for identifying abnormal user groups according to an embodiment of the present application. As shown in fig. 6, the identification device 600 includes:
the user set generating module 610 is configured to divide the plurality of users into a plurality of user sets based on the service start time indicated by the acquired service information of each user and a preset time interval.
The first similarity value calculating module 620 is configured to determine, as a user group, a plurality of users that repeatedly associate and appear in each user group, and calculate, in each service dimension, a first similarity value between all users in the user group.
A second similarity value determining module 630, configured to linearly weight the first similarity value and the corresponding dimension weight coefficient under each service dimension, and determine a second similarity value between all users of the user group.
The abnormal user community determining module 640 is configured to rank the determined second similarity values among all users of each of the plurality of user communities according to a preset ranking order, and determine a plurality of user communities corresponding to the plurality of second similarity values ranked before the preset number as a plurality of abnormal user communities.
Further, the service dimension includes one of a numeric service dimension and a category service dimension.
Further, when the service dimension includes a numeric service dimension, the first similarity value calculation module 620 is configured to calculate a first similarity value between all users in the user group by:
for the same service dimension, acquiring a characteristic value corresponding to each user in the user group under the service dimension, adding the characteristic values corresponding to all users, and determining a characteristic value and a value;
respectively differencing the characteristic value corresponding to each user with the characteristic value sum value to obtain a plurality of first difference values, and determining a first difference absolute value of each first difference value;
adding the absolute value of the first difference to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity;
and normalizing the initial first numerical similarity, and determining a first similarity value among all users in the user group.
Further, when the service dimension includes a category type service dimension, the first similarity value calculating module 620 is configured to calculate a first similarity value among all users in the user group by:
Aiming at the same service dimension, acquiring a service type corresponding to each user in the service dimension, and counting the total number of the service types and the total number of the users in the dimension;
the total number of the service types and the second quotient of the total number of the users are calculated, and initial first similarity values among all users in the user group are calculated;
and normalizing the initial first numerical similarity, and determining a first similarity value among all users in the user group.
Further, the first similarity value calculation module 620 is configured to determine the first similarity value by:
determining initial first similarity values corresponding to each user group under the same service dimension, and determining the largest initial first similarity value and the smallest initial first similarity value in all the initial first similarity values;
respectively differencing the maximum initial first similarity value with each initial first similarity value to obtain a plurality of second difference values;
determining a third difference value between the maximum initial first similarity value and the minimum initial first similarity value as a normalization coefficient;
and multiplying each second difference value by the normalized coefficient to determine a plurality of first similarity values.
The device for identifying abnormal user groups provided by the embodiment of the application divides a plurality of users to generate a plurality of user sets based on the acquired service start time indicated by the service information of each user and a preset time interval; determining a plurality of users repeatedly associated in each user set as a user group, and calculating a first similarity value among all users in the user group under each service dimension; linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient, and determining second similarity values among all users of the user group; and ordering the second similarity values among all users of each of the determined user communities according to a preset arrangement order, and determining the user communities corresponding to the second similarity values which are ordered before the preset number as abnormal user communities.
In this way, a plurality of user sets are determined according to the service information starting time indicated by the service information of each user, a plurality of users repeatedly associated in the plurality of user sets are determined as one user group, first similarity values of all users in the user groups in each dimension are determined, second similarity values among all users in each user group are determined according to the weight coefficient corresponding to each service dimension, all user groups are ordered according to the second similarity values, the user groups with the similarity values at the preset positions are determined as abnormal user groups, and a plurality of associated abnormal users can be determined through the abnormal user groups at the same time, so that the judging efficiency and the judging accuracy of the abnormal users can be improved.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the application. As shown in fig. 7, the electronic device 700 includes a processor 710, a memory 720, and a bus 730.
The memory 720 stores machine-readable instructions executable by the processor 710, and when the electronic device 700 is running, the processor 710 communicates with the memory 720 through the bus 730, and when the machine-readable instructions are executed by the processor 710, the steps of the method for identifying an abnormal user community in the method embodiments shown in fig. 2 to 5 can be executed, and detailed implementation of the method embodiments will be omitted herein.
The embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor may perform the steps of the method for identifying an abnormal user group in the method embodiment shown in the foregoing fig. 2 to 5, and the specific implementation manner may refer to the method embodiment and will not be described herein.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (8)

1. A method of identifying an abnormal community of users, the method comprising:
dividing a plurality of users based on the acquired service start time indicated by the service information of each user and a preset time interval to generate a plurality of user sets;
determining a plurality of users repeatedly associated in each user set as a user group, and calculating a first similarity value among all users in the user group under each service dimension;
linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient, and determining second similarity values among all users of the user group;
according to a preset arrangement order, ordering second similarity values among all users of each of the determined user communities, and determining a plurality of user communities corresponding to the second similarity values before the preset quantity as a plurality of abnormal user communities;
the service dimension comprises one of a numerical service dimension and a category service dimension;
when the service dimension comprises a numerical service dimension, the first similarity value is determined based on the absolute value of the difference between the characteristic value corresponding to each user in the user group and the sum of the characteristic values of all users in the same service dimension;
When the service dimension includes a category type service dimension, the first similarity value is determined based on the total number of service types corresponding to the users and the quotient value between the total number of the users in the same service dimension.
2. The identification method of claim 1, wherein when the business dimension comprises a numeric business dimension, calculating a first similarity value between all users in the community of users by:
for the same service dimension, acquiring a characteristic value corresponding to each user in the user group under the service dimension, adding the characteristic values corresponding to all users, and determining a characteristic value and a value;
respectively differencing the characteristic value corresponding to each user with the characteristic value sum value to obtain a plurality of first difference values, and determining a first difference absolute value of each first difference value;
adding the absolute value of the first difference to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity;
and normalizing the initial first numerical similarity, and determining a first similarity value among all users in the user group.
3. The identification method according to claim 1, wherein when the service dimension comprises a category service dimension, a first similarity value between all users in the user community is calculated by:
Aiming at the same service dimension, acquiring a service type corresponding to each user in the service dimension, and counting the total number of the service types and the total number of the users in the dimension;
determining an initial first similarity value among all users in the user group by using the total number of the service types and the second quotient value of the total number of the users;
and normalizing the initial first numerical similarity, and determining a first similarity value among all users in the user group.
4. A method of identifying as claimed in claim 2 or 3, wherein the first similarity value is determined by:
determining initial first similarity values corresponding to each user group under the same service dimension, and determining the largest initial first similarity value and the smallest initial first similarity value in all the initial first similarity values;
respectively differencing the maximum initial first similarity value with each initial first similarity value to obtain a plurality of second difference values;
determining a third difference value between the maximum initial first similarity value and the minimum initial first similarity value as a normalization coefficient;
and multiplying each second difference value by the normalized coefficient to determine a plurality of first similarity values.
5. An identification device for an abnormal user community, the identification device comprising:
the user set generation module is used for dividing a plurality of users to generate a plurality of user sets based on the acquired service start time indicated by the service information of each user and a preset time interval;
the first similarity value calculation module is used for determining a plurality of users repeatedly associated and appearing in each user set as a user group, and calculating first similarity values among all users in the user group under each service dimension;
the second similarity value determining module is used for linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient to determine the second similarity value among all users of the user group;
the abnormal user community determining module is used for ordering the second similarity values among all users of each user community in the determined multiple user communities according to a preset arrangement order, and determining the multiple user communities corresponding to the multiple second similarity values ordered before the preset number as multiple abnormal user communities;
the service dimension comprises one of a numerical service dimension and a category service dimension;
When the service dimension comprises a numerical service dimension, the first similarity value is determined based on the absolute value of the difference between the characteristic value corresponding to each user in the user group and the sum of the characteristic values of all users in the same service dimension;
when the service dimension includes a category type service dimension, the first similarity value is determined based on the total number of service types corresponding to the users and the quotient value between the total number of the users in the same service dimension.
6. The identification device of claim 5, wherein when the business dimension comprises a numeric business dimension, the first similarity value calculation module is configured to calculate a first similarity value between all users in the community of users by:
for the same service dimension, acquiring a characteristic value corresponding to each user in the user group under the service dimension, adding the characteristic values corresponding to all users, and determining a characteristic value and a value;
respectively differencing the characteristic value corresponding to each user with the characteristic value sum value to obtain a plurality of first difference values, and determining a first difference absolute value of each first difference value;
adding the absolute value of the first difference to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity;
And normalizing the initial first numerical similarity, and determining a first similarity value among all users in the user group.
7. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating via said bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of the method of identifying an abnormal community of users according to any one of claims 1 to 4.
8. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, performs the steps of the method of identifying an abnormal user community according to any one of claims 1 to 4.
CN202010082795.0A 2020-02-07 2020-02-07 Identification method and device for abnormal user group and readable storage medium Active CN111311276B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010082795.0A CN111311276B (en) 2020-02-07 2020-02-07 Identification method and device for abnormal user group and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010082795.0A CN111311276B (en) 2020-02-07 2020-02-07 Identification method and device for abnormal user group and readable storage medium

Publications (2)

Publication Number Publication Date
CN111311276A CN111311276A (en) 2020-06-19
CN111311276B true CN111311276B (en) 2023-08-29

Family

ID=71161807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010082795.0A Active CN111311276B (en) 2020-02-07 2020-02-07 Identification method and device for abnormal user group and readable storage medium

Country Status (1)

Country Link
CN (1) CN111311276B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767315B (en) * 2020-06-29 2023-07-04 北京奇艺世纪科技有限公司 Black product identification method and device, electronic equipment and storage medium
CN113706181B (en) * 2021-10-30 2022-02-08 杭银消费金融股份有限公司 Service processing detection method and system based on user behavior characteristics

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9231962B1 (en) * 2013-11-12 2016-01-05 Emc Corporation Identifying suspicious user logins in enterprise networks
CN105808988A (en) * 2014-12-31 2016-07-27 阿里巴巴集团控股有限公司 Method and device for identifying exceptional account
CN110032583A (en) * 2019-03-12 2019-07-19 平安科技(深圳)有限公司 A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device
CN110046929A (en) * 2019-03-12 2019-07-23 平安科技(深圳)有限公司 A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device
CN110335139A (en) * 2019-06-21 2019-10-15 深圳前海微众银行股份有限公司 Appraisal procedure, device, equipment and readable storage medium storing program for executing based on similarity
CN110457707A (en) * 2019-08-16 2019-11-15 秒针信息技术有限公司 Extracting method, device, electronic equipment and the readable storage medium storing program for executing of notional word keyword
CN110610182A (en) * 2018-06-15 2019-12-24 武汉安天信息技术有限责任公司 User track similarity judgment method and related device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9231962B1 (en) * 2013-11-12 2016-01-05 Emc Corporation Identifying suspicious user logins in enterprise networks
CN105808988A (en) * 2014-12-31 2016-07-27 阿里巴巴集团控股有限公司 Method and device for identifying exceptional account
CN110610182A (en) * 2018-06-15 2019-12-24 武汉安天信息技术有限责任公司 User track similarity judgment method and related device
CN110032583A (en) * 2019-03-12 2019-07-19 平安科技(深圳)有限公司 A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device
CN110046929A (en) * 2019-03-12 2019-07-23 平安科技(深圳)有限公司 A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device
CN110335139A (en) * 2019-06-21 2019-10-15 深圳前海微众银行股份有限公司 Appraisal procedure, device, equipment and readable storage medium storing program for executing based on similarity
CN110457707A (en) * 2019-08-16 2019-11-15 秒针信息技术有限公司 Extracting method, device, electronic equipment and the readable storage medium storing program for executing of notional word keyword

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蒋鑫.基于属性约简的社交网络异常用户识别系统的设计与实现.《中国优秀硕士学位论文全文数据库(信息科技辑)》.2018,全文. *

Also Published As

Publication number Publication date
CN111311276A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
CN107423613B (en) Method and device for determining device fingerprint according to similarity and server
CN112669138B (en) Data processing method and related equipment
CN111311276B (en) Identification method and device for abnormal user group and readable storage medium
CN111144941A (en) Merchant score generation method, device, equipment and readable storage medium
CN113051291A (en) Work order information processing method, device, equipment and storage medium
CN113448955B (en) Data set quality evaluation method and device, computer equipment and storage medium
CN111160329A (en) Root cause analysis method and device
CN112927061A (en) User operation detection method and program product
CN114780606B (en) Big data mining method and system
CN111859057B (en) Data feature processing method and data feature processing device
CN110019762B (en) Problem positioning method, storage medium and server
EP3451611B1 (en) Method and apparatus for setting mobile device identifier
CN112990989A (en) Value prediction model input data generation method, device, equipment and medium
CN112101024A (en) Target object identification system based on app information
CN110728585A (en) Authority guaranteeing method, device, equipment and storage medium
CN114281867A (en) Data association method, device, storage medium and program product
CN115115403A (en) Method and device for classifying customers in target customer group, electronic equipment and storage medium
CN114817518A (en) License handling method, system and medium based on big data archive identification
CN110827144B (en) Application risk evaluation method and application risk evaluation device for user and electronic equipment
CN110570301B (en) Risk identification method, device, equipment and medium
CN114021116A (en) Construction method of homologous analysis knowledge base, homologous analysis method and device
CN112232962A (en) Transaction index processing method, device and equipment
CN111507397A (en) Abnormal data analysis method and device
CN110209763B (en) Data processing method, device and computer readable storage medium
CN115600112B (en) Method, device, equipment and medium for obtaining behavior prediction model training set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant