CN111311276A - Abnormal user group identification method, identification device and readable storage medium - Google Patents

Abnormal user group identification method, identification device and readable storage medium Download PDF

Info

Publication number
CN111311276A
CN111311276A CN202010082795.0A CN202010082795A CN111311276A CN 111311276 A CN111311276 A CN 111311276A CN 202010082795 A CN202010082795 A CN 202010082795A CN 111311276 A CN111311276 A CN 111311276A
Authority
CN
China
Prior art keywords
user
users
similarity
value
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010082795.0A
Other languages
Chinese (zh)
Other versions
CN111311276B (en
Inventor
韩跃盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202010082795.0A priority Critical patent/CN111311276B/en
Publication of CN111311276A publication Critical patent/CN111311276A/en
Application granted granted Critical
Publication of CN111311276B publication Critical patent/CN111311276B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Technology Law (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides an identification method, an identification device and a readable storage medium of an abnormal user group, wherein the identification method comprises the following steps: generating a plurality of user sets based on the initial time of each user service and a preset time interval; determining a plurality of users which are repeatedly associated in each user set as a user group, and calculating a first similarity value between all users in the user group under each service dimension; determining a second similarity value between all users of the user community based on the first similarity value under each service dimension and the corresponding dimension weight coefficient; and sequencing the second similarity values among all the users of each user group according to a preset sequencing order, and determining a plurality of user groups sequenced before a preset number as a plurality of abnormal user groups. Therefore, a plurality of associated abnormal users are determined through the abnormal user group, and the efficiency and the accuracy of judging the abnormal users are improved.

Description

Abnormal user group identification method, identification device and readable storage medium
Technical Field
The present application relates to the field of data analysis and statistics, and in particular, to a method and an apparatus for identifying an abnormal user group, and a readable storage medium.
Background
With the rapid development of internet finance, related services in the financial field increasingly utilize information technology, and users may have abnormal operations to perform false transactions in the process of performing financial transactions on the internet, thereby seriously affecting the security of the internet financial system.
In the current stage, for the troubleshooting of abnormal users, the single transaction of a single user is judged based on a fixed rule, the abnormal condition of the single transaction of one user is judged, the data volume to be judged is large, the accuracy of the single data is low, the number of users capable of being qualitatively analyzed is small, and the efficiency and the accuracy of the judgment of the abnormal users are influenced.
Disclosure of Invention
In view of the above, an object of the present application is to provide an identification method, an identification apparatus, and a readable storage medium for identifying an abnormal user group, wherein at least one abnormal user group is determined by analyzing similarities among users in the divided user groups, and a plurality of associated abnormal users are determined based on the abnormal user group, which is helpful for improving efficiency and accuracy of determining the abnormal users.
The embodiment of the application provides an identification method of an abnormal user group, which comprises the following steps:
dividing a plurality of users to generate a plurality of user sets based on the service starting time and the preset time interval indicated by the acquired service information of each user;
determining a plurality of users which are repeatedly associated in each user set as a user group, and calculating a first similarity value between all users in the user group under each service dimension;
linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient to determine a second similarity value among all users of the user community;
and sequencing the second similarity values of all the users of each of the plurality of determined user groups according to a preset sequencing order, and determining the plurality of user groups corresponding to the second similarity values sequenced before the preset number as a plurality of abnormal user groups.
Further, the business dimension includes one of a numeric business dimension and a category business dimension.
Further, when the business dimension comprises a numerical business dimension, calculating a first similarity value between all users in the user community by:
aiming at the same service dimension, acquiring a characteristic value corresponding to each user in the user group under the service dimension, adding the characteristic values corresponding to all the users, and determining the characteristic values and the values;
respectively subtracting the characteristic value corresponding to each user from the characteristic value and the value to obtain a plurality of first difference values, and determining a first difference absolute value of each first difference value;
adding the absolute value of the first difference to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity;
and normalizing the initial first numerical similarity to determine a first similarity value among all users in the user community.
Further, when the business dimension comprises a category business dimension, calculating a first similarity value between all users in the user community by:
aiming at the same service dimension, acquiring a service type corresponding to each user in the service dimension, and counting the total number of the service types and the total number of the users in the dimension;
the second quotient of the total number of the service types and the total number of the users and the initial first similarity value among all the users in the user group are obtained;
and normalizing the initial first numerical similarity to determine a first similarity value among all users in the user community.
Further, the first similarity value is determined by:
determining initial first similarity values corresponding to each user group under the same service dimension, and determining the maximum initial first similarity value and the minimum initial first similarity value in all the initial first similarity values;
respectively subtracting the maximum initial first similarity value from each initial first similarity value to obtain a plurality of second difference values;
determining a third difference between the maximum initial first similarity value and the minimum initial first similarity value as a normalization coefficient;
and dividing each second difference value by the normalization coefficient to determine a plurality of first similarity values.
The embodiment of the present application further provides an identification apparatus for an abnormal user group, where the identification apparatus includes:
the user set generation module is used for dividing a plurality of users to generate a plurality of user sets based on the service starting time and the preset time interval indicated by the acquired service information of each user;
the first similarity value calculation module is used for determining a plurality of users which are repeatedly associated in each user set as a user group and calculating the first similarity values among all the users in the user group in each service dimension;
the second similarity value determining module is used for linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient to determine a second similarity value among all users of the user community;
and the abnormal user group determining module is used for sequencing the second similarity values of all the users of each of the plurality of determined user groups according to a preset sequencing order, and determining the plurality of user groups corresponding to the plurality of second similarity values sequenced before the preset number as a plurality of abnormal user groups.
Further, the business dimension includes one of a numeric business dimension and a category business dimension.
Further, when the business dimension includes a numerical business dimension, the first similarity value calculation module is configured to calculate a first similarity value between all users in the user community by:
aiming at the same service dimension, acquiring a characteristic value corresponding to each user in the user group under the service dimension, adding the characteristic values corresponding to all the users, and determining the characteristic values and the values;
respectively subtracting the characteristic value corresponding to each user from the characteristic value and the value to obtain a plurality of first difference values, and determining a first difference absolute value of each first difference value;
adding the absolute value of the first difference to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity;
and normalizing the initial first numerical similarity to determine a first similarity value among all users in the user community.
Further, when the business dimension includes a category-type business dimension, the first similarity value calculation module is configured to calculate a first similarity value between all users in the user community by:
aiming at the same service dimension, acquiring a service type corresponding to each user in the service dimension, and counting the total number of the service types and the total number of the users in the dimension;
the second quotient of the total number of the service types and the total number of the users and the initial first similarity value among all the users in the user group are obtained;
and normalizing the initial first numerical similarity to determine a first similarity value among all users in the user community.
Further, the first similarity value calculation module is configured to determine the first similarity value by:
determining initial first similarity values corresponding to each user group under the same service dimension, and determining the maximum initial first similarity value and the minimum initial first similarity value in all the initial first similarity values;
respectively subtracting the maximum initial first similarity value from each initial first similarity value to obtain a plurality of second difference values;
determining a third difference between the maximum initial first similarity value and the minimum initial first similarity value as a normalization coefficient;
and dividing each second difference value by the normalization coefficient to determine a plurality of first similarity values.
An embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions being executable by the processor to perform the steps of the method for identifying an abnormal user community as described above.
The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for identifying an abnormal user group as described above are performed.
According to the identification method, the identification device and the readable storage device for the abnormal user group, provided by the embodiment of the application, a plurality of users are divided to generate a plurality of user sets based on the service starting time indicated by the acquired service information of each user and the preset time interval; determining a plurality of users which are repeatedly associated in each user set as a user group, and calculating a first similarity value between all users in the user group under each service dimension; linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient to determine a second similarity value among all users of the user community; and sequencing the second similarity values of all the users of each of the plurality of determined user groups according to a preset sequencing order, and determining the plurality of user groups corresponding to the second similarity values sequenced before the preset number as a plurality of abnormal user groups.
Therefore, a plurality of user sets are determined through the service information starting time indicated by the service information of each user, a plurality of users which are repeatedly associated in the user sets are determined as a user group, the first similarity value of all users in the user group under each dimension is determined, the second similarity value among all users in each user group is determined according to the weight coefficient corresponding to each service dimension, all user groups are sequenced according to the second similarity value, the user group with the similarity value positioned at the front preset position is determined as an abnormal user group, a plurality of associated abnormal users can be simultaneously determined through the abnormal user group, and the efficiency and the accuracy of judging the abnormal users are improved.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a block diagram of a possible application scenario;
fig. 2 is a flowchart of an abnormal user group identification method according to an embodiment of the present disclosure;
fig. 3 is a flowchart for calculating a first similarity value between all users in the user community when the business dimension includes a numerical business dimension according to an embodiment of the present application;
fig. 4 is a flowchart for calculating a first similarity value between all users in the user community when the business dimension includes a category-type business dimension according to the embodiment of the present application;
fig. 5 is a flowchart for determining a first similarity value according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an apparatus for identifying an abnormal user group according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. Every other embodiment that can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present application falls within the protection scope of the present application.
First, an application scenario to which the present application is applicable will be described. The method and the device can be applied to the technical field of data analysis and statistics, at least one abnormal user group is determined through analysis of similarity between users in the divided user groups, a plurality of associated abnormal users are determined simultaneously through the abnormal user groups, and the method and the device are favorable for improving the efficiency and the accuracy of judgment of the abnormal users. Referring to fig. 1, fig. 1 is a system structure diagram in a possible application scenario, as shown in fig. 1, the system includes a service information database and an identification device, the service information database stores service information generated when a user performs a service, where the service data includes service feature data and basic feature data; after the identification device obtains the service information, a plurality of user groups are determined according to the service information, and a plurality of abnormal user groups are determined according to the similarity between all users in the user groups.
Research shows that in the present stage, for the troubleshooting of abnormal users, the single transaction of a single user is judged based on a fixed rule, the abnormal condition of the single transaction of one user is judged, the data volume to be judged is large, the accuracy of the single data is low, the number of users capable of being qualitatively analyzed is small, and the judgment efficiency and the accuracy of the shadow abnormal users are high.
Based on this, the embodiment of the application provides an identification method of an abnormal user group, at least one abnormal user group is determined through analysis of similarity among users in the divided user group, and a plurality of associated abnormal users are determined based on the abnormal user group, which is beneficial to improving efficiency and accuracy of abnormal user judgment.
Referring to fig. 2, fig. 2 is a flowchart illustrating a method for identifying an abnormal user group according to an embodiment of the present application. As shown in fig. 2, the method for identifying an abnormal user group provided in the embodiment of the present application includes:
step 201, dividing a plurality of users to generate a plurality of user sets based on the service start time and the preset time interval indicated by the acquired service information of each user.
In the step, a plurality of pieces of service information of a plurality of users are obtained from a service information base, and the plurality of users are divided according to corresponding service start time according to the service start time of the service started by the user indicated by the service information and a preset time interval, so as to generate a plurality of user sets.
Since one user may correspond to more than one piece of service information, the same user may exist in a plurality of user sets generated by dividing the user according to the service start time of each piece of service information, that is, the same user may belong to different user sets.
Here, the specific process of dividing the users according to their service start times may be: sequencing all the obtained service starting times of the plurality of users according to the time sequence, starting from the user with the earliest service starting time, and dividing the user set according to a preset time interval until the user with the latest service starting time is divided.
For example, there are five users: the method comprises the steps that a user A, a user B, a user C, a user D and a user E are assumed that only one piece of service information is obtained by each user, analysis shows that the service starting time of the user A is 8:00, the service starting time of the user B is 9:00, the service starting time of the user C is 9:30, the service starting time of the user D is 9:45, the service starting time of the user E is 10:00, a queue of a journey is formed by arranging the user A, the user B, the user C, the user D and the user E according to the sequence of the service starting times, the obtained five users are divided into three user sets (the user A, the user B, the user C, the user D and the user E) assuming that the preset time interval is 1, and three user sets are obtained, namely the user A, the user B, the user C, the user D and the user E.
Step 202, determining a plurality of users repeatedly associated in each user set as a user community, and calculating a first similarity value between all users in the user community under each service dimension.
In this step, after a plurality of user sets are generated in step 201, each user included in each user set is analyzed, a user repeatedly appearing in association in each user set is determined as a user group, and for each user group, a first similarity value between all users included in the user group in each business dimension in the plurality of business dimensions is calculated.
Wherein the business dimension comprises one of a numeric business dimension and a category business dimension. Taking the business information as an example, for the business information, the transaction dimensions can include transaction times, transaction amount, transaction opponents, account opening dates, ages, account opening mechanisms and other dimensions, for the dimensions, the transaction times, the transaction amount, the ages and the like can be quantitatively described through specific numerical values, the characteristics belong to numerical business dimensions, for the dimensions of the transaction opponents, the account opening dates, the account opening mechanisms and the like, the characteristics are qualitative characteristics descriptions, different characteristics only have category difference among different characteristics descriptions, and cannot be described through specific numerical values, and the characteristic dimensions belong to category business dimensions.
Here, the determination of the plurality of users whose repetitive association appears may be determined by the FP-Growth algorithm, which employs the following policy: compressing the database providing the frequent item set to a frequent pattern tree, and still keeping the associated information of the item set; a data structure called a frequent pattern tree is used in the algorithm, the FP-tree is a special prefix tree and is composed of a frequent item head table and an item prefix tree, and the FP-Growth algorithm accelerates the whole mining process based on the structure, so that a plurality of frequently associated users are determined.
For example, four user sets are divided at the present stage, the user set 1 (user 1, user 2, user 3, user 4, user 5, user 6), the user set 2 (user 4, user 5, user 7, user 8, user 9), the user set 3 (user 1, user 2, user 3, user 7), the user set 4 (user 1, user 2, user 3, user 9), and two user groups (user 1, user 2, user 3) and (user 4, user 5) can be obtained through the FP-Growth algorithm.
Here, after determining the user community, a first similarity value between all users is calculated for all users included in the user community under each business dimension that can be obtained through business information of the users, and for a user community, how many business dimensions are indicated in business information of the users included in the community, how many first similarity values exist between all users of the user community.
Step 203, linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient, and determining a second similarity value between all users of the user community.
In this step, after the first similarity value of the user community under each service dimension is obtained in step 202, the weight coefficient corresponding to each service dimension is determined, each first similarity value and the corresponding weight coefficient are linearly weighted, and the second similarity value between all users of the user community is determined.
Here, the setting of the weight coefficient for each service dimension may be determined based on a history calculation process, or may be sorting each service dimension according to a more emphasized service dimension of a calculation result, and sequentially setting the weight coefficient corresponding to each service dimension in order.
And 204, sequencing the second similarity values of all the users of each user group in the plurality of determined user groups according to a preset sequencing order, and determining the plurality of user groups corresponding to the second similarity values sequenced before the preset number as a plurality of abnormal user groups.
In this step, the steps 201 to 203 are repeated, a plurality of user groups and a second similarity value of each user group are determined, the second similarity values of each user group are sorted from large to small, and the plurality of user groups in the preset number are determined as a plurality of abnormal user groups according to the preset number.
Here, the setting of the preset number may be based on a history setting or a setting for a requirement of the number of abnormal user groups, and may be ranked in the first three digits or the first ten digits.
According to the method for identifying the abnormal user group, the multiple users are divided to generate multiple user sets based on the service starting time and the preset time interval indicated by the acquired service information of each user; determining a plurality of users which are repeatedly associated in each user set as a user group, and calculating a first similarity value between all users in the user group under each service dimension; linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient to determine a second similarity value among all users of the user community; and sequencing the second similarity values of all the users of each of the plurality of determined user groups according to a preset sequencing order, and determining the plurality of user groups corresponding to the second similarity values sequenced before the preset number as a plurality of abnormal user groups.
Therefore, a plurality of user sets are determined through the service information starting time indicated by the service information of each user, a plurality of users which are repeatedly associated in the user sets are determined as a user group, the first similarity value of all users in the user group under each dimension is determined, the second similarity value among all users in each user group is determined according to the weight coefficient corresponding to each service dimension, all user groups are sequenced according to the second similarity value, the user group with the similarity value positioned at the front preset position is determined as an abnormal user group, a plurality of associated abnormal users can be simultaneously determined through the abnormal user group, and the efficiency and the accuracy of judging the abnormal users are improved.
Referring to fig. 3, fig. 3 is a flowchart for calculating a first similarity value between all users in the user group when the business dimension includes a numerical business dimension according to an embodiment of the present application. As shown in fig. 3, a first similarity value between all users in the community of users is calculated by:
step 301, for the same service dimension, obtaining a feature value corresponding to each user in the user group under the service dimension, and adding the feature values corresponding to all the users to determine a feature value and a value.
In this step, for the same service dimension, the feature value of each user in the user group in the dimension is determined, and the feature values corresponding to each user are added to determine the feature value sum.
Wherein, for a user feature value for which there may not be a feature value in this dimension, the sum of the feature values may be recorded as 0 when determining the feature value sum.
Taking the service information as transaction information, taking the service characteristic dimension as the number of transaction opponents as an example, three users, namely a user A, a user B and a user C, are shared in the user group, the number of the transaction opponents of the user A is 2, the number of the transaction opponents of the user B is 5, the number of the transaction opponents of the user C is 9, and the transaction numbers of the user A, the user B and the user C are added to obtain the characteristic value sum of 16.
Step 302, the feature value corresponding to each user is respectively subtracted from the feature value and the value to obtain a plurality of first difference values, and a first difference absolute value of each first difference value is determined.
In this step, after the feature value and the value are determined in step 301, the feature value corresponding to each user in the user group is subtracted from the feature value and the value to obtain a plurality of first difference values, and the absolute value of the plurality of difference values smaller than 0 is removed to facilitate subsequent calculation.
Corresponding to the above example, the feature value corresponding to the user a is 2, the feature sum value is calculated to be 16, the first difference value between the feature sum value and the feature sum value is-14, and then the absolute value is taken to obtain the corresponding first difference value absolute value to be 14; the feature value corresponding to the user B is 5, the first difference value of the feature value and the feature value is-11 by calculating the feature sum value to be 16, and then the absolute value is taken to obtain the corresponding first difference value absolute value to be 11; and the characteristic value corresponding to the user C is 9, the first difference value of the characteristic value and the characteristic value is-7 by calculating the characteristic sum value to be 16, and then the absolute value is taken to obtain the corresponding first difference value with the absolute value of 7.
And step 303, adding the absolute value of the first difference to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity.
In this step, the absolute value of each first difference in the user group obtained in step 302 is added to obtain an absolute value sum, and the absolute value sum is divided by the feature value sum to determine an obtained first quotient as an initial first numerical similarity.
Corresponding to the above example, the first difference absolute value 14 determined by user a, the first difference absolute value 11 determined by user B, and the first difference absolute value 7 determined by user C are summed to obtain an absolute value sum 32, and the absolute value sum 32 is divided by the eigenvalue sum 16 to obtain an initial first numerical similarity 2.
Step 304, normalizing the initial first numerical similarity, and determining a first similarity value between all users in the user group.
In this step, the initial first numerical value similarity determined in step 303 is normalized, that is, the initial first numerical value is converted into a numerical value between (0, 1), and a normalized result is determined as a first similarity value between all users in the user group.
Here, the normalization of the initial similarity values between all users of the user community is performed according to the initial similarity values between all users of other user communities in the same service dimension.
Further, please refer to fig. 4, where fig. 4 is a flowchart for calculating a first similarity value between all users in the user community when the business dimension includes a category-type business dimension according to the embodiment of the present application. As shown in fig. 4, a first similarity value between all users in the community of users is calculated by:
step 401, for the same service dimension, obtaining a service type corresponding to each user in the service dimension, and counting the total number of service types and the total number of users in the dimension.
In the step, aiming at the same type service dimension, the corresponding service type of each user in the user group is determined, and the number of the service types in the dimension and the number of the users in the user group are counted according to the qualitative analysis of the service type of each user.
Taking the service information as transaction information, taking the service characteristic dimension as an account opening mechanism as an example, it can be known through the transaction information of the user that the account opening mechanism of the user A in the user group is a business bank, the account opening mechanism of the user B is a traffic bank, and the account opening mechanism of the user C is a business bank, wherein under the dimension of the account opening mechanism, the total number of service types is 2, and the total number of users is 3.
Step 402, obtaining a second quotient value of the total number of the service types and the total number of the users, and an initial first similarity value between all users in the user group.
In this step, the total number of the service types determined in step 401 is divided by the total number of the users, and a second quotient between the two is determined as an initial first similarity value between all the users in the user group.
Corresponding to the above example, the total number of the service types is 2, the total number of the users is 3, and the two are divided to obtain an initial first similarity value of 0.6.
Step 403, normalizing the initial first numerical similarity, and determining a first similarity value between all users in the user group.
In this step, the initial first value similarity determined in step 402 is normalized, that is, the initial first value is converted into a value between (0, 1), and a normalized result is determined as a first similarity value between all users in the user group.
Here, the similarity value for the category-type user is generally a numerical value between (0, 1), and may not be normalized.
Further, referring to fig. 5, fig. 5 is a flowchart for determining a first similarity value according to an embodiment of the present application, and as shown in fig. 5, the first similarity value is determined through the following steps:
step 501, determining initial first similarity values corresponding to each user group in the same service dimension, and determining the largest initial first similarity value and the smallest initial first similarity value in all the initial first similarity values.
In this step, in the same service dimension, each initial first similarity value of each user group in the dimension is determined, and each initial first similarity value is arranged according to a descending order (descending order), from which a maximum initial first similarity value and a minimum initial first similarity value are determined.
The business feature dimension is the number of counterparties for example, and there are three user groups in total, the transaction number similarity of the user group 1 is 2, the transaction number similarity of the user group 2 is 1.5, and the transaction number similarity of the user group 3 is 3, so it can be seen that the maximum initial first similarity value is 3 of the user group 3, and the minimum initial first similarity value is 1.5 of the user group 2.
Step 502, the maximum initial first similarity value is respectively subtracted from each initial first similarity value to obtain a plurality of second difference values.
In this step, after the maximum initial first similarity value is determined in step 502, the maximum initial first similarity value is respectively subtracted from the initial first similarity value of each user in the user group, so as to obtain a plurality of second difference values.
Corresponding to the above example, the second difference between the maximum initial similarity 3 and the user community 1 is 1, the second difference between the maximum initial similarity 3 and the user community 2 is 1.5, and the second difference between the maximum initial similarity 3 and the user community 3 is 0.
Step 503, determining a third difference between the maximum initial first similarity value and the minimum initial first similarity value as a normalization coefficient.
In this step, the largest initial first similarity value determined in step 501 and the smallest initial first similarity value are subtracted from each other to determine a normalization coefficient.
Corresponding to the above example, the largest initial first similarity value is 3 for the user community 3, the smallest initial first similarity value is 1.5 for the user community 2, and the difference between the two is 1.5, i.e. the normalization factor is 1.5.
Step 504, the quotient of each second difference value and the normalization coefficient is determined to determine a plurality of first similarity values.
In the step, each second difference value of each user group under the same service dimension is subjected to quotient making with the normalization coefficient, and a first similarity value of each user group is determined.
Corresponding to the above example, the second difference value of the user community 1 is 1, and the normalization coefficient is 1.5, so the first similarity value of the user community 1 is 0.67; the second difference value of the user group 2 is 1.5, and the normalization coefficient is 1.5, so the first similarity value of the user group 2 is 1; the second difference value of the user community 3 is 10 and the normalization factor is 1.5, so the first similarity value of the user community 3 is 0.
According to the identification method of the abnormal user group provided by the embodiment of the application, when the service dimension comprises a numerical service dimension, the characteristic value corresponding to each user in the user group under the service dimension is obtained aiming at the same service dimension, the characteristic values corresponding to all users are added, and the characteristic values and the values are determined; respectively subtracting the characteristic value corresponding to each user from the characteristic value and the value to obtain a plurality of first difference values, and determining a first difference absolute value of each first difference value; adding the absolute value of the first difference to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity; normalizing the initial first numerical similarity to determine a first similarity value between all users in the user community; when the service dimension comprises a category type service dimension, acquiring a service type corresponding to each user in the service dimension aiming at the same service dimension, and counting the total number of the service types and the total number of the users in the dimension; the second quotient of the total number of the service types and the total number of the users and the initial first similarity value among all the users in the user group are obtained; and normalizing the initial first numerical similarity to determine a first similarity value among all users in the user community.
Therefore, the first similarity values of all users in the user group can be determined under different types of service dimensions, the first similarity values can be accurately and conveniently calculated, and the subsequent accurate calculation of the second similarity values is facilitated.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an apparatus for identifying an abnormal user group according to an embodiment of the present application. As shown in fig. 6, the recognition apparatus 600 includes:
the user set generating module 610 is configured to divide the multiple users to generate multiple user sets based on the service start time and the preset time interval indicated by the obtained service information of each user.
And a first similarity value calculation module 620, configured to determine, as a user community, multiple users repeatedly associated in each user set, and calculate a first similarity value between all users in the user community in each business dimension.
A second similarity value determining module 630, configured to linearly weight the first similarity value and the corresponding dimension weight coefficient in each service dimension, and determine a second similarity value between all users of the user community.
The abnormal user group determining module 640 is configured to sort, according to a preset sorting order, the second similarity values between all the users of each of the plurality of determined user groups, and determine, as a plurality of abnormal user groups, a plurality of user groups corresponding to a plurality of second similarity values sorted before a preset number.
Further, the business dimension includes one of a numeric business dimension and a category business dimension.
Further, when the business dimension includes a numerical business dimension, the first similarity value calculation module 620 is configured to calculate a first similarity value between all users in the user community by:
aiming at the same service dimension, acquiring a characteristic value corresponding to each user in the user group under the service dimension, adding the characteristic values corresponding to all the users, and determining the characteristic values and the values;
respectively subtracting the characteristic value corresponding to each user from the characteristic value and the value to obtain a plurality of first difference values, and determining a first difference absolute value of each first difference value;
adding the absolute value of the first difference to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity;
and normalizing the initial first numerical similarity to determine a first similarity value among all users in the user community.
Further, when the business dimension includes a category-type business dimension, the first similarity value calculating module 620 is configured to calculate a first similarity value between all users in the user community by:
aiming at the same service dimension, acquiring a service type corresponding to each user in the service dimension, and counting the total number of the service types and the total number of the users in the dimension;
the second quotient of the total number of the service types and the total number of the users and the initial first similarity value among all the users in the user group are obtained;
and normalizing the initial first numerical similarity to determine a first similarity value among all users in the user community.
Further, the first similarity value calculating module 620 is configured to determine the first similarity value by:
determining initial first similarity values corresponding to each user group under the same service dimension, and determining the maximum initial first similarity value and the minimum initial first similarity value in all the initial first similarity values;
respectively subtracting the maximum initial first similarity value from each initial first similarity value to obtain a plurality of second difference values;
determining a third difference between the maximum initial first similarity value and the minimum initial first similarity value as a normalization coefficient;
and dividing each second difference value by the normalization coefficient to determine a plurality of first similarity values.
The identification device for the abnormal user group provided by the embodiment of the application divides a plurality of users to generate a plurality of user sets based on the service start time indicated by the acquired service information of each user and the preset time interval; determining a plurality of users which are repeatedly associated in each user set as a user group, and calculating a first similarity value between all users in the user group under each service dimension; linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient to determine a second similarity value among all users of the user community; and sequencing the second similarity values of all the users of each of the plurality of determined user groups according to a preset sequencing order, and determining the plurality of user groups corresponding to the second similarity values sequenced before the preset number as a plurality of abnormal user groups.
Therefore, a plurality of user sets are determined through the service information starting time indicated by the service information of each user, a plurality of users which are repeatedly associated in the user sets are determined as a user group, the first similarity value of all users in the user group under each dimension is determined, the second similarity value among all users in each user group is determined according to the weight coefficient corresponding to each service dimension, all user groups are sequenced according to the second similarity value, the user group with the similarity value positioned at the front preset position is determined as an abnormal user group, a plurality of associated abnormal users can be simultaneously determined through the abnormal user group, and the efficiency and the accuracy of judging the abnormal users are improved.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 7, the electronic device 700 includes a processor 710, a memory 720, and a bus 730.
The memory 720 stores machine-readable instructions executable by the processor 710, when the electronic device 700 runs, the processor 710 communicates with the memory 720 through the bus 730, and when the machine-readable instructions are executed by the processor 710, the steps of the method for identifying an abnormal user group in the embodiment of the method shown in fig. 2 to 5 may be performed.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for identifying an abnormal user group in the method embodiments shown in fig. 2 to fig. 5 may be executed.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. An identification method for an abnormal user community, the identification method comprising:
dividing a plurality of users to generate a plurality of user sets based on the service starting time and the preset time interval indicated by the acquired service information of each user;
determining a plurality of users which are repeatedly associated in each user set as a user group, and calculating a first similarity value between all users in the user group under each service dimension;
linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient to determine a second similarity value among all users of the user community;
and sequencing the second similarity values of all the users of each of the plurality of determined user groups according to a preset sequencing order, and determining the plurality of user groups corresponding to the second similarity values sequenced before the preset number as a plurality of abnormal user groups.
2. The identification method of claim 1, wherein the business dimension comprises one of a numeric business dimension and a category business dimension.
3. The method of claim 2, wherein when the business dimension comprises a numeric business dimension, calculating a first similarity value between all users in the user community by:
aiming at the same service dimension, acquiring a characteristic value corresponding to each user in the user group under the service dimension, adding the characteristic values corresponding to all the users, and determining the characteristic values and the values;
respectively subtracting the characteristic value corresponding to each user from the characteristic value and the value to obtain a plurality of first difference values, and determining a first difference absolute value of each first difference value;
adding the absolute value of the first difference to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity;
and normalizing the initial first numerical similarity to determine a first similarity value among all users in the user community.
4. The method of claim 2, wherein when the business dimension comprises a category business dimension, calculating a first similarity value between all users in the user community by:
aiming at the same service dimension, acquiring a service type corresponding to each user in the service dimension, and counting the total number of the service types and the total number of the users in the dimension;
the second quotient of the total number of the service types and the total number of the users and the initial first similarity value among all the users in the user group are obtained;
and normalizing the initial first numerical similarity to determine a first similarity value among all users in the user community.
5. An identification method according to claim 3 or 4, characterized in that the first similarity value is determined by:
determining initial first similarity values corresponding to each user group under the same service dimension, and determining the maximum initial first similarity value and the minimum initial first similarity value in all the initial first similarity values;
respectively subtracting the maximum initial first similarity value from each initial first similarity value to obtain a plurality of second difference values;
determining a third difference between the maximum initial first similarity value and the minimum initial first similarity value as a normalization coefficient;
and dividing each second difference value by the normalization coefficient to determine a plurality of first similarity values.
6. An apparatus for identifying an abnormal user group, the apparatus comprising:
the user set generation module is used for dividing a plurality of users to generate a plurality of user sets based on the service starting time and the preset time interval indicated by the acquired service information of each user;
the first similarity value calculation module is used for determining a plurality of users which are repeatedly associated in each user set as a user group and calculating the first similarity values among all the users in the user group in each service dimension;
the second similarity value determining module is used for linearly weighting the first similarity value under each service dimension and the corresponding dimension weight coefficient to determine a second similarity value among all users of the user community;
and the abnormal user group determining module is used for sequencing the second similarity values of all the users of each of the plurality of determined user groups according to a preset sequencing order, and determining the plurality of user groups corresponding to the plurality of second similarity values sequenced before the preset number as a plurality of abnormal user groups.
7. The identification device of claim 6, wherein the business dimension comprises one of a numeric business dimension and a category business dimension.
8. The identification device of claim 7, wherein when the business dimension comprises a numeric business dimension, the first similarity value calculation module is configured to calculate the first similarity value between all users in the user community by:
aiming at the same service dimension, acquiring a characteristic value corresponding to each user in the user group under the service dimension, adding the characteristic values corresponding to all the users, and determining the characteristic values and the values;
respectively subtracting the characteristic value corresponding to each user from the characteristic value and the value to obtain a plurality of first difference values, and determining a first difference absolute value of each first difference value;
adding the absolute value of the first difference to obtain an absolute value sum, and determining a first quotient of the absolute value sum and the characteristic value sum as an initial first numerical similarity;
and normalizing the initial first numerical similarity to determine a first similarity value among all users in the user community.
9. An electronic device, comprising: processor, memory and bus, said memory storing machine-readable instructions executable by said processor, said processor and said memory communicating over said bus when the electronic device is running, said machine-readable instructions when executed by said processor performing the steps of the method of identification of an abnormal user community as claimed in any one of claims 1 to 5.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the method for identifying an abnormal user community as claimed in any one of claims 1 to 5.
CN202010082795.0A 2020-02-07 2020-02-07 Identification method and device for abnormal user group and readable storage medium Active CN111311276B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010082795.0A CN111311276B (en) 2020-02-07 2020-02-07 Identification method and device for abnormal user group and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010082795.0A CN111311276B (en) 2020-02-07 2020-02-07 Identification method and device for abnormal user group and readable storage medium

Publications (2)

Publication Number Publication Date
CN111311276A true CN111311276A (en) 2020-06-19
CN111311276B CN111311276B (en) 2023-08-29

Family

ID=71161807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010082795.0A Active CN111311276B (en) 2020-02-07 2020-02-07 Identification method and device for abnormal user group and readable storage medium

Country Status (1)

Country Link
CN (1) CN111311276B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767315A (en) * 2020-06-29 2020-10-13 北京奇艺世纪科技有限公司 Black product identification method and device, electronic equipment and storage medium
CN113706181A (en) * 2021-10-30 2021-11-26 杭银消费金融股份有限公司 Service processing detection method and system based on user behavior characteristics

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9231962B1 (en) * 2013-11-12 2016-01-05 Emc Corporation Identifying suspicious user logins in enterprise networks
CN105808988A (en) * 2014-12-31 2016-07-27 阿里巴巴集团控股有限公司 Method and device for identifying exceptional account
CN110032583A (en) * 2019-03-12 2019-07-19 平安科技(深圳)有限公司 A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device
CN110046929A (en) * 2019-03-12 2019-07-23 平安科技(深圳)有限公司 A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device
CN110335139A (en) * 2019-06-21 2019-10-15 深圳前海微众银行股份有限公司 Appraisal procedure, device, equipment and readable storage medium storing program for executing based on similarity
CN110457707A (en) * 2019-08-16 2019-11-15 秒针信息技术有限公司 Extracting method, device, electronic equipment and the readable storage medium storing program for executing of notional word keyword
CN110610182A (en) * 2018-06-15 2019-12-24 武汉安天信息技术有限责任公司 User track similarity judgment method and related device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9231962B1 (en) * 2013-11-12 2016-01-05 Emc Corporation Identifying suspicious user logins in enterprise networks
CN105808988A (en) * 2014-12-31 2016-07-27 阿里巴巴集团控股有限公司 Method and device for identifying exceptional account
CN110610182A (en) * 2018-06-15 2019-12-24 武汉安天信息技术有限责任公司 User track similarity judgment method and related device
CN110032583A (en) * 2019-03-12 2019-07-19 平安科技(深圳)有限公司 A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device
CN110046929A (en) * 2019-03-12 2019-07-23 平安科技(深圳)有限公司 A kind of recognition methods of fraud clique, device, readable storage medium storing program for executing and terminal device
CN110335139A (en) * 2019-06-21 2019-10-15 深圳前海微众银行股份有限公司 Appraisal procedure, device, equipment and readable storage medium storing program for executing based on similarity
CN110457707A (en) * 2019-08-16 2019-11-15 秒针信息技术有限公司 Extracting method, device, electronic equipment and the readable storage medium storing program for executing of notional word keyword

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蒋鑫: "基于属性约简的社交网络异常用户识别系统的设计与实现" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767315A (en) * 2020-06-29 2020-10-13 北京奇艺世纪科技有限公司 Black product identification method and device, electronic equipment and storage medium
CN111767315B (en) * 2020-06-29 2023-07-04 北京奇艺世纪科技有限公司 Black product identification method and device, electronic equipment and storage medium
CN113706181A (en) * 2021-10-30 2021-11-26 杭银消费金融股份有限公司 Service processing detection method and system based on user behavior characteristics
CN113706181B (en) * 2021-10-30 2022-02-08 杭银消费金融股份有限公司 Service processing detection method and system based on user behavior characteristics

Also Published As

Publication number Publication date
CN111311276B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN107066616B (en) Account processing method and device and electronic equipment
CN107423613B (en) Method and device for determining device fingerprint according to similarity and server
CN110033170B (en) Method and device for identifying risky merchants
CN111507470A (en) Abnormal account identification method and device
CN111311276B (en) Identification method and device for abnormal user group and readable storage medium
CN109711424A (en) A kind of rule of conduct acquisition methods, device and equipment based on decision tree
CN113448955B (en) Data set quality evaluation method and device, computer equipment and storage medium
CN111160329A (en) Root cause analysis method and device
CN112395881A (en) Material label construction method and device, readable storage medium and electronic equipment
CN112200259A (en) Information gain text feature selection method and classification device based on classification and screening
CN111291567A (en) Evaluation method and device for manual labeling quality, electronic equipment and storage medium
CN116610821B (en) Knowledge graph-based enterprise risk analysis method, system and storage medium
CN110019762B (en) Problem positioning method, storage medium and server
CN112632000A (en) Log file clustering method and device, electronic equipment and readable storage medium
CN112101024A (en) Target object identification system based on app information
CN115859932A (en) Log template extraction method and device, electronic equipment and storage medium
CN115766215A (en) Abnormal flow detection method and device
CN114817518A (en) License handling method, system and medium based on big data archive identification
CN111859057B (en) Data feature processing method and data feature processing device
CN110570301B (en) Risk identification method, device, equipment and medium
CN112232962A (en) Transaction index processing method, device and equipment
CN115600112B (en) Method, device, equipment and medium for obtaining behavior prediction model training set
CN113379004B (en) Data table classification method and device, electronic equipment and storage medium
CN115358323A (en) Feature screening method and device
CN114662822A (en) Audit model determination method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant