CN110070383B - Abnormal user identification method and device based on big data analysis - Google Patents

Abnormal user identification method and device based on big data analysis Download PDF

Info

Publication number
CN110070383B
CN110070383B CN201811025042.5A CN201811025042A CN110070383B CN 110070383 B CN110070383 B CN 110070383B CN 201811025042 A CN201811025042 A CN 201811025042A CN 110070383 B CN110070383 B CN 110070383B
Authority
CN
China
Prior art keywords
user
abnormal
users
virtual asset
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811025042.5A
Other languages
Chinese (zh)
Other versions
CN110070383A (en
Inventor
黄强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN201811025042.5A priority Critical patent/CN110070383B/en
Publication of CN110070383A publication Critical patent/CN110070383A/en
Application granted granted Critical
Publication of CN110070383B publication Critical patent/CN110070383B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0225Avoiding frauds

Abstract

The disclosure relates to the technical field of big data, in particular to an abnormal user identification method based on big data analysis, an abnormal user identification device based on big data analysis, electronic equipment and a storage medium. The method comprises the following steps: acquiring virtual asset acquisition data of each user in a user set in a preset period; establishing a time tag vector for the user according to the virtual asset acquisition data; randomly selecting one user as a target user, and calculating the similarity between the target user and other users by using the time tag vector; and screening the users with the similarity with the target users being larger than a first preset threshold value as abnormal users. The method and the device can calculate the similarity between the users by using the time tag vector of each user, so that the abnormal users with higher similarity are obtained. And further, effective identification of abnormal users who maliciously acquire virtual assets in batches is realized.

Description

Abnormal user identification method and device based on big data analysis
Technical Field
The disclosure relates to the technical field of big data, in particular to an abnormal user identification method based on big data analysis, an abnormal user identification device based on big data analysis, electronic equipment and a storage medium.
Background
With the rapid development of electronic commerce, merchants can offer various marketing means at irregular intervals to improve the activity of users and to increase the viscosity of users. For example, a user may be issued a certain number of virtual assets, such as credits, coupons, etc., with which the user may consume, thereby attracting more users. However, with the popularization of marketing activities and the expansion of the marketing scope and the continuous expansion of the use scope of virtual assets, the security protection of the virtual assets has become an important subject. And, there are now cases where malicious bulk registers accounts and bulk obtains virtual assets to make a profit.
In order to cope with the above-mentioned situations, the prior art mostly adopts a method of checking the identity of an account before the virtual asset is issued and controlling the issuing of the virtual asset. However, the method has certain defects that the judgment accuracy of the abnormal account is low and misjudgment is easy to occur.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
It is an object of the present disclosure to provide an abnormal user identification method based on big data analysis, an abnormal user identification apparatus based on big data analysis, and an electronic device, a storage medium, which overcome, at least in part, one or more problems due to the limitations and disadvantages of the related art.
Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.
According to a first aspect of embodiments of the present disclosure, there is provided an abnormal user identification method based on big data analysis, the method including:
acquiring virtual asset acquisition data of each user in a user set in a preset period;
establishing a time tag vector for the user according to the virtual asset acquisition data;
randomly selecting one user as a target user, and calculating the similarity between the target user and other users by using the time tag vector;
and screening the users with the similarity with the target users being larger than a first preset threshold value as abnormal users.
In one exemplary embodiment of the present disclosure, the virtual asset acquisition data includes virtual asset acquisition time and/or check-in tag data; or the acquisition time and/or the number of check-in tags.
In an exemplary embodiment of the present disclosure, after acquiring the virtual asset acquisition data of the user within a preset period, the method further includes: screening the user according to the virtual asset acquisition data of the user to acquire a suspected user set, including:
screening users with the number of the virtual assets being greater than a second preset threshold to obtain a suspected user set; or alternatively
And screening users with virtual asset acquisition time and/or sign-in tag data larger than a third preset threshold value to acquire a suspected user set.
In an exemplary embodiment of the present disclosure, the method further comprises: and classifying the users in the suspected user set according to the virtual asset acquisition time and/or the sign-in tag data to acquire a plurality of similar user sets.
In an exemplary embodiment of the present disclosure, the randomly selecting one of the users as a target user, and calculating the similarity between the target user and other users using a time tag vector includes:
randomly selecting a user from the suspected user set as a target user;
and calculating the similarity of the target user and other users in the similar user set to which the target user belongs by using the time tag vector.
In an exemplary embodiment of the present disclosure, the method further comprises: judging whether the abnormal user has misjudgment or not comprises the following steps:
judging whether the abnormal user has a white list behavior or not, and if so, judging the abnormal user as a non-abnormal user; or alternatively
And extracting login equipment information corresponding to the virtual asset acquisition time and/or sign-in tag data of the abnormal user, and judging the abnormal user as a non-abnormal user if the replacement frequency of the login equipment information is smaller than a fourth preset threshold value.
In an exemplary embodiment of the present disclosure, the login device information includes:
any one or a combination of any plurality of MAC address, IP address, equipment identity, IMEI, MEID, and networking.
According to a second aspect of the embodiments of the present disclosure, there is provided an abnormal user identification apparatus based on big data analysis, including:
the basic data acquisition module is used for acquiring virtual asset acquisition data of each user in a user set in a preset period;
the time tag vector establishing module is used for establishing a time tag vector for the user according to the virtual asset acquisition data;
the similarity calculation module is used for randomly selecting one user as a target user and calculating the similarity between the target user and other users by using the time tag vector;
and the abnormal user screening module is used for screening that the user with the similarity with the target user being larger than a first preset threshold value is an abnormal user.
According to a third aspect of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described abnormal user identification method based on big data analysis.
According to a fourth aspect of the present disclosure, an electronic terminal includes:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform any one of the above-described abnormal user identification methods based on big data analysis.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:
in one embodiment of the disclosure, in the method for identifying abnormal users based on big data analysis, a time tag vector is established by using virtual asset acquisition data of each user, and similarity between users is calculated by using the time tag vector, so that abnormal users with higher similarity are acquired. And further, effective identification of abnormal users who maliciously acquire virtual assets in batches is realized.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.
FIG. 1 schematically illustrates a method for identifying abnormal users based on big data analysis in an exemplary embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a method of flow instance generation in an exemplary embodiment of the present disclosure;
FIG. 3 schematically illustrates a schematic composition of an abnormal user identification device based on big data analysis in an exemplary embodiment of the present disclosure;
FIG. 4 schematically illustrates another schematic diagram of an abnormal user identification apparatus based on big data analysis in an exemplary embodiment of the present disclosure;
fig. 5 schematically illustrates a schematic diagram of a program product implementing an abnormal user identification method based on big data analysis in an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
In this exemplary embodiment, first, an abnormal user identification based on big data analysis is provided. Referring to fig. 1, the method may include the steps of:
step S101, obtaining virtual asset obtaining data of each user in a user set in a preset period;
step S102, a time tag vector is established for the user according to the virtual asset acquisition data;
step S103, randomly selecting one user as a target user, and calculating the similarity between the target user and other users by using the time tag vector;
step S104, screening the users with the similarity larger than a first preset threshold value as abnormal users.
According to the abnormal user identification based on big data analysis, the time tag vector is built by utilizing the virtual asset acquisition data of each user, and the abnormal users with higher similarity are obtained by utilizing the similarity among users of the time tag vector technology. And further, effective identification of abnormal users who maliciously acquire virtual assets in batches is realized.
Hereinafter, each step of the above-described method in the present exemplary embodiment will be described in more detail with reference to the accompanying drawings and examples.
The E-commerce is used for promoting the consumption of the user and ensuring the viscosity of the user, and some marketing strategies for giving the virtual asset to the user can be set at random. For example, after a user logs in and signs in every natural day, a certain number of virtual assets can be obtained; or when a new user is registered, giving a certain amount of virtual assets to the registered user; or the user attending a designated activity to obtain a number of virtual assets, etc. The virtual asset may be, for example: the user can use the virtual asset to consume, exchange gifts, make purchase money, and the like after obtaining the virtual asset, such as points, coupons, vouchers, and the like. In order to acquire a large number of virtual assets, there are black-age batch registered users, and the abnormal user identification method described above may be applied to identify abnormal users that are maliciously registered.
Step S101, obtaining virtual asset obtaining data of each user in a user set in a preset period.
In this example embodiment, for one user, virtual asset acquisition data for each user in the set of users over a period of time may be first acquired. For example, the user set may include all users, or may include only users selected according to a certain rule. The number of users in the user set and the selection mode of the users are not particularly limited in the present disclosure.
Further, the above-mentioned preset period may be 90 days from the date of creation by the user, or 90 days from the date when the user first acquired the virtual asset, or 80 days or 70 days from any one day in the history data of the user, or the like. The specific duration of the preset period can be set according to the requirement of data calculation, and the disclosure is not limited in particular.
In addition, the virtual asset acquisition data of the user may include the ID of the user, the time and number of virtual assets acquired, sign-in tag data, and the like. The sign-in tag data may be a sign-in tag generated after the user logs in and acquires the virtual asset, or a sign-in tag generated after the user logs in. For the sign-in tag data, 1 may be set when the user signs in, and 0 may be set when the user does not sign in. Of course, in other exemplary embodiments of the present disclosure, the virtual asset acquisition data described above may also include the user's current total amount of virtual assets, and so on.
For example, the user's current virtual asset acquisition data may be, for example: user a receives 15 points on 5 months 1 day 2018, the total current accumulated points 1542; user B receives 15 points on 5 months 2 days 2018, current total points 1422, and so on.
Step S102, a time tag vector is established for the user according to the virtual asset acquisition data.
In this example embodiment, when taking a natural day as a unit, in a preset period, if a user acquires a virtual asset on a certain day, 1 may be set; if a virtual asset is not obtained on a natural day, then 0 may be set. And then a time tag vector can be established for the user according to the acquisition time. Of course, the time stamp vector may also be established using the user check-in stamp data described above.
For example, if the preset period is seven days, the virtual credit or check-in is obtained by the user C in the first, second, fourth, fifth and sixth, and the corresponding time tag vector of the user C may be (1,1,0,1,1,1,0).
Based on the foregoing, in the present exemplary embodiment, after acquiring the virtual asset acquisition data of the user within a preset period, the method further includes:
step S1021, screening the user according to the virtual asset acquisition data of the user to acquire a suspected user set.
Specifically, this step may include: screening users with the number of the virtual assets being greater than a second preset threshold to obtain a suspected user set; or screening users with virtual asset acquisition time and/or sign-in tag data larger than a third preset threshold value to acquire a suspected user set.
Because abnormal users actively and intensively log in and check in daily and acquire virtual assets, suspected users can be screened according to the number of the virtual assets of the users and/or the check-in label data of the users. For example, at a preset period of 80 days, users with login days greater than 65 days may be screened and a set of suspected users may be generated. Or, judging the current integration quantity of the users, screening the users with the integration quantity larger than 2000, 2500 or 2800, and generating suspected user combinations.
Of course, in other exemplary embodiments of the present disclosure, users having check-in tag data less than a threshold value or having a number of virtual assets less than a preset threshold value within a preset period may be screened out, and the remaining users may be generated into a set of suspected users. For example, users with virtual assets less than 100 are screened out, or users with check-in days less than 15 days within a preset period are screened out, and the remaining users are generated into a set of suspected users.
In addition, in the present exemplary embodiment, after the suspected user set is acquired, the users in the suspected user set may also be classified. Specifically, the method may further include:
step S1022, classifying the users in the suspected user set according to the virtual asset acquisition time and/or the sign-in tag data to acquire a plurality of similar user sets.
For example, for users in the suspected user set, the users may be classified according to login tag data, for example, when the preset period is 90 days, users with a check-in time greater than 75 days may be classified into a user set, users with a check-in time between 60 and 75 days may be classified into a user set, and so on.
Of course, in other exemplary embodiments of the present disclosure, the set of similar users may also be partitioned according to the number of virtual assets. For example, users with a number of virtual assets greater than 3000 are divided into a set of similar users, users with a number of virtual assets 2200-2999 are divided into a set of similar users, and so on. Of course, the similar user sets may also be divided by combining the number of virtual assets and the sign-in tag data, for example, the users with the number of virtual assets greater than 3000 and the sign-in number greater than 70 days are divided into one similar user set; users with virtual assets numbers 2000-2500 check-in numbers 40-65 days are divided into a similar set of users, and so on. The present disclosure does not specifically limit the specific number of similar user set divisions.
By dividing users in the suspected user set into similar user sets, users with similar login states and virtual asset states can be acquired more accurately, and acquisition of later abnormal user sets is facilitated.
Step S103, randomly selecting one user as a target user, and calculating the similarity between the target user and other users by using the time tag vector.
In this exemplary embodiment, after the time tag vector of the user is obtained, a target user may be selected randomly from all the users, and the similarity between the target user and other users may be calculated by using a cosine similarity algorithm.
In addition, based on the above, after the suspected user set and each similar user set are obtained, the similarity between the target user and other users in the similar user set may also be calculated by using the time stamp vector. Conventional calculation methods may be employed for calculating the remaining chordal similarity using two known vectors, and this disclosure will not be described in detail.
In addition, the process may be repeated so that the similarity between any two users may be obtained.
Step S104, screening the users with the similarity larger than a first preset threshold value as abnormal users.
In this exemplary embodiment, after similarity data between users is obtained by calculation, users having similarity greater than a first preset threshold are clustered, so that an abnormal user set is obtained. For example, when the similarity between two users is greater than a preset value, it indicates that the time tag vectors of the two users are high in similarity, and further indicates that the login signing time of the two users is highly similar, so that the login behaviors of the two users can be considered to be abnormal, and the user is determined to be an abnormal user.
Because abnormal users are mostly registered users in batches, the users in batches can log in batches regularly to acquire virtual assets. Therefore, when a user logs in the check-in data in a preset period time, the data validity of the time tag vector can be ensured by establishing the time tag vector by using the check-in data in a certain period time. By calculating similar users by using the time tag vectors of the users, the calculation results of abnormal users can be ensured to have higher accuracy.
Based on the foregoing, in order to further optimize the determination result of the abnormal user, referring to fig. 2, the foregoing method may further include: step S105, determining whether the abnormal user has erroneous determination. Specifically, it may include:
s1051, judging whether the abnormal user has a white list behavior, if so, judging the abnormal user as a non-abnormal user; or alternatively
S1052, extracting login equipment information corresponding to the virtual asset acquisition time and/or sign-in tag data of the abnormal user, and judging the abnormal user as a non-abnormal user if the replacement frequency of the login equipment information is smaller than a fourth preset threshold value.
For example, the login device information may include: any one or a combination of any plurality of MAC address, IP address, equipment identity, IMEI, MEID, and networking. The fact that the sign-in frequency of some normal users is high is considered, but equipment used by login and IP are relatively fixed; the true abnormal user signs in frequently, but the information such as the equipment used by the login, the IP and the like is generally not fixed, and the change frequency is high. When the time tag vector is used for screening the abnormal users, the time tags of the normal users are high in similarity with the time tags of the abnormal users, and the normal users are easily judged as the abnormal users by mistake, so that the normal users in the abnormal user set can be screened out by using information such as equipment and IP.
In addition, after each piece of registration IP or equipment information of each abnormal user in the abnormal user set is acquired, a multidimensional vector can be established for the abnormal user, cosine similarity calculation is carried out on each different user in the abnormal user set, and different users with the same or similar equipment information are acquired, so that a plurality of abnormal users are accurately screened out.
Additionally, the whitelist behavior described above may include: conventional monetary transaction records, such as, for example, a recharge record, a Renminbi consumption record, or updated, perfected personal information, etc. When white list behavior exists, the user can be considered as a normal user, and the normal user is deleted from the abnormal user set.
By establishing a misjudgment and identification mechanism of the abnormal user, the abnormal user and the normal user can be distinguished from each other in multiple dimensions, so that the screening accuracy of the abnormal user is effectively improved.
In summary, according to the method in the present exemplary embodiment, the time tag vector is established for each user, and the similarity between the users is calculated by using the time tag vector, so as to obtain the abnormal users with higher similarity. And further, effective identification of abnormal users who maliciously acquire virtual assets in batches is realized. In addition, by setting a misjudgment and identification mechanism of the abnormal user, the misjudgment of the normal user as the abnormal user can be effectively avoided, and the identification accuracy of the abnormal user is further improved.
It should be noted that although the steps of the methods of the present disclosure are illustrated in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order or that all of the illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc. In addition, it is also readily understood that these steps may be performed synchronously or asynchronously, for example, in a plurality of modules/processes/threads.
Further, in the present exemplary embodiment, there is also provided an abnormal user identification apparatus 20 based on big data analysis. Referring to what is shown in fig. 3, the apparatus 20 may include: a basic data acquisition module 201, a time tag vector establishment module 202, a similarity calculation module 203 and an abnormal user screening module 204. Wherein:
the base data acquisition module 201 may be configured to acquire virtual asset acquisition data of a user within a preset period.
The time tag vector creation module 202 may be configured to create a time tag vector for the user based on the virtual asset acquisition data.
The similarity calculation module 203 may be configured to randomly select a user as a target user, and calculate a similarity between the target user and other users using the time stamp vector.
The abnormal user screening module 204 may be configured to screen users having a similarity with the target user greater than a first preset threshold as abnormal users.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied. The components shown as modules or units may or may not be physical units, may be located in one place, or may be distributed across multiple network elements. Some or all of the modules can be selected according to actual needs to achieve the purpose of the wood disclosure scheme. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above-described abnormal user identification method based on big data analysis is also provided.
Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 4. The electronic device 600 shown in fig. 4 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 4, the electronic device 600 is embodied in the form of a general purpose computing device. Components of electronic device 600 may include, but are not limited to: the at least one processing unit 610, the at least one memory unit 620, and a bus 630 that connects the various system components, including the memory unit 620 and the processing unit 610.
Wherein the storage unit stores program code that is executable by the processing unit 610 such that the processing unit 610 performs steps according to various exemplary embodiments of the present invention described in the above-described "exemplary methods" section of the present specification. For example, the processing unit 610 may perform step S101 shown in fig. 1, to acquire virtual asset acquisition data of the user within a preset period; step S102, a time tag vector is established for the user according to the virtual asset acquisition data; step S103, randomly selecting a user as a target user, and calculating the similarity between the target user and other users by using the time tag vector; step S104, screening the users with the similarity with the target users being larger than a first preset threshold value as abnormal users.
The storage unit 620 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 6201 and/or cache memory unit 6202, and may further include Read Only Memory (ROM) 6203.
The storage unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 630 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 600, and/or any device (e.g., router, modem, etc.) that enables the electronic device 600 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 650. Also, electronic device 600 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 660. As shown, network adapter 660 communicates with other modules of electronic device 600 over bus 630. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 600, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary methods" section of this specification, when said program product is run on the terminal device.
Referring to fig. 5, a program product 800 for implementing the above-described method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (7)

1. An abnormal user identification method based on big data analysis is characterized by comprising the following steps:
acquiring virtual asset acquisition data of each user in a user set in a preset period; wherein the preset period is calculated on the day of user creation or from the day when the user first acquired the virtual asset;
establishing a time tag vector for the user according to the virtual asset acquisition data; when taking a natural day as a unit, setting 1 if a user acquires a virtual asset on a certain day in a preset period; if the virtual asset is not obtained on a natural day, setting 0;
screening the user according to the virtual asset acquisition data of the user to acquire a suspected user set; classifying users in the suspected user set according to the virtual asset acquisition time and/or sign-in tag data to acquire a plurality of similar user sets;
randomly selecting one user as a target user, and calculating the similarity between the target user and other users by using the time tag vector, wherein the method comprises the following steps: randomly selecting a user from the suspected user set as a target user; calculating the similarity between the target user and other users in a similar user set to which the target user belongs by using a time tag vector;
screening users with the similarity between the target users and the target users being larger than a first preset threshold value as abnormal users;
the abnormal user set is established, each sign-in or equipment information of each abnormal user in the abnormal user set is obtained, a multidimensional vector is established for the abnormal user, cosine similarity calculation is carried out on each abnormal user in the abnormal user set, different users with the same equipment information are obtained, and therefore a plurality of abnormal users are screened out;
the method further comprises the steps of:
judging whether the abnormal user has misjudgment or not comprises the following steps:
judging whether the abnormal user has a white list behavior or not, and if so, judging the abnormal user as a non-abnormal user; or alternatively
And extracting login equipment information corresponding to the virtual asset acquisition time and/or sign-in tag data of the abnormal user, and judging the abnormal user as a non-abnormal user if the replacement frequency of the login equipment information is smaller than a fourth preset threshold value.
2. The method of claim 1, wherein the virtual asset acquisition data comprises virtual asset acquisition time and/or check-in tag data; or the acquisition time and/or the number of check-in tags.
3. The method of claim 2, wherein after acquiring the virtual asset acquisition data of the user over a predetermined period, the method further comprises: screening the user according to the virtual asset acquisition data of the user to acquire a suspected user set, including:
screening users with the number of the virtual assets being greater than a second preset threshold to obtain a suspected user set; or alternatively
And screening users with virtual asset acquisition time and/or sign-in tag data larger than a third preset threshold value to acquire a suspected user set.
4. The method of claim 1, wherein the login device information comprises:
any one or a combination of any plurality of MAC address, IP address, equipment identity, IMEI, MEID, and networking.
5. An abnormal user identification device based on big data analysis, comprising:
the basic data acquisition module is used for acquiring virtual asset acquisition data of each user in a user set in a preset period; wherein the preset period is calculated on the day of user creation or from the day when the user first acquired the virtual asset;
the time tag vector establishing module is used for establishing a time tag vector for the user according to the virtual asset acquisition data; when taking a natural day as a unit, setting 1 if a user acquires a virtual asset on a certain day in a preset period; if the virtual asset is not obtained on a natural day, setting 0;
the suspected user set calculation module is used for screening the user according to the virtual asset acquisition data of the user to acquire a suspected user set; classifying users in the suspected user set according to the virtual asset acquisition time and/or sign-in tag data to acquire a plurality of similar user sets;
the similarity calculation module is configured to randomly select one of the users as a target user, and calculate a similarity between the target user and other users by using the time tag vector, and includes: randomly selecting a user from the suspected user set as a target user; calculating the similarity between the target user and other users in a similar user set to which the target user belongs by using a time tag vector;
the abnormal user screening module is used for screening that the users with the similarity between the abnormal user screening module and the target user being larger than a first preset threshold value are abnormal users;
the device is also used for the purpose of,
establishing the abnormal user set, acquiring each sign-in or equipment information of each abnormal user in the abnormal user set, establishing a multidimensional vector for the abnormal user, performing cosine similarity calculation on each abnormal user in the abnormal user set, and acquiring different users with the same equipment information, thereby screening a plurality of abnormal users;
and judging whether the abnormal user has misjudgment, including:
judging whether the abnormal user has a white list behavior or not, and if so, judging the abnormal user as a non-abnormal user; or alternatively
And extracting login equipment information corresponding to the virtual asset acquisition time and/or sign-in tag data of the abnormal user, and judging the abnormal user as a non-abnormal user if the replacement frequency of the login equipment information is smaller than a fourth preset threshold value.
6. A storage medium storing a computer program executable by a processor to perform the big data analysis based abnormal user identification method of any one of claims 1 to 4.
7. An electronic terminal, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the big data analysis based outlier user identification method of any one of claims 1 to 4.
CN201811025042.5A 2018-09-04 2018-09-04 Abnormal user identification method and device based on big data analysis Active CN110070383B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811025042.5A CN110070383B (en) 2018-09-04 2018-09-04 Abnormal user identification method and device based on big data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811025042.5A CN110070383B (en) 2018-09-04 2018-09-04 Abnormal user identification method and device based on big data analysis

Publications (2)

Publication Number Publication Date
CN110070383A CN110070383A (en) 2019-07-30
CN110070383B true CN110070383B (en) 2024-04-05

Family

ID=67365828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811025042.5A Active CN110070383B (en) 2018-09-04 2018-09-04 Abnormal user identification method and device based on big data analysis

Country Status (1)

Country Link
CN (1) CN110070383B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110910198A (en) * 2019-10-16 2020-03-24 支付宝(杭州)信息技术有限公司 Abnormal object early warning method and device, electronic equipment and storage medium
CN111049838B (en) * 2019-12-16 2022-05-13 铭迅(北京)信息技术有限公司 Black product equipment identification method and device, server and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106469413A (en) * 2015-08-20 2017-03-01 深圳市腾讯计算机系统有限公司 A kind of data processing method of virtual resource and device
CN106649517A (en) * 2016-10-17 2017-05-10 北京京东尚科信息技术有限公司 Data mining method, device and system
CN107146085A (en) * 2017-05-25 2017-09-08 腾讯科技(深圳)有限公司 A kind of abnormal application behavior monitoring method and apparatus
CN107958382A (en) * 2017-12-06 2018-04-24 北京小度信息科技有限公司 Abnormal behaviour recognition methods, device, electronic equipment and storage medium
CN108073945A (en) * 2017-11-13 2018-05-25 珠海金山网络游戏科技有限公司 A kind of method and apparatus that density anticipation game studios are logged in based on equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9563921B2 (en) * 2013-03-13 2017-02-07 Opera Solutions U.S.A., Llc System and method for detecting merchant points of compromise using network analysis and modeling
US20180218369A1 (en) * 2017-02-01 2018-08-02 Google Inc. Detecting fraudulent data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106469413A (en) * 2015-08-20 2017-03-01 深圳市腾讯计算机系统有限公司 A kind of data processing method of virtual resource and device
CN106649517A (en) * 2016-10-17 2017-05-10 北京京东尚科信息技术有限公司 Data mining method, device and system
CN107146085A (en) * 2017-05-25 2017-09-08 腾讯科技(深圳)有限公司 A kind of abnormal application behavior monitoring method and apparatus
CN108073945A (en) * 2017-11-13 2018-05-25 珠海金山网络游戏科技有限公司 A kind of method and apparatus that density anticipation game studios are logged in based on equipment
CN107958382A (en) * 2017-12-06 2018-04-24 北京小度信息科技有限公司 Abnormal behaviour recognition methods, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110070383A (en) 2019-07-30

Similar Documents

Publication Publication Date Title
AU2019232799A1 (en) Service processing method and apparatus
CN108876213B (en) Block chain-based product management method, device, medium and electronic equipment
CN106022349B (en) Method and system for device type determination
CN110009365B (en) User group detection method, device and equipment for abnormally transferring electronic assets
US20200250675A1 (en) Fraud Detection Based on Community Change Analysis Using a Machine Learning Model
US20200250743A1 (en) Fraud Detection Based on Community Change Analysis
CN104376452A (en) System and method for managing payment success rate on basis of international card payment channel
CN111931047B (en) Artificial intelligence-based black product account detection method and related device
CN110070383B (en) Abnormal user identification method and device based on big data analysis
CN113538154A (en) Risk object identification method and device, storage medium and electronic equipment
CN109544392B (en) Method, system, device and medium for insurance order processing
CN114844792A (en) Dynamic monitoring method, device, equipment and storage medium based on LUA language
CN110781134A (en) Data processing method and device, computer storage medium and electronic equipment
CN114358147A (en) Training method, identification method, device and equipment of abnormal account identification model
CN110689425A (en) Method and device for pricing quota based on income and electronic equipment
JP7170689B2 (en) Output device, output method and output program
CN110363583B (en) Method and device for creating user consumption behavior label based on position information and electronic equipment
US20210165907A1 (en) Systems and methods for intelligent and quick masking
CN111210109A (en) Method and device for predicting user risk based on associated user and electronic equipment
CN112541765A (en) Method and apparatus for detecting suspicious transactions
CN111177099A (en) Data processing method and device of business system, electronic equipment and storage medium
CN112347457A (en) Abnormal account detection method and device, computer equipment and storage medium
CN110555763A (en) financial data processing method and device based on block chain
CN113591900A (en) Identification method and device for high-demand response potential user and terminal equipment
CN115204888A (en) Target account identification method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant