CN108734366B - User identification method and system, nonvolatile storage medium and computer system - Google Patents
User identification method and system, nonvolatile storage medium and computer system Download PDFInfo
- Publication number
- CN108734366B CN108734366B CN201710274401.XA CN201710274401A CN108734366B CN 108734366 B CN108734366 B CN 108734366B CN 201710274401 A CN201710274401 A CN 201710274401A CN 108734366 B CN108734366 B CN 108734366B
- Authority
- CN
- China
- Prior art keywords
- user
- information
- target
- user group
- group information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Marketing (AREA)
- Accounting & Taxation (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Theoretical Computer Science (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Administration (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present disclosure provides a user identification method, including: acquiring first user information of a target user, wherein the target user belongs to a target user group, and a first similarity between transaction data of users in the target user group meets a similarity threshold; acquiring first user group information of a target user group, wherein the first user group information at least comprises information for describing the relationship between second user information of each user; and identifying the target user according to the first user information and the first user group information. The present disclosure also provides a user identification system, a non-volatile storage medium, and a computer program.
Description
Technical Field
The present disclosure relates to the field of data processing, and more particularly, to a user identification method and system, a non-volatile storage medium, and a computer system.
Background
With the rapid development of artificial intelligence, electronic commerce and big data systems, in the field of electronic commerce, competition among enterprises is increasingly fierce, and even a malicious competition phenomenon occurs, so that potential risks are caused to competitors. For example, to hit or suppress a competitor, enterprise a may impersonate a user, and perform malicious ordering, malicious inventory occupation, and the like on the e-commerce platform for the commodities of enterprise B, so as to pose a potential risk to enterprise B. In order to reduce or eliminate such potential risks, enterprises need to comprehensively and effectively identify malicious users with potential risks such as malicious orders, malicious inventory occupation, and the like.
Disclosure of Invention
In view of this, the present disclosure provides a user identification method and a system thereof capable of comprehensively and effectively identifying a malicious user.
One aspect of the present disclosure provides a user identification method, including: acquiring first user information of a target user, wherein the target user belongs to a target user group, and a first similarity between transaction data of users in the target user group meets a similarity threshold; acquiring first user group information of the target user group, wherein the first user group information at least comprises information for describing the relationship between second user information of each user; and identifying the target user according to the first user information and the first user group information.
According to an embodiment of the present disclosure, identifying the target user according to the first user information and the first user group information includes: loading a user identification model; and inputting the first user information and the first user group information into the user identification model so that the user identification model identifies the target user based on the first user information and the first user group information.
According to an embodiment of the present disclosure, before loading the user identification model, the method further includes: acquiring a user training sample; acquiring third user information of each user in the user training sample; acquiring second user group information of the user training sample, wherein the second user group information at least comprises information for describing the relationship between the third user information; and training according to the third user information and the second user group information to obtain the user identification model.
According to an embodiment of the present disclosure, after training is performed according to the third user information and the second user group information to obtain the user recognition model, the method further includes: obtaining a user test sample; obtaining fourth user information of each user in the user test sample; obtaining third user group information of the user test sample, wherein the third user group information at least comprises information for describing the relationship between the fourth user information; inputting the fourth user information and the third user group information into the user identification model, so that the user identification model identifies each user in the user test sample based on the fourth user information and the third user group information to obtain an identification result; and verifying whether the user identification model can accurately identify the user according to the identification result.
According to an embodiment of the present disclosure, before obtaining the first user group information of the target user group, the method further includes: acquiring first transaction data of the target user; acquiring second transaction data of at least one designated user; calculating a second similarity between the first transaction data and the second transaction data; judging whether the second similarity meets the similarity threshold value; and if so, taking the target user and the at least one designated user as users in the target user group to determine the target user group.
According to an embodiment of the disclosure, the obtaining of the first transaction data of the target user includes obtaining transaction data generated when the target user performs a transaction within a preset time period.
Another aspect of the present disclosure provides a user identification system including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring first user information of a target user, the target user belongs to a target user group, and a first similarity between transaction data of users in the target user group meets a similarity threshold; a second obtaining module, configured to obtain first user group information of the target user group, where the first user group information at least includes information used to describe a relationship between second user information of each user; and the identification module is used for identifying the target user according to the first user information and the first user group information.
According to an embodiment of the present disclosure, the identification module includes: the loading unit is used for loading the user identification model; and an input unit configured to input the first user information and the first user group information into the user identification model so that the user identification model identifies the target user based on the first user information and the first user group information.
According to an embodiment of the present disclosure, the above system further includes: the third acquisition module is used for acquiring a user training sample before loading the user identification model; the fourth acquisition module is used for acquiring third user information of each user in the user training sample; a fifth obtaining module, configured to obtain second user group information of the user training sample, where the second user group information at least includes information used for describing a relationship between the third user information; and the training module is used for training according to the third user information and the second user group information to obtain the user identification model.
According to an embodiment of the present disclosure, the system further includes: a sixth obtaining module, configured to obtain a user test sample after training is performed according to the third user information and the second user group information to obtain the user identification model; a seventh obtaining module, configured to obtain fourth user information of each user in the user test sample; an eighth obtaining module, configured to obtain third user group information of the user test sample, where the third user group information at least includes information used for describing a relationship between the fourth user information; an input module, configured to input the fourth user information and the third user group information into the user identification model, so that the user identification model identifies each user in the user test sample based on the fourth user information and the third user group information, and obtains an identification result; and the verification module is used for verifying whether the user identification model can accurately identify the user according to the identification result.
According to an embodiment of the present disclosure, the above system further includes: a ninth obtaining module, configured to obtain first transaction data of the target user before obtaining first user group information of the target user group; the tenth acquisition module is used for acquiring second transaction data of at least one designated user; the calculation module is used for calculating a second similarity of the first transaction data and the second transaction data; the judging module is used for judging whether the second similarity meets the similarity threshold value; and a determining module, configured to determine the target user group by using the target user and the at least one designated user as users in the target user group if yes.
According to an embodiment of the disclosure, the ninth obtaining module is further configured to obtain transaction data generated when the target user performs a transaction within a preset time period.
Another aspect of the disclosure provides a non-volatile storage medium storing computer-executable instructions for implementing the method as described above when executed.
Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.
Another aspect of the present disclosure provides a computer system, including: a memory; and a processor coupled to the memory, wherein the processor is configured to perform the user identification method as described above based on instructions stored in the memory.
According to the embodiment of the disclosure, by adopting the technical means of identifying the users by combining the user information of the target user and the user group information of the similar group (namely the target user group) where the target user is located, on one hand, the defect that malicious users cannot be comprehensively and effectively identified due to the fact that only single-dimension user information is used for identification is overcome, on the other hand, the defect that normal users with similar purchasing behaviors are possibly mistakenly identified as the malicious users due to the fact that only user group information is used for identification is overcome, so that the technical problem that potential risks are brought to enterprise operation due to malicious ordering and malicious inventory occupation of the malicious users is at least partially reduced, and the technical effect of comprehensively and effectively identifying the malicious users to prevent the malicious ordering and the malicious inventory occupation of the malicious users is achieved.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of the embodiments of the present disclosure with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an application scenario of a user identification method and a system thereof according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a user identification method according to an embodiment of the present disclosure;
FIG. 3A schematically illustrates a flow diagram for identifying users based on user information and user group information, in accordance with an embodiment of the present disclosure;
FIG. 3B schematically shows a flow diagram for training a user recognition model according to an embodiment of the present disclosure;
FIG. 3C schematically illustrates a flow diagram for verifying a user identification model according to an embodiment of the present disclosure;
FIG. 3D schematically illustrates a flow chart for determining a target user group according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a block diagram of a user identification system according to an embodiment of the present disclosure; and
fig. 5 schematically shows a block diagram of a user identification system according to another embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The words "a", "an" and "the" and the like as used herein are also intended to include the meanings of "a plurality" and "the" unless the context clearly dictates otherwise. Furthermore, the terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
Accordingly, the techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable medium having instructions stored thereon for use by or in connection with an instruction execution system. In the context of this disclosure, a computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the instructions. For example, the computer readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the computer readable medium include: magnetic storage devices such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.
The embodiment of the disclosure provides a user identification method and a system thereof. The method comprises an information acquisition process and a user identification process. In the information obtaining process, it is necessary to obtain user information of a target user (i.e., an identified user) and user group information of a similar group (i.e., a target user group) to which the target user belongs. After the information acquisition is completed, a user identification process is entered, and at this time, the target user may be identified according to the two acquired information, for example, whether the target user is a malicious user is determined.
Fig. 1 schematically illustrates an application scenario of a user identification method and a system thereof according to an embodiment of the present disclosure.
As shown in fig. 1, in this application scenario, a plurality of users, such as user a, user B, user C, and user D, purchase products using a shopping platform, where user a purchases products a, B, and C, user B purchases products a and C, user C purchases products B and C, and user D purchases products C, D, and e. In order to avoid risks and prevent some users from maliciously ordering, maliciously occupying inventory and the like, when facing the users, the enterprises generally need to identify which users are malicious users and which users are normal users.
The malicious user refers to an abnormal transaction user in the field of electronic commerce, such as a potential risk user with malicious operation behaviors of ordering, occupying inventory and the like.
Fig. 2 schematically shows a flow chart of a user identification method according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S201 to S203, in which:
operation S201 is performed to obtain first user information of a target user, where the target user belongs to a target user group, and a first similarity between transaction data of users in the target user group satisfies a similarity threshold.
It should be noted that the target user refers to a user to be identified, and he (or she) may be a normal user or a malicious user. Each user has own user information, wherein the user information includes but is not limited to information related to registration, login and coupon of an account number used by the user to conduct transaction. Typically, different users have different user information. In other words, different normal users generally have no regularity in the aspects of registration time, login time, coupon getting time and the like, but different malicious users often have obvious regularity in the aspects of registration time, login time, coupon getting time and the like.
The user group is the similar group. For any user group, all users in the user group tend to show some similarity in transactions. For example, all users in a group of users have similar trading behavior, e.g., they tend to buy several of the same goods, and the amount of purchases may be similar.
Malicious users tend to have similar transaction behaviors, but since normal users may also have similar transaction behaviors, users in one user group may have 3 cases: in case 1, there may be both normal users and malicious users; case 2, there may be only normal users; in case 3, there may be only malicious users.
In operation S202, first user group information of the target user group is obtained, where the first user group information at least includes information for describing a relationship between second user information of each user.
It should be noted that a user group may include a plurality of users, each user has its own user information, and the user group information may be information of the relationship between the user information of all users in the user group, and may be generated from the user information. The second user information refers to a sum of user information of all users in the target user group.
When the users in a user group meet the above condition 1, the user information of each user in the user group has a certain regularity. For example, some users use the same IP address when registering accounts, the registration time is the same or similar, the login time is the same or meets a certain period, and so on.
When the users in a user group conform to the above case 2, the user information of the users in the user group does not have any regularity. For example, different IP addresses are used when a user registers an account, and registration time and login time are different.
When the users in a user group meet the above condition 3, there will be obvious regularity between the user information of the users in the user group. For example, all users register accounts using the same IP address and register in batches within a very short time, their login time is generally the same or meets a certain period, and so on.
In operation S203, a target user is identified according to the first user information and the first user group information.
In the course of implementing the inventive concept, the inventors found that a user identification scheme is provided in the related art, that is, identification is performed only according to user information. The dimensionality of the identification basis is single, regularity shown on user information among malicious users is ignored, and therefore the malicious users cannot be identified comprehensively and effectively.
In the process of implementing the inventive concept, the inventor finds that another user identification scheme is provided in the related art, namely, the identification is performed only according to the user group information. This scheme is not sufficient to accurately identify potentially malicious users because normal users may also have similar user groups, and normal users may be mistakenly identified as malicious users based only on group-related information.
Compared with the foregoing related technologies, in the embodiment of the present disclosure, by adopting a technical means of identifying a user by combining user information of a target user and user group information of a similar group (i.e., a target user group) in which the target user is located, on one hand, a defect that a malicious user cannot be identified comprehensively and effectively due to identification only using user information of a single dimension is overcome, and on the other hand, a defect that a normal user with similar purchasing behavior may be mistakenly identified as a malicious user due to identification only using user group information is overcome, so that a technical problem that potential risks are brought to enterprise operations due to malicious ordering, malicious inventory occupation, and the like of the malicious user is at least partially reduced, and a technical effect of comprehensively and effectively identifying the malicious user to prevent malicious ordering, malicious inventory occupation, and the like of the malicious user is achieved.
The method shown in fig. 2 is further described with reference to fig. 3A-3D in conjunction with specific embodiments.
As an alternative embodiment, identifying the target user according to the first user information and the first user group information may include various ways. For example, the comparison is performed directly based on the two kinds of information and the relevant information of the malicious user. For another example, a model for identifying a malicious user may be trained in advance, and the model is loaded directly for identification in the presence of an identification task. Compared with the former, the latter has the advantages that a user identification model is trained in advance, so that the identification purpose can be realized only by loading the model and inputting corresponding user information and user group information during each identification, and the identification efficiency can be improved.
Fig. 3A schematically illustrates a flow chart for identifying a user based on user information and user group information according to an embodiment of the disclosure.
As shown in fig. 3A, identifying the target user according to the first user information and the first user group information may include operations S301 to S302, in which:
operation S301, load a user identification model; and
in operation S302, the first user information and the first user group information are input into the user identification model, so that the user identification model identifies the target user based on the first user information and the first user group information.
It should be noted that the user recognition model is obtained by using a special user training sample to train in advance and stored in the local or cloud, and is directly loaded when in use. In the embodiment of the present disclosure, the recognition basis of the user recognition model may include, but is not limited to: user information of the user and user group information of a user group to which the user belongs.
In addition, for the user to be identified, the user information and the corresponding user group information thereof may be stored in the wide table in advance as the wide table.
By using the scheme, the first user information and the first user group information of the target user to be identified can be loaded from the wide table and input into the user identification model, and the user identification model classifies the target user according to the input related information based on the identification of the user identification model, so that whether the target user is a normal user or a malicious user is determined.
According to the embodiment of the disclosure, because a user identification technical means is adopted for carrying out user identification, malicious user identification is carried out by taking user information and user group information as identification bases, the purpose of processing large-scale data can be realized, and the technical effects of simplifying an identification flow and improving identification efficiency are achieved.
As an optional embodiment, for the purposes of simplifying the recognition process and improving the recognition efficiency, before loading the user recognition model, the method may further include: a user identification model capable of comprehensively and effectively identifying a user is trained. The training of the user recognition model may include various ways/means, which are not limited herein. For example, the computational engine Apache Spark can be used and trained using the random forest algorithm provided by the Spark framework's machine algorithm package (m1) and some of the extensible machine learning domain classical algorithms provided by Mahout. Wherein:
apache Spark is a fast general-purpose computing engine designed specially for large-scale data processing, and can be used for distributed stream computing, machine model training and graph computing. The Spark ml package is the machine model training module therein.
The scalable machine learning domain classical algorithms provided by Mahout may include, but are not limited to: clustering, classification, recommendation filtering, frequent sub-item mining, etc.
A random forest algorithm is a classifier that contains a plurality of decision trees whose output classes are dependent on the mode of the class output by the individual tree. The data modeling using the random forest algorithm provided by the Spark ml packet comprises: reading model training data into a Spark memory; setting a target variable, selecting characteristic data, selecting a random forest algorithm and reasonably setting parameters; performing model training based on the training set, and fitting a random forest model; and outputting the potential malicious user identification model.
FIG. 3B schematically shows a flow diagram for training a user recognition model according to an embodiment of the present disclosure.
As shown in fig. 3B, before loading the user recognition model, the method may further include operations S401 to S404, where:
operation S401, acquiring a user training sample;
operation S402, acquiring third user information of each user in the user training sample;
operation S403, acquiring second user group information of the user training sample, where the second user group information at least includes information for describing a relationship between third user information; and
and operation S404, training according to the third user information and the second user group information to obtain a user identification model.
It should be noted that the user training sample may also be referred to as a training set. All users in the training set can be malicious users, and at this time, the user group information corresponding to the training set (i.e., the user information of all users in the training set) meets a specific rule. All users in the training set may also be normal users, and at this time, the user group information corresponding to the training set (i.e., the user information of all users in the training set) does not satisfy any rule. The third user information is the sum of the user information of the users in the training set.
According to the embodiment of the disclosure, because a technical means of training the user identification model by using a special training set is adopted, the technical effect that the user identification model can comprehensively and effectively identify malicious users based on user information and user group information is achieved.
As an alternative embodiment, in order to ensure that the user can be identified without error, after training is performed according to the third user information and the second user group information to obtain the user identification model, the method may further include a relevant manner/means for verifying whether the user identification model is valid (i.e., whether the user can be identified accurately). The verifying whether the user identification model is valid may include various ways/means, which are not limited herein.
FIG. 3C schematically shows a flow diagram for validating a user identification model according to an embodiment of the disclosure.
As shown in fig. 3C, after training is performed according to the third user information and the second user group information to obtain the user recognition model, the method may further include operations S501 to S505, where:
operation S501, acquiring a user test sample;
operation S502, obtaining fourth user information of each user in the user test sample;
operation S503, acquiring third user group information of the user test sample, where the third user group information at least includes information for describing a relationship between fourth user information;
operation S504, inputting the fourth user information and the third user group information into the user identification model, so that the user identification model identifies each user in the user test sample based on the fourth user information and the third user group information, and obtains an identification result; and
in operation S505, it is verified whether the user recognition model can accurately recognize the user according to the recognition result.
The user test sample may also be referred to as a test set. All users in the test set can be malicious users, normal users, and both malicious and normal users. However, the identity of each user in the test set, whether a normal user or a malicious user, is certain. The fourth user information is a sum of user information of users in the training set.
Specifically, assume that the test set contains user a and user B, and that both user a and user B are malicious users. In the testing process, the user information of the user A, the user group information of the test set, the user information of the user B and the user group information of the test set are respectively input into a tested user identification model, and after identification, if the output result shows that the user A and the user B are both malicious users, the user identification model can accurately identify the users; if the output result shows that the user A is a normal user and the user B is a malicious user, or if the output result shows that the user A is a malicious user and the user B is a normal user, the accuracy of the user identification model for identifying the user is not high; and if the output result shows that the user A and the user B are normal users, the user identification model basically cannot accurately identify the users.
Further, after the accuracy of the user identification model for identifying the user is verified, if the accuracy is correct, the user identification model can be used for identifying the user subsequently, and if the accuracy is not correct, the user identification model needs to be corrected or retrained subsequently and then used, so that the accuracy of identifying the user can be improved.
According to the embodiment of the disclosure, because a technical means of testing the user identification model by using a special test set is adopted, the technical effect of verifying whether the user identification model identifies the user accurately so as to ensure that the user identification model can comprehensively and effectively identify the user is achieved.
As an alternative embodiment, before obtaining the first user group information of the target user group, the method may further include determining the target user group (i.e. determining the similar group)
Fig. 3D schematically shows a flow chart for determining a target user group according to an embodiment of the present disclosure.
As shown in fig. 3D, before acquiring the first user group information of the target user group, the method may further include operations S601 to S605, where:
operation S601, acquiring first transaction data of a target user;
operation S602, acquiring second transaction data of at least one designated user;
operation S603 of calculating a second similarity between the first transaction data and the second transaction data;
operation S604, determining whether the second similarity satisfies a similarity threshold; and
in operation S605, if yes, the target user and the at least one designated user are taken as users in the target user group to determine the target user group.
It should be noted that the transaction data may include, but is not limited to, the name of the purchased goods, the purchase amount, and the like. In the embodiment of the present disclosure, the second similarity manner/means for calculating the two sets of transaction data may include various manners, which are not limited herein. For example, the similarity between the transaction data of the users can be calculated by a RowSimilarity method provided in Mahout, and the similarity between the transaction data of the users can also be calculated by a collaborative filtering model of the users. Further, calculating the similarity between the transaction data of the users through the collaborative filtering model of the users may use at least the following formula: jaccard formula, cosine similarity calculation formula, etc.
User-based collaborative filtering algorithms are typically applied in recommendation scenarios, for example, recommending user a's item lists of other users with similar preferences as user a. In the disclosure, the algorithm is mainly used to obtain a user group with high similarity to the users, and then the user group information of the user group is generated according to the user information of each user in the user group, so as to identify malicious users in the user group.
Generally, assuming that a user has recently purchased a certain item, the user's score for the item is set to 1, otherwise it is set to 0. And calculating the similarity between the target user and other users, and finding out a target user group with high similarity with the target user. For example, given users u and v, let Nu denote the set of items that user u has purchased, and let Nv denote the set of items that user v has purchased.
And calculating the similarity of the users u and v through the Jaccard formula, wherein the similarity is shown in formula (1).
Through cosine similarity calculation, as shown in formula (2).
The following describes the calculation of user similarity based on the collaborative filtering algorithm of the user by taking the user transaction record shown in fig. 1 as an example. As shown in fig. 1, the user a purchased the goods { a, B, d }, the user B purchased the goods { a, c }, and the similarity between the users a and B is calculated by using the cosine similarity as shown in formula (3).
According to the embodiment of the disclosure, the similar relation technical means among the users is calculated by adopting algorithms such as collaborative filtering, the similar user group is determined, and then the corresponding user group information is generated on the similar user group based on the user information of each user in the user group, so that malicious users can be identified more effectively.
As an alternative embodiment, the obtaining of the first transaction data of the target user includes obtaining transaction data generated when the target user performs a transaction within a preset time period.
According to the embodiment of the disclosure, because a technical means of acquiring the transaction data in the preset time period is adopted, the technical problems of large data volume, redundant data and the like caused by acquiring the transaction data in all time periods are at least partially overcome, and the technical effect of reasonably acquiring meaningful transaction data is further achieved.
Fig. 4 schematically shows a block diagram of a user identification system according to an embodiment of the present disclosure.
As shown in fig. 4, the user identification system includes: a first obtaining module 410, a second obtaining module 420, and an identifying module 430.
The first obtaining module 410 is configured to obtain first user information of a target user, where the target user belongs to a target user group, and a first similarity between transaction data of users in the target user group satisfies a similarity threshold.
It should be noted that the target user refers to a user to be identified, and he (or she) may be a normal user or a malicious user. Each user has own user information, wherein the user information includes but is not limited to information related to registration, login and coupon of an account number used by the user to conduct transaction. Typically, different users have different user information. In other words, different normal users generally have no regularity in the aspects of registration time, login time, coupon getting time and the like, but different malicious users often have obvious regularity in the aspects of registration time, login time, coupon getting time and the like.
The user groups are similar groups. For any user group, all users in the user group tend to show some similarity in transactions. For example, all users in a group of users have similar trading behavior, e.g., they tend to buy several of the same goods, and the amount of purchases may be similar.
Malicious users tend to have similar transaction behaviors, but since normal users may also have similar transaction behaviors, users in one user group may have 3 cases: in case 1, there may be both normal users and malicious users; case 2, there may be only normal users; in case 3, there may be only malicious users.
The second obtaining module 420 is configured to obtain first user group information of the target user group, where the first user group information at least includes information for describing a relationship between second user information of each user.
It should be noted that a user group may include a plurality of users, each user has its own user information, and the user group information may be information of the relationship between the user information of all users in the user group, and may be generated from the user information. The second user information refers to a sum of user information of all users in the target user group.
When the users in a user group meet the above condition 1, the user information of each user in the user group has a certain regularity. For example, some users use the same IP address when registering accounts, the registration time is the same or similar, the login time is the same or meets a certain period, and so on.
When the users in a user group conform to the above case 2, the user information of the users in the user group does not have any regularity. For example, different IP addresses are used when a user registers an account, and registration time and login time are different.
When the users in a user group meet the above condition 3, there will be obvious regularity between the user information of the users in the user group. For example, all users register accounts using the same IP address and register in batches within a very short time, their login time is generally the same or meets a certain period, and so on.
The identifying module 430 is configured to identify a target user according to the first user information and the first user group information.
In the process of implementing the concept of the present invention, the inventor finds that a user identification scheme is provided in the related art, that is, identification is performed only according to user information. The dimensionality of the identification basis is single, regularity shown on user information among malicious users is ignored, and therefore the malicious users cannot be identified comprehensively and effectively.
In the process of implementing the concept of the present invention, the inventor finds that another user identification scheme is provided in the related art, that is, identification is performed only according to user group information. This scheme is not sufficient to accurately identify potentially malicious users because normal users may also have similar user groups, and normal users may be mistakenly identified as malicious users based only on group-related information.
Compared with the related technologies, the embodiment of the disclosure adopts a technical means of identifying the user by combining the user information of the target user and the user group information of the similar group (i.e. the target user group) where the target user is located, so that on one hand, the defect that malicious users cannot be identified comprehensively and effectively due to the fact that only the user information with a single dimension is used for identification is overcome, and on the other hand, the defect that normal users with similar purchasing behaviors are mistakenly identified as malicious users due to the fact that only the user group information is used for identification is overcome, so that the technical problem that potential risks are brought to enterprise operation due to malicious ordering, malicious inventory occupation and the like of the malicious users is at least partially reduced, and the technical effect of comprehensively and effectively identifying the malicious users to prevent the malicious ordering, malicious inventory occupation and the like of the malicious users is achieved.
As an alternative embodiment, the identification module may include: the loading unit is used for loading the user identification model; and an input unit for inputting the first user information and the first user group information into a user identification model to cause the user identification model to identify the target user based on the first user information and the first user group information.
As an alternative embodiment, the system may further include: the third acquisition module is used for acquiring a user training sample before loading the user identification model; the fourth acquisition module is used for acquiring third user information of each user in the user training sample; the fifth acquisition module is used for acquiring second user group information of the user training sample, wherein the second user group information at least comprises information for describing the relationship between third user information; and the training module is used for training according to the third user information and the second user group information to obtain a user identification model.
As an alternative embodiment, the system may further include: the sixth acquisition module is used for acquiring a user test sample after training is carried out according to the third user information and the second user group information to obtain a user identification model; the seventh obtaining module is used for obtaining fourth user information of each user in the user test sample; the eighth obtaining module is configured to obtain third user group information of the user test sample, where the third user group information at least includes information used for describing a relationship between fourth user information; the input module is used for inputting the fourth user information and the third user group information into the user identification model so that the user identification model identifies each user in the user test sample based on the fourth user information and the third user group information to obtain an identification result; and the verification module is used for verifying whether the user identification model can accurately identify the user according to the identification result.
As an alternative embodiment, the system may further include: the ninth acquisition module is used for acquiring first transaction data of the target user before acquiring the first user group information of the target user group; the tenth acquisition module is used for acquiring second transaction data of at least one designated user; the calculation module is used for calculating a second similarity of the first transaction data and the second transaction data; the judging module is used for judging whether the second similarity meets a similarity threshold value; and the determining module is used for taking the target user and at least one designated user as the users in the target user group under the condition of yes so as to determine the target user group.
As an optional embodiment, the ninth obtaining module may be further configured to obtain transaction data generated when the target user performs a transaction within a preset time period.
It should be noted that the implementation manner/means, the implemented functions, the solved technical problems, and the achieved technical effects of the modules/units/sub-units in the embodiments of the apparatus part are the same as or similar to the implementation manner/means, the implemented functions, the solved technical problems, and the achieved technical effects of the operations corresponding to the embodiments of the method part, and are not described herein again.
Another aspect of the disclosure provides a non-volatile storage medium storing computer-executable instructions for implementing the method as described above when executed.
Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.
Fig. 5 schematically shows a block diagram of a user identification system according to another embodiment of the present disclosure.
As shown in fig. 5, the user identification system includes a processor 510 and a computer-readable storage medium 520. The user identification system may perform the methods described above with reference to fig. 2-3D for the purpose of identifying whether a user is a malicious user.
In particular, processor 510 may include, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 510 may also include on-board memory for caching purposes. Processor 510 may be a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure described with reference to fig. 2-3D.
Computer-readable storage medium 520 may be, for example, any medium that can contain, store, communicate, propagate, or transport the instructions. For example, a readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the readable storage medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.
The computer-readable storage medium 520 may include a computer program 521, which computer program 521 may include code/computer-executable instructions that, when executed by the processor 510, cause the processor 510 to perform a method flow such as that described above in connection with fig. 2-3D and any variations thereof.
The computer program 521 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 521 may include one or more program modules, including for example 521A, modules 521B, … …. It should be noted that the division and number of modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, which when executed by processor 510, enable processor 510 to perform the method flows described above in conjunction with fig. 2-3D, for example, and any variations thereof.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.
Claims (8)
1. A user identification method, comprising:
acquiring first user information of a target user, wherein the target user belongs to a target user group, a first similarity between transaction data of users in the target user group meets a similarity threshold, the transaction data comprises names and purchase quantities of purchased commodities, and the target user group is determined by the following operations: acquiring first transaction data of the target user and second transaction data of at least one designated user, calculating second similarity of the first transaction data and the second transaction data, and then taking the target user and the at least one designated user as users in the target user group under the condition that the second similarity meets the similarity threshold value so as to determine the target user group;
acquiring first user group information of the target user group, wherein the first user group information at least comprises information for describing the relationship between second user information of each user, and the first user information and the first user group information are stored in a wide table in advance; and
identifying the target user according to the first user information and the first user group information,
wherein the acquiring of the first transaction data of the target user comprises acquiring transaction data generated when the target user performs a transaction within a preset time period,
wherein identifying the target user according to the first user information and the first user group information comprises:
loading a user identification model; and
inputting the first user information and the first user group information into the user identification model to enable the user identification model to identify the target user based on the first user information and the first user group information.
2. The method of claim 1, prior to loading the user identification model, the method further comprising:
acquiring a user training sample;
acquiring third user information of each user in the user training sample;
acquiring second user group information of the user training sample, wherein the second user group information at least comprises information for describing the relationship among the third user information; and
and training according to the third user information and the second user group information to obtain the user identification model.
3. The method of claim 2, after training based on the third user information and the second user group information to obtain the user recognition model, the method further comprising:
obtaining a user test sample;
obtaining fourth user information of each user in the user test sample;
obtaining third user group information of the user test sample, wherein the third user group information at least comprises information for describing the relationship between the fourth user information;
inputting the fourth user information and the third user group information into the user identification model, so that the user identification model identifies each user in the user test sample based on the fourth user information and the third user group information to obtain an identification result; and
and verifying whether the user identification model can accurately identify the user according to the identification result.
4. A user identification system, comprising:
the system comprises a first obtaining module, a first obtaining module and a second obtaining module, wherein the first obtaining module is used for obtaining first user information of a target user, the target user belongs to a target user group, a first similarity between transaction data of users in the target user group meets a similarity threshold, the transaction data comprises a name and a purchase quantity of a purchased commodity, and the target user group is determined by the following operations: acquiring first transaction data of the target user and second transaction data of at least one designated user, calculating second similarity of the first transaction data and the second transaction data, and then taking the target user and the at least one designated user as users in the target user group under the condition that the second similarity meets the similarity threshold value so as to determine the target user group;
a second obtaining module, configured to obtain first user group information of the target user group, where the first user group information at least includes information used to describe a relationship between second user information of each user, and the first user information and the first user group information are stored in a wide table in advance; and
an identification module for identifying the target user according to the first user information and the first user group information,
acquiring first transaction data of the target user, wherein the acquiring of the first transaction data of the target user comprises acquiring transaction data generated when the target user performs a transaction within a preset time period;
wherein the identification module comprises:
the loading unit is used for loading the user identification model; and
an input unit configured to input the first user information and the first user group information into the user identification model so that the user identification model identifies the target user based on the first user information and the first user group information.
5. The system of claim 4, further comprising:
the third acquisition module is used for acquiring a user training sample before loading the user identification model;
the fourth acquisition module is used for acquiring third user information of each user in the user training sample;
a fifth obtaining module, configured to obtain second user group information of the user training sample, where the second user group information at least includes information used for describing a relationship between the third user information; and
and the training module is used for training according to the third user information and the second user group information to obtain the user identification model.
6. The system of claim 5, further comprising:
the sixth obtaining module is used for obtaining a user test sample after training is carried out according to the third user information and the second user group information to obtain the user identification model;
a seventh obtaining module, configured to obtain fourth user information of each user in the user test sample;
an eighth obtaining module, configured to obtain third user group information of the user test sample, where the third user group information at least includes information used for describing a relationship between the fourth user information;
an input module, configured to input the fourth user information and the third user group information into the user identification model, so that the user identification model identifies, based on the fourth user information and the third user group information, each user in the user test sample to obtain an identification result; and
and the verification module is used for verifying whether the user identification model can accurately identify the user according to the identification result.
7. A non-volatile storage medium storing computer-executable instructions for implementing the user identification method of any one of claims 1 to 3 when executed.
8. A computer program comprising computer executable instructions for implementing the user identification method of any one of claims 1 to 3 when executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710274401.XA CN108734366B (en) | 2017-04-24 | 2017-04-24 | User identification method and system, nonvolatile storage medium and computer system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710274401.XA CN108734366B (en) | 2017-04-24 | 2017-04-24 | User identification method and system, nonvolatile storage medium and computer system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108734366A CN108734366A (en) | 2018-11-02 |
CN108734366B true CN108734366B (en) | 2022-09-30 |
Family
ID=63934213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710274401.XA Active CN108734366B (en) | 2017-04-24 | 2017-04-24 | User identification method and system, nonvolatile storage medium and computer system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108734366B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109587248B (en) * | 2018-12-06 | 2023-08-29 | 腾讯科技(深圳)有限公司 | User identification method, device, server and storage medium |
CN109840778A (en) * | 2018-12-21 | 2019-06-04 | 上海拍拍贷金融信息服务有限公司 | The recognition methods of fraudulent user and device, readable storage medium storing program for executing |
CN110675196A (en) * | 2019-09-27 | 2020-01-10 | 中国工商银行股份有限公司 | User identification method and device, electronic equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103853948A (en) * | 2012-11-28 | 2014-06-11 | 阿里巴巴集团控股有限公司 | User identity recognizing and information filtering and searching method and server |
CN104917739A (en) * | 2014-03-14 | 2015-09-16 | 腾讯科技(北京)有限公司 | False account identification method and device |
CN105391594A (en) * | 2014-09-03 | 2016-03-09 | 阿里巴巴集团控股有限公司 | Method and device for recognizing characteristic account number |
CN106022834A (en) * | 2016-05-24 | 2016-10-12 | 腾讯科技(深圳)有限公司 | Advertisement against cheating method and device |
CN106127505A (en) * | 2016-06-14 | 2016-11-16 | 北京众成汇通信息技术有限公司 | The single recognition methods of a kind of brush and device |
CN106330837A (en) * | 2015-06-30 | 2017-01-11 | 阿里巴巴集团控股有限公司 | Suspicious network user identification method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160203485A1 (en) * | 2015-01-08 | 2016-07-14 | Ca, Inc. | Selective authentication based on similarities of ecommerce transactions from a same user terminal across financial accounts |
-
2017
- 2017-04-24 CN CN201710274401.XA patent/CN108734366B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103853948A (en) * | 2012-11-28 | 2014-06-11 | 阿里巴巴集团控股有限公司 | User identity recognizing and information filtering and searching method and server |
CN104917739A (en) * | 2014-03-14 | 2015-09-16 | 腾讯科技(北京)有限公司 | False account identification method and device |
CN105391594A (en) * | 2014-09-03 | 2016-03-09 | 阿里巴巴集团控股有限公司 | Method and device for recognizing characteristic account number |
CN106330837A (en) * | 2015-06-30 | 2017-01-11 | 阿里巴巴集团控股有限公司 | Suspicious network user identification method and device |
CN106022834A (en) * | 2016-05-24 | 2016-10-12 | 腾讯科技(深圳)有限公司 | Advertisement against cheating method and device |
CN106127505A (en) * | 2016-06-14 | 2016-11-16 | 北京众成汇通信息技术有限公司 | The single recognition methods of a kind of brush and device |
Also Published As
Publication number | Publication date |
---|---|
CN108734366A (en) | 2018-11-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12079841B2 (en) | Configurable relevance service platform incorporating a relevance test driver | |
JP6940646B2 (en) | Information recommendation method, information recommendation device, equipment and medium | |
US11093908B2 (en) | Routing transactions to a priority processing network based on routing rules | |
US20200234324A1 (en) | Matching a Coupon to A Specific Product | |
US9800678B2 (en) | Apparatus and method for processing information and program for the same | |
US10430727B1 (en) | Systems and methods for privacy-preserving generation of models for estimating consumer behavior | |
US20150019394A1 (en) | Merchant information correction through transaction history or detail | |
US20160328759A1 (en) | Method, apparatus, and a non-transitory computer-readable recording medium for providing review sharing service | |
US8396935B1 (en) | Discovering spam merchants using product feed similarity | |
US20200334694A1 (en) | Behavioral data analytics platform | |
CN108734366B (en) | User identification method and system, nonvolatile storage medium and computer system | |
WO2019179030A1 (en) | Product purchasing prediction method, server and storage medium | |
KR102083624B1 (en) | System and method for analyzing interest object, and apparatus applied to the same | |
CN111768258A (en) | Method, device, electronic equipment and medium for identifying abnormal order | |
US20190333077A1 (en) | Purchase information utilization system, purchase information utilization method, and program | |
WO2013173194A1 (en) | A user recommendation method and device | |
US10163144B1 (en) | Extracting data from a catalog | |
US11392919B2 (en) | Credit data analysis | |
CN108985755A (en) | A kind of account state identification method, device and server | |
CN111275071A (en) | Prediction model training method, prediction device and electronic equipment | |
CN110288365B (en) | Data processing method and system, computer system and computer readable storage medium | |
US20170091792A1 (en) | Methods and apparatus for estimating potential demand at a prospective merchant location | |
KR20220117676A (en) | Review reliability validation device and method of thereof | |
US20210056561A1 (en) | Method and system for identifying electronic devices of genuine customers of organizations | |
CN112131465A (en) | Activity information matching method, device, medium and terminal equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |