CN109714636B - User identification method, device, equipment and medium - Google Patents
User identification method, device, equipment and medium Download PDFInfo
- Publication number
- CN109714636B CN109714636B CN201811573563.4A CN201811573563A CN109714636B CN 109714636 B CN109714636 B CN 109714636B CN 201811573563 A CN201811573563 A CN 201811573563A CN 109714636 B CN109714636 B CN 109714636B
- Authority
- CN
- China
- Prior art keywords
- user
- probability
- behavior data
- cheating
- standard value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention discloses a user identification method, a device, equipment and a medium, wherein the method comprises the following steps: determining online behavior data of a user; determining the probability that the user is a target user based on the online behavior data and a preset identification model; wherein, the target users include users who perform specific online behaviors, and the determining the online behavior data of the users includes: counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; or counting the number of used devices when the user logs in the live broadcast platform within a set time period. By adopting the technical scheme, the identification accuracy of the target user can be improved.
Description
Technical Field
Embodiments of the present invention relate to the field of computers, and in particular, to a user identification method, apparatus, device, and medium.
Background
On a live broadcast platform, in order to obtain benefits, people's cheating behaviors such as false bullet screen brushing, attention brushing and the like generally exist.
The cheating behavior can cause the problems of network blockage, overlarge pressure of a live broadcast platform server and the like, and great influence is caused on the live broadcast ecological environment of the platform. Therefore, in order to reduce the negative influence caused by the cheating behaviors, it is significant to find the users with cheating suspicions by adopting a reasonable method. The existing identification method for cheating users usually adopts strong rules which are usually set according to experience of service personnel, and threshold values of some indexes are not set in a reasonable principle, so that the randomness is relatively high; for example, if the number of IP addresses used by the user is empirically determined to be 10 or more, the user is determined to be a suspected cheating user, and if the number of IP addresses used by a certain user is 9, the certain user is determined not to be a suspected cheating user. Clearly the above rules for identifying cheating users are not reasonable.
Disclosure of Invention
The invention provides a user identification method, a user identification device, user identification equipment and a user identification medium.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
in a first aspect, an embodiment of the present invention provides a user identification method, where the method includes:
determining online behavior data of a user;
determining the probability that the user is a target user based on the online behavior data and a preset identification model;
wherein the target users comprise users performing specific online behaviors;
the determining the online behavior data of the user comprises the following steps:
counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; alternatively, the first and second electrodes may be,
and counting the number of the used devices when the user logs in the live broadcast platform in a set time period.
In a second aspect, an embodiment of the present invention provides a user identification apparatus, where the apparatus includes:
the determining module is used for determining online behavior data of the user;
the recognition module is used for determining the probability that the user is a target user based on the online behavior data and a preset recognition model;
wherein the target users comprise users performing specific online behaviors;
the determining module is specifically configured to: counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; alternatively, the first and second electrodes may be,
and counting the number of the used devices when the user logs in the live broadcast platform in a set time period.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
a storage device for storing a plurality of programs;
when at least one of the plurality of programs is executed by the one or more processors, the one or more processors are caused to implement the user identification method of the first aspect described above.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the user identification method according to the first aspect.
The user identification method provided by the embodiment of the invention determines the online behavior data of the user, and specifically comprises the following steps: counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; or counting the number of used devices when the user logs in a live broadcast platform within a set time period, and determining the probability that the user is a target user based on the online behavior data and a preset recognition model; the target user comprises a technical means of a user performing specific online behaviors, and the identification accuracy of the target user is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the contents of the embodiments of the present invention and the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a user identification method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a method for determining values of constants a and b according to a first embodiment of the present invention;
FIG. 3 is a diagram illustrating linear division of a log-probability ratio curve according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating linear division of another log-probability ratio curve according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a user identification device according to a second embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.
Detailed Description
In order to make the technical problems solved, technical solutions adopted and technical effects achieved by the present invention clearer, the technical solutions of the embodiments of the present invention will be described in further detail below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Fig. 1 is a schematic flow chart of a user identification method according to an embodiment of the present invention. The user identification method disclosed in this embodiment may be suitable for identifying a user engaged in a certain online behavior, for example, identifying a user brushing a bullet screen or a user brushing attention in a live broadcast room, and may be executed by a user identification device, where the device may be implemented by software and/or hardware and is generally integrated in a terminal, such as a smart phone or a computer. Referring specifically to fig. 1, the method may include the steps of:
That is, the online behavior data includes the IP address number or the device number. Wherein, the set time period may be a specific day, a specific week or a specific month. The method comprises the steps of determining that user online behavior data can be collected through behavior dotting, wherein the behavior dotting is to insert a dot embedding code into a place (such as a click event and page jump) where a dot embedding is needed in a project for counting user behaviors, then the user online behaviors are recorded in a user behavior log, and determining a user performing specific online behaviors by collecting the user behavior log and inquiring the user behaviors, wherein the specific online behaviors are specifically user behavior which sends bullet screen information for a main broadcast A. At the same time, network environment information (such as IP address) used by the user for online behavior and terminal equipment information (such as terminal equipment ID) used by the user for online behavior are recorded in the user behavior log. The user behavior log can be directly obtained through a data acquisition interface at a mobile terminal (such as a smart phone).
And 120, determining the probability that the user is the target user based on the online behavior data and a preset identification model.
Wherein the target users comprise users performing specific online behaviors.
The specific online behavior can be a positive behavior worthy of advocation, such as online donation, and can also be a negative behavior needing to be resisted, such as a barrage brushing behavior aiming at the same anchor through a live platform or an attention brushing behavior aiming at the same anchor through the live platform. The negative behavior that needs to be resisted often has some negative effects, for example, the above-mentioned act of swiping a bullet screen through the live platform for the same anchor or swiping attention through the live platform for the same anchor often causes problems of network congestion, overstressing of the live platform server, and the like. Therefore, in order to reduce the negative impact caused by the bullet screen brushing behavior or the attention brushing behavior or to actively advocate the pursuit of a certain beneficial behavior, the present embodiment discloses a user identification method for identifying a target user engaged in the bullet screen brushing behavior or the attention brushing behavior, to give a warning or take other measures to do mortgage, or to identify a group engaged in public welfare behaviors such as donations, to make a good social atmosphere, etc., which is described by taking the example of identifying the online cheating user engaged in the bullet screen brushing behavior or the attention brushing behavior, etc.
Further, determining the probability that the user is the target user based on the online behavior data and a preset recognition model, includes:
determining the probability that the user is the target user according to the following preset identification function:
wherein, A (x) represents the probability that the user is the target user, x represents the online behavior data of the user, and a and b are set constants.
For the reason that the accuracy of identifying the target user by the identification function depends on the set values of the constants a and b, the present embodiment provides an algorithm for determining the values of the constants a and b, as shown in fig. 2, and the algorithm specifically includes the following steps:
and step 210, determining a cheating suspected user set and a non-cheating suspected user set.
Specifically, the set of cheating suspected users can be determined by receiving the complaints, that is, the users complained by the audience are the cheating suspected users, and the users not complained by the audience are the non-cheating suspected users. The behavior of each user logging in the live broadcast room can be audited by auditors to determine cheating suspected users and non-cheating suspected users, and specifically, if the auditors find that a certain user logs in the live broadcast room through 10 different devices in one day and pays attention to different anchor broadcasters, the user can be considered as the cheating suspected user. And the cheating suspected users and the non-cheating suspected users can be obtained through expert experience.
And step 220, counting the online behavior data of each user in the cheating suspected user set and the non-cheating suspected user set within a set time period.
Wherein the online behavior data may include: the number of used IP addresses when logging in a live broadcast platform in a set time period; or the number of used devices when logging in the live platform in a set time period.
And step 230, respectively calculating a first comprehensive probability that the online behavior data of each user in the cheating suspected user set is greater than the current standard value and a second comprehensive probability that the online behavior data of each user in the non-cheating suspected user set is greater than the current standard value according to each standard value in the standard values.
Wherein, each standard value includes a possible value of the online behavior data of the user, for example, when the online behavior data of the user is an IP address used when logging in the live platform in a set time period, the possible value is 1,2,3,4, … … n, that is, each standard value is: the value of 1,2,3,4, … … n, n can be set according to business experience, and is usually set to 30. When the online behavior data of the user is the number of used IP addresses when logging in the live broadcast platform in a set time period, the corresponding standard values are possible values of the number of the used IP addresses when logging in the live broadcast platform in the set time period; and when the online behavior data of the user is the number of used devices when logging in the live broadcast platform in the set time period, the corresponding standard values are possible values of the number of used devices when logging in the live broadcast platform in the set time period.
Specifically, a first comprehensive probability that online behavior data of each user in the cheating suspect user set is greater than a current standard value is calculated according to the following formula (1):
wherein p (S, v)i) Representing that the online behavior data of each user in the cheating suspect user set S is larger than the current standard value viFirst integrated probability of (1), N (S, v > v)i) Representing that the network uplink in the cheating suspicion user set S is that the data v is larger than the current standard value viN (S) represents the total number of users in the set S of cheating suspected users.
And the calculation mode of the second comprehensive probability that the online behavior data of each user in the non-cheating suspected user set is greater than the current standard value is the same as the calculation mode of the first comprehensive probability that the online behavior data of each user in the cheating suspected user set is greater than the current standard value.
And 240, calculating the log probability ratio of the first comprehensive probability and the second comprehensive probability to obtain the log probability ratio corresponding to each standard value.
Specifically, the log probability ratio corresponding to each standard value is calculated according to the following formula (2):
wherein, r (v)i) Represents a corresponding standard value viLog probability ratio of p (S, v)i) Representing that the online behavior data of each user in the cheating suspect user set S is larger than the current standard value viFirst integrated probability of p (N, v)i) Representing that the online behavior data of each user in the non-cheating suspected user set N is larger than the current standard value viThe second composite probability of (1).
And 250, constructing a log probability ratio curve based on the log probability ratios corresponding to the standard values, wherein the horizontal axis represents each standard value, and the vertical axis represents the log probability ratio corresponding to each standard value.
Each standard value viAll correspond to a logarithmic probability ratio r (v)i) Thus, v can be obtainedi~r(vi) On the horizontal axis of the logarithmic probability ratio curve of (1), each standard value v is represented byiThe vertical axis represents a logarithmic probability ratio r (v) corresponding to each standard valuei)。
And step 260, performing linear fitting on the log-probability ratio curve, wherein the end point values between the corresponding linear division regions respectively correspond to the constants a and b when the fitting loss is minimum.
Specifically, the inflection point of the log probability ratio curve is determined by adopting a broken line method, and in consideration of the problem of calculation complexity, the log probability ratio curve can be divided into three sections according to the inflection point, each section adopts linear fitting, and fitting loss is calculated; and finally solving an optimal division mode with the minimum fitting loss by different division modes and solving the fitting loss under each division mode, wherein the endpoint values between the corresponding linear division modes under the optimal division mode respectively correspond to the constants a and b. Further referring to fig. 3 and fig. 4, the linear division schematic diagram of the log probability ratio curve is shown, where the two corresponding inflection points in fig. 3 are an inflection point g1 and an inflection point g2, respectively, and the log probability ratio curve is divided into three segments, namely, a curve segment 31, a curve segment 32, and a curve segment 33; the two corresponding inflection points in fig. 4 are an inflection point g3 and an inflection point g4, respectively, and the log probability ratio curve is divided into three segments, namely a curve segment 34, a curve segment 35 and a curve segment 36; if the fitting loss is the minimum in the division manner shown in fig. 4, the value of the constant a is the horizontal axis data corresponding to the inflection point g3, and the value of the constant b is the horizontal axis data corresponding to the inflection point g4, as shown in fig. 4.
Further, the fitting loss in a certain division manner is calculated according to the following formula (3):
wherein S represents a division of the log-probability ratio curve, Q (S) represents the fitting loss under the division S, rmThe horizontal axis region corresponding to the mth segment of the divided curve is shown, and for example, it is assumed that the horizontal axis region r1 corresponding to the first segment of the divided curve 31 shown in fig. 3 is [0,1 ]]The horizontal axis region r2 corresponding to the second curve 32 is [1,3 ]]The horizontal axis region r3 corresponding to the third-segment curve 33 is [3,4 ]];xiIndicating the region rmValue of (a), yiDenotes xiThe corresponding vertical axis coordinate on the curve; m represents the total number of curve segments divided, in this embodiment, the log-probability ratio curve is divided into 3 segments, l (x) represents the fitting loss function, where the square function is chosen, i.e., l (x) x2,αmAnd betamAnd (3) representing parameters obtained by estimating the m-th section of curve by a least square method, wherein the specific calculation method comprises the following steps:
and traversing all the possible division modes to obtain the minimum value of Q (S), wherein the endpoint values between the linear division areas corresponding to the minimum values of Q (S) respectively correspond to the numerical values of the constants a and b.
According to the user identification method provided by the embodiment, the set values of the constants a and b in the preset identification function are determined by adopting a specific algorithm by combining the online behavior data of each user in the cheating suspected user set and the non-cheating suspected user set within the set time period, so that the identification accuracy of the identification function on the cheating users is improved.
Example two
Fig. 5 is a schematic structural diagram of a user identification device according to a second embodiment of the present invention. Referring to fig. 5, the apparatus comprises: a determination module 510 and an identification module 520;
the determining module 510 is configured to determine online behavior data of a user; the identification module 520 is configured to determine the probability that the user is a target user based on the online behavior data and a preset identification model; wherein the target users comprise users performing specific online behaviors.
Further, the determining module 510 is specifically configured to: counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; or counting the number of used devices when the user logs in the live broadcast platform within a set time period.
Further, the identifying module 520 is specifically configured to: determining the probability that the user is the target user according to the following preset identification function:
wherein, A (x) represents the probability that the user is the target user, x represents the online behavior data of the user, and a and b are set constants.
Further, the device further comprises a calculation module, configured to determine values of the constants a and b in the preset identification function.
Further, the calculation module comprises:
the determining unit is used for determining a cheating suspected user set and a non-cheating suspected user set;
the statistical unit is used for counting the online behavior data of each user in the cheating suspected user set and the non-cheating suspected user set within a set time period;
the first calculation unit is used for respectively calculating a first comprehensive probability that the online behavior data of each user in the cheating suspected user set is larger than a current standard value and a second comprehensive probability that the online behavior data of each user in the non-cheating suspected user set is larger than the current standard value according to each standard value in the standard values;
the second calculation unit is used for calculating the log probability ratio of the first comprehensive probability and the second comprehensive probability to obtain the log probability ratio corresponding to each standard value;
the construction unit is used for constructing a log probability ratio curve based on the log probability ratio corresponding to each standard value, wherein the horizontal axis represents each standard value, and the vertical axis represents the log probability ratio corresponding to each standard value;
the fitting unit is used for performing linear fitting on the log probability ratio curve, and the end point values between the corresponding linear division regions respectively correspond to the constants a and b when the fitting loss is minimum;
and the standard values comprise possible values of the online behavior data of the user.
Further, the first calculating unit is specifically configured to: calculating a first comprehensive probability that the online behavior data of each user in the cheating suspected user set is greater than a current standard value according to the following formula:
wherein p (S, v)i) Representing that the online behavior data of each user in the cheating suspect user set S is larger than the current standard value viFirst integrated probability of (1), N (S, v > v)i) Representing that the network uplink in the cheating suspicion user set S is that the data v is larger than the current standard value viTo a userNumber, n (S), represents the total number of users in the set S of cheating suspect users.
Further, the second calculating unit is specifically configured to: calculating the log probability ratio corresponding to each standard value according to the following formula:
wherein, r (v)i) Represents a corresponding standard value viLog probability ratio of p (S, v)i) Representing that the online behavior data of each user in the cheating suspect user set S is larger than the current standard value viFirst integrated probability of p (N, v)i) Representing that the online behavior data of each user in the non-cheating suspected user set N is larger than the current standard value viThe second composite probability of (1).
The user identification device provided by this embodiment determines the set values of the constants a and b in the preset identification function by using a specific algorithm by combining the online behavior data of each user in the cheating suspected user set and the non-cheating suspected user set within a set time period, so as to improve the identification accuracy of the identification function on the cheating users.
EXAMPLE III
Fig. 6 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. FIG. 6 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 6 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
As shown in FIG. 6, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set of program modules (e.g., subscriber identity device determination module 510 and identification module 520) configured to perform the functions of embodiments of the present invention.
A program/utility 40 having a set of program modules 42 (e.g., user identification device determination module 510 and identification module 520) may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may include an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, implementing a user identification method provided by an embodiment of the present invention, the method including:
determining online behavior data of a user;
determining the probability that the user is a target user based on the online behavior data and a preset identification model;
wherein the target users comprise users performing specific online behaviors;
the determining the online behavior data of the user comprises the following steps:
counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; alternatively, the first and second electrodes may be,
and counting the number of the used devices when the user logs in the live broadcast platform in a set time period.
The processing unit 16 executes various functional applications and data processing, such as implementing a user identification method provided by an embodiment of the present invention, by executing programs stored in the system memory 28.
Of course, those skilled in the art can understand that the processor can also implement the technical solution of the user identification method provided by any embodiment of the present invention.
Example four
The fourth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the user identification method provided in the fourth embodiment of the present invention, where the method includes:
determining online behavior data of a user;
determining the probability that the user is a target user based on the online behavior data and a preset identification model;
wherein the target users comprise users performing specific online behaviors;
the determining the online behavior data of the user comprises the following steps:
counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; alternatively, the first and second electrodes may be,
and counting the number of the used devices when the user logs in the live broadcast platform in a set time period.
Of course, the computer program stored on the computer-readable storage medium provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the user identification method provided by any embodiments of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (6)
1. A method for identifying a user, comprising:
determining online behavior data of a user;
determining the probability that the user is a target user based on the online behavior data and a preset identification model;
wherein the target users comprise users performing specific online behaviors; the determining the online behavior data of the user comprises the following steps:
counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; alternatively, the first and second electrodes may be,
counting the number of used devices when the user logs in the live broadcast platform within a set time period;
the determining the probability that the user is the target user based on the online behavior data and a preset recognition model comprises:
determining the probability that the user is the target user according to the following preset identification function:
wherein, A (x) represents the probability that the user is the target user, x represents the online behavior data of the user, and a and b are set constants;
wherein, determining the values of the constants a and b in the preset identification function comprises:
determining a cheating suspected user set and a non-cheating suspected user set;
counting online behavior data of each user in the cheating suspect user set and the non-cheating suspect user set within a set time period;
respectively calculating a first comprehensive probability that the online behavior data of each user in the cheating suspected user set is greater than a current standard value and a second comprehensive probability that the online behavior data of each user in the non-cheating suspected user set is greater than the current standard value according to each standard value in the standard values;
calculating the log probability ratio of the first comprehensive probability and the second comprehensive probability to obtain the log probability ratio corresponding to each standard value;
constructing a log probability ratio curve based on the log probability ratios corresponding to each standard value, wherein the horizontal axis represents each standard value, and the vertical axis represents the log probability ratio corresponding to each standard value;
performing linear fitting on the logarithmic probability ratio curve, wherein the end point values between corresponding linear division regions respectively correspond to the constants a and b when the fitting loss is minimum;
and the standard values comprise possible values of the online behavior data of the user.
2. The method of claim 1, wherein calculating, for each of the standard values, a first integrated probability that the online behavior data of each user in the set of suspected cheating users is greater than the current standard value comprises:
calculating a first comprehensive probability that the online behavior data of each user in the cheating suspected user set is greater than a current standard value according to the following formula:
wherein p (S, v)i) Representing that the online behavior data of each user in the cheating suspect user set S is larger than the current standard value viFirst synthesis ofProbability, N (S, v > v)i) Representing that the network uplink in the cheating suspicion user set S is that the data v is larger than the current standard value viN (S) represents the total number of users in the set S of cheating suspected users.
3. The method of claim 2, wherein calculating a log probability ratio of the first combined probability to the second combined probability to obtain a log probability ratio for each criterion value comprises:
calculating the log probability ratio corresponding to each standard value according to the following formula:
wherein, r (v)i) Represents a corresponding standard value viLog probability ratio of p (S, v)i) Representing that the online behavior data of each user in the cheating suspect user set S is larger than the current standard value viFirst integrated probability of p (N, v)i) Representing that the online behavior data of each user in the non-cheating suspected user set N is larger than the current standard value viThe second composite probability of (1).
4. A user identification device, the device comprising:
the determining module is used for determining online behavior data of the user;
the recognition module is used for determining the probability that the user is a target user based on the online behavior data and a preset recognition model;
wherein the target users comprise users performing specific online behaviors;
the determining module is specifically configured to: counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; alternatively, the first and second electrodes may be,
counting the number of used devices when the user logs in the live broadcast platform within a set time period;
the identification module is specifically configured to: determining the probability that the user is the target user according to the following preset identification function:
wherein, A (x) represents the probability that the user is the target user, x represents the online behavior data of the user, and a and b are set constants;
the user identification device also comprises a calculation module used for determining the numerical values of the constants a and b in the preset identification function;
the calculation module comprises:
the determining unit is used for determining a cheating suspected user set and a non-cheating suspected user set;
the statistical unit is used for counting the online behavior data of each user in the cheating suspected user set and the non-cheating suspected user set within a set time period;
the first calculation unit is used for respectively calculating a first comprehensive probability that the online behavior data of each user in the cheating suspected user set is larger than a current standard value and a second comprehensive probability that the online behavior data of each user in the non-cheating suspected user set is larger than the current standard value according to each standard value in the standard values;
the second calculation unit is used for calculating the log probability ratio of the first comprehensive probability and the second comprehensive probability to obtain the log probability ratio corresponding to each standard value;
the construction unit is used for constructing a log probability ratio curve based on the log probability ratio corresponding to each standard value, wherein the horizontal axis represents each standard value, and the vertical axis represents the log probability ratio corresponding to each standard value;
the fitting unit is used for performing linear fitting on the log probability ratio curve, and the end point values between the corresponding linear division regions respectively correspond to the constants a and b when the fitting loss is minimum;
and the standard values comprise possible values of the online behavior data of the user.
5. An electronic device, characterized in that the electronic device further comprises:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the user identification method of any of claims 1-3.
6. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the user identification method according to any one of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811573563.4A CN109714636B (en) | 2018-12-21 | 2018-12-21 | User identification method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811573563.4A CN109714636B (en) | 2018-12-21 | 2018-12-21 | User identification method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109714636A CN109714636A (en) | 2019-05-03 |
CN109714636B true CN109714636B (en) | 2021-04-23 |
Family
ID=66256003
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811573563.4A Active CN109714636B (en) | 2018-12-21 | 2018-12-21 | User identification method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109714636B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112788351B (en) * | 2019-11-01 | 2022-08-05 | 武汉斗鱼鱼乐网络科技有限公司 | Target live broadcast room identification method, device, equipment and storage medium |
CN110990729B (en) * | 2019-12-05 | 2023-11-03 | 秒针信息技术有限公司 | Job analysis method, device, electronic equipment and readable storage medium |
CN114071196A (en) * | 2020-08-03 | 2022-02-18 | 武汉斗鱼鱼乐网络科技有限公司 | Method, system, medium and equipment for identifying target live broadcast room |
CN112995686B (en) * | 2021-02-03 | 2022-04-19 | 上海哔哩哔哩科技有限公司 | Data processing method, live broadcast method, authentication server and live broadcast data server |
CN113347497B (en) * | 2021-08-02 | 2021-11-26 | 武汉斗鱼鱼乐网络科技有限公司 | Target user identification method and device, electronic equipment and storage medium |
CN114679600A (en) * | 2022-03-24 | 2022-06-28 | 上海哔哩哔哩科技有限公司 | Data processing method and device |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102523126A (en) * | 2011-12-29 | 2012-06-27 | 深圳市同洲视讯传媒有限公司 | Method and device for sending alarm event |
CN103793484A (en) * | 2014-01-17 | 2014-05-14 | 五八同城信息技术有限公司 | Fraudulent conduct identification system based on machine learning in classified information website |
CN104424433A (en) * | 2013-08-22 | 2015-03-18 | 腾讯科技(深圳)有限公司 | Anti-cheating method and anti-cheating system of application program |
CN104852886A (en) * | 2014-02-14 | 2015-08-19 | 腾讯科技(深圳)有限公司 | Protection method and device for user account |
CN105100032A (en) * | 2014-05-23 | 2015-11-25 | 腾讯科技(北京)有限公司 | Method and apparatus for preventing resource steal |
CN105808639A (en) * | 2016-02-24 | 2016-07-27 | 平安科技(深圳)有限公司 | Network access behavior recognizing method and device |
CN106209862A (en) * | 2016-07-14 | 2016-12-07 | 微梦创科网络科技(中国)有限公司 | A kind of steal-number defence implementation method and device |
WO2017027003A1 (en) * | 2015-08-10 | 2017-02-16 | Hewlett Packard Enterprise Development Lp | Evaluating system behaviour |
CN108055281A (en) * | 2017-12-27 | 2018-05-18 | 百度在线网络技术(北京)有限公司 | Account method for detecting abnormality, device, server and storage medium |
CN108763359A (en) * | 2018-05-16 | 2018-11-06 | 武汉斗鱼网络科技有限公司 | A kind of usage mining method, apparatus and electronic equipment with incidence relation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11005864B2 (en) * | 2017-05-19 | 2021-05-11 | Salesforce.Com, Inc. | Feature-agnostic behavior profile based anomaly detection |
-
2018
- 2018-12-21 CN CN201811573563.4A patent/CN109714636B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102523126A (en) * | 2011-12-29 | 2012-06-27 | 深圳市同洲视讯传媒有限公司 | Method and device for sending alarm event |
CN104424433A (en) * | 2013-08-22 | 2015-03-18 | 腾讯科技(深圳)有限公司 | Anti-cheating method and anti-cheating system of application program |
CN103793484A (en) * | 2014-01-17 | 2014-05-14 | 五八同城信息技术有限公司 | Fraudulent conduct identification system based on machine learning in classified information website |
CN104852886A (en) * | 2014-02-14 | 2015-08-19 | 腾讯科技(深圳)有限公司 | Protection method and device for user account |
CN105100032A (en) * | 2014-05-23 | 2015-11-25 | 腾讯科技(北京)有限公司 | Method and apparatus for preventing resource steal |
WO2017027003A1 (en) * | 2015-08-10 | 2017-02-16 | Hewlett Packard Enterprise Development Lp | Evaluating system behaviour |
CN105808639A (en) * | 2016-02-24 | 2016-07-27 | 平安科技(深圳)有限公司 | Network access behavior recognizing method and device |
CN106209862A (en) * | 2016-07-14 | 2016-12-07 | 微梦创科网络科技(中国)有限公司 | A kind of steal-number defence implementation method and device |
CN108055281A (en) * | 2017-12-27 | 2018-05-18 | 百度在线网络技术(北京)有限公司 | Account method for detecting abnormality, device, server and storage medium |
CN108763359A (en) * | 2018-05-16 | 2018-11-06 | 武汉斗鱼网络科技有限公司 | A kind of usage mining method, apparatus and electronic equipment with incidence relation |
Also Published As
Publication number | Publication date |
---|---|
CN109714636A (en) | 2019-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109714636B (en) | User identification method, device, equipment and medium | |
CN110177094B (en) | User group identification method and device, electronic equipment and storage medium | |
CN110351572B (en) | Method, device and equipment for updating live broadcast room information and storage medium | |
CN108197202B (en) | Data verification method and device for crowdsourcing task, server and storage medium | |
CN112417274A (en) | Message pushing method and device, electronic equipment and storage medium | |
CN110502697B (en) | Target user identification method and device and electronic equipment | |
CN110826036A (en) | User operation behavior safety identification method and device and electronic equipment | |
US8396877B2 (en) | Method and apparatus for generating a fused view of one or more people | |
CN111090729A (en) | Method, device, server and storage medium for identifying fraudulent group | |
CN107729944B (en) | Identification method and device of popular pictures, server and storage medium | |
CN112788351B (en) | Target live broadcast room identification method, device, equipment and storage medium | |
US20100198633A1 (en) | Method and System for Obtaining Social Network Information | |
CN113347497B (en) | Target user identification method and device, electronic equipment and storage medium | |
CN116805012A (en) | Quality assessment method and device for multi-mode knowledge graph, storage medium and equipment | |
CN110659280A (en) | Road blocking abnormity detection method and device, computer equipment and storage medium | |
CN113365113B (en) | Target node identification method and device | |
CN109257648B (en) | Method, device, terminal and storage medium for correcting similarity between live broadcasts | |
CN114202409A (en) | Guarantee map construction method, device, equipment and storage medium | |
CN110458743B (en) | Community management method, device, equipment and storage medium based on big data analysis | |
CN115016890A (en) | Virtual machine resource allocation method and device, electronic equipment and storage medium | |
CN110688610B (en) | Weight calculation method and device for graph data and electronic equipment | |
CN115810228A (en) | Face recognition access control management method and device, electronic equipment and storage medium | |
CN110648208B (en) | Group identification method and device and electronic equipment | |
CN114630185B (en) | Target user identification method and device, electronic equipment and storage medium | |
CN112261484B (en) | Target user identification method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |