CN109714636B - User identification method, device, equipment and medium - Google Patents

User identification method, device, equipment and medium Download PDF

Info

Publication number
CN109714636B
CN109714636B CN201811573563.4A CN201811573563A CN109714636B CN 109714636 B CN109714636 B CN 109714636B CN 201811573563 A CN201811573563 A CN 201811573563A CN 109714636 B CN109714636 B CN 109714636B
Authority
CN
China
Prior art keywords
user
probability
behavior data
cheating
standard value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811573563.4A
Other languages
Chinese (zh)
Other versions
CN109714636A (en
Inventor
王璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Ouyuan Network Video Co ltd
Original Assignee
Wuhan Ouyuan Network Video Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Ouyuan Network Video Co ltd filed Critical Wuhan Ouyuan Network Video Co ltd
Priority to CN201811573563.4A priority Critical patent/CN109714636B/en
Publication of CN109714636A publication Critical patent/CN109714636A/en
Application granted granted Critical
Publication of CN109714636B publication Critical patent/CN109714636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a user identification method, a device, equipment and a medium, wherein the method comprises the following steps: determining online behavior data of a user; determining the probability that the user is a target user based on the online behavior data and a preset identification model; wherein, the target users include users who perform specific online behaviors, and the determining the online behavior data of the users includes: counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; or counting the number of used devices when the user logs in the live broadcast platform within a set time period. By adopting the technical scheme, the identification accuracy of the target user can be improved.

Description

User identification method, device, equipment and medium
Technical Field
Embodiments of the present invention relate to the field of computers, and in particular, to a user identification method, apparatus, device, and medium.
Background
On a live broadcast platform, in order to obtain benefits, people's cheating behaviors such as false bullet screen brushing, attention brushing and the like generally exist.
The cheating behavior can cause the problems of network blockage, overlarge pressure of a live broadcast platform server and the like, and great influence is caused on the live broadcast ecological environment of the platform. Therefore, in order to reduce the negative influence caused by the cheating behaviors, it is significant to find the users with cheating suspicions by adopting a reasonable method. The existing identification method for cheating users usually adopts strong rules which are usually set according to experience of service personnel, and threshold values of some indexes are not set in a reasonable principle, so that the randomness is relatively high; for example, if the number of IP addresses used by the user is empirically determined to be 10 or more, the user is determined to be a suspected cheating user, and if the number of IP addresses used by a certain user is 9, the certain user is determined not to be a suspected cheating user. Clearly the above rules for identifying cheating users are not reasonable.
Disclosure of Invention
The invention provides a user identification method, a user identification device, user identification equipment and a user identification medium.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
in a first aspect, an embodiment of the present invention provides a user identification method, where the method includes:
determining online behavior data of a user;
determining the probability that the user is a target user based on the online behavior data and a preset identification model;
wherein the target users comprise users performing specific online behaviors;
the determining the online behavior data of the user comprises the following steps:
counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; alternatively, the first and second electrodes may be,
and counting the number of the used devices when the user logs in the live broadcast platform in a set time period.
In a second aspect, an embodiment of the present invention provides a user identification apparatus, where the apparatus includes:
the determining module is used for determining online behavior data of the user;
the recognition module is used for determining the probability that the user is a target user based on the online behavior data and a preset recognition model;
wherein the target users comprise users performing specific online behaviors;
the determining module is specifically configured to: counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; alternatively, the first and second electrodes may be,
and counting the number of the used devices when the user logs in the live broadcast platform in a set time period.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
a storage device for storing a plurality of programs;
when at least one of the plurality of programs is executed by the one or more processors, the one or more processors are caused to implement the user identification method of the first aspect described above.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the user identification method according to the first aspect.
The user identification method provided by the embodiment of the invention determines the online behavior data of the user, and specifically comprises the following steps: counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; or counting the number of used devices when the user logs in a live broadcast platform within a set time period, and determining the probability that the user is a target user based on the online behavior data and a preset recognition model; the target user comprises a technical means of a user performing specific online behaviors, and the identification accuracy of the target user is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the contents of the embodiments of the present invention and the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a user identification method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a method for determining values of constants a and b according to a first embodiment of the present invention;
FIG. 3 is a diagram illustrating linear division of a log-probability ratio curve according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating linear division of another log-probability ratio curve according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a user identification device according to a second embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.
Detailed Description
In order to make the technical problems solved, technical solutions adopted and technical effects achieved by the present invention clearer, the technical solutions of the embodiments of the present invention will be described in further detail below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Fig. 1 is a schematic flow chart of a user identification method according to an embodiment of the present invention. The user identification method disclosed in this embodiment may be suitable for identifying a user engaged in a certain online behavior, for example, identifying a user brushing a bullet screen or a user brushing attention in a live broadcast room, and may be executed by a user identification device, where the device may be implemented by software and/or hardware and is generally integrated in a terminal, such as a smart phone or a computer. Referring specifically to fig. 1, the method may include the steps of:
step 110, counting the number of used IP addresses when a user logs in a live broadcast platform in a set time period; or counting the number of devices used by the user when logging in the live broadcast platform within a set time period.
That is, the online behavior data includes the IP address number or the device number. Wherein, the set time period may be a specific day, a specific week or a specific month. The method comprises the steps of determining that user online behavior data can be collected through behavior dotting, wherein the behavior dotting is to insert a dot embedding code into a place (such as a click event and page jump) where a dot embedding is needed in a project for counting user behaviors, then the user online behaviors are recorded in a user behavior log, and determining a user performing specific online behaviors by collecting the user behavior log and inquiring the user behaviors, wherein the specific online behaviors are specifically user behavior which sends bullet screen information for a main broadcast A. At the same time, network environment information (such as IP address) used by the user for online behavior and terminal equipment information (such as terminal equipment ID) used by the user for online behavior are recorded in the user behavior log. The user behavior log can be directly obtained through a data acquisition interface at a mobile terminal (such as a smart phone).
And 120, determining the probability that the user is the target user based on the online behavior data and a preset identification model.
Wherein the target users comprise users performing specific online behaviors.
The specific online behavior can be a positive behavior worthy of advocation, such as online donation, and can also be a negative behavior needing to be resisted, such as a barrage brushing behavior aiming at the same anchor through a live platform or an attention brushing behavior aiming at the same anchor through the live platform. The negative behavior that needs to be resisted often has some negative effects, for example, the above-mentioned act of swiping a bullet screen through the live platform for the same anchor or swiping attention through the live platform for the same anchor often causes problems of network congestion, overstressing of the live platform server, and the like. Therefore, in order to reduce the negative impact caused by the bullet screen brushing behavior or the attention brushing behavior or to actively advocate the pursuit of a certain beneficial behavior, the present embodiment discloses a user identification method for identifying a target user engaged in the bullet screen brushing behavior or the attention brushing behavior, to give a warning or take other measures to do mortgage, or to identify a group engaged in public welfare behaviors such as donations, to make a good social atmosphere, etc., which is described by taking the example of identifying the online cheating user engaged in the bullet screen brushing behavior or the attention brushing behavior, etc.
Further, determining the probability that the user is the target user based on the online behavior data and a preset recognition model, includes:
determining the probability that the user is the target user according to the following preset identification function:
Figure BDA0001916125520000051
wherein, A (x) represents the probability that the user is the target user, x represents the online behavior data of the user, and a and b are set constants.
For the reason that the accuracy of identifying the target user by the identification function depends on the set values of the constants a and b, the present embodiment provides an algorithm for determining the values of the constants a and b, as shown in fig. 2, and the algorithm specifically includes the following steps:
and step 210, determining a cheating suspected user set and a non-cheating suspected user set.
Specifically, the set of cheating suspected users can be determined by receiving the complaints, that is, the users complained by the audience are the cheating suspected users, and the users not complained by the audience are the non-cheating suspected users. The behavior of each user logging in the live broadcast room can be audited by auditors to determine cheating suspected users and non-cheating suspected users, and specifically, if the auditors find that a certain user logs in the live broadcast room through 10 different devices in one day and pays attention to different anchor broadcasters, the user can be considered as the cheating suspected user. And the cheating suspected users and the non-cheating suspected users can be obtained through expert experience.
And step 220, counting the online behavior data of each user in the cheating suspected user set and the non-cheating suspected user set within a set time period.
Wherein the online behavior data may include: the number of used IP addresses when logging in a live broadcast platform in a set time period; or the number of used devices when logging in the live platform in a set time period.
And step 230, respectively calculating a first comprehensive probability that the online behavior data of each user in the cheating suspected user set is greater than the current standard value and a second comprehensive probability that the online behavior data of each user in the non-cheating suspected user set is greater than the current standard value according to each standard value in the standard values.
Wherein, each standard value includes a possible value of the online behavior data of the user, for example, when the online behavior data of the user is an IP address used when logging in the live platform in a set time period, the possible value is 1,2,3,4, … … n, that is, each standard value is: the value of 1,2,3,4, … … n, n can be set according to business experience, and is usually set to 30. When the online behavior data of the user is the number of used IP addresses when logging in the live broadcast platform in a set time period, the corresponding standard values are possible values of the number of the used IP addresses when logging in the live broadcast platform in the set time period; and when the online behavior data of the user is the number of used devices when logging in the live broadcast platform in the set time period, the corresponding standard values are possible values of the number of used devices when logging in the live broadcast platform in the set time period.
Specifically, a first comprehensive probability that online behavior data of each user in the cheating suspect user set is greater than a current standard value is calculated according to the following formula (1):
Figure BDA0001916125520000071
wherein p (S, v)i) Representing that the online behavior data of each user in the cheating suspect user set S is larger than the current standard value viFirst integrated probability of (1), N (S, v > v)i) Representing that the network uplink in the cheating suspicion user set S is that the data v is larger than the current standard value viN (S) represents the total number of users in the set S of cheating suspected users.
And the calculation mode of the second comprehensive probability that the online behavior data of each user in the non-cheating suspected user set is greater than the current standard value is the same as the calculation mode of the first comprehensive probability that the online behavior data of each user in the cheating suspected user set is greater than the current standard value.
And 240, calculating the log probability ratio of the first comprehensive probability and the second comprehensive probability to obtain the log probability ratio corresponding to each standard value.
Specifically, the log probability ratio corresponding to each standard value is calculated according to the following formula (2):
Figure BDA0001916125520000072
wherein, r (v)i) Represents a corresponding standard value viLog probability ratio of p (S, v)i) Representing that the online behavior data of each user in the cheating suspect user set S is larger than the current standard value viFirst integrated probability of p (N, v)i) Representing that the online behavior data of each user in the non-cheating suspected user set N is larger than the current standard value viThe second composite probability of (1).
And 250, constructing a log probability ratio curve based on the log probability ratios corresponding to the standard values, wherein the horizontal axis represents each standard value, and the vertical axis represents the log probability ratio corresponding to each standard value.
Each standard value viAll correspond to a logarithmic probability ratio r (v)i) Thus, v can be obtainedi~r(vi) On the horizontal axis of the logarithmic probability ratio curve of (1), each standard value v is represented byiThe vertical axis represents a logarithmic probability ratio r (v) corresponding to each standard valuei)。
And step 260, performing linear fitting on the log-probability ratio curve, wherein the end point values between the corresponding linear division regions respectively correspond to the constants a and b when the fitting loss is minimum.
Specifically, the inflection point of the log probability ratio curve is determined by adopting a broken line method, and in consideration of the problem of calculation complexity, the log probability ratio curve can be divided into three sections according to the inflection point, each section adopts linear fitting, and fitting loss is calculated; and finally solving an optimal division mode with the minimum fitting loss by different division modes and solving the fitting loss under each division mode, wherein the endpoint values between the corresponding linear division modes under the optimal division mode respectively correspond to the constants a and b. Further referring to fig. 3 and fig. 4, the linear division schematic diagram of the log probability ratio curve is shown, where the two corresponding inflection points in fig. 3 are an inflection point g1 and an inflection point g2, respectively, and the log probability ratio curve is divided into three segments, namely, a curve segment 31, a curve segment 32, and a curve segment 33; the two corresponding inflection points in fig. 4 are an inflection point g3 and an inflection point g4, respectively, and the log probability ratio curve is divided into three segments, namely a curve segment 34, a curve segment 35 and a curve segment 36; if the fitting loss is the minimum in the division manner shown in fig. 4, the value of the constant a is the horizontal axis data corresponding to the inflection point g3, and the value of the constant b is the horizontal axis data corresponding to the inflection point g4, as shown in fig. 4.
Further, the fitting loss in a certain division manner is calculated according to the following formula (3):
Figure BDA0001916125520000081
wherein S represents a division of the log-probability ratio curve, Q (S) represents the fitting loss under the division S, rmThe horizontal axis region corresponding to the mth segment of the divided curve is shown, and for example, it is assumed that the horizontal axis region r1 corresponding to the first segment of the divided curve 31 shown in fig. 3 is [0,1 ]]The horizontal axis region r2 corresponding to the second curve 32 is [1,3 ]]The horizontal axis region r3 corresponding to the third-segment curve 33 is [3,4 ]];xiIndicating the region rmValue of (a), yiDenotes xiThe corresponding vertical axis coordinate on the curve; m represents the total number of curve segments divided, in this embodiment, the log-probability ratio curve is divided into 3 segments, l (x) represents the fitting loss function, where the square function is chosen, i.e., l (x) x2,αmAnd betamAnd (3) representing parameters obtained by estimating the m-th section of curve by a least square method, wherein the specific calculation method comprises the following steps:
Figure BDA0001916125520000091
Figure BDA0001916125520000092
and traversing all the possible division modes to obtain the minimum value of Q (S), wherein the endpoint values between the linear division areas corresponding to the minimum values of Q (S) respectively correspond to the numerical values of the constants a and b.
According to the user identification method provided by the embodiment, the set values of the constants a and b in the preset identification function are determined by adopting a specific algorithm by combining the online behavior data of each user in the cheating suspected user set and the non-cheating suspected user set within the set time period, so that the identification accuracy of the identification function on the cheating users is improved.
Example two
Fig. 5 is a schematic structural diagram of a user identification device according to a second embodiment of the present invention. Referring to fig. 5, the apparatus comprises: a determination module 510 and an identification module 520;
the determining module 510 is configured to determine online behavior data of a user; the identification module 520 is configured to determine the probability that the user is a target user based on the online behavior data and a preset identification model; wherein the target users comprise users performing specific online behaviors.
Further, the determining module 510 is specifically configured to: counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; or counting the number of used devices when the user logs in the live broadcast platform within a set time period.
Further, the identifying module 520 is specifically configured to: determining the probability that the user is the target user according to the following preset identification function:
Figure BDA0001916125520000101
wherein, A (x) represents the probability that the user is the target user, x represents the online behavior data of the user, and a and b are set constants.
Further, the device further comprises a calculation module, configured to determine values of the constants a and b in the preset identification function.
Further, the calculation module comprises:
the determining unit is used for determining a cheating suspected user set and a non-cheating suspected user set;
the statistical unit is used for counting the online behavior data of each user in the cheating suspected user set and the non-cheating suspected user set within a set time period;
the first calculation unit is used for respectively calculating a first comprehensive probability that the online behavior data of each user in the cheating suspected user set is larger than a current standard value and a second comprehensive probability that the online behavior data of each user in the non-cheating suspected user set is larger than the current standard value according to each standard value in the standard values;
the second calculation unit is used for calculating the log probability ratio of the first comprehensive probability and the second comprehensive probability to obtain the log probability ratio corresponding to each standard value;
the construction unit is used for constructing a log probability ratio curve based on the log probability ratio corresponding to each standard value, wherein the horizontal axis represents each standard value, and the vertical axis represents the log probability ratio corresponding to each standard value;
the fitting unit is used for performing linear fitting on the log probability ratio curve, and the end point values between the corresponding linear division regions respectively correspond to the constants a and b when the fitting loss is minimum;
and the standard values comprise possible values of the online behavior data of the user.
Further, the first calculating unit is specifically configured to: calculating a first comprehensive probability that the online behavior data of each user in the cheating suspected user set is greater than a current standard value according to the following formula:
Figure BDA0001916125520000111
wherein p (S, v)i) Representing that the online behavior data of each user in the cheating suspect user set S is larger than the current standard value viFirst integrated probability of (1), N (S, v > v)i) Representing that the network uplink in the cheating suspicion user set S is that the data v is larger than the current standard value viTo a userNumber, n (S), represents the total number of users in the set S of cheating suspect users.
Further, the second calculating unit is specifically configured to: calculating the log probability ratio corresponding to each standard value according to the following formula:
Figure BDA0001916125520000112
wherein, r (v)i) Represents a corresponding standard value viLog probability ratio of p (S, v)i) Representing that the online behavior data of each user in the cheating suspect user set S is larger than the current standard value viFirst integrated probability of p (N, v)i) Representing that the online behavior data of each user in the non-cheating suspected user set N is larger than the current standard value viThe second composite probability of (1).
The user identification device provided by this embodiment determines the set values of the constants a and b in the preset identification function by using a specific algorithm by combining the online behavior data of each user in the cheating suspected user set and the non-cheating suspected user set within a set time period, so as to improve the identification accuracy of the identification function on the cheating users.
EXAMPLE III
Fig. 6 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. FIG. 6 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 6 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
As shown in FIG. 6, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set of program modules (e.g., subscriber identity device determination module 510 and identification module 520) configured to perform the functions of embodiments of the present invention.
A program/utility 40 having a set of program modules 42 (e.g., user identification device determination module 510 and identification module 520) may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may include an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, implementing a user identification method provided by an embodiment of the present invention, the method including:
determining online behavior data of a user;
determining the probability that the user is a target user based on the online behavior data and a preset identification model;
wherein the target users comprise users performing specific online behaviors;
the determining the online behavior data of the user comprises the following steps:
counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; alternatively, the first and second electrodes may be,
and counting the number of the used devices when the user logs in the live broadcast platform in a set time period.
The processing unit 16 executes various functional applications and data processing, such as implementing a user identification method provided by an embodiment of the present invention, by executing programs stored in the system memory 28.
Of course, those skilled in the art can understand that the processor can also implement the technical solution of the user identification method provided by any embodiment of the present invention.
Example four
The fourth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the user identification method provided in the fourth embodiment of the present invention, where the method includes:
determining online behavior data of a user;
determining the probability that the user is a target user based on the online behavior data and a preset identification model;
wherein the target users comprise users performing specific online behaviors;
the determining the online behavior data of the user comprises the following steps:
counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; alternatively, the first and second electrodes may be,
and counting the number of the used devices when the user logs in the live broadcast platform in a set time period.
Of course, the computer program stored on the computer-readable storage medium provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the user identification method provided by any embodiments of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (6)

1. A method for identifying a user, comprising:
determining online behavior data of a user;
determining the probability that the user is a target user based on the online behavior data and a preset identification model;
wherein the target users comprise users performing specific online behaviors; the determining the online behavior data of the user comprises the following steps:
counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; alternatively, the first and second electrodes may be,
counting the number of used devices when the user logs in the live broadcast platform within a set time period;
the determining the probability that the user is the target user based on the online behavior data and a preset recognition model comprises:
determining the probability that the user is the target user according to the following preset identification function:
Figure FDA0002928907850000011
wherein, A (x) represents the probability that the user is the target user, x represents the online behavior data of the user, and a and b are set constants;
wherein, determining the values of the constants a and b in the preset identification function comprises:
determining a cheating suspected user set and a non-cheating suspected user set;
counting online behavior data of each user in the cheating suspect user set and the non-cheating suspect user set within a set time period;
respectively calculating a first comprehensive probability that the online behavior data of each user in the cheating suspected user set is greater than a current standard value and a second comprehensive probability that the online behavior data of each user in the non-cheating suspected user set is greater than the current standard value according to each standard value in the standard values;
calculating the log probability ratio of the first comprehensive probability and the second comprehensive probability to obtain the log probability ratio corresponding to each standard value;
constructing a log probability ratio curve based on the log probability ratios corresponding to each standard value, wherein the horizontal axis represents each standard value, and the vertical axis represents the log probability ratio corresponding to each standard value;
performing linear fitting on the logarithmic probability ratio curve, wherein the end point values between corresponding linear division regions respectively correspond to the constants a and b when the fitting loss is minimum;
and the standard values comprise possible values of the online behavior data of the user.
2. The method of claim 1, wherein calculating, for each of the standard values, a first integrated probability that the online behavior data of each user in the set of suspected cheating users is greater than the current standard value comprises:
calculating a first comprehensive probability that the online behavior data of each user in the cheating suspected user set is greater than a current standard value according to the following formula:
Figure FDA0002928907850000021
wherein p (S, v)i) Representing that the online behavior data of each user in the cheating suspect user set S is larger than the current standard value viFirst synthesis ofProbability, N (S, v > v)i) Representing that the network uplink in the cheating suspicion user set S is that the data v is larger than the current standard value viN (S) represents the total number of users in the set S of cheating suspected users.
3. The method of claim 2, wherein calculating a log probability ratio of the first combined probability to the second combined probability to obtain a log probability ratio for each criterion value comprises:
calculating the log probability ratio corresponding to each standard value according to the following formula:
Figure FDA0002928907850000031
wherein, r (v)i) Represents a corresponding standard value viLog probability ratio of p (S, v)i) Representing that the online behavior data of each user in the cheating suspect user set S is larger than the current standard value viFirst integrated probability of p (N, v)i) Representing that the online behavior data of each user in the non-cheating suspected user set N is larger than the current standard value viThe second composite probability of (1).
4. A user identification device, the device comprising:
the determining module is used for determining online behavior data of the user;
the recognition module is used for determining the probability that the user is a target user based on the online behavior data and a preset recognition model;
wherein the target users comprise users performing specific online behaviors;
the determining module is specifically configured to: counting the number of used IP addresses when the user logs in the live broadcast platform within a set time period; alternatively, the first and second electrodes may be,
counting the number of used devices when the user logs in the live broadcast platform within a set time period;
the identification module is specifically configured to: determining the probability that the user is the target user according to the following preset identification function:
Figure FDA0002928907850000032
wherein, A (x) represents the probability that the user is the target user, x represents the online behavior data of the user, and a and b are set constants;
the user identification device also comprises a calculation module used for determining the numerical values of the constants a and b in the preset identification function;
the calculation module comprises:
the determining unit is used for determining a cheating suspected user set and a non-cheating suspected user set;
the statistical unit is used for counting the online behavior data of each user in the cheating suspected user set and the non-cheating suspected user set within a set time period;
the first calculation unit is used for respectively calculating a first comprehensive probability that the online behavior data of each user in the cheating suspected user set is larger than a current standard value and a second comprehensive probability that the online behavior data of each user in the non-cheating suspected user set is larger than the current standard value according to each standard value in the standard values;
the second calculation unit is used for calculating the log probability ratio of the first comprehensive probability and the second comprehensive probability to obtain the log probability ratio corresponding to each standard value;
the construction unit is used for constructing a log probability ratio curve based on the log probability ratio corresponding to each standard value, wherein the horizontal axis represents each standard value, and the vertical axis represents the log probability ratio corresponding to each standard value;
the fitting unit is used for performing linear fitting on the log probability ratio curve, and the end point values between the corresponding linear division regions respectively correspond to the constants a and b when the fitting loss is minimum;
and the standard values comprise possible values of the online behavior data of the user.
5. An electronic device, characterized in that the electronic device further comprises:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the user identification method of any of claims 1-3.
6. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the user identification method according to any one of claims 1-3.
CN201811573563.4A 2018-12-21 2018-12-21 User identification method, device, equipment and medium Active CN109714636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811573563.4A CN109714636B (en) 2018-12-21 2018-12-21 User identification method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811573563.4A CN109714636B (en) 2018-12-21 2018-12-21 User identification method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN109714636A CN109714636A (en) 2019-05-03
CN109714636B true CN109714636B (en) 2021-04-23

Family

ID=66256003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811573563.4A Active CN109714636B (en) 2018-12-21 2018-12-21 User identification method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN109714636B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112788351B (en) * 2019-11-01 2022-08-05 武汉斗鱼鱼乐网络科技有限公司 Target live broadcast room identification method, device, equipment and storage medium
CN110990729B (en) * 2019-12-05 2023-11-03 秒针信息技术有限公司 Job analysis method, device, electronic equipment and readable storage medium
CN114071196A (en) * 2020-08-03 2022-02-18 武汉斗鱼鱼乐网络科技有限公司 Method, system, medium and equipment for identifying target live broadcast room
CN112995686B (en) * 2021-02-03 2022-04-19 上海哔哩哔哩科技有限公司 Data processing method, live broadcast method, authentication server and live broadcast data server
CN113347497B (en) * 2021-08-02 2021-11-26 武汉斗鱼鱼乐网络科技有限公司 Target user identification method and device, electronic equipment and storage medium
CN114679600A (en) * 2022-03-24 2022-06-28 上海哔哩哔哩科技有限公司 Data processing method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523126A (en) * 2011-12-29 2012-06-27 深圳市同洲视讯传媒有限公司 Method and device for sending alarm event
CN103793484A (en) * 2014-01-17 2014-05-14 五八同城信息技术有限公司 Fraudulent conduct identification system based on machine learning in classified information website
CN104424433A (en) * 2013-08-22 2015-03-18 腾讯科技(深圳)有限公司 Anti-cheating method and anti-cheating system of application program
CN104852886A (en) * 2014-02-14 2015-08-19 腾讯科技(深圳)有限公司 Protection method and device for user account
CN105100032A (en) * 2014-05-23 2015-11-25 腾讯科技(北京)有限公司 Method and apparatus for preventing resource steal
CN105808639A (en) * 2016-02-24 2016-07-27 平安科技(深圳)有限公司 Network access behavior recognizing method and device
CN106209862A (en) * 2016-07-14 2016-12-07 微梦创科网络科技(中国)有限公司 A kind of steal-number defence implementation method and device
WO2017027003A1 (en) * 2015-08-10 2017-02-16 Hewlett Packard Enterprise Development Lp Evaluating system behaviour
CN108055281A (en) * 2017-12-27 2018-05-18 百度在线网络技术(北京)有限公司 Account method for detecting abnormality, device, server and storage medium
CN108763359A (en) * 2018-05-16 2018-11-06 武汉斗鱼网络科技有限公司 A kind of usage mining method, apparatus and electronic equipment with incidence relation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11005864B2 (en) * 2017-05-19 2021-05-11 Salesforce.Com, Inc. Feature-agnostic behavior profile based anomaly detection

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523126A (en) * 2011-12-29 2012-06-27 深圳市同洲视讯传媒有限公司 Method and device for sending alarm event
CN104424433A (en) * 2013-08-22 2015-03-18 腾讯科技(深圳)有限公司 Anti-cheating method and anti-cheating system of application program
CN103793484A (en) * 2014-01-17 2014-05-14 五八同城信息技术有限公司 Fraudulent conduct identification system based on machine learning in classified information website
CN104852886A (en) * 2014-02-14 2015-08-19 腾讯科技(深圳)有限公司 Protection method and device for user account
CN105100032A (en) * 2014-05-23 2015-11-25 腾讯科技(北京)有限公司 Method and apparatus for preventing resource steal
WO2017027003A1 (en) * 2015-08-10 2017-02-16 Hewlett Packard Enterprise Development Lp Evaluating system behaviour
CN105808639A (en) * 2016-02-24 2016-07-27 平安科技(深圳)有限公司 Network access behavior recognizing method and device
CN106209862A (en) * 2016-07-14 2016-12-07 微梦创科网络科技(中国)有限公司 A kind of steal-number defence implementation method and device
CN108055281A (en) * 2017-12-27 2018-05-18 百度在线网络技术(北京)有限公司 Account method for detecting abnormality, device, server and storage medium
CN108763359A (en) * 2018-05-16 2018-11-06 武汉斗鱼网络科技有限公司 A kind of usage mining method, apparatus and electronic equipment with incidence relation

Also Published As

Publication number Publication date
CN109714636A (en) 2019-05-03

Similar Documents

Publication Publication Date Title
CN109714636B (en) User identification method, device, equipment and medium
CN110177094B (en) User group identification method and device, electronic equipment and storage medium
CN110351572B (en) Method, device and equipment for updating live broadcast room information and storage medium
CN108197202B (en) Data verification method and device for crowdsourcing task, server and storage medium
CN112417274A (en) Message pushing method and device, electronic equipment and storage medium
CN110502697B (en) Target user identification method and device and electronic equipment
CN110826036A (en) User operation behavior safety identification method and device and electronic equipment
US8396877B2 (en) Method and apparatus for generating a fused view of one or more people
CN111090729A (en) Method, device, server and storage medium for identifying fraudulent group
CN107729944B (en) Identification method and device of popular pictures, server and storage medium
CN112788351B (en) Target live broadcast room identification method, device, equipment and storage medium
US20100198633A1 (en) Method and System for Obtaining Social Network Information
CN113347497B (en) Target user identification method and device, electronic equipment and storage medium
CN116805012A (en) Quality assessment method and device for multi-mode knowledge graph, storage medium and equipment
CN110659280A (en) Road blocking abnormity detection method and device, computer equipment and storage medium
CN113365113B (en) Target node identification method and device
CN109257648B (en) Method, device, terminal and storage medium for correcting similarity between live broadcasts
CN114202409A (en) Guarantee map construction method, device, equipment and storage medium
CN110458743B (en) Community management method, device, equipment and storage medium based on big data analysis
CN115016890A (en) Virtual machine resource allocation method and device, electronic equipment and storage medium
CN110688610B (en) Weight calculation method and device for graph data and electronic equipment
CN115810228A (en) Face recognition access control management method and device, electronic equipment and storage medium
CN110648208B (en) Group identification method and device and electronic equipment
CN114630185B (en) Target user identification method and device, electronic equipment and storage medium
CN112261484B (en) Target user identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant