WO2019134284A1 - Method and apparatus for recognizing user, and computer device - Google Patents

Method and apparatus for recognizing user, and computer device Download PDF

Info

Publication number
WO2019134284A1
WO2019134284A1 PCT/CN2018/082163 CN2018082163W WO2019134284A1 WO 2019134284 A1 WO2019134284 A1 WO 2019134284A1 CN 2018082163 W CN2018082163 W CN 2018082163W WO 2019134284 A1 WO2019134284 A1 WO 2019134284A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
users
remaining
weight
value
Prior art date
Application number
PCT/CN2018/082163
Other languages
French (fr)
Chinese (zh)
Inventor
王璐
陈少杰
张文明
Original Assignee
武汉斗鱼网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉斗鱼网络科技有限公司 filed Critical 武汉斗鱼网络科技有限公司
Publication of WO2019134284A1 publication Critical patent/WO2019134284A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44222Analytics of user selections, e.g. selection of programs or purchase activity
    • H04N21/44224Monitoring of user activity on external systems, e.g. Internet browsing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25808Management of client data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • H04N21/25875Management of end-user data involving end-user authentication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • H04N21/25891Management of end-user data being end-user preferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data

Definitions

  • the invention belongs to the field of live broadcast technology, and in particular relates to a method, a device and a computer device for identifying a user.
  • the present invention provides a method, a device, and a computer device for identifying a user, which are used to solve the problem that the abnormal user information of the broadcast screen in the live broadcast platform cannot be identified in the prior art, resulting in the problem of the prior art.
  • the technical problem of the order of the live broadcast platform is not guaranteed.
  • An embodiment of the present invention provides a method for identifying a user, which is applied to a live broadcast platform, where the method includes:
  • the users in the user samples include: normal users, abnormal users, and remaining users;
  • Each user is identified based on the user's tag.
  • the normal user and the abnormal user in the user sample are respectively marked with different labels, including:
  • the user's behavior log record is normal or the user level is a high-level user, then the user is determined to be a normal user;
  • the normal user and the abnormal user in the user sample are respectively marked with different labels, including:
  • the information about the number of live broadcasts generated by each user behavior, the device information used, and the login IP information of each user in the live broadcast platform determine the similarity between any two users in the user sample. Values, including:
  • the Ru is the number of live broadcasts that the user u sends the barrage within a preset time
  • the R v is the live room where the user v sends the barrage within the preset time. number
  • I u is the user u of the user login information broadcast IP platform
  • the user v I v is a user login information in the IP broadcast platform
  • the users u D u is transmitted barrage
  • the device information used at the time, the D v is the device information used when the user v sends the barrage
  • the determining, according to a preset K value and a weight value of an edge formed by any two users in the K-neighbor graph, determining that each remaining user in the K-neighbor graph is connected with the normal user The first weight sum of the edges, and the second weight sum of the edges formed by each remaining user connected to the abnormal user, including:
  • determining, according to the similarity value between any two users in the user sample, a weight value of an edge formed by any two users in the K-neighbor graph including:
  • the determining, according to the first weight and the second weight, a label of each remaining user including:
  • the labels of each of the remaining users are marked as being consistent with the largest weight and the label of the corresponding user.
  • An embodiment of the present invention further provides an apparatus for identifying a user, where the apparatus includes:
  • a marking unit configured to mark different labels for the normal user and the abnormal user in the user sample, where the users in the user sample include: a normal user, an abnormal user, and a remaining user;
  • a first determining unit configured to determine, according to the quantity information of the live broadcast corresponding to each user behavior, the device information used, and the IP information that each user logs in in the live broadcast platform, determine between any two users in the user sample. Similar value
  • a second determining unit configured to determine, according to a similarity value between any two users in the user sample, a weight value of an edge formed by any two users in the K-nearest neighbor graph
  • a third determining unit configured to determine, according to a preset K value, a first weight sum of edges of each remaining user in the K-neighbor graph connected to the normal user, and each remaining user and the abnormal user a second weight sum of the connected edges;
  • a fourth determining unit configured to determine, according to the first weight sum and the second weight, a label of each remaining user
  • the identification unit is configured to identify each user according to the user's label.
  • the present invention also provides a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the following steps:
  • the user samples include: normal users, abnormal users, and remaining users;
  • Each user is identified based on the user's tag.
  • the invention also provides a computer device for identifying a user of a barrage, comprising:
  • At least one processor At least one processor
  • At least one memory communicatively coupled to the processor, wherein
  • the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of the above.
  • the embodiment of the present invention provides a method, an apparatus, and a computer device for identifying a user.
  • the method includes: marking different labels for a normal user and an abnormal user in a user sample, where the user in the user sample includes: a normal user, an abnormal user, and a remaining user; determining the number of live rooms corresponding to each user behavior, the device information used, and the IP information of each user logging in the live platform to determine any two users in the user sample.
  • K-nearest neighbor graph for all users in the user sample by using a K-nearest neighbor algorithm; determining that any two users are in the K-nearest neighbor graph according to similarity values between any two users in the user sample a weight value of the edge; determining, according to a preset K value and a weight value of an edge formed by any two users in the K-nearest neighbor graph, each remaining user in the K-neighbor graph is connected with the normal user a first weight sum of the edges, and a second weight sum of edges of each remaining user connected to the abnormal user; according to the first weight sum a second weight and a label for each remaining user; each user is identified according to the user's tag; thus, different tags are marked for the normal user and the abnormal user in the user sample; since the remaining users are not yet identified Normal user and abnormal user, so K-nearest neighbor graph is constructed for all users in the user sample according to the K-nearest neighbor algorithm, and each remaining in the K-neigh
  • each user can be identified according to the label of each user; thus, the abnormal user in the live broadcast platform can be accurately identified, and the order of the live broadcast platform can be guaranteed.
  • FIG. 1 is a schematic flowchart of a method for identifying a user according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of an apparatus for identifying a user according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a computer device for identifying a user according to an embodiment of the present invention.
  • the embodiment provides a method for identifying a user, which is applied to a live broadcast platform. As shown in FIG. 1 , the method includes:
  • the users in the user sample include: a normal user, an abnormal user, and a remaining user.
  • the user sample may be a user that logs in within a preset time in the live broadcast platform.
  • the user sample in this embodiment is a user sample that is counted daily in the live broadcast platform.
  • the normal user and the abnormal user may be identified from the user sample by using a preset identification rule, and different labels are respectively marked for the normal user and the abnormal user in the user sample.
  • the identification rule cannot identify all normal users and abnormal users in the user sample, and the identified normal users and abnormal users are only a part of users in the user sample, and thus there are some remaining users.
  • the behavior log record and the user level of each user in the user sample in the live broadcast platform are obtained; if the user's behavior log record is normal or the user level is a high level
  • the user is determined to be a normal user; the first user is marked with a first label, and the first label may be identified by a number or a letter or other, and is not limited herein.
  • the first label in this embodiment is represented by a number, such as 1.
  • the user's behavior log record when used for identification, some users will have corresponding dot records in the behavior log record when sending the barrage or other trigger actions, and if there is a corresponding dot record in the log record, the corresponding trigger is also triggered.
  • the action such as sending a barrage message, indicates that the user is a normal user.
  • the user's level it is determined by the reference level 5, when the user level is greater than 5, the user is determined to be a high-level user; when the user level is less than 5, the user is determined to be a low-level user; For example, when it is determined that the user level is 10, the user is determined to be a high-level user; when the user level is 1 level, the user is determined to be a low-level user.
  • the login account of each user in the live broadcast platform and the device ID corresponding to the account are obtained; if the login account corresponds to multiple identical device IDs, then The user corresponding to the login account is an abnormal user; and the abnormal user is marked with a second tag.
  • the second label may be identified by a number or a letter or other, and is not limited herein.
  • the second label in this embodiment is represented by a number, such as zero.
  • multiple users use different devices to send the barrage information using the same account, and then determine that the user corresponding to the account is an abnormal user.
  • S111 Determine, according to the quantity information of the live broadcast corresponding to each user behavior, the device information used, and the login Internet Protocol IP information of each user in the live broadcast platform, determine a similarity value between any two users in the user sample. ;
  • the number of live broadcasts corresponding to each user behavior, the device information used, and the login Internet Protocol IP information of each user in the live broadcast platform determine between any two users in the user sample. Similar values.
  • each user generates a unique device identifier for each device by logging in to the live platform on the device.
  • the R u is the number of live broadcasts that the user u sends the barrage within the preset time, and the R v is the user v at the preset. number of live transmission time bomb curtain;
  • I u is the user u of the user login information broadcast IP platform, the user v I v is a user login information in the IP broadcast platform;
  • the D u is the device information used when the user u sends the barrage, and the D v is the device information used when the user v sends the barrage;
  • the N is the number of the feature indicators;
  • the feature indicators related to the sending of the barrage may include: the number of times the barrage is transmitted within the preset time period, the interval between the transmission of the barrage, and the like.
  • the device information is an ID of each device. When each device accesses the live broadcast platform, the live broadcast platform generates a unique ID for each device.
  • the above parameters may be adapted to other types of users.
  • other types of users may also include: users who send gifts, and the like.
  • S112. Construct a K-nearest neighbor graph for all users in the user sample by using a K-nearest neighbor algorithm; determine weights of edges formed by any two users in the K-nearest neighbor graph according to similarity values between any two users in the user sample. value;
  • the K-nearest neighbor algorithm is used to construct a K-nearest neighbor graph for all users in the user sample.
  • each user is equivalent to one node, two An edge is formed between nodes, and the similar value between users represents the relationship between nodes.
  • weight value of the edge formed by any two users in the K-nearest neighbor graph can be determined according to the similarity value between any two users in the user sample.
  • the specific implementation method is shown in formula (2):
  • the s(u, v) is a similar value between any user u and user v
  • the a is a preset constant
  • a is a value of (0, 1).
  • S113 Determine, according to a preset K value and a weight value of an edge formed by any two users in the K-neighbor graph, a first edge of each side of the K-neighbor graph connected with the normal user. a sum of weights, and a second weight sum of edges of each remaining user connected to the abnormal user;
  • Determining the weight value of the edge formed by any two users in the K-nearest neighbor graph determining the K according to a preset K value and a weight value of an edge formed by any two users in the K-nearest neighbor graph The first weight sum of the edges formed by each remaining user connected to the normal user in the neighbor graph, and the second weight sum of the edges formed by each remaining user connected to the abnormal user.
  • the K-nearest neighbor graph will be too complicated, so a K value is preset, and each remaining in the K-nearest neighbor graph is determined according to the preset K value. a K edge formed by the user connected to the normal user and a K edge formed by each remaining user connected to the abnormal user;
  • the K value is set to 30, that is, each node is only connected to 30 nodes with the largest similar value, forming 30 edges.
  • the remaining nodes include the A node, and the B, C, and D nodes have the largest similarity values; the edges of the A and B nodes have a weight of 0.4, and the edges of the A and C nodes
  • the weight is 0.3, the weight of the edge formed by the A node and the D node is 0.1; the label of the B node and the D node is 1, and the label of the C node is 0; then the sum of the first weight is 0.5, and the sum of the second weight is 0.3.
  • a label of each remaining user is determined according to the first weight sum and the second weight sum.
  • the first weight and the second weight sum are compared to determine a maximum weight sum; the label of each remaining user is marked as being consistent with the largest weight and the label of the corresponding user.
  • the label of the A node should be the same as the label of the normal user, that is, the label of the A node is 1.
  • the original node refers to a node corresponding to an abnormal user in a user sample.
  • each user is identified according to the user's label.
  • each user After tagging the remaining users, each user can be identified based on the user's tag. In this way, we can find out every possible abnormal user and ensure the order of the live broadcast platform.
  • the embodiment provides a device for identifying a user.
  • the device includes: a marking unit 21, a first determining unit 22, a building unit 23, a second determining unit 24, and a third a determining unit 25, a fourth determining unit 26, and an identifying unit 27; wherein
  • the marking unit 21 is configured to respectively mark different labels for normal users and abnormal users in the user sample.
  • the user in the user sample further includes: remaining users.
  • the user sample may be a user that logs in within a preset time in the live broadcast platform.
  • the user sample in this embodiment is a user sample that is counted daily in the live broadcast platform.
  • the marking unit 21 may first identify a normal user and an abnormal user from the user sample by using a preset identification rule, and respectively mark different labels for the normal user and the abnormal user in the user sample.
  • the identification rule cannot identify all normal users and abnormal users in the user sample, and the identified normal users and abnormal users are only a part of users in the user sample, and thus there are some remaining users.
  • the marking unit 21 identifies and marks a normal user
  • the behavior log record and the user level of each user in the user sample in the live broadcast platform are acquired; if the behavior log record of the user is normal or When the user level is a high-level user, the user is determined to be a normal user; the first user is marked with a first label, and the first label may be identified by a number or a letter or other, and is not limited herein.
  • the first label in this embodiment is represented by a number, such as 1.
  • the user's behavior log record when used for identification, some users will have corresponding dot records in the behavior log record when sending the barrage or other trigger actions, and if there is a corresponding dot record in the log record, the corresponding trigger is also triggered.
  • the action such as sending a barrage message, indicates that the user is a normal user.
  • the user's level it is determined by the reference level 5, when the user level is greater than 5, the user is determined to be a high-level user; when the user level is less than 5, the user is determined to be a low-level user; For example, when it is determined that the user level is 10, the user is determined to be a high-level user; when the user level is 1 level, the user is determined to be a low-level user.
  • the marking unit 21 identifies and marks the normal user
  • the login account of each user in the live sample platform and the device ID corresponding to the account are obtained; if the login account corresponds to multiple identical
  • the device ID determines that the user corresponding to the login account is an abnormal user, and marks the second user with the abnormal user.
  • the second label may be identified by a number or a letter or other, and is not limited herein.
  • the second label in this embodiment is represented by a number, such as zero.
  • multiple users use different devices to send the barrage information using the same account, and then determine that the user corresponding to the account is an abnormal user.
  • the first determining unit 22 determines any two of the user samples according to the corresponding live broadcast quantity information, the used device information, and the login Internet Protocol IP information of each user in the live broadcast platform. Similar values between users.
  • each user generates a unique device identifier for each device by logging in to the live platform on the device.
  • the first determining unit 22 determines a similar value between any two users in the user sample according to formula (1):
  • the R u is the number of live broadcasts that the user u sends the barrage within the preset time
  • the R v is the user v at the preset. number of live transmission time bomb curtain
  • I u is the user u of the user login information broadcast IP platform
  • the user v I v is a user login information in the IP broadcast platform
  • the D u is the device information used when the user u sends the barrage
  • the D v is the device information used when the user v sends the barrage
  • the x ui is the i-th feature index related to the sending barrage of the user u
  • the x vi is the i-th feature index related to the sending barrage of the user v
  • the N is the number of the feature indicators
  • the feature indicators related to the sending of the barrage may include: the number of times the barrage is transmitted within the preset
  • the above parameters may be adapted to other types of users.
  • other types of users may also include: users who send gifts, and the like.
  • the constructing unit 23 is configured to construct a K-nearest neighbor graph for all users in the user sample by using the K-nearest neighbor algorithm, in the K-nearest neighbor Each user in the figure is equivalent to one node, and an edge is formed between two nodes.
  • the similarity value between users represents the relationship between nodes.
  • the second determining unit 24 may determine the weight value of the edge formed by any two users in the K-neighbor graph according to the similarity value between any two users in the user sample.
  • the specific implementation method is shown in formula (2):
  • the s(u, v) is a similar value between any user u and user v
  • the a is a preset constant
  • a is a value of (0, 1).
  • the third determining unit 25 is configured to use the preset K value and any two users according to the preset value. a weight value of an edge formed in the K-neighbor graph, determining a first weight sum of edges of each of the remaining neighbors in the K-neighbor graph and the normal user, and each remaining user is connected to the abnormal user The second weight of the side is.
  • the K-nearest neighbor graph will be too complicated, so a K value is preset, and each remaining in the K-nearest neighbor graph is determined according to the preset K value. a K edge formed by the user connected to the normal user and a K edge formed by each remaining user connected to the abnormal user;
  • the K value is set to 30, that is, each node is only connected to 30 nodes with the largest similar value, forming 30 edges.
  • the remaining nodes include the A node, and the B, C, and D nodes with the largest similarity value; the edges of the A and B nodes have a weight of 0.4, and the A and C nodes form the edge.
  • the weight is 0.3, the weight of the edge formed by the A node and the D node is 0.1; the label of the B node and the D node is 1, and the label of the C node is 0; then the sum of the first weight is 0.5, and the sum of the second weight is 0.3. .
  • the fourth determining unit 26 is configured to determine a label of each remaining user according to the first weight sum and the second weight sum.
  • the first weight and the second weight sum are compared to determine a maximum weight sum; the label of each remaining user is marked as being consistent with the largest weight and the label of the corresponding user.
  • the label of the A node should be the same as the label of the normal user, that is, the label of the A node is 1.
  • the original node refers to a node corresponding to an abnormal user in a user sample.
  • the identification unit 27 can identify each user according to the user's tag. In this way, we can find out every possible abnormal user and ensure the order of the live broadcast platform.
  • the embodiment further provides a computer device for identifying a user.
  • the computer device includes: a radio frequency (RF) circuit 310, a memory 320, an input unit 330, a display unit 340, and an audio circuit. 350, WiFi module 360, processor 370, and power supply 380 and other components.
  • RF radio frequency
  • FIG. 3 does not constitute a limitation to a computer device, and may include more or fewer components than those illustrated, or some components may be combined, or different component arrangements.
  • the RF circuit 310 can be used for receiving and transmitting signals, and in particular, receiving downlink information of the base station and processing it to the processor 370.
  • RF circuit 310 includes, but is not limited to, at least one amplifier, transceiver, coupler, Low Noise Amplifier (LNA), duplexer, and the like.
  • LNA Low Noise Amplifier
  • the memory 320 can be used to store software programs and modules, and the processor 370 executes various functional applications and data processing of the computer devices by running software programs and modules stored in the memory 320.
  • the memory 320 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function, and the like; the storage data area may store data created according to usage of the computer device, and the like.
  • the memory 320 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the input unit 330 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the computer device.
  • the input unit 330 may include a touch panel 331 and other input devices 332.
  • the touch panel 331 can collect input operations of the user and drive the corresponding connecting device according to a preset program.
  • the touch panel 331 collects the output information and sends it to the processor 370.
  • the input unit 330 may also include other input devices 332.
  • other input devices 332 may include, but are not limited to, one or more of a touch panel, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
  • the display unit 340 can be used to display information input by the user or information provided to the user as well as various menus of the computer device.
  • the display unit 340 can include a display panel 341.
  • the display panel 341 can be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • the touch panel 331 can cover the display panel 341. When the touch panel 331 detects a touch operation on or near the touch panel 331, it transmits to the processor 370 to determine the type of the touch event, and then the processor 370 according to the input event. The type provides a corresponding visual output on display panel 341.
  • the touch panel 331 and the display panel 341 are implemented as two separate components in FIG. 3 to implement input and input functions of the computer device, in some embodiments, the touch panel 331 may be integrated with the display panel 341. Implement the input and output functions of computer equipment.
  • An audio circuit 350, a speaker 351, and a microphone 352 can provide an audio interface between the user and the computer device.
  • the audio circuit 350 can transmit the converted electrical data of the received audio data to the speaker 351, and convert it into a sound signal output by the speaker 351;
  • WiFi is a short-range wireless transmission technology.
  • the computer device can help users to send and receive emails, browse web pages and access streaming media through the WiFi module 360. It provides users with wireless broadband Internet access.
  • FIG. 3 shows the WiFi module 360, it can be understood that it does not belong to the essential configuration of the computer device, and may be omitted as needed within the scope of not changing the essence of the invention.
  • Processor 370 is a control center for computer devices that connects various portions of the entire computer device using various interfaces and lines, by running or executing software programs and/or modules stored in memory 320, and recalling data stored in memory 320. , performing various functions and processing data of the computer device, thereby performing overall monitoring of the computer device.
  • the processor 370 may include one or more processing units; preferably, the processor 370 may integrate an application processor, wherein the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the computer device also includes a power source 380 (such as a power adapter) that supplies power to the various components.
  • a power source 380 such as a power adapter
  • the power source can be logically coupled to the processor 370 via a power management system.
  • the embodiment of the present invention provides a method, an apparatus, and a computer device for identifying a user.
  • the method includes: marking different labels for a normal user and an abnormal user in a user sample, where the user in the user sample includes: a normal user, an abnormal user, and a remaining user; determining, according to the number of live broadcasts corresponding to each user behavior, the device information used, and the login Internet Protocol IP information of each user in the live broadcast platform, determining any two of the user samples.
  • a similarity value between users constructing a K-nearest neighbor graph for all users in the user sample by using a K-nearest neighbor algorithm; determining any two users in the K-nearest neighbor graph according to similar values between any two users in the user sample The weight value of the edge formed in the middle; determining, according to the preset K value and the weight value of the edge formed by any two users in the K-nearest neighbor graph, each remaining user in the K-neighbor graph is connected to the normal user a first weight sum of the constituent edges, and a second weight sum of edges of each remaining user connected to the abnormal user; according to the first Reconciling the second weight with the label of each remaining user; identifying each user according to the user's tag; thus, marking different tags for the normal user and the abnormal user in the user sample; The normal user and the abnormal user of the remaining users are out, so the K-nearest neighbor graph is constructed for all users in the user sample according to the K-nearest neighbor algorithm, and the K-nearest neighbor is determined
  • modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment.
  • the modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components.
  • any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined.
  • Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
  • the various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processing (DSP) may be used in practice to implement some of the gateways, proxy servers, some or all of the components in accordance with embodiments of the present invention. Or all features.
  • DSP digital signal processing
  • the invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • Such a program implementing the present invention may be stored on a computer readable storage medium or may be in the form of one or more signals.
  • Such signals may be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form; when executed by the processor, the program implements the following steps: respectively marking different users and abnormal users in the user sample.
  • a label the user in the user sample includes: a normal user, an abnormal user, and a remaining user; a corresponding live broadcast quantity information, device information used, and a login Internet of each user in the live broadcast platform according to each user behavior
  • the protocol IP information determines a similarity value between any two users in the user sample; constructs a K-nearest neighbor graph for all users in the user sample by using a K-nearest neighbor algorithm; according to similar values between any two users in the user sample Determining a weight value of an edge formed by any two users in the K-nearest neighbor graph; determining the K-nearest neighbor graph according to a preset K value and a weight value of an edge formed by any two users in the K-nearest neighbor graph a first weight sum of edges of each of the remaining users connected to

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Graphics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Provided are a method and apparatus for recognizing a user, and a computer device. The method comprises: respectively marking different labels for normal users and abnormal users in user samples; determining the value of similarity between any two users in the user samples; constructing a K nearest neighbor graph for all the users in the user samples using a K nearest neighbor algorithm; determining, according to the value of similarity between any two users in the user sample, a weight value of an edge constituted by any two users in the K nearest neighbor graph; according to a preset value for K and the weight value of the edge constituted by any two users in the K nearest neighbor graph, determining a first weight sum of edges constituted by connecting each of the remaining users to the normal users in the K nearest neighbor graph and a second weight sum of edges constituted by connecting each of the remaining users to the abnormal users; determining the label of each of the remaining users according to the first weight sum and the second weight sum; and recognizing each of the users according to the labels of the users.

Description

一种用于识别用户的方法、装置及计算机设备Method, device and computer device for identifying users 技术领域Technical field
本发明属于直播技术领域,尤其涉及一种用于识别用户的方法、装置及计算机设备。The invention belongs to the field of live broadcast technology, and in particular relates to a method, a device and a computer device for identifying a user.
背景技术Background technique
随着直播平台的发展,越来越多的用户在直播平台上发送弹幕消息。有些主播为了增加直播间的人气,会使用机器设备在直播间发送大量弹幕信息,严重影响了直播平台的秩序。With the development of the live broadcast platform, more and more users send barrage messages on the live broadcast platform. In order to increase the popularity of the live broadcast, some anchors will use the machine equipment to send a large amount of barrage information in the live broadcast room, which seriously affects the order of the live broadcast platform.
一般来说机器弹幕从内容上大多数与正常弹幕信息并没有什么不同,机器会模仿正常用户的弹幕文本,因此无法识别出发送弹幕的异常用户,导致直播平台的秩序得不到保障。Generally speaking, most of the contents of the machine barrage are not different from the normal barrage information. The machine will imitate the barrage text of the normal user, so it is impossible to identify the abnormal user who sent the barrage, resulting in the order of the live platform being unavailable. Guarantee.
发明内容Summary of the invention
针对现有技术存在的问题,本发明实施例提供了一种用于识别用户的方法、装置及计算机设备,用于解决现有技术中无法识别出直播平台中发送弹幕的异常用户信息,导致直播平台秩序得不到保障的技术问题。The present invention provides a method, a device, and a computer device for identifying a user, which are used to solve the problem that the abnormal user information of the broadcast screen in the live broadcast platform cannot be identified in the prior art, resulting in the problem of the prior art. The technical problem of the order of the live broadcast platform is not guaranteed.
本发明实施例提供一种用于识别用户的方法,应用在直播平台中,所述方法包括:An embodiment of the present invention provides a method for identifying a user, which is applied to a live broadcast platform, where the method includes:
为用户样本中的正常用户及异常用户分别标记不同的标签,所述用户样本中的用户包括:正常用户、异常用户及剩余用户;Labeling different users and abnormal users in the user sample respectively, the users in the user samples include: normal users, abnormal users, and remaining users;
根据每个用户行为发生对应的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中登陆的互联网协议IP信息确定所述用户样本中任意两个用户之间的相似值;Determining a similar value between any two users in the user sample according to the number of live broadcasts corresponding to each user behavior, the device information used, and the Internet Protocol IP information that each user logs in in the live broadcast platform;
利用K近邻算法为用户样本中的所有用户构建K近邻图;Constructing a K-nearest neighbor graph for all users in the user sample using the K-nearest neighbor algorithm;
根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值;Determining, according to a similarity value between any two users in the user sample, a weight value of an edge formed by any two users in the K-nearest neighbor graph;
根据预设的K值及任意两个用户在所述K近邻图中构成的边的权重值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和;Determining, according to a preset K value and a weight value of an edge formed by any two users in the K-nearest neighbor graph, a first weight of an edge formed by each remaining user in the K-neighbor graph connected to the normal user And a second weight sum of edges of each remaining user connected to the abnormal user;
根据所述第一权重和及所述第二权重和确定每个剩余用户的标签;Determining a label of each remaining user according to the first weight sum and the second weight sum;
根据用户的标签对每个用户进行识别。Each user is identified based on the user's tag.
上述方案中,所述为用户样本中的正常用户及异常用户分别标记不同的标签,包括:In the foregoing solution, the normal user and the abnormal user in the user sample are respectively marked with different labels, including:
获取所述用户样本中每个用户在所述直播平台中的行为日志记录及用户等级;Obtaining a behavior log record and a user level of each user in the user sample in the live broadcast platform;
若所述用户的行为日志记录正常或用户等级为高等级用户时,则确定所述用户为正常用户;If the user's behavior log record is normal or the user level is a high-level user, then the user is determined to be a normal user;
为所述正常用户标记第一标签。Marking the first user for the normal user.
上述方案中,所述为用户样本中的正常用户及异常用户分别标记不同的标签,包括:In the foregoing solution, the normal user and the abnormal user in the user sample are respectively marked with different labels, including:
获取所述用户样本中每个用户在所述直播平台中的登陆账号以及所述登陆账号对应的设备标识ID;Obtaining a login account of each user in the live broadcast platform and a device identification ID corresponding to the login account;
若所述登陆账号对应多个相同的设备ID,则确定所述登陆账号对应的用户为异常用户;If the login account corresponds to multiple identical device IDs, determine that the user corresponding to the login account is an abnormal user;
为所述异常用户标记第二标签。Marking the second user with the abnormal user.
上述方案中,所述根据每个用户行为发生的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中的登陆IP信息确定所述用户样本中任意两个用户之间的相似值,包括:In the foregoing solution, the information about the number of live broadcasts generated by each user behavior, the device information used, and the login IP information of each user in the live broadcast platform determine the similarity between any two users in the user sample. Values, including:
根据公式
Figure PCTCN2018082163-appb-000001
确定任意用户u和用户v之间的相似值s(u,v);其中,
According to the formula
Figure PCTCN2018082163-appb-000001
Determining the similarity value s(u,v) between any user u and user v;
当需要对发送弹幕信息的用户进行识别时,所述Ru为用户u在预设时间内发送弹幕的直播间数量、所述R v为用户v在预设时间内发送弹幕的直播间数量;所述I u为用户u用户在所述直播平台中的登陆IP信息、所述I v为用户v用户在所述直播平台中的登陆IP信息;所述D u为用户u发送弹幕时使用的设备信息、所述D v为用户v发送弹幕时使用的设备信息;所述w i(i=1,2,3,4)为
Figure PCTCN2018082163-appb-000002
标的数量。
When it is required to identify the user who sends the bullet information, the Ru is the number of live broadcasts that the user u sends the barrage within a preset time, and the R v is the live room where the user v sends the barrage within the preset time. number; I u is the user u of the user login information broadcast IP platform, the user v I v is a user login information in the IP broadcast platform; the users u D u is transmitted barrage The device information used at the time, the D v is the device information used when the user v sends the barrage; the w i (i=1, 2, 3, 4) is
Figure PCTCN2018082163-appb-000002
The number of targets.
上述方案中,所述根据预设的K值及任意两个用户在所述K近邻图中构成的边的权重值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和,包括:In the above solution, the determining, according to a preset K value and a weight value of an edge formed by any two users in the K-neighbor graph, determining that each remaining user in the K-neighbor graph is connected with the normal user The first weight sum of the edges, and the second weight sum of the edges formed by each remaining user connected to the abnormal user, including:
根据预设的K值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的K条边以及每个剩余用户与所述异常用户相连构成的K条边;Determining, according to a preset K value, K edges formed by each remaining user in the K-neighbor graph and the normal user, and K edges formed by each remaining user connected to the abnormal user;
根据任意两个用户在所述K近邻图中构成的边的权重值分别计算所述K近邻图中每个剩余用户与所述正常用户相连构成的K条边的第一权重和,以及每个剩余用户与所述异常用户相连构成的K条边的第二权重和。Calculating, according to weight values of edges of any two users in the K-neighbor graph, a first weight sum of K edges formed by each remaining user in the K-neighbor graph and the normal user, and each The second weight sum of the K edges formed by the remaining users connected to the abnormal user.
上述方案中,所述根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值,包括:In the foregoing solution, determining, according to the similarity value between any two users in the user sample, a weight value of an edge formed by any two users in the K-neighbor graph, including:
根据公式
Figure PCTCN2018082163-appb-000003
确定任意两个用户在所述K近邻图中构成的边的权重值;其中,所述s(u,v)为任意用户u和用户v之间的相似值,所述a为预设的常数,a取值为(0,1)。
According to the formula
Figure PCTCN2018082163-appb-000003
Determining a weight value of an edge formed by any two users in the K-nearest neighbor graph; wherein the s(u, v) is a similar value between any user u and user v, the a being a preset constant , a takes the value (0, 1).
上述方案中,所述根据所述第一权重和及所述第二权重和确定每个剩余用户的标签,包括:In the above solution, the determining, according to the first weight and the second weight, a label of each remaining user, including:
比较所述第一权重和及所述第二权重和,确定出最大的权重和;Comparing the first weight sum and the second weight sum to determine a maximum weight sum;
将所述每个剩余用户的标签与最大的权重和对应的用户的标签标记为一致。The labels of each of the remaining users are marked as being consistent with the largest weight and the label of the corresponding user.
本发明实施例还提供一种用于识别用户的装置,所述装置包括:An embodiment of the present invention further provides an apparatus for identifying a user, where the apparatus includes:
标记单元,用于为用户样本中的正常用户及异常用户分别标记不同的标签,所述用户样本中的用户包括:正常用户、异常用户及剩余用户;a marking unit, configured to mark different labels for the normal user and the abnormal user in the user sample, where the users in the user sample include: a normal user, an abnormal user, and a remaining user;
第一确定单元,用于根据每个用户行为发生对应的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中登陆的IP信息确定所述用户样本中任意两个用户之间的相似值;a first determining unit, configured to determine, according to the quantity information of the live broadcast corresponding to each user behavior, the device information used, and the IP information that each user logs in in the live broadcast platform, determine between any two users in the user sample. Similar value
构建单元,用于利用K近邻算法为用户样本中的所有用户构建K近邻图;a building unit for constructing a K-nearest neighbor graph for all users in the user sample by using a K-nearest neighbor algorithm;
第二确定单元,用于根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值;a second determining unit, configured to determine, according to a similarity value between any two users in the user sample, a weight value of an edge formed by any two users in the K-nearest neighbor graph;
第三确定单元,用于根据预设的K值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和;a third determining unit, configured to determine, according to a preset K value, a first weight sum of edges of each remaining user in the K-neighbor graph connected to the normal user, and each remaining user and the abnormal user a second weight sum of the connected edges;
第四确定单元,用于根据所述第一权重和及所述第二权重和确定每个剩余用户的标签;a fourth determining unit, configured to determine, according to the first weight sum and the second weight, a label of each remaining user;
识别单元,用于根据用户的标签对每个用户进行识别。The identification unit is configured to identify each user according to the user's label.
本发明还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现以下步骤:The present invention also provides a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the following steps:
为用户样本中的正常用户及异常用户分别标记不同的标签,所述用户样本包括:正常用户、异常用户及剩余用户;Labeling different users for normal users and abnormal users in the user sample, the user samples include: normal users, abnormal users, and remaining users;
根据每个用户行为发生对应的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中登陆的IP信息确定所述用户样本中任意两个用户之间的相似值;Determining a similarity value between any two users in the user sample according to the corresponding live broadcast quantity information, the used device information, and the IP information that each user logs in in the live broadcast platform;
利用K近邻算法为用户样本中的所有用户构建K近邻图;Constructing a K-nearest neighbor graph for all users in the user sample using the K-nearest neighbor algorithm;
根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值;Determining, according to a similarity value between any two users in the user sample, a weight value of an edge formed by any two users in the K-nearest neighbor graph;
根据预设的K值及任意两个用户在所述K近邻图中构成的边的权重值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和;Determining, according to a preset K value and a weight value of an edge formed by any two users in the K-nearest neighbor graph, a first weight of an edge formed by each remaining user in the K-neighbor graph connected to the normal user And a second weight sum of edges of each remaining user connected to the abnormal user;
根据所述第一权重和及所述第二权重和确定每个剩余用户的标签;Determining a label of each remaining user according to the first weight sum and the second weight sum;
根据用户的标签对每个用户进行识别。Each user is identified based on the user's tag.
本发明还提供一种用于识别弹幕用户的计算机设备,包括:The invention also provides a computer device for identifying a user of a barrage, comprising:
至少一个处理器;以及At least one processor;
与所述处理器通信连接的至少一个存储器,其中,At least one memory communicatively coupled to the processor, wherein
所述存储器存储有可被所述处理器执行的程序指令,所述处理器调用所述程序指令能够执行上述任一所述的方法。The memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of the above.
本发明实施例提供了一种用于识别用户的方法、装置及计算机设备,所述方法包括:为用户样本中的正常用户及异常用户分别标记不同的标签,所述用户样本中的用户包括:正常用户、异常用户及剩余用户;根据每个用户行为发生对应的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中登陆的IP信息确定所述用户样本中任意两个用户之间的相似值;利用K近邻算法为用户样本中的所有用户构建K近邻图;根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值;根据预设的K值及任意两个用户在所述K近邻图中构成的边的权重值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和;根据所述第一权重和及所述第二权重和确定每个剩余用户的标签;根据用户的标签对每个用户进行识别;如此,为用户样本中的正常用户及异常用户分别标记不同的标签;由于还不能识别出剩余用户中的正常用户及非正常用户,所以再根据K近邻算法为用户样本中的所有用户构建K近邻图,根据用户样本中任意两个用户之间的相似值计算,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的K条边的第一权重和,以及每个剩余用户与所述异常用户相连构成的K条边的第二权重和;再根据所述第一权重和及所述第二权重和确定每个剩余用户的标签,最后可以根据每个用户的标签对每个用户进行识别;这样就可以精 确地识别出直播平台中的非正常用户,保障直播平台的秩序。The embodiment of the present invention provides a method, an apparatus, and a computer device for identifying a user. The method includes: marking different labels for a normal user and an abnormal user in a user sample, where the user in the user sample includes: a normal user, an abnormal user, and a remaining user; determining the number of live rooms corresponding to each user behavior, the device information used, and the IP information of each user logging in the live platform to determine any two users in the user sample. a similarity value between the two; constructing a K-nearest neighbor graph for all users in the user sample by using a K-nearest neighbor algorithm; determining that any two users are in the K-nearest neighbor graph according to similarity values between any two users in the user sample a weight value of the edge; determining, according to a preset K value and a weight value of an edge formed by any two users in the K-nearest neighbor graph, each remaining user in the K-neighbor graph is connected with the normal user a first weight sum of the edges, and a second weight sum of edges of each remaining user connected to the abnormal user; according to the first weight sum a second weight and a label for each remaining user; each user is identified according to the user's tag; thus, different tags are marked for the normal user and the abnormal user in the user sample; since the remaining users are not yet identified Normal user and abnormal user, so K-nearest neighbor graph is constructed for all users in the user sample according to the K-nearest neighbor algorithm, and each remaining in the K-neighbor graph is determined according to the similarity value calculation between any two users in the user sample. a first weight sum of K edges formed by the user connected to the normal user, and a second weight sum of K edges formed by each remaining user connected to the abnormal user; and according to the first weight sum The second weight is determined and the label of each remaining user is determined. Finally, each user can be identified according to the label of each user; thus, the abnormal user in the live broadcast platform can be accurately identified, and the order of the live broadcast platform can be guaranteed.
附图说明DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those skilled in the art from a The drawings are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the drawing:
图1为本发明实施例提供的用于识别用户的方法流程示意图;FIG. 1 is a schematic flowchart of a method for identifying a user according to an embodiment of the present invention;
图2为本发明实施例提供的用于识别用户的装置结构示意图;2 is a schematic structural diagram of an apparatus for identifying a user according to an embodiment of the present invention;
图3为本发明实施例提供的用于识别用户的计算机设备结构示意图。FIG. 3 is a schematic structural diagram of a computer device for identifying a user according to an embodiment of the present invention.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the embodiments of the present invention have been shown in the drawings, the embodiments Rather, these embodiments are provided so that this disclosure will be more fully understood and the scope of the disclosure will be fully disclosed.
下面通过附图及具体实施例对本发明的技术方案做进一步的详细说明。The technical solution of the present invention will be further described in detail below through the accompanying drawings and specific embodiments.
本实施例提供一种用于识别用户的方法,应用在直播平台中,如图1所示,所述方法包括:The embodiment provides a method for identifying a user, which is applied to a live broadcast platform. As shown in FIG. 1 , the method includes:
S110,为用户样本中的正常用户及异常用户分别标记不同的标签。S110, marking different labels for normal users and abnormal users in the user sample.
本步骤中,所述用户样本中的用户包括:正常用户、异常用户及剩余用户。所述用户样本可以是直播平台中预设时间内登陆的用户,本实施例中的用户样本是直播平台中每天统计的用户样本。In this step, the users in the user sample include: a normal user, an abnormal user, and a remaining user. The user sample may be a user that logs in within a preset time in the live broadcast platform. The user sample in this embodiment is a user sample that is counted daily in the live broadcast platform.
这里,可以先利用预设的识别规则从所述用户样本中识别出正常用户及异常用户,并为用户样本中的正常用户及异常用户分别标记不同的标签。但是该识别规则不能识别出用户样本中的全部正常用户及异常用户,识别出的正常用户及异常用户只是用户样本中的一部分用户,因此还有一些剩余用户。Here, the normal user and the abnormal user may be identified from the user sample by using a preset identification rule, and different labels are respectively marked for the normal user and the abnormal user in the user sample. However, the identification rule cannot identify all normal users and abnormal users in the user sample, and the identified normal users and abnormal users are only a part of users in the user sample, and thus there are some remaining users.
具体地,对正常用户进行识别及标记时,获取所述用户样本中每个用户在所述直播平台中的行为日志记录及用户等级;若所述用户的行为日志记录正常或用户等级为高等级用户时,则确定所述用户为正常用户;为所述正常用户标记第一标签,所述第一标签可以用数字或字母或其他标识,在此不做限制。本实施例中的第一标签是以数字表示的,比如1。Specifically, when the normal user is identified and marked, the behavior log record and the user level of each user in the user sample in the live broadcast platform are obtained; if the user's behavior log record is normal or the user level is a high level The user is determined to be a normal user; the first user is marked with a first label, and the first label may be identified by a number or a letter or other, and is not limited herein. The first label in this embodiment is represented by a number, such as 1.
比如以用户的行为日志记录进行识别时,有些用户在发送弹幕或者进行其他触发动作时,行为日志记录中会有相应的打点记录,而如果日志记录中有相 应的打点记录,也触发了相应的动作,比如发送了弹幕信息,那么就表示该用户是正常用户。For example, when the user's behavior log record is used for identification, some users will have corresponding dot records in the behavior log record when sending the barrage or other trigger actions, and if there is a corresponding dot record in the log record, the corresponding trigger is also triggered. The action, such as sending a barrage message, indicates that the user is a normal user.
这里,以用户的等级进行识别时,是以基准等级5级进行判断的,当用户等级大于5级,则确定用户为高等级用户;当用户等级小于5级,则确定用户为低等级用户;比如当确定用户等级为10级时,确定该用户为高等级用户;当用户等级为1级时,确定该用户为低等级用户。Here, when the user's level is identified, it is determined by the reference level 5, when the user level is greater than 5, the user is determined to be a high-level user; when the user level is less than 5, the user is determined to be a low-level user; For example, when it is determined that the user level is 10, the user is determined to be a high-level user; when the user level is 1 level, the user is determined to be a low-level user.
对正常用户进行识别及标记时,获取所述用户样本中每个用户在所述直播平台中的登陆账号及该账号对应的设备ID;若所述登陆账号对应多个相同的设备ID,则确定所述登陆账号对应的用户为异常用户;为所述异常用户标记第二标签。所述第二标签可以用数字或字母或其他标识,在此不做限制。本实施例中的第二标签是以数字表示的,比如0。When the normal user is identified and marked, the login account of each user in the live broadcast platform and the device ID corresponding to the account are obtained; if the login account corresponds to multiple identical device IDs, then The user corresponding to the login account is an abnormal user; and the abnormal user is marked with a second tag. The second label may be identified by a number or a letter or other, and is not limited herein. The second label in this embodiment is represented by a number, such as zero.
比如,在发送弹幕信息时,多个用户利用不同设备使用同一个账号发送弹幕信息,则确定该账号对应的用户是异常用户。For example, when sending the barrage information, multiple users use different devices to send the barrage information using the same account, and then determine that the user corresponding to the account is an abnormal user.
S111,根据每个用户行为发生对应的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中的登陆互联网协议IP信息确定所述用户样本中任意两个用户之间的相似值;S111. Determine, according to the quantity information of the live broadcast corresponding to each user behavior, the device information used, and the login Internet Protocol IP information of each user in the live broadcast platform, determine a similarity value between any two users in the user sample. ;
本步骤中,根据每个用户行为发生对应的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中的登陆互联网协议IP信息确定所述用户样本中任意两个用户之间的相似值。这里,每个用户通过在设备上登录直播平台时,直播平台都会为每个设备生成唯一的设备标识。In this step, the number of live broadcasts corresponding to each user behavior, the device information used, and the login Internet Protocol IP information of each user in the live broadcast platform determine between any two users in the user sample. Similar values. Here, each user generates a unique device identifier for each device by logging in to the live platform on the device.
那么就可以根据公式(1)确定出用户样本中任意两个用户之间的相似值:Then you can determine the similarity between any two users in the user sample according to formula (1):
Figure PCTCN2018082163-appb-000004
Figure PCTCN2018082163-appb-000004
公式(1)中,当需要对发送弹幕信息的用户进行识别时,那么所述R u为用户u在预设时间内发送弹幕的直播间数量、所述R v为用户v在预设时间内发送弹幕的直播间数量;所述I u为用户u用户在所述直播平台中的登陆IP信息、所述I v为用户v用户在所述直播平台中的登陆IP信息;所述D u为用户u发送弹幕时使用的设备信息、所述D v为用户v发送弹幕时使用的设备信息;
Figure PCTCN2018082163-appb-000005
述N为特征指标的数量;所述与发送弹幕相关的特征指标可以包括:预设时间段内发送弹幕的次数、发送弹幕的间隔时间等。其中,所述设备信息是每个 设备的ID,当每个设备访问直播平台时,直播平台会为每个设备生成一个唯一的ID。
In formula (1), when it is necessary to identify the user who sends the bullet information, then the R u is the number of live broadcasts that the user u sends the barrage within the preset time, and the R v is the user v at the preset. number of live transmission time bomb curtain; I u is the user u of the user login information broadcast IP platform, the user v I v is a user login information in the IP broadcast platform; the D u is the device information used when the user u sends the barrage, and the D v is the device information used when the user v sends the barrage;
Figure PCTCN2018082163-appb-000005
The N is the number of the feature indicators; the feature indicators related to the sending of the barrage may include: the number of times the barrage is transmitted within the preset time period, the interval between the transmission of the barrage, and the like. The device information is an ID of each device. When each device accesses the live broadcast platform, the live broadcast platform generates a unique ID for each device.
这里,当需要其他类型用户进行识别时,上述参数对应其他类型的用户进行适应调整即可。比如其他类型的用户还可以包括:发送礼物的用户等。Here, when other types of users are required for identification, the above parameters may be adapted to other types of users. For example, other types of users may also include: users who send gifts, and the like.
S112,利用K近邻算法为用户样本中的所有用户构建K近邻图;根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值;S112. Construct a K-nearest neighbor graph for all users in the user sample by using a K-nearest neighbor algorithm; determine weights of edges formed by any two users in the K-nearest neighbor graph according to similarity values between any two users in the user sample. value;
当确定出所述用户样本中任意两个用户之间的相似值后,利用K近邻算法为用户样本中的所有用户构建K近邻图,在K近邻图中每个用户相当于一个节点,两个节点之间构成一条边,用户之间的相似值就代表了节点间的关联关系。After determining the similarity value between any two users in the user sample, the K-nearest neighbor algorithm is used to construct a K-nearest neighbor graph for all users in the user sample. In the K-nearest neighbor graph, each user is equivalent to one node, two An edge is formed between nodes, and the similar value between users represents the relationship between nodes.
那么就可以根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值。具体实现方法如公式(2)所示:Then, the weight value of the edge formed by any two users in the K-nearest neighbor graph can be determined according to the similarity value between any two users in the user sample. The specific implementation method is shown in formula (2):
Figure PCTCN2018082163-appb-000006
Figure PCTCN2018082163-appb-000006
公式(2)中,所述s(u,v)为任意用户u和用户v之间的相似值,所述a为预设的常数,a取值为(0,1)。In the formula (2), the s(u, v) is a similar value between any user u and user v, the a is a preset constant, and a is a value of (0, 1).
需要说明的是,在计算不同用户之间的构成的边的权重值时,所述a的取值是相同的。It should be noted that when calculating the weight value of the constituent side between different users, the value of the a is the same.
S113,根据预设的K值及任意两个用户在所述K近邻图中构成的边的权重值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和;S113. Determine, according to a preset K value and a weight value of an edge formed by any two users in the K-neighbor graph, a first edge of each side of the K-neighbor graph connected with the normal user. a sum of weights, and a second weight sum of edges of each remaining user connected to the abnormal user;
确定出任意两个用户在所述K近邻图中构成的边的权重值后,根据预设的K值及任意两个用户在所述K近邻图中构成的边的权重值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和。Determining the weight value of the edge formed by any two users in the K-nearest neighbor graph, determining the K according to a preset K value and a weight value of an edge formed by any two users in the K-nearest neighbor graph The first weight sum of the edges formed by each remaining user connected to the normal user in the neighbor graph, and the second weight sum of the edges formed by each remaining user connected to the abnormal user.
具体地,若在述K近邻图中将所有的节点进行连接,那么K近邻图会太复杂,因此会预设一个K值,根据预设的K值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的K条边以及每个剩余用户与所述异常用户相连构成的K条边;Specifically, if all the nodes are connected in the K-nearest neighbor graph, the K-nearest neighbor graph will be too complicated, so a K value is preset, and each remaining in the K-nearest neighbor graph is determined according to the preset K value. a K edge formed by the user connected to the normal user and a K edge formed by each remaining user connected to the abnormal user;
根据任意两个用户在所述K近邻图中构成的边的权重值分别计算所述K近邻图中每个剩余用户与所述正常用户相连构成的K条边的第一权重和,以及每个剩余用户与所述异常用户相连构成的K条边的第二权重和。Calculating, according to weight values of edges of any two users in the K-neighbor graph, a first weight sum of K edges formed by each remaining user in the K-neighbor graph and the normal user, and each The second weight sum of the K edges formed by the remaining users connected to the abnormal user.
本实施例中将K值设置为30,即每个节点只与相似值最大的30个节点相连,形成30条边。In this embodiment, the K value is set to 30, that is, each node is only connected to 30 nodes with the largest similar value, forming 30 edges.
比如当K值为3时,剩余节点包括A节点,相似值最大的有B节点、C节点及D节点;A节点和B节点构成的边的权重为0.4,A节点和C节点构成的边的权重为0.3,A节点和D节点构成的边的权重为0.1;B节点和D节点的标签为1,C节点的标签为0;那么第一权重之和为0.5,第二权重之和为0.3。For example, when the K value is 3, the remaining nodes include the A node, and the B, C, and D nodes have the largest similarity values; the edges of the A and B nodes have a weight of 0.4, and the edges of the A and C nodes The weight is 0.3, the weight of the edge formed by the A node and the D node is 0.1; the label of the B node and the D node is 1, and the label of the C node is 0; then the sum of the first weight is 0.5, and the sum of the second weight is 0.3. .
S114,根据所述第一权重和及所述第二权重和确定每个剩余用户的标签;S114. Determine a label of each remaining user according to the first weight sum and the second weight sum;
当第一权重和及所述第二权重和确定出之后,根据所述第一权重和及所述第二权重和确定每个剩余用户的标签。After the first weight sum and the second weight sum are determined, a label of each remaining user is determined according to the first weight sum and the second weight sum.
具体地,比较所述第一权重和及所述第二权重和,确定出最大的权重和;将所述每个剩余用户的标签与最大的权重和对应的用户的标签标记为一致。Specifically, the first weight and the second weight sum are compared to determine a maximum weight sum; the label of each remaining user is marked as being consistent with the largest weight and the label of the corresponding user.
还是以上述步骤中的A节点为例,由于第一权重之和为0.5,第二权重之和为0.3。那么A节点的标签应该和正常用户的标签一致,即A节点的标签为1。Taking the A node in the above steps as an example, since the sum of the first weights is 0.5, the sum of the second weights is 0.3. Then the label of the A node should be the same as the label of the normal user, that is, the label of the A node is 1.
除原始节点外,其他的节点按照上述过程不断进行迭代标记,直至所有的节点的标签状态不再发生变化,即对剩余用户完成了标记。所述原始节点是指用户样本中正常用户与异常用户对应的节点。In addition to the original node, other nodes are iteratively marked according to the above process until the label status of all nodes no longer changes, that is, the remaining users are marked. The original node refers to a node corresponding to an abnormal user in a user sample.
S115,根据用户的标签对每个用户进行识别。S115, each user is identified according to the user's label.
对剩余用户标记上标签后,可以根据用户的标签对每个用户进行识别。这样可以找出每个具有可能的异常用户,保证了直播平台的运行秩序。After tagging the remaining users, each user can be identified based on the user's tag. In this way, we can find out every possible abnormal user and ensure the order of the live broadcast platform.
实施例二Embodiment 2
相应于实施例一,本实施例提供一种识别用户的装置,如图2所示,所述装置包括:标记单元21、第一确定单元22、构建单元23、第二确定单元24、第三确定单元25、第四确定单元26及识别单元27;其中,Corresponding to the first embodiment, the embodiment provides a device for identifying a user. As shown in FIG. 2, the device includes: a marking unit 21, a first determining unit 22, a building unit 23, a second determining unit 24, and a third a determining unit 25, a fourth determining unit 26, and an identifying unit 27; wherein
所述标记单元21用于为用户样本中的正常用户及异常用户分别标记不同的标签。所述用户样本中的用户还包括:剩余用户。所述用户样本可以是直播平台中预设时间内登陆的用户,本实施例中的用户样本是直播平台中每天统计的用户样本。The marking unit 21 is configured to respectively mark different labels for normal users and abnormal users in the user sample. The user in the user sample further includes: remaining users. The user sample may be a user that logs in within a preset time in the live broadcast platform. The user sample in this embodiment is a user sample that is counted daily in the live broadcast platform.
这里,所述标记单元21可以先利用预设的识别规则从所述用户样本中识别出正常用户及异常用户,并为用户样本中的正常用户及异常用户分别标记不同的标签。但是该识别规则不能识别出用户样本中的全部正常用户及异常用户,识别出的正常用户及异常用户只是用户样本中的一部分用户,因此还有一些剩余用户。Here, the marking unit 21 may first identify a normal user and an abnormal user from the user sample by using a preset identification rule, and respectively mark different labels for the normal user and the abnormal user in the user sample. However, the identification rule cannot identify all normal users and abnormal users in the user sample, and the identified normal users and abnormal users are only a part of users in the user sample, and thus there are some remaining users.
具体地,所述标记单元21对正常用户进行识别及标记时,获取所述用户样本中每个用户在所述直播平台中的行为日志记录及用户等级;若所述用户的 行为日志记录正常或用户等级为高等级用户时,则确定所述用户为正常用户;为所述正常用户标记第一标签,所述第一标签可以用数字或字母或其他标识,在此不做限制。本实施例中的第一标签是以数字表示的,比如1。Specifically, when the marking unit 21 identifies and marks a normal user, the behavior log record and the user level of each user in the user sample in the live broadcast platform are acquired; if the behavior log record of the user is normal or When the user level is a high-level user, the user is determined to be a normal user; the first user is marked with a first label, and the first label may be identified by a number or a letter or other, and is not limited herein. The first label in this embodiment is represented by a number, such as 1.
比如以用户的行为日志记录进行识别时,有些用户在发送弹幕或者进行其他触发动作时,行为日志记录中会有相应的打点记录,而如果日志记录中有相应的打点记录,也触发了相应的动作,比如发送了弹幕信息,那么就表示该用户是正常用户。For example, when the user's behavior log record is used for identification, some users will have corresponding dot records in the behavior log record when sending the barrage or other trigger actions, and if there is a corresponding dot record in the log record, the corresponding trigger is also triggered. The action, such as sending a barrage message, indicates that the user is a normal user.
这里,以用户的等级进行识别时,是以基准等级5级进行判断的,当用户等级大于5级,则确定用户为高等级用户;当用户等级小于5级,则确定用户为低等级用户;比如当确定用户等级为10级时,确定该用户为高等级用户;当用户等级为1级时,确定该用户为低等级用户。Here, when the user's level is identified, it is determined by the reference level 5, when the user level is greater than 5, the user is determined to be a high-level user; when the user level is less than 5, the user is determined to be a low-level user; For example, when it is determined that the user level is 10, the user is determined to be a high-level user; when the user level is 1 level, the user is determined to be a low-level user.
所述标记单元21对正常用户进行识别及标记时,获取所述用户样本中每个用户在所述直播平台中的登陆账号及该账号对应的设备ID;若所述登陆账号对应多个相同的设备ID,则确定所述登陆账号对应的用户为异常用户,为所述异常用户标记第二标签。所述第二标签可以用数字或字母或其他标识,在此不做限制。本实施例中的第二标签是以数字表示的,比如0。When the marking unit 21 identifies and marks the normal user, the login account of each user in the live sample platform and the device ID corresponding to the account are obtained; if the login account corresponds to multiple identical The device ID determines that the user corresponding to the login account is an abnormal user, and marks the second user with the abnormal user. The second label may be identified by a number or a letter or other, and is not limited herein. The second label in this embodiment is represented by a number, such as zero.
比如,在发送弹幕信息时,多个用户利用不同设备使用同一个账号发送弹幕信息,则确定该账号对应的用户是异常用户。For example, when sending the barrage information, multiple users use different devices to send the barrage information using the same account, and then determine that the user corresponding to the account is an abnormal user.
然后所述第一确定单元22根据每个用户行为发生对应的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中的登陆互联网协议IP信息确定所述用户样本中任意两个用户之间的相似值。这里,每个用户通过在设备上登录直播平台时,直播平台都会为每个设备生成唯一的设备标识。Then, the first determining unit 22 determines any two of the user samples according to the corresponding live broadcast quantity information, the used device information, and the login Internet Protocol IP information of each user in the live broadcast platform. Similar values between users. Here, each user generates a unique device identifier for each device by logging in to the live platform on the device.
具体地,所述第一确定单元22根据公式(1)确定出用户样本中任意两个用户之间的相似值:Specifically, the first determining unit 22 determines a similar value between any two users in the user sample according to formula (1):
Figure PCTCN2018082163-appb-000007
Figure PCTCN2018082163-appb-000007
公式(1)中,当需要对发送弹幕信息的用户进行识别时,那么所述R u为用户u在预设时间内发送弹幕的直播间数量、所述R v为用户v在预设时间内发送弹幕的直播间数量;所述I u为用户u用户在所述直播平台中的登陆IP信息、所述I v为用户v用户在所述直播平台中的登陆IP信息;所述D u为用户u发送弹幕时使用的设备信息、所述D v为用户v发送弹幕时使用的设备信息;所述w i(i=1,2,3,4)为权重系数,且
Figure PCTCN2018082163-appb-000008
所述x ui为用户u的第i个与发送弹 幕相关的特征指标,所述x vi为用户v的第i个与发送弹幕相关的特征指标;所述N为特征指标的数量;所述与发送弹幕相关的特征指标可以包括:预设时间段内发送弹幕的次数、发送弹幕的间隔时间等。其中,所述设备信息是每个设备的ID,当每个设备访问直播平台时,直播平台会为每个设备生成一个唯一的ID。
In formula (1), when it is necessary to identify the user who sends the bullet information, then the R u is the number of live broadcasts that the user u sends the barrage within the preset time, and the R v is the user v at the preset. number of live transmission time bomb curtain; I u is the user u of the user login information broadcast IP platform, the user v I v is a user login information in the IP broadcast platform; the D u is the device information used when the user u sends the barrage, and the D v is the device information used when the user v sends the barrage; the w i (i=1, 2, 3, 4) is a weight coefficient, and
Figure PCTCN2018082163-appb-000008
The x ui is the i-th feature index related to the sending barrage of the user u, and the x vi is the i-th feature index related to the sending barrage of the user v; the N is the number of the feature indicators; The feature indicators related to the sending of the barrage may include: the number of times the barrage is transmitted within the preset time period, the interval between the transmission of the barrage, and the like. The device information is an ID of each device. When each device accesses the live broadcast platform, the live broadcast platform generates a unique ID for each device.
这里,当需要其他类型用户进行识别时,上述参数对应其他类型的用户进行适应调整即可。比如其他类型的用户还可以包括:发送礼物的用户等。Here, when other types of users are required for identification, the above parameters may be adapted to other types of users. For example, other types of users may also include: users who send gifts, and the like.
当第一确定单元22确定出所述用户样本中任意两个用户之间的相似值后,所述构建单元23用于利用K近邻算法为用户样本中的所有用户构建K近邻图,在K近邻图中每个用户相当于一个节点,两个节点之间构成一条边,用户之间的相似值就代表了节点间的关联关系。After the first determining unit 22 determines the similarity value between any two users in the user sample, the constructing unit 23 is configured to construct a K-nearest neighbor graph for all users in the user sample by using the K-nearest neighbor algorithm, in the K-nearest neighbor Each user in the figure is equivalent to one node, and an edge is formed between two nodes. The similarity value between users represents the relationship between nodes.
那么第二确定单元24就可以根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值。具体实现方法如公式(2)所示:Then, the second determining unit 24 may determine the weight value of the edge formed by any two users in the K-neighbor graph according to the similarity value between any two users in the user sample. The specific implementation method is shown in formula (2):
Figure PCTCN2018082163-appb-000009
Figure PCTCN2018082163-appb-000009
公式(2)中,所述s(u,v)为任意用户u和用户v之间的相似值,所述a为预设的常数,a取值为(0,1)。In the formula (2), the s(u, v) is a similar value between any user u and user v, the a is a preset constant, and a is a value of (0, 1).
需要说明的是,在计算不同用户之间的构成的边的权重值时,所述a的取值是相同的。It should be noted that when calculating the weight value of the constituent side between different users, the value of the a is the same.
当第二确定单元24确定出任意两个用户在所述K近邻图中构成的边的权重值后,所述第三确定单元25用于根据预设的K值及任意两个用户在所述K近邻图中构成的边的权重值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和。After the second determining unit 24 determines the weight value of the edge formed by any two users in the K-nearest neighbor graph, the third determining unit 25 is configured to use the preset K value and any two users according to the preset value. a weight value of an edge formed in the K-neighbor graph, determining a first weight sum of edges of each of the remaining neighbors in the K-neighbor graph and the normal user, and each remaining user is connected to the abnormal user The second weight of the side is.
具体地,若在述K近邻图中将所有的节点进行连接,那么K近邻图会太复杂,因此会预设一个K值,根据预设的K值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的K条边以及每个剩余用户与所述异常用户相连构成的K条边;Specifically, if all the nodes are connected in the K-nearest neighbor graph, the K-nearest neighbor graph will be too complicated, so a K value is preset, and each remaining in the K-nearest neighbor graph is determined according to the preset K value. a K edge formed by the user connected to the normal user and a K edge formed by each remaining user connected to the abnormal user;
根据任意两个用户在所述K近邻图中构成的边的权重值分别计算所述K近邻图中每个剩余用户与所述正常用户相连构成的K条边的第一权重和,以及每个剩余用户与所述异常用户相连构成的K条边的第二权重和。Calculating, according to weight values of edges of any two users in the K-neighbor graph, a first weight sum of K edges formed by each remaining user in the K-neighbor graph and the normal user, and each The second weight sum of the K edges formed by the remaining users connected to the abnormal user.
本实施例中将K值设置为30,即每个节点只与相似值最大的30个节点相连,形成30条边。In this embodiment, the K value is set to 30, that is, each node is only connected to 30 nodes with the largest similar value, forming 30 edges.
比如当K值为3时,剩余节点包括A节点,相似值最大的有B节点、C 节点及D节点;A节点和B节点构成的边的权重为0.4,A节点和C节点构成的边的权重为0.3,A节点和D节点构成的边的权重为0.1;B节点和D节点的标签为1,C节点的标签为0;那么第一权重之和为0.5,第二权重之和为0.3。For example, when the K value is 3, the remaining nodes include the A node, and the B, C, and D nodes with the largest similarity value; the edges of the A and B nodes have a weight of 0.4, and the A and C nodes form the edge. The weight is 0.3, the weight of the edge formed by the A node and the D node is 0.1; the label of the B node and the D node is 1, and the label of the C node is 0; then the sum of the first weight is 0.5, and the sum of the second weight is 0.3. .
当第一权重和及所述第二权重和确定出之后,所述第四确定单元26用于根据所述第一权重和及所述第二权重和确定每个剩余用户的标签。After the first weight sum and the second weight sum are determined, the fourth determining unit 26 is configured to determine a label of each remaining user according to the first weight sum and the second weight sum.
具体地,比较所述第一权重和及所述第二权重和,确定出最大的权重和;将所述每个剩余用户的标签与最大的权重和对应的用户的标签标记为一致。Specifically, the first weight and the second weight sum are compared to determine a maximum weight sum; the label of each remaining user is marked as being consistent with the largest weight and the label of the corresponding user.
还是以上述步骤中的A节点为例,由于第一权重之和为0.5,第二权重之和为0.3。那么A节点的标签应该和正常用户的标签一致,即A节点的标签为1。Taking the A node in the above steps as an example, since the sum of the first weights is 0.5, the sum of the second weights is 0.3. Then the label of the A node should be the same as the label of the normal user, that is, the label of the A node is 1.
除原始节点外,其他的节点按照上述过程不断进行迭代标记,直至所有的节点的标签状态不再发生变化,即对剩余用户完成了标记。所述原始节点是指用户样本中正常用户与异常用户对应的节点。In addition to the original node, other nodes are iteratively marked according to the above process until the label status of all nodes no longer changes, that is, the remaining users are marked. The original node refers to a node corresponding to an abnormal user in a user sample.
对剩余用户标记上标签后,所述识别单元27可以根据用户的标签对每个用户进行识别。这样可以找出每个具有可能的异常用户,保证了直播平台的运行秩序。After tagging the remaining users, the identification unit 27 can identify each user according to the user's tag. In this way, we can find out every possible abnormal user and ensure the order of the live broadcast platform.
实施例三Embodiment 3
本实施例还提供一种用于识别用户的计算机设备,如图3所示,所述计算机设备包括:射频(Radio Frequency,RF)电路310、存储器320、输入单元330、显示单元340、音频电路350、WiFi模块360、处理器370、以及电源380等部件。本领域技术人员可以理解,图3中示出的计算机设备结构并不构成对计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。The embodiment further provides a computer device for identifying a user. As shown in FIG. 3, the computer device includes: a radio frequency (RF) circuit 310, a memory 320, an input unit 330, a display unit 340, and an audio circuit. 350, WiFi module 360, processor 370, and power supply 380 and other components. Those skilled in the art will appreciate that the computer device architecture illustrated in FIG. 3 does not constitute a limitation to a computer device, and may include more or fewer components than those illustrated, or some components may be combined, or different component arrangements.
下面结合图3对计算机设备的各个构成部件进行具体的介绍:The specific components of the computer device will be specifically described below with reference to FIG. 3:
RF电路310可用于信号的接收和发送,特别地,将基站的下行信息接收后,给处理器370处理。通常,RF电路310包括但不限于至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。The RF circuit 310 can be used for receiving and transmitting signals, and in particular, receiving downlink information of the base station and processing it to the processor 370. Generally, RF circuit 310 includes, but is not limited to, at least one amplifier, transceiver, coupler, Low Noise Amplifier (LNA), duplexer, and the like.
存储器320可用于存储软件程序以及模块,处理器370通过运行存储在存储器320的软件程序以及模块,从而执行计算机设备的各种功能应用以及数据处理。存储器320可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据计算机设备的使用所创建的数据等。此外,存储器320可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或 其他易失性固态存储器件。The memory 320 can be used to store software programs and modules, and the processor 370 executes various functional applications and data processing of the computer devices by running software programs and modules stored in the memory 320. The memory 320 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function, and the like; the storage data area may store data created according to usage of the computer device, and the like. Further, the memory 320 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
输入单元330可用于接收输入的数字或字符信息,以及产生与计算机设备的用户设置以及功能控制有关的键信号输入。具体地,输入单元330可包括触控面板331以及其他输入设备332。触控面板331,可收集用户在其上的输入操作,并根据预先设定的程式驱动相应的连接装置。触控面板331采集到输出信息后再送给处理器370。除了触控面板331,输入单元330还可以包括其他输入设备332。具体地,其他输入设备332可以包括但不限于触控面板、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。The input unit 330 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the computer device. Specifically, the input unit 330 may include a touch panel 331 and other input devices 332. The touch panel 331 can collect input operations of the user and drive the corresponding connecting device according to a preset program. The touch panel 331 collects the output information and sends it to the processor 370. In addition to the touch panel 331, the input unit 330 may also include other input devices 332. Specifically, other input devices 332 may include, but are not limited to, one or more of a touch panel, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
显示单元340可用于显示由用户输入的信息或提供给用户的信息以及计算机设备的各种菜单。显示单元340可包括显示面板341,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板341。进一步的,触控面板331可覆盖显示面板341,当触控面板331检测到在其上或附近的触摸操作后,传送给处理器370以确定触摸事件的类型,随后处理器370根据输入事件的类型在显示面板341上提供相应的视觉输出。虽然在图3中触控面板331与显示面板341是作为两个独立的部件来实现计算机设备的输入和输入功能,但是在某些实施例中,可以将触控面板331与显示面板341集成而实现计算机设备的输入和输出功能。The display unit 340 can be used to display information input by the user or information provided to the user as well as various menus of the computer device. The display unit 340 can include a display panel 341. Alternatively, the display panel 341 can be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 331 can cover the display panel 341. When the touch panel 331 detects a touch operation on or near the touch panel 331, it transmits to the processor 370 to determine the type of the touch event, and then the processor 370 according to the input event. The type provides a corresponding visual output on display panel 341. Although the touch panel 331 and the display panel 341 are implemented as two separate components in FIG. 3 to implement input and input functions of the computer device, in some embodiments, the touch panel 331 may be integrated with the display panel 341. Implement the input and output functions of computer equipment.
音频电路350、扬声器351,传声器352可提供用户与计算机设备之间的音频接口。音频电路350可将接收到的音频数据转换后的电信号,传输到扬声器351,由扬声器351转换为声音信号输出;An audio circuit 350, a speaker 351, and a microphone 352 can provide an audio interface between the user and the computer device. The audio circuit 350 can transmit the converted electrical data of the received audio data to the speaker 351, and convert it into a sound signal output by the speaker 351;
WiFi属于短距离无线传输技术,计算机设备通过WiFi模块360可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图3示出了WiFi模块360,但是可以理解的是,其并不属于计算机设备的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。WiFi is a short-range wireless transmission technology. The computer device can help users to send and receive emails, browse web pages and access streaming media through the WiFi module 360. It provides users with wireless broadband Internet access. Although FIG. 3 shows the WiFi module 360, it can be understood that it does not belong to the essential configuration of the computer device, and may be omitted as needed within the scope of not changing the essence of the invention.
处理器370是计算机设备的控制中心,利用各种接口和线路连接整个计算机设备的各个部分,通过运行或执行存储在存储器320内的软件程序和/或模块,以及调用存储在存储器320内的数据,执行计算机设备的各种功能和处理数据,从而对计算机设备进行整体监控。可选的,处理器370可包括一个或多个处理单元;优选的,处理器370可集成应用处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等。 Processor 370 is a control center for computer devices that connects various portions of the entire computer device using various interfaces and lines, by running or executing software programs and/or modules stored in memory 320, and recalling data stored in memory 320. , performing various functions and processing data of the computer device, thereby performing overall monitoring of the computer device. Optionally, the processor 370 may include one or more processing units; preferably, the processor 370 may integrate an application processor, wherein the application processor mainly processes an operating system, a user interface, an application, and the like.
计算机设备还包括给各个部件供电的电源380(比如电源适配器),优选 的,电源可以通过电源管理系统与处理器370逻辑相连。The computer device also includes a power source 380 (such as a power adapter) that supplies power to the various components. Preferably, the power source can be logically coupled to the processor 370 via a power management system.
本发明实施例提供的用于识别用户的方法、装置及计算机设备能带来的有益效果至少是:The beneficial effects of the method, device and computer device for identifying a user provided by the embodiments of the present invention are at least:
本发明实施例提供了一种用于识别用户的方法、装置及计算机设备,所述方法包括:为用户样本中的正常用户及异常用户分别标记不同的标签,所述用户样本中的用户包括:正常用户、异常用户及剩余用户;根据每个用户行为发生对应的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中的登陆互联网协议IP信息确定所述用户样本中任意两个用户之间的相似值;利用K近邻算法为用户样本中的所有用户构建K近邻图;根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值;根据预设的K值及任意两个用户在所述K近邻图中构成的边的权重值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和;根据所述第一权重和及所述第二权重和确定每个剩余用户的标签;根据用户的标签对每个用户进行识别;如此,为用户样本中的正常用户及异常用户分别标记不同的标签;由于还不能识别出剩余用户中的正常用户及非正常用户,所以再根据K近邻算法为用户样本中的所有用户构建K近邻图,根据用户样本中任意两个用户之间的相似值计算,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的K条边的第一权重和,以及每个剩余用户与所述异常用户相连构成的K条边的第二权重和;再根据所述第一权重和及所述第二权重和确定每个剩余用户的标签,最后可以根据每个用户的标签对每个用户进行识别;这样就可以精确地识别出直播平台中的非正常用户,保障直播平台的秩序。The embodiment of the present invention provides a method, an apparatus, and a computer device for identifying a user. The method includes: marking different labels for a normal user and an abnormal user in a user sample, where the user in the user sample includes: a normal user, an abnormal user, and a remaining user; determining, according to the number of live broadcasts corresponding to each user behavior, the device information used, and the login Internet Protocol IP information of each user in the live broadcast platform, determining any two of the user samples. a similarity value between users; constructing a K-nearest neighbor graph for all users in the user sample by using a K-nearest neighbor algorithm; determining any two users in the K-nearest neighbor graph according to similar values between any two users in the user sample The weight value of the edge formed in the middle; determining, according to the preset K value and the weight value of the edge formed by any two users in the K-nearest neighbor graph, each remaining user in the K-neighbor graph is connected to the normal user a first weight sum of the constituent edges, and a second weight sum of edges of each remaining user connected to the abnormal user; according to the first Reconciling the second weight with the label of each remaining user; identifying each user according to the user's tag; thus, marking different tags for the normal user and the abnormal user in the user sample; The normal user and the abnormal user of the remaining users are out, so the K-nearest neighbor graph is constructed for all users in the user sample according to the K-nearest neighbor algorithm, and the K-nearest neighbor is determined according to the similarity value calculation between any two users in the user sample. a first weight sum of K edges formed by each remaining user connected to the normal user, and a second weight sum of K edges formed by each remaining user connected to the abnormal user; a weight sum and the second weight and determining a label of each remaining user, and finally each user can be identified according to the label of each user; thus, the abnormal user in the live broadcast platform can be accurately identified, and the live broadcast is guaranteed. The order of the platform.
在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本发明也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays provided herein are not inherently related to any particular computer, virtual system, or other device. Various general purpose systems can also be used with the teaching based on the teachings herein. The structure required to construct such a system is apparent from the above description. Moreover, the invention is not directed to any particular programming language. It is to be understood that the invention may be embodied in a variety of programming language, and the description of the specific language has been described above in order to disclose the preferred embodiments of the invention.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that the embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures, and techniques are not shown in detail so as not to obscure the understanding of the description.
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确 记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。Similarly, the various features of the invention are sometimes grouped together into a single embodiment, in the above description of the exemplary embodiments of the invention, Figure, or a description of it. However, the method disclosed is not to be interpreted as reflecting the intention that the claimed invention requires more features than those recited in the claims. Rather, as the following claims reflect, inventive aspects reside in less than all features of the single embodiments disclosed herein. Therefore, the claims following the specific embodiments are hereby explicitly incorporated into the embodiments, and each of the claims as a separate embodiment of the invention.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art will appreciate that the modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components. In addition to such features and/or at least some of the processes or units being mutually exclusive, any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined. Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
此外,本领域的技术人员能够理解,尽管在此的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。In addition, those skilled in the art will appreciate that although some embodiments herein include certain features included in other embodiments and not other features, combinations of features of different embodiments are intended to be within the scope of the present invention. And different embodiments are formed. For example, in the following claims, any one of the claimed embodiments can be used in any combination.
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP,Digital Signal Processing)来实现根据本发明实施例的网关、代理服务器、系统中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读存储介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供;该程序被处理器执行时实现以下步骤:为用户样本中的正常用户及异常用户分别标记不同的标签,所述用户样本中的用户包括:正常用户、异常用户及剩余用户;根据每个用户行为发生对应的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中的登陆互联网协议IP信息确定所述用户样本中任意两个用户之间的相似值;利用K近邻算法为用户样本中的所有用户构建K近邻图;根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值;根据预设的K值及任意两个用户在所述K近邻图中构成的边的权重值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第 一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和;根据所述第一权重和及所述第二权重和确定每个剩余用户的标签;根据用户的标签对每个用户进行识别。The various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or digital signal processing (DSP) may be used in practice to implement some of the gateways, proxy servers, some or all of the components in accordance with embodiments of the present invention. Or all features. The invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein. Such a program implementing the present invention may be stored on a computer readable storage medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form; when executed by the processor, the program implements the following steps: respectively marking different users and abnormal users in the user sample. a label, the user in the user sample includes: a normal user, an abnormal user, and a remaining user; a corresponding live broadcast quantity information, device information used, and a login Internet of each user in the live broadcast platform according to each user behavior The protocol IP information determines a similarity value between any two users in the user sample; constructs a K-nearest neighbor graph for all users in the user sample by using a K-nearest neighbor algorithm; according to similar values between any two users in the user sample Determining a weight value of an edge formed by any two users in the K-nearest neighbor graph; determining the K-nearest neighbor graph according to a preset K value and a weight value of an edge formed by any two users in the K-nearest neighbor graph a first weight sum of edges of each of the remaining users connected to the normal user, and each remaining user and the different A second side connected to the user right weight and configuration; and in accordance with the first weight and the second weight is determined and each of the remaining user tag; for each user based on the user identification tag.
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It is to be noted that the above-described embodiments are illustrative of the invention and are not intended to be limiting, and that the invention may be devised without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as a limitation. The word "comprising" does not exclude the presence of the elements or steps that are not recited in the claims. The word "a" or "an" The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means can be embodied by the same hardware item. The use of the words first, second, and third does not indicate any order. These words can be interpreted as names.
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above is only the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in Within the scope of protection of the present invention.

Claims (10)

  1. 一种用于识别用户的方法,其特征在于,应用在直播平台中,所述方法包括:A method for identifying a user, wherein the application is in a live broadcast platform, the method comprising:
    为用户样本中的正常用户及异常用户分别标记不同的标签,所述用户样本中的用户包括:正常用户、异常用户及剩余用户;Labeling different users and abnormal users in the user sample respectively, the users in the user samples include: normal users, abnormal users, and remaining users;
    根据每个用户行为发生对应的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中登陆的互联网协议IP信息确定所述用户样本中任意两个用户之间的相似值;Determining a similar value between any two users in the user sample according to the number of live broadcasts corresponding to each user behavior, the device information used, and the Internet Protocol IP information that each user logs in in the live broadcast platform;
    利用K近邻算法为用户样本中的所有用户构建K近邻图;Constructing a K-nearest neighbor graph for all users in the user sample using the K-nearest neighbor algorithm;
    根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值;Determining, according to a similarity value between any two users in the user sample, a weight value of an edge formed by any two users in the K-nearest neighbor graph;
    根据预设的K值及任意两个用户在所述K近邻图中构成的边的权重值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和;Determining, according to a preset K value and a weight value of an edge formed by any two users in the K-nearest neighbor graph, a first weight of an edge formed by each remaining user in the K-neighbor graph connected to the normal user And a second weight sum of edges of each remaining user connected to the abnormal user;
    根据所述第一权重和及所述第二权重和确定每个剩余用户的标签;Determining a label of each remaining user according to the first weight sum and the second weight sum;
    根据用户的标签对每个用户进行识别。Each user is identified based on the user's tag.
  2. 如权利要求1所述的方法,其特征在于,所述为用户样本中的正常用户及异常用户分别标记不同的标签,包括:The method according to claim 1, wherein the labeling different labels for the normal user and the abnormal user in the user sample respectively comprises:
    获取所述用户样本中每个用户在所述直播平台中的行为日志记录及用户等级;Obtaining a behavior log record and a user level of each user in the user sample in the live broadcast platform;
    若所述用户的行为日志记录正常或用户等级为高等级用户时,则确定所述用户为正常用户;If the user's behavior log record is normal or the user level is a high-level user, then the user is determined to be a normal user;
    为所述正常用户标记第一标签。Marking the first user for the normal user.
  3. 如权利要求1所述的方法,其特征在于,所述为用户样本中的正常用户及异常用户分别标记不同的标签,包括:The method according to claim 1, wherein the labeling different labels for the normal user and the abnormal user in the user sample respectively comprises:
    获取所述用户样本中每个用户在所述直播平台中的登陆账号以及所述登陆账号对应的设备标识ID;Obtaining a login account of each user in the live broadcast platform and a device identification ID corresponding to the login account;
    若所述登陆账号对应多个相同的设备ID,则确定所述登陆账号对应的用户为异常用户;If the login account corresponds to multiple identical device IDs, determine that the user corresponding to the login account is an abnormal user;
    为所述异常用户标记第二标签。Marking the second user with the abnormal user.
  4. 如权利要求1所述的方法,其特征在于,所述根据每个用户行为发生的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中的登陆IP信息确定所述用户样本中任意两个用户之间的相似值,包括:The method according to claim 1, wherein the determining the user sample according to the number of live broadcasts generated by each user behavior, the device information used, and the login IP information of each user in the live broadcast platform Similar values between any two users, including:
    根据公式
    Figure PCTCN2018082163-appb-100001
    确定任意用户u和用户v之间的相似值s(u,v);其中,
    According to the formula
    Figure PCTCN2018082163-appb-100001
    Determining the similarity value s(u,v) between any user u and user v;
    当需要对发送弹幕信息的用户进行识别时,所述R u为用户u在预设时间内发送弹幕的直播间数量、所述R v为用户v在预设时间内发送弹幕的直播间数量;所述I u为用户u用户在所述直播平台中的登陆IP信息、所述I v为用户v用户在所述直播平台中的登陆IP信息;所述D u为用户u发送弹幕时使用的设备信息、所述D v为用户v发送弹幕时使用的设备信息;所述w i(i=1,2,3,4)为
    Figure PCTCN2018082163-appb-100002
    标的数量。
    When it is necessary to identify the user who sends the bullet information, the R u is the number of live broadcasts that the user u sends the barrage within the preset time, and the R v is the live broadcast of the user v transmitting the barrage within the preset time. The number is the number of times; the I u is the login IP information of the user u user in the live broadcast platform, and the I v is the login IP information of the user v user in the live broadcast platform; the D u is the user u sending the bullet The device information used in the screen, the D v is the device information used when the user v sends the barrage; the w i (i=1, 2, 3, 4) is
    Figure PCTCN2018082163-appb-100002
    The number of targets.
  5. 如权利要求1所述的方法,其特征在于,所述根据预设的K值及任意两个用户在所述K近邻图中构成的边的权重值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和,包括:The method according to claim 1, wherein said determining each remaining in said K-neighbor graph according to a preset K value and a weight value of an edge formed by any two users in said K-nearest neighbor graph a first weight sum of edges formed by the user connected to the normal user, and a second weight sum of edges formed by each remaining user connected to the abnormal user, including:
    根据预设的K值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的K条边以及每个剩余用户与所述异常用户相连构成的K条边;Determining, according to a preset K value, K edges formed by each remaining user in the K-neighbor graph and the normal user, and K edges formed by each remaining user connected to the abnormal user;
    根据任意两个用户在所述K近邻图中构成的边的权重值分别计算所述K近邻图中每个剩余用户与所述正常用户相连构成的K条边的第一权重和,以及每个剩余用户与所述异常用户相连构成的K条边的第二权重和。Calculating, according to weight values of edges of any two users in the K-neighbor graph, a first weight sum of K edges formed by each remaining user in the K-neighbor graph and the normal user, and each The second weight sum of the K edges formed by the remaining users connected to the abnormal user.
  6. 如权利要求1所述的方法,其特征在于,所述根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值,包括:The method according to claim 1, wherein said determining, according to a similarity value between any two users in said user sample, weight values of edges of any two users in said K-neighbor graph, including :
    根据公式
    Figure PCTCN2018082163-appb-100003
    确定任意两个用户在所述K近邻图中构成的边的权重值;其中,所还s(u,v)为任意用户u和用户v之间的相似值,所述a为预设的常数,a取值为(0,1)。
    According to the formula
    Figure PCTCN2018082163-appb-100003
    Determining a weight value of an edge formed by any two users in the K-nearest neighbor graph; wherein, s(u, v) is a similar value between any user u and user v, and the a is a preset constant , a takes the value (0, 1).
  7. 如权利要求1所述的方法,其特征在于,所述根据所述第一权重和及所述第二权重和确定每个剩余用户的标签,包括:The method according to claim 1, wherein said determining a label of each remaining user based on said first weight sum and said second weight sum comprises:
    比较所述第一权重和及所述第二权重和,确定出最大的权重和;Comparing the first weight sum and the second weight sum to determine a maximum weight sum;
    将所述每个剩余用户的标签与最大的权重和对应的用户的标签标记为一致。The labels of each of the remaining users are marked as being consistent with the largest weight and the label of the corresponding user.
  8. 一种用于识别用户的装置,其特征在于,所述装置包括:A device for identifying a user, the device comprising:
    标记单元,用于为用户样本中的正常用户及异常用户分别标记不同的标签,所述用户样本中的用户包括:正常用户、异常用户及剩余用户;a marking unit, configured to mark different labels for the normal user and the abnormal user in the user sample, where the users in the user sample include: a normal user, an abnormal user, and a remaining user;
    第一确定单元,用于根据每个用户行为发生对应的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中登陆的IP信息确定所述用户样本中任意两个用户之间的相似值;a first determining unit, configured to determine, according to the quantity information of the live broadcast corresponding to each user behavior, the device information used, and the IP information that each user logs in in the live broadcast platform, determine between any two users in the user sample. Similar value
    构建单元,用于利用K近邻算法为用户样本中的所有用户构建K近邻图;a building unit for constructing a K-nearest neighbor graph for all users in the user sample by using a K-nearest neighbor algorithm;
    第二确定单元,用于根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值;a second determining unit, configured to determine, according to a similarity value between any two users in the user sample, a weight value of an edge formed by any two users in the K-nearest neighbor graph;
    第三确定单元,用于根据预设的K值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和;a third determining unit, configured to determine, according to a preset K value, a first weight sum of edges of each remaining user in the K-neighbor graph connected to the normal user, and each remaining user and the abnormal user a second weight sum of the connected edges;
    第四确定单元,用于根据所述第一权重和及所述第二权重和确定每个剩余用户的标签;a fourth determining unit, configured to determine, according to the first weight sum and the second weight, a label of each remaining user;
    识别单元,用于根据用户的标签对每个用户进行识别。The identification unit is configured to identify each user according to the user's label.
  9. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现以下步骤:A computer readable storage medium having stored thereon a computer program, wherein the program, when executed by the processor, implements the following steps:
    为用户样本中的正常用户及异常用户分别标记不同的标签,所述用户样本中的用户包括:正常用户、异常用户及剩余用户;Labeling different users and abnormal users in the user sample respectively, the users in the user samples include: normal users, abnormal users, and remaining users;
    根据每个用户行为发生对应的直播间数量信息、使用的设备信息以及每个用户在所述直播平台中登陆的IP信息确定所述用户样本中任意两个用户之间的相似值;Determining a similarity value between any two users in the user sample according to the corresponding live broadcast quantity information, the used device information, and the IP information that each user logs in in the live broadcast platform;
    利用K近邻算法为用户样本中的所有用户构建K近邻图;Constructing a K-nearest neighbor graph for all users in the user sample using the K-nearest neighbor algorithm;
    根据所述用户样本中任意两个用户之间的相似值确定任意两个用户在所述K近邻图中构成的边的权重值;Determining, according to a similarity value between any two users in the user sample, a weight value of an edge formed by any two users in the K-nearest neighbor graph;
    根据预设的K值及任意两个用户在所述K近邻图中构成的边的权重值,确定所述K近邻图中每个剩余用户与所述正常用户相连构成的边的第一权重和,以及每个剩余用户与所述异常用户相连构成的边的第二权重和;Determining, according to a preset K value and a weight value of an edge formed by any two users in the K-nearest neighbor graph, a first weight of an edge formed by each remaining user in the K-neighbor graph connected to the normal user And a second weight sum of edges of each remaining user connected to the abnormal user;
    根据所述第一权重和及所述第二权重和确定每个剩余用户的标签;Determining a label of each remaining user according to the first weight sum and the second weight sum;
    根据用户的标签对每个用户进行识别。Each user is identified based on the user's tag.
  10. 一种用于识别弹幕用户的计算机设备,其特征在于,包括:A computer device for identifying a user of a barrage, comprising:
    至少一个处理器;以及At least one processor;
    与所述处理器通信连接的至少一个存储器,其中,At least one memory communicatively coupled to the processor, wherein
    所述存储器存储有可被所述处理器执行的程序指令,所述处理器调用所述程序指令能够执行如权利要求1至7任一所述的方法。The memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1-7.
PCT/CN2018/082163 2018-01-08 2018-04-08 Method and apparatus for recognizing user, and computer device WO2019134284A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810015912.4 2018-01-08
CN201810015912.4A CN108184148B (en) 2018-01-08 2018-01-08 A kind of method, apparatus and computer equipment of user for identification

Publications (1)

Publication Number Publication Date
WO2019134284A1 true WO2019134284A1 (en) 2019-07-11

Family

ID=62550120

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/082163 WO2019134284A1 (en) 2018-01-08 2018-04-08 Method and apparatus for recognizing user, and computer device

Country Status (2)

Country Link
CN (1) CN108184148B (en)
WO (1) WO2019134284A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113810335A (en) * 2020-06-12 2021-12-17 武汉斗鱼鱼乐网络科技有限公司 Method and system for identifying target IP, storage medium and equipment

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255371B (en) * 2018-08-23 2021-06-15 武汉斗鱼网络科技有限公司 Method for determining false attention user of live broadcast platform and related equipment
CN111104551B (en) * 2019-11-25 2024-04-26 网易(杭州)网络有限公司 Live broadcast room label determining method and device, storage medium and electronic equipment
CN111882446B (en) * 2020-07-28 2023-05-16 哈尔滨工业大学(威海) Abnormal account detection method based on graph convolution network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100251291A1 (en) * 2009-03-24 2010-09-30 Pino Jr Angelo J System, Method and Computer Program Product for Processing Video Data
CN106228410A (en) * 2016-07-29 2016-12-14 武汉斗鱼网络科技有限公司 Virtual present task anti-brush system and method in a kind of live platform
CN106294800A (en) * 2016-08-16 2017-01-04 武汉斗鱼网络科技有限公司 Method and system recommended by direct broadcasting room based on weighting k neighbour scoring
CN106960042A (en) * 2017-03-29 2017-07-18 中国科学技术大学苏州研究院 Network direct broadcasting measure of supervision based on barrage semantic analysis
CN107222780A (en) * 2017-06-23 2017-09-29 中国地质大学(武汉) A kind of live platform comprehensive state is perceived and content real-time monitoring method and system
CN107481009A (en) * 2017-08-28 2017-12-15 广州虎牙信息科技有限公司 Identify that live platform supplements the method, apparatus and terminal of user with money extremely

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105915960A (en) * 2016-03-31 2016-08-31 广州华多网络科技有限公司 User type determination method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100251291A1 (en) * 2009-03-24 2010-09-30 Pino Jr Angelo J System, Method and Computer Program Product for Processing Video Data
CN106228410A (en) * 2016-07-29 2016-12-14 武汉斗鱼网络科技有限公司 Virtual present task anti-brush system and method in a kind of live platform
CN106294800A (en) * 2016-08-16 2017-01-04 武汉斗鱼网络科技有限公司 Method and system recommended by direct broadcasting room based on weighting k neighbour scoring
CN106960042A (en) * 2017-03-29 2017-07-18 中国科学技术大学苏州研究院 Network direct broadcasting measure of supervision based on barrage semantic analysis
CN107222780A (en) * 2017-06-23 2017-09-29 中国地质大学(武汉) A kind of live platform comprehensive state is perceived and content real-time monitoring method and system
CN107481009A (en) * 2017-08-28 2017-12-15 广州虎牙信息科技有限公司 Identify that live platform supplements the method, apparatus and terminal of user with money extremely

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113810335A (en) * 2020-06-12 2021-12-17 武汉斗鱼鱼乐网络科技有限公司 Method and system for identifying target IP, storage medium and equipment
CN113810335B (en) * 2020-06-12 2023-08-22 武汉斗鱼鱼乐网络科技有限公司 Method and system for identifying target IP, storage medium and equipment

Also Published As

Publication number Publication date
CN108184148B (en) 2019-10-22
CN108184148A (en) 2018-06-19

Similar Documents

Publication Publication Date Title
US11546270B2 (en) Methods and apparatus to throttle media access by web crawlers
WO2019134284A1 (en) Method and apparatus for recognizing user, and computer device
US20160241589A1 (en) Method and apparatus for identifying malicious website
US10489822B2 (en) Extending audience reach in messaging campaigns using probabilistic ID linking
CN104468463B (en) Verification method, device and system
CN103702297B (en) Short message enhancement, apparatus and system
CN110366727A (en) Multi signal analysis for damage range identification
CN110771126A (en) Matching and attribution of user equipment events
CN108200180B (en) Method and device for limiting request frequency and computer equipment
US20210288976A1 (en) Methods and apparatus to analyze network traffic for malicious activity
CN107204964A (en) A kind of methods, devices and systems of rights management
CN103763112A (en) User identity protection method and apparatus
KR101464448B1 (en) System and method for transmitting group message by using chatting application
CA3076319A1 (en) Systems and methods for device recognition
CN111510557B (en) Content processing method and electronic equipment
US10887409B2 (en) Determining geolocation of IP addresses using user transitions over time
WO2019024275A1 (en) Page loading method, device and computer equipment
CN108073613A (en) Method for sending information and device
WO2019095614A1 (en) Recommendation method and apparatus, and computer device
CN108804434A (en) A kind of message query method, server and terminal device
CN103414795A (en) Dynamic and intelligent DNS routing with subzones
CN108112016A (en) Wireless LAN safety appraisal procedure and device
CN112711518B (en) Log uploading method and device
CN110392064B (en) Risk identification method and device, computing equipment and computer readable storage medium
US20140136525A1 (en) Unique identification of users across multiple social and computer networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18898015

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18898015

Country of ref document: EP

Kind code of ref document: A1