CN113163218A - Method and system for detecting user in live broadcast room, electronic device and storage medium - Google Patents

Method and system for detecting user in live broadcast room, electronic device and storage medium Download PDF

Info

Publication number
CN113163218A
CN113163218A CN202110180459.4A CN202110180459A CN113163218A CN 113163218 A CN113163218 A CN 113163218A CN 202110180459 A CN202110180459 A CN 202110180459A CN 113163218 A CN113163218 A CN 113163218A
Authority
CN
China
Prior art keywords
live broadcast
user
class
users
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110180459.4A
Other languages
Chinese (zh)
Inventor
李益永
孙准
黄秋实
井雪
项伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bigo Technology Pte Ltd
Original Assignee
Bigo Technology Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bigo Technology Pte Ltd filed Critical Bigo Technology Pte Ltd
Priority to CN202110180459.4A priority Critical patent/CN113163218A/en
Publication of CN113163218A publication Critical patent/CN113163218A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44204Monitoring of content usage, e.g. the number of times a movie has been viewed, copied or the amount which has been watched
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/4508Management of client data or end-user data
    • H04N21/4532Management of client data or end-user data involving end-user characteristics, e.g. viewer profile, preferences

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Social Psychology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the invention provides a method and a system for detecting a user in a live broadcast room, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of first feature vectors of each user in a live broadcast room according to a detection rule, wherein the first feature vectors comprise public screen features related to screened sensitive words; inputting the first feature vector into a first network model, and correspondingly outputting a first detection result of each user; and if the first detection result meets the first detection condition, outputting the identification information of the user corresponding to the first detection result. The embodiment of the invention screens the general sensitive words, and improves the accuracy of the screened sensitive words. Whether the user is the loving childhood is scored by utilizing the plurality of first feature vectors and the first network model, whether the user is the loving childhood is judged according to a scoring result, whether the user is the loving childhood is judged by avoiding a single interactive behavior of speaking on a public screen, the user features of judging the loving childhood are enriched, and the accuracy of judging the loving childhood is improved.

Description

Method and system for detecting user in live broadcast room, electronic device and storage medium
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a method and a system for detecting a user in a live broadcast room, electronic equipment and a storage medium.
Background
Along with the operation of the user in the live platform, various interactive behaviors and association relations can be generated with other users in the live platform. Moreover, the interaction behavior and the association relationship are very complex, and especially the public screen speech and the context relationship are closely related.
Childbearing habits refer to those people who have strong and repetitive sexual impulses and fantasy for pre-pubertal children by adolescents or adults over the age of 16 and who have acted upon or suffered from such sexual impulses. At present, in order to detect whether a user in a live broadcast room is a child-loving habit, whether a public screen speech of the user contains a general sensitive word or not may be determined, and if the public screen speech contains the general sensitive word, the user is determined to be the child-loving habit. However, the accuracy of the general sensitive words is not high, and it is difficult to establish a strong logical relationship with the childhood preference for a single interaction behavior of sending a public screen speech, so that the accuracy of the existing scheme for detecting whether the user in the live broadcast room is the childhood preference is not high.
Disclosure of Invention
In view of this, the present invention provides a method and a system for detecting a user in a live broadcast room, an electronic device, and a storage medium, which solve the problem of low accuracy of the existing scheme for detecting whether a user in a live broadcast room is a child preference.
In a first aspect of the embodiments of the present invention, a method for detecting a user in a live broadcast room is provided, including: according to a preset detection rule, obtaining a plurality of first feature vectors of each user in a live broadcast room, wherein the first feature vectors comprise public screen features related to screened sensitive words; inputting the first feature vector to a trained first network model, and correspondingly outputting a first detection result of each user; and if the first detection result meets a preset first detection condition, outputting the identification information of the user corresponding to the first detection result.
In a second aspect of the embodiments of the present invention, a system for detecting a user in a live broadcast room is provided, including: the acquisition module is used for acquiring a plurality of first feature vectors of each user in a live broadcast room according to a preset detection rule, wherein the first feature vectors comprise public screen features related to the screened sensitive words; the input module is used for inputting the first feature vector to a trained first network model and correspondingly outputting a first detection result of each user; and the output module is used for outputting the identification information of the user corresponding to the first detection result if the first detection result meets a preset first detection condition.
In a third aspect of the embodiments of the present invention, there is provided an electronic device, including: a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the method of detecting a user in a live broadcast as described in the first aspect when executing the program.
A fourth aspect of embodiments of the present invention provides a computer-readable storage medium, where instructions are stored, and when the instructions are executed on a computer, the instructions cause the computer to perform the method for detecting a user in a live broadcast according to the first aspect.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
and acquiring a plurality of first characteristic vectors of each user in the live broadcast room according to a preset detection rule. Wherein the first feature vector contains the public screen features associated with the screened sensitive word. Namely, the first feature vector contains the public screen features related to the sensitive words obtained after the screening operation is carried out on the general sensitive words. And then, inputting the first feature vector to the trained first network model, and correspondingly outputting a first detection result of each user. And if the first detection result meets a preset first detection condition, outputting the identification information of the user corresponding to the first detection result. In the detection scene of the nodules of the amour, the output identification information can be the identification information of the nodules of the amour. When the method and the device for detecting the loving child addiction of the user in the live broadcast room carry out the loving child addiction detection, whether the common sensitive words are contained in the speech on the public screen or not is simply judged, a first feature vector corresponding to a detection rule is obtained, a first detection result of the user is output by using a trained first network model and the first feature vector, whether the first detection result meets a first detection condition or not is further judged, and if the first detection result meets the first detection condition, the user corresponding to the first detection result can be determined to be the loving child addiction. The embodiment of the invention screens the general sensitive words, and improves the accuracy of the screened sensitive words. And whether the user is the loving childhood is graded by utilizing the plurality of first feature vectors and the first network model, the first detection result can be understood as the grading result of the user, whether the user is the loving childhood is judged according to the grading result, whether the user is the loving childhood is judged by avoiding a single interactive behavior of speaking on a public screen, whether the user is the loving childhood is judged, the user features of judging the loving childhood are enriched, and therefore the accuracy of judging the loving childhood is improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic flowchart of a method for detecting a user in a live broadcast room according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a scheme for detecting a childhood amounting in a live broadcast room according to an embodiment of the present invention;
fig. 3 is a block diagram of a system for detecting a user in a live broadcast room according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 is a schematic flowchart of a method for detecting a user in a live broadcast room according to an embodiment of the present invention, and as shown in fig. 1, the method for detecting a user in a live broadcast room may specifically include the following steps.
Step 101, obtaining a plurality of first feature vectors of each user in a live broadcast room according to a preset detection rule.
In the embodiment of the present invention, a corresponding detection rule may be set for each user detection, where each detection rule may indicate whether a user in a live broadcast room is detected as an underage user, or whether a user in a live broadcast room is detected as a childhood. The content, format, and the like of the detection rule are not particularly limited in the embodiments of the present invention. In practical application, a corresponding detection rule can be set according to an actual detection requirement.
In the embodiment of the present invention, the obtained multiple first feature vectors of each user in the live broadcast room may correspond to a detection rule. For example, if the detection rule is rule a, a first feature vector m1 is obtained, and rule a corresponds to the first feature vector m 1. For another example, if the detection rule is rule b, the first feature vectors m1 and m2 are obtained, and rule b corresponds to the first feature vectors m1 and m 2.
The first feature vector in the embodiment of the present invention may include a public screen feature, where the public screen feature is used to represent a public screen utterance of a user in a live broadcast room, and the public screen utterance is related to the filtered sensitive words, which may be understood as that the public screen utterance includes the filtered sensitive words.
And 102, inputting the first feature vector to a trained first network model, and correspondingly outputting a first detection result of each user.
In the embodiment of the present invention, the acquired first feature vector is used as an input item of the first network model, and a first detection result is output from the first network model, where the first detection result is an output item of the first network model. The first detection result may represent a scoring result of detecting the user according to the detection rule.
Step 103, if the first detection result meets a preset first detection condition, outputting the identification information of the user corresponding to the first detection result.
In the embodiment of the invention, whether the first detection result meets the first detection condition is judged, if the first detection result meets the first detection condition, the user corresponding to the first detection result is considered to meet the user of the detection rule, and further, the identification information of the user corresponding to the first detection result is output. And if the first detection result does not meet the first detection condition, the user corresponding to the first detection result is considered to be not in accordance with the user of the detection rule, and the identification information of the user corresponding to the first detection result does not need to be output. In practical application, when judging whether the first detection result meets the first detection condition, comparing the scoring result represented by the first detection result with the scoring threshold represented by the first detection condition, and if the scoring result is greater than the scoring threshold, considering that the first detection result meets the first detection condition; and if the scoring result is less than or equal to the scoring threshold value, the first detection result is considered not to meet the first detection condition.
In the detection scheme of the user in the live broadcast room provided by the embodiment of the invention, a plurality of first feature vectors of each user in the live broadcast room are obtained according to a preset detection rule. Wherein the first feature vector contains the public screen features associated with the screened sensitive word. Namely, the first feature vector contains the public screen features related to the sensitive words obtained after the screening operation is carried out on the general sensitive words. And then, inputting the first feature vector to the trained first network model, and correspondingly outputting a first detection result of each user. And if the first detection result meets a preset first detection condition, outputting the identification information of the user corresponding to the first detection result. In the detection scene of the nodules of the amour, the output identification information can be the identification information of the nodules of the amour. When the method and the device for detecting the loving child addiction of the user in the live broadcast room carry out the loving child addiction detection, whether the common sensitive words are contained in the speech on the public screen or not is simply judged, a first feature vector corresponding to a detection rule is obtained, a first detection result of the user is output by using a trained first network model and the first feature vector, whether the first detection result meets a first detection condition or not is further judged, and if the first detection result meets the first detection condition, the user corresponding to the first detection result can be determined to be the loving child addiction. The embodiment of the invention screens the general sensitive words, and improves the accuracy of the screened sensitive words. And whether the user is the loving childhood is graded by utilizing the plurality of first feature vectors and the first network model, the first detection result can be understood as the grading result of the user, whether the user is the loving childhood is judged according to the grading result, whether the user is the loving childhood is judged by avoiding a single interactive behavior of speaking on a public screen, whether the user is the loving childhood is judged, the user features of judging the loving childhood are enriched, and therefore the accuracy of judging the loving childhood is improved.
The generic sensitive word will appear in the public screen speech of any user. Taking a scene of detecting the loving child as an example, the frequency of using the general sensitive word in the public screen speech of the normal user may be higher than that in the public screen speech of the loving child, and therefore, the accuracy of detecting whether the user is the loving child is not high. In an exemplary embodiment of the present invention, the general sensitive words may be subjected to a screening operation, specifically, a general sensitive word set is obtained first, and then a sensitive word screening operation is performed in a preset type of live broadcast room for each general sensitive word in the general sensitive word set. The preset type of live broadcast rooms can comprise underage live broadcast rooms and underage pornography type screening live broadcast rooms. Next, a description will be given of an example of a screening operation of a general sensitive word in an underage live broadcast room.
For each universal sensitive word, acquiring the first using times of the universal sensitive word in the first user set, the second using times in the second user set and the number of the universal sensitive words in a preset type of live broadcast (such as underage live broadcast). And then, screening the general sensitive words according to the first using times, the second using times, the quantity and a preset threshold value. The first set of users may be a set of love children, and the second set of users may be a set of users sampled from all users in an underage live room (a third set of users). If the total users in the minor live space (the third set of users) contains 11378162 users, 49000 users are sampled from the total users, i.e., the second set of users contains 49000 users, and the sampling ratio is 49000/11378162 and is about 232. The preset threshold may be determined according to actual needs, and may include but is not limited to: a first threshold (e.g., 15%), a second threshold (e.g., 5%), a third threshold (e.g., 10%), and a fourth threshold (e.g., 1%).
When the general sensitive words are screened according to the first using times, the second using times, the quantity and the preset threshold, the general sensitive words can be screened according to the first using times, the second using times, the quantity, the sampling ratio of the second user set to the third user set, the first threshold, the second threshold, the third threshold and the fourth threshold.
In practical application scenarios, the product of the second number of times of use and the sampling ratio is added to the first number of times of use to obtain a sum. For example, the first usage number is the usage number num1 of the general sensitive word in the user of the loving childhood, and the second usage number is the usage number num2 of the general sensitive word in the normal sampling user. The sum is num1+ num2 x 232. The first number of uses is divided by the sum to obtain a first ratio, which is num1/(num1+ num2 x 232). When the first ratio is greater than or equal to the first threshold, that is, num1/(num1+ num2 × 232) > is 15%, the generic sensitive word is a word used alone for the whole number of the limpness. And when the first ratio is larger than or equal to the second threshold and smaller than the first threshold, the number of the universal sensitive words in the minor live broadcast room is num3, and the first usage times are divided by the sum of the first usage times and the number to obtain a second ratio. I.e. the second ratio is num1/(num1+ num 3). If the second ratio is less than the third threshold, that is, num1/(num1+ num3) < 10%, the generic sensitive word is regarded as the suspect sensitive word. If the second ratio is greater than or equal to the third threshold, the generic sensitive word is evaluated against alone, i.e., manually reevaluated. When the first ratio is smaller than the second threshold, if the second ratio is greater than or equal to the fourth threshold and smaller than the third threshold, i.e. 1% < num1/(num1+ num3) < 10%, the generic sensitive word is regarded as the suspect sensitive word. And if the second ratio is larger than or equal to the second threshold value, the general sensitive word is independently subjected to trial evaluation. And if the second ratio is smaller than the fourth threshold, the general sensitive word is a non-loved child word.
Similarly, the screening operation of the general sensitive words can be performed in the minor pornography type screening live broadcast room, which is different from the above-mentioned screening operation of the general sensitive words in the minor pornography type screening live broadcast room in that the above-mentioned number is replaced by the number of the general sensitive words in the minor pornography type screening live broadcast room, and other screening steps can refer to the above-mentioned related description, and are not described herein again.
If the general sensitive words are not screened, the general sensitive words may have weak relevance with the childhood habits, and the effect of detecting whether the user is the childhood habits is not great. The whole single-use vocabulary for the child-loving habits can be used for judging whether all users are the child-loving habits, and if the public screen speech of the users contains the whole single-use vocabulary for the child-loving habits, the users are the child-loving habits. The non-loved child words are not used for judging whether the user is loved child. The suspicious sensitive words are used for judging whether the user is in the childhood gender in the minor live broadcast room or the minor pornographic type screening live broadcast room. If the suspicious sensitive words appear in the public screen speech of the minor live broadcast room or the minor pornographic type screening live broadcast room, the public screen speech of the minor live broadcast room or the minor pornographic type screening live broadcast room can be used as the public screen feature in the first feature vector.
In an exemplary embodiment of the invention, the first feature vector may comprise at least one of: the method comprises the following steps of paying attention to the total number of users, the number of users paying attention to a first class, the number of users paying attention to a second class, the number of times of watching a first class live broadcast, the total number of times of watching a live broadcast, the length of time of watching a second class live broadcast, the number of times of sending a first class public screen to a first class live broadcast room, the number of times of sending the first class public screen to the first class live broadcast room, the total number of times of sharing a live broadcast room, the number of times of sharing the second class live broadcast room, the total number of times of connecting to the live broadcast, the number of times of connecting to the live broadcast with the first class, the total number of times of delivering gifts, the number of times of delivering gifts to the first class, the number of times of sharing the second class of users, the number of times of connecting to the second class of users, the number of connecting to the live broadcast with the second class of users, The number of gift deliveries to the second type of user. In practical applications, the first feature vector may include a total number of concerns x1, a percentage of underage users x2, a percentage of underage pornographic users x3, a percentage of underage pornographic live broadcast times x4, a percentage of live broadcast times x5, a percentage of immature live broadcast time x6, a percentage of underage users live broadcast times x7, a percentage of immature pornographic type screening live broadcast rooms related to a pornographic public screen x8, a percentage of adult pornographic public screen sent between the underage pornographic type screening live broadcast rooms x9, a percentage of immature live broadcast rooms related to a pornographic public screen x10, a percentage of immature live broadcast rooms sent to an underage live broadcast room x11, a total number of live broadcast rooms x12, a percentage of underage live broadcast rooms x13, a percentage of continuous broadcast times x14, a percentage of continuous broadcast times x15 with underage main broadcast times x16, a total number of adult present shows shared to an underage ratio x17, an underage pornographic broadcast times x18, a continuous broadcast times x19, the present was given to the minor pornograph x20 times.
In the training process of the first network model, first training sample feature vectors corresponding to the first feature vectors may be obtained, and the first training sample feature vectors may be used to form training vectors y [ y1, y2, … …, y20], where a user of a training vector y is a childhood attribute, the label of the user is 1, and where the user of the training vector y is not a childhood attribute, the label of the user is 0. In practical application, the first network model may be trained by a linear fitting manner to obtain weights corresponding to the training vectors, weights corresponding to the first training eigenvectors ([ y8, y9, … …, y20]) which are perceived as strong by an immature adult are denoted as ([ a8, a9, … …, a20]), and weights corresponding to the first training eigenvectors ([ y1, y2, … …, y7]) which are not perceived as strong by an immature are denoted as ([ a1, a2, … …, a7 ]).
In the training process of the first network model, if the detection rule changes, the weight of the first feature vector may be adjusted according to the detection rule, and the model does not need to be retrained, which is exemplified by the following specific process: for example, if the determination of the ratio of viewing immature features is removed from the detection rule, the weight corresponding to the first training feature vector ([ y8, y9, … …, y20]) is changed to ([ a8, a9, … …, a20]/(a1+ a3+ … … + a20) (a1+ a2+ … … + a20)), and the weight corresponding to the first training feature vector ([ y1, y2, … …, y7]) is recorded as ([ a1,0, … …, a7]/(a1+ a3+ … … + a20) ([ a1+ a2+ … … + a 20)).
In an exemplary embodiment of the present invention, after the identification information of the user corresponding to the first detection result is output, a second feature vector of the user corresponding to the first detection result may also be obtained, the second feature vector is input to the trained second network model, and a second detection result is output; and if the second detection result meets the preset second detection condition, outputting the identification information of the user corresponding to the second detection result. That is, after the first network model determines whether the user is a childhood, the second network model may further determine whether the user is a childhood.
In an exemplary embodiment of the invention, the second feature vector may comprise at least one of: paying attention to the number of third-class users, the maximum intersection ratio of the shared live broadcast room and the live broadcast room shared by the third-class users, the number of times of sending a second-class public screen, the number of times of watching the second-class live broadcast by the third-class users, the number of times of watching the second-class users by the third-class users, the number of times of watching the second-class live broadcast by the third-class users, the number of times of sending the first-class users by the third-class users, the number of times of sending the second-class live broadcast by the third-class users and the number of the first-class users by the third-class users. In practical applications, the second feature vector may include t1 of users interested in the nodule, t2 of the maximum ratio of the intersection between the live broadcast of the users and the live broadcast of the nodule, t3 of times of violation public screen publication by the users, t4 of times of seeing the same immature live broadcast by the nodule users, t5 of times of seeing the same immature anchor by the nodule users, t6 of times of seeing the same immature live broadcast by the nodule users, t7 of times of seeing the same immature anchor by the nodule users, t8 of times of immature live broadcast by the nodule users, and t9 of times of seeing the same immature anchor by the nodule users.
In the training process of the second network model, second training sample feature vectors corresponding to the second feature vectors may be obtained, and the second training sample feature vectors may be used to form training vectors p [ p1, p2, … …, p9], where a user of a training vector p is a childhood attribute, the label of the user is 1, and where the user of the training vector p is not a childhood attribute, the label of the user is 0.
Based on the above description about the embodiments of the method for detecting a user in a live broadcast room, a scheme for detecting a childhood amounting in the live broadcast room is described below. As shown in fig. 2, data of an underage live room is obtained from an audit table of a live platform. In the live broadcast platform, one live broadcast room can correspond to an examination and check table, and the examination and check table can store the room number of the live broadcast room, the user name of the anchor broadcast of the live broadcast room, the live broadcast start time, the live broadcast type and the like. Other information of the anchor, such as gender, whether adult, etc., may also be stored in the review form. And further auditing whether the immature direct broadcast room relates to immature pornography or super-aged behaviors according to the data acquired from the auditing table. And if the minor live broadcast room relates to minor pornography or super-age behavior, inputting the first feature vector of each user in the live broadcast room into a first network model, outputting a first detection result, and if the first detection result exceeds a first grading threshold, pushing the identification information of the user to an audit background. And the auditor manually confirms the detection result in the audit background. And after the auditor confirms, continuously acquiring a second feature vector of the user, inputting the second feature vector into a second network model, outputting a second detection result, and if the second detection result exceeds a second grading threshold, pushing the identification information of the user to an audit background. And the auditor manually confirms the detection result in the audit background.
The embodiment of the invention can screen the general sensitive words to obtain the suspicious sensitive words, and the suspicious sensitive words can be used for detecting the childhood addiction in the underage live broadcast room or the underage pornography type screening live broadcast room, and can screen the general sensitive words in a targeted manner, thereby improving the accuracy of the suspicious sensitive words in the childhood addiction detection process.
The embodiment of the invention can acquire the first feature vector and the second feature vector of the user. The first feature vector focuses on interaction operations of the user in an underage live broadcast or underage pornography type screening live broadcast room, such as paying attention to a main broadcast, watching the live broadcast, sharing the live broadcast room and the like. The second feature vector focuses on the association relationship between the user and the childishness, such as attention to the childishness, and the childishness connecting wheat. The embodiment of the invention can not only screen the characteristics of the interactive operation of the live broadcast room from the immature live broadcast or immature pornography types of the user and judge whether the user is the childhood amounting, but also judge whether the user is the childhood amounting according to the characteristics of the correlation relationship between the user and the childhood amounting, and also can combine the characteristics of the two aspects to carry out the detection operation of the double childhood amounting, thereby improving the accuracy of the childhood amounting detection.
Childbearing nodules are generally divided into four categories: 1. the method has the advantages that the method has clear cognition and clear moral evaluation on the loving childhood addiction, the behavior mode is low, and the main purpose is to meet the loving childhood addiction in a mental level. Typically embodied as watching, praise a large number of different minor erotic live broadcasts. 2. Although the sexual criminal behavior is not actively pursued or the content of the boy lovers is spread, the sex standard of the boy lovers is not enough for the behavior of the boy lovers, the individual will be actively promoted by sharing the minor light-erotic live broadcast and public screen, and the companions are sought to obtain more content. The method has no harassment immature behaviors, but shares a large number of different immature light erotic live videos, and releases a large number of contents such as vermicelli and the like which are required to be added mutually on a public screen. 3. Live broadcast videos containing minor erotic or super-aged behaviors are collected and uploaded in a paste bar, the main purpose is to attract people with the same love to achieve the purpose of profit or non-profit, but no specific harassment behaviors on the sexual level of children exist. A large number of different light-erotic live broadcast videos of minor girls in minor ages are collected and uploaded in a sticking bar through photo albums. 4. The method is mainly used for releasing the speech containing remarkable or obscure pornography/attraction contents on a public screen of an underage live broadcast room, sending gifts to the live broadcast room and attracting underage chatting, and mainly aiming at carrying out harassment/invasion/peeling actions on the sexual level of children in reality. The release includes significant implementation of invading public screen speech or live broadcasting room gift giving on the sexual level of children and luring non-adult private chat. The embodiment of the invention can select and acquire the corresponding first characteristic vector according to different categories of the lingering childhood, further judge which category the user belongs to, and refine the detection of the lingering childhood.
Fig. 3 is a block diagram of a system for detecting a user in a live broadcast room according to an embodiment of the present invention, and as shown in fig. 3, the system may include:
the acquiring module 31 is configured to acquire a plurality of first feature vectors of each user in the live broadcast room according to a preset detection rule, where the first feature vectors include public screen features related to the screened sensitive words;
an input module 32, configured to input the first feature vector to a trained first network model, and correspondingly output a first detection result of each user;
an output module 33, configured to output, if the first detection result meets a preset first detection condition, identification information of the user corresponding to the first detection result.
In an exemplary embodiment of the invention, the system further comprises: the screening module is used for screening sensitive words, and comprises:
the set acquisition module is used for acquiring a general sensitive word set;
and the sensitive word screening module is used for screening the sensitive words in a preset type of live broadcast room aiming at each general sensitive word in the general sensitive word set.
In an exemplary embodiment of the present invention, the sensitive word screening module includes:
the parameter acquisition module is used for acquiring the first use times of the general sensitive words in a first user set, the second use times of the general sensitive words in a second user set and the number of the general sensitive words in the preset type of live broadcast rooms;
and the general sensitive word screening module is used for screening the general sensitive words according to the first using times, the second using times, the number and a preset threshold value.
In an exemplary embodiment of the present invention, the second set of users is sampled from a third set of users, and the preset threshold includes a first threshold, a second threshold, a third threshold and a fourth threshold;
the general sensitive word screening module is configured to screen the general sensitive words according to the first number of times of use, the second number of times of use, the number, a sampling ratio of the second user set to the third user set, the first threshold, the second threshold, the third threshold, and the fourth threshold.
In an exemplary embodiment of the present invention, the generic sensitive word screening module is configured to add a product of the second number of times of use and the sampling ratio to the first number of times of use to obtain a sum; dividing the first using times with the sum value to obtain a first ratio; when the first ratio is larger than or equal to the second threshold and smaller than the first threshold, dividing the first using times by the sum of the first using times and the quantity to obtain a second ratio, and if the second ratio is smaller than the third threshold, taking the general sensitive word as a suspicious sensitive word; and when the first ratio is smaller than the second threshold, if the second ratio is larger than or equal to the fourth threshold and smaller than the third threshold, taking the general sensitive word as a suspicious sensitive word.
In an exemplary embodiment of the invention, the system further comprises: and the training module is used for adjusting the weight of the first feature vector according to the detection rule in the training process of the first network model.
In an exemplary embodiment of the present invention, the obtaining module 31 is further configured to obtain, after the outputting module outputs the identification information of the user corresponding to the first detection result, a second feature vector of the user corresponding to the first detection result;
the input module 32 is further configured to input the second feature vector to a trained second network model, and output a second detection result;
the output module 33 is further configured to output the identification information of the user corresponding to the second detection result if the second detection result meets a preset second detection condition.
In an exemplary embodiment of the invention, the first feature vector comprises at least one of: the method comprises the following steps of paying attention to the total number of users, the number of users paying attention to a first class, the number of users paying attention to a second class, the number of times of watching a first class live broadcast, the total number of times of watching a live broadcast, the length of time of watching a second class live broadcast, the number of times of sending a first class public screen to a first class live broadcast room, the number of times of sending the first class public screen to the first class live broadcast room, the total number of times of sharing a live broadcast room, the number of times of sharing the second class live broadcast room, the total number of times of connecting to the live broadcast, the number of times of connecting to the live broadcast with the first class, the total number of times of delivering gifts, the number of times of delivering gifts to the first class, the number of times of sharing the second class of users, the number of times of connecting to the second class of users, the number of connecting to the live broadcast with the second class of users, The number of gift deliveries to the second type of user.
In an exemplary embodiment of the invention, the second feature vector comprises at least one of: paying attention to the number of third-class users, the maximum intersection ratio of the shared live broadcast room and the live broadcast room shared by the third-class users, the number of times of sending a second-class public screen, the number of times of watching the second-class live broadcast by the third-class users, the number of times of watching the second-class users by the third-class users, the number of times of watching the second-class live broadcast by the third-class users, the number of times of sending the first-class users by the third-class users, the number of times of sending the second-class live broadcast by the third-class users and the number of the first-class users by the third-class users.
The detection system for the user in the live broadcast room provided by the embodiment of the invention has the corresponding functional module for executing the detection method for the user in the live broadcast room, can execute the detection method for the user in the live broadcast room provided by the embodiment of the invention, and can achieve the same beneficial effects.
In another embodiment provided by the present invention, there is also provided an electronic device, which may include: the processor executes the program, and realizes each process of the above-mentioned embodiment of the method for detecting a user in a live broadcast room, and can achieve the same technical effect, and the details are not repeated here in order to avoid repetition. For example, as shown in fig. 4, the electronic device may specifically include: a processor 401, a storage device 402, a display screen 403 with touch functionality, an input device 404, an output device 405, and a communication device 406. The number of the processors 401 in the electronic device may be one or more, and one processor 401 is taken as an example in fig. 4. The processor 401, the storage means 402, the display 403, the input means 404, the output means 405 and the communication means 406 of the electronic device may be connected by a bus or other means.
In another embodiment of the present invention, a computer-readable storage medium is further provided, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer, the computer is caused to perform a method for detecting a user in a live broadcast according to any one of the above embodiments.
In a further embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform a method for detecting a user in a live broadcast room as described in any of the above embodiments.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (12)

1. A method for detecting a user in a live broadcast room is characterized by comprising the following steps:
according to a preset detection rule, obtaining a plurality of first feature vectors of each user in a live broadcast room, wherein the first feature vectors comprise public screen features related to screened sensitive words;
inputting the first feature vector to a trained first network model, and correspondingly outputting a first detection result of each user;
and if the first detection result meets a preset first detection condition, outputting the identification information of the user corresponding to the first detection result.
2. The method of claim 1, wherein the step of screening the sensitive words comprises:
acquiring a general sensitive word set;
and screening the sensitive words in a preset type of live broadcast room aiming at each general sensitive word in the general sensitive word set.
3. The method according to claim 2, wherein the performing sensitive word filtering in a preset type of live broadcast room for each general sensitive word in the general sensitive word set comprises:
acquiring the first use times of the general sensitive words in a first user set, the second use times of the general sensitive words in a second user set and the number of the general sensitive words in the preset type of live broadcast rooms for each general sensitive word;
and screening the general sensitive words according to the first using times, the second using times, the number and a preset threshold value.
4. The method according to claim 3, wherein the second set of users is sampled from a third set of users, and the preset threshold comprises a first threshold, a second threshold, a third threshold and a fourth threshold;
the screening the general sensitive words according to the first using times, the second using times, the number and a preset threshold value comprises the following steps:
and screening the general sensitive words according to the first using times, the second using times, the number, the sampling ratio of the second user set to the third user set, the first threshold, the second threshold, the third threshold and the fourth threshold.
5. The method of claim 4, wherein the filtering the generic sensitive word according to the first number of uses, the second number of uses, the number, a sampling ratio of the second set of users to the third set of users, the first threshold, the second threshold, the third threshold, and the fourth threshold comprises:
adding the product of the second using times and the sampling ratio to the first using times to obtain a sum value;
dividing the first using times with the sum value to obtain a first ratio;
when the first ratio is larger than or equal to the second threshold and smaller than the first threshold, dividing the first using times by the sum of the first using times and the quantity to obtain a second ratio, and if the second ratio is smaller than the third threshold, taking the general sensitive word as a suspicious sensitive word;
and when the first ratio is smaller than the second threshold, if the second ratio is larger than or equal to the fourth threshold and smaller than the third threshold, taking the general sensitive word as a suspicious sensitive word.
6. The method of claim 1, wherein the step of training the first network model comprises:
and adjusting the weight of the first feature vector according to the detection rule.
7. The method according to claim 1, wherein after the outputting of the identification information of the user corresponding to the first detection result, the method further comprises:
acquiring a second feature vector of the user corresponding to the first detection result;
inputting the second feature vector to a trained second network model, and outputting a second detection result;
and if the second detection result meets a preset second detection condition, outputting the identification information of the user corresponding to the second detection result.
8. The method of claim 7, wherein the first eigenvector comprises at least one of: the method comprises the following steps of paying attention to the total number of users, the number of users paying attention to a first class, the number of users paying attention to a second class, the number of times of watching a first class live broadcast, the total number of times of watching a live broadcast, the length of time of watching a second class live broadcast, the number of times of sending a first class public screen to a first class live broadcast room, the number of times of sending the first class public screen to the first class live broadcast room, the total number of times of sharing a live broadcast room, the number of times of sharing the second class live broadcast room, the total number of times of connecting to the live broadcast, the number of times of connecting to the live broadcast with the first class, the total number of times of delivering gifts, the number of times of delivering gifts to the first class, the number of times of sharing the second class of users, the number of times of connecting to the second class of users, the number of connecting to the live broadcast with the second class of users, The number of gift deliveries to the second type of user.
9. The method of claim 8, wherein the second eigenvector comprises at least one of: paying attention to the number of third-class users, the maximum intersection ratio of the shared live broadcast room and the live broadcast room shared by the third-class users, the number of times of sending a second-class public screen, the number of times of watching the second-class live broadcast by the third-class users, the number of times of watching the second-class users by the third-class users, the number of times of watching the second-class live broadcast by the third-class users, the number of times of sending the first-class users by the third-class users, the number of times of sending the second-class live broadcast by the third-class users and the number of the first-class users by the third-class users.
10. A system for detecting a user in a live room, comprising:
the acquisition module is used for acquiring a plurality of first feature vectors of each user in a live broadcast room according to a preset detection rule, wherein the first feature vectors comprise public screen features related to the screened sensitive words;
the input module is used for inputting the first feature vector to a trained first network model and correspondingly outputting a first detection result of each user;
and the output module is used for outputting the identification information of the user corresponding to the first detection result if the first detection result meets a preset first detection condition.
11. An electronic device, comprising:
a processor, a memory, and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing a method of detecting a user in a live broadcast as claimed in any one of claims 1 to 9.
12. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform a method of detecting a user in a live broadcast as claimed in any one of claims 1 to 9.
CN202110180459.4A 2021-02-09 2021-02-09 Method and system for detecting user in live broadcast room, electronic device and storage medium Pending CN113163218A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110180459.4A CN113163218A (en) 2021-02-09 2021-02-09 Method and system for detecting user in live broadcast room, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110180459.4A CN113163218A (en) 2021-02-09 2021-02-09 Method and system for detecting user in live broadcast room, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN113163218A true CN113163218A (en) 2021-07-23

Family

ID=76883058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110180459.4A Pending CN113163218A (en) 2021-02-09 2021-02-09 Method and system for detecting user in live broadcast room, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN113163218A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114786044A (en) * 2022-04-08 2022-07-22 广州博冠信息科技有限公司 Management method and device of live broadcast platform, computer equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105657471A (en) * 2016-01-29 2016-06-08 广州酷狗计算机科技有限公司 Account management method and device
CN106685963A (en) * 2016-12-29 2017-05-17 济南大学 Method and system of establishing malicious network flow lexicon
CN108111867A (en) * 2016-11-24 2018-06-01 广州华多网络科技有限公司 A kind of direct broadcasting room speech monitoring method and system
CN110363245A (en) * 2019-07-17 2019-10-22 上海掌学教育科技有限公司 Excellent picture screening technique, the apparatus and system of Online class
CN110555465A (en) * 2019-08-13 2019-12-10 成都信息工程大学 Weather image identification method based on CNN and multi-feature fusion
CN111008332A (en) * 2019-12-03 2020-04-14 腾讯科技(深圳)有限公司 Content item recommendation method, device, server and storage medium
CN111079029A (en) * 2019-12-20 2020-04-28 珠海格力电器股份有限公司 Sensitive account detection method, storage medium and computer equipment
CN111783998A (en) * 2020-06-30 2020-10-16 百度在线网络技术(北京)有限公司 Illegal account recognition model training method and device and electronic equipment
CN111966906A (en) * 2020-08-21 2020-11-20 绍兴市寅川软件开发有限公司 Webpage sensitive text processing method and system based on self-setting sensitive words
CN112199640A (en) * 2020-09-30 2021-01-08 广州市百果园网络科技有限公司 Abnormal user auditing method and device, electronic equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105657471A (en) * 2016-01-29 2016-06-08 广州酷狗计算机科技有限公司 Account management method and device
CN108111867A (en) * 2016-11-24 2018-06-01 广州华多网络科技有限公司 A kind of direct broadcasting room speech monitoring method and system
CN106685963A (en) * 2016-12-29 2017-05-17 济南大学 Method and system of establishing malicious network flow lexicon
CN110363245A (en) * 2019-07-17 2019-10-22 上海掌学教育科技有限公司 Excellent picture screening technique, the apparatus and system of Online class
CN110555465A (en) * 2019-08-13 2019-12-10 成都信息工程大学 Weather image identification method based on CNN and multi-feature fusion
CN111008332A (en) * 2019-12-03 2020-04-14 腾讯科技(深圳)有限公司 Content item recommendation method, device, server and storage medium
CN111079029A (en) * 2019-12-20 2020-04-28 珠海格力电器股份有限公司 Sensitive account detection method, storage medium and computer equipment
CN111783998A (en) * 2020-06-30 2020-10-16 百度在线网络技术(北京)有限公司 Illegal account recognition model training method and device and electronic equipment
CN111966906A (en) * 2020-08-21 2020-11-20 绍兴市寅川软件开发有限公司 Webpage sensitive text processing method and system based on self-setting sensitive words
CN112199640A (en) * 2020-09-30 2021-01-08 广州市百果园网络科技有限公司 Abnormal user auditing method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114786044A (en) * 2022-04-08 2022-07-22 广州博冠信息科技有限公司 Management method and device of live broadcast platform, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
AU2014214662B2 (en) Collection of machine learning training data for expression recognition
CN111212303B (en) Video recommendation method, server and computer-readable storage medium
JP6261547B2 (en) Determination device, determination method, and determination program
Redi et al. Crowdsourcing for rating image aesthetic appeal: Better a paid or a volunteer crowd?
Kalyanam et al. Prediction and characterization of high-activity events in social media triggered by real-world news
EP3176718A2 (en) Control method, processing apparatus, and recording medium
WO2015160415A2 (en) Systems and methods for visual sentiment analysis
WO2014141976A1 (en) Method for user categorization in social media, computer program, and computer
JP2012113589A (en) Action motivating device, action motivating method and program
US20220261527A1 (en) Information processing apparatus and non-transitory computer readable medium
CN106612230A (en) Media information promotion method, client and server
CN113163218A (en) Method and system for detecting user in live broadcast room, electronic device and storage medium
US10185765B2 (en) Non-transitory computer-readable medium, information classification method, and information processing apparatus
JP2007172173A (en) Information providing method and device and program and computer-readable recording medium
US20230401492A1 (en) Content moderation
US10349135B2 (en) Method and program product for automatic human identification and censorship of submitted media
Song et al. Finding epic moments in live content through deep learning on collective decisions
JP6043460B2 (en) Data analysis system, data analysis method, and data analysis program
Chen et al. Characterizing heated tobacco products marketing on Instagram: observational study
JP2018036756A (en) Message classification system, message classification method, and program
TW201802735A (en) Cosmetics recommendation system and method
JP6604608B2 (en) Information processing apparatus, information processing method, and information processing program
JP2021162997A (en) Information processing device and information processing method
CN111800660B (en) Information display method and device
Jayashree et al. Mpaa rating prediction using script analysis for movies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210723