CN111385655A - Advertisement bullet screen detection method and device, server and storage medium - Google Patents

Advertisement bullet screen detection method and device, server and storage medium Download PDF

Info

Publication number
CN111385655A
CN111385655A CN201811633994.5A CN201811633994A CN111385655A CN 111385655 A CN111385655 A CN 111385655A CN 201811633994 A CN201811633994 A CN 201811633994A CN 111385655 A CN111385655 A CN 111385655A
Authority
CN
China
Prior art keywords
bullet screen
advertisement
data
real
screen data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811633994.5A
Other languages
Chinese (zh)
Inventor
刘兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Douyu Network Technology Co Ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201811633994.5A priority Critical patent/CN111385655A/en
Publication of CN111385655A publication Critical patent/CN111385655A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/454Content or additional data filtering, e.g. blocking advertisements
    • H04N21/4545Input to filtering algorithms, e.g. filtering a region of the image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a device for detecting an advertisement bullet screen, a server and a storage medium, and belongs to the technical field of bullet screens. The method provided by the invention comprises the following steps: collecting historical bullet screen data, wherein the historical bullet screen data comprises an advertisement bullet screen and a normal bullet screen; training the historical bullet screen data by using a random forest to obtain an advertisement bullet screen prediction model; and collecting bullet screen data in real time, and detecting whether the real-time bullet screen data is an advertisement bullet screen or not through the advertisement bullet screen prediction model according to the dimension characteristics in the real-time bullet screen data. The invention can quickly identify the advertisement bullet screen which does not appear and ensure the accurate detection and interception of the advertisement bullet screen.

Description

Advertisement bullet screen detection method and device, server and storage medium
Technical Field
The invention relates to the technical field of barrage, in particular to an advertisement barrage detection method, an advertisement barrage detection device, a server and a storage medium.
Background
The user often can appear advertisement barrage watching the live in-process, and advertisement barrage not only can influence user's viewing experience, but also can relate to some illegal propaganda content. Aiming at the malicious advertisements, whether the advertisements are the advertisement bullet screens or not needs to be distinguished according to the bullet screen contents and corresponding interception needs to be carried out.
The existing advertisement interception model is mainly characterized in that the model such as a neural network is subjected to iterative training through previously marked advertisement bullet screens, and the bullet screens which never appear or are in the forms of simple disordered word sequences, shape-similar characters, sound-similar characters and the like are difficult to recognize and judge, so that the bullet screen interception effect is unsatisfactory.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, and a storage medium for detecting and identifying an advertisement bullet screen that does not appear, and intercepting the advertisement bullet screen.
In combination with the first aspect of the embodiments of the present invention, a method for detecting an advertisement bullet screen is provided, including:
collecting historical bullet screen data, wherein the historical bullet screen data comprises an advertisement bullet screen and a normal bullet screen;
training the historical bullet screen data by using a random forest to obtain an advertisement bullet screen prediction model;
and acquiring bullet screen data in real time, and detecting whether the real-time bullet screen data is an advertisement bullet screen or not through the advertisement bullet screen prediction model according to the dimension characteristics in the real-time bullet screen data.
In combination with the second aspect of the embodiments of the present invention, there is provided an advertisement bullet screen detection apparatus, including:
a collecting unit: the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring historical bullet screen data, and the historical bullet screen data comprises advertisement bullet screens and normal bullet screens;
a training unit: the system is used for training the historical bullet screen data by utilizing a random forest to obtain an advertisement bullet screen prediction model;
a detection unit: the method is used for acquiring bullet screen data in real time, and detecting whether the real-time bullet screen data is an advertisement bullet screen or not through the advertisement bullet screen prediction model according to dimension characteristics in the real-time bullet screen data.
In a third aspect of embodiments of the present invention, there is provided a server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the computer program.
In a fourth aspect of embodiments of the present invention, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method as provided in the first aspect of the application.
In a fifth aspect of embodiments of the present invention, there is provided a computer program product comprising a computer program which, when executed by one or more processors, performs the steps of the method as provided in the first aspect of the present application.
In the embodiment of the invention, the historical bullet screen data is acquired, the random forest training historical bullet screen data is utilized to obtain the recognition model, and the recognition model is utilized to detect the real-time bullet screen. The scheme combines random forest training samples, judges bullet screen risks according to bullet screen data characteristics, can identify bullet screens which do not appear in advance, can intercept advertisement bullet screens changed in a simple form in real time and quickly, solves the problem that traditional identification models based on bullet screen contents are low in iteration rate, and guarantees the accuracy of interception.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flow chart of an advertisement bullet screen detection method according to an embodiment of the present invention;
fig. 2 is another schematic flow chart of an advertisement bullet screen detection method according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a specific implementation of step S103 according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an advertisement bullet screen detection device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides an advertisement bullet screen detection method, an advertisement bullet screen detection device, a server and a storage medium, which are used for detecting and identifying malicious advertisement bullet screens.
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment is as follows:
referring to fig. 1, an implementation flowchart of an advertisement display method in live video according to an embodiment of the present invention includes the following steps:
s101, collecting historical bullet screen data, wherein the historical bullet screen data comprises an advertisement bullet screen and a normal bullet screen;
the historical bullet screen data are bullet screen data which are stored in the data of the system server, the historical bullet screen data come from different live broadcast rooms, different users and clients of different devices, a certain amount of historical bullet screen data are randomly selected, and the bullet screen data are arranged according to bullet screen contents and the basic information of the bullet screen.
The advertisement bullet screen is the bullet screen of having been identified as the advertisement, when gathering historical bullet screen data, can mark the advertisement bullet screen. In the embodiment of the invention, the malicious advertisement bullet screen relates to illegal content publicity or illegal means publicizing profitable bullet screen content, the malicious advertisement can be automatically sent to a live broadcast room through program setting, the malicious advertisement bullet screen has various forms, such as symbols, special-shaped characters, phonetic characters, shape characters and the like, and the traditional recognition model is difficult to recognize the variant bullet screen.
Preferably, after the historical bullet screen data are collected, the historical bullet screen data are sorted according to marks of users, bullet screen contents, equipment IDs, IP addresses, live broadcast room numbers, identity records and the like. The sorting and marking of the historical bullet screen data can facilitate the determination of tree nodes and the number of trees.
S102, training the historical bullet screen data by using a random forest to obtain an advertisement bullet screen prediction model;
the random forest is a classifier comprising a plurality of decision trees, the probability that the expected value is greater than zero is obtained by forming the decision trees, the project risk is evaluated, and the feasibility of the project risk is judged. Specifically, collected historical bullet screen data is used as a sample, and a training set is extracted from the sample and put back to train each node of the decision tree.
The advertisement bullet screen prediction model is that after all decision trees in the advertisement bullet screen prediction model are trained by a random forest, prediction or judgment results of bullet screens are output according to input bullet screen data.
Specifically, the historical bullet screen data is used as a sample, and classification nodes are selected for each decision tree by calculating the minimum kiningness of the nodes.
S103, collecting bullet screen data in real time, and detecting whether the real-time bullet screen data is an advertisement bullet screen or not through the advertisement bullet screen prediction model according to dimension characteristics in the real-time bullet screen data.
The bullet screen data is the bullet screen data which is obtained by the server after the user text is input, the implementation bullet screen data is the bullet screen data which is collected in real time, and the real-time bullet screen data contains basic sender information, namely user information, such as a user name, room information, equipment IP and the like. Preferably, the method further comprises statistical information such as historical bullet screen number, device ID, IP ratio of the sending end and the like.
The dimensionality characteristics are based on current barrage basic information, such as barrage content, user identity, equipment ID, network IP and the like, and behavior statistical information of historical barrage data, such as statistical information of the number of times of occurrence of the current barrage content, the number of barrages sent by the equipment ID, the number of people average barrages in a current live broadcast room and the like. Specifically, the preset features are different-dimension behavior statistical features based on current barrage basic information and historical barrage information, the historical barrage information is historical barrage records of a current user, and the different-dimension behavior statistical features are different-dimension barrage statistical features of the identity, content, equipment, network IP, room, client type and the like corresponding to the barrage sent by the user.
Through the advertisement bullet screen prediction model, the bullet screen sent by the user can be predicted in real time, and whether the bullet screen is the advertisement bullet screen or not is judged. Preferably, the probability that each bullet screen is an advertisement bullet screen is calculated through the advertisement bullet screen prediction model, the corresponding interception grade is searched according to the probability that the bullet screen is the advertisement bullet screen, and a corresponding interception strategy is adopted.
Specifically, the possibility that each bullet screen is an advertisement is graded, and interception strategies of different degrees are adopted according to different grades. For example, if the advertisement probability is 70% -90%, the bullet screen is deleted.
In the method, based on random forest training samples, bullet screen features are extracted, and then whether the bullet screen is an advertisement or not can be predicted in real time, so that the advertisement bullet screen can be effectively identified, and the accuracy of interception is guaranteed.
Example two:
on the basis of fig. 1, fig. 2 shows another flow diagram of the advertisement model detection method, which specifically describes the training process of the advertisement recognition model, and includes the following steps:
s201, collecting sample data, and performing characteristic sorting on the sample data;
every bullet screen that the user sent all has the record in the server, gathers the historical bullet screen data of a period of time recently, for example the bullet screen of nearly 3 months, chooses a certain amount of bullet screens at random, and bullet screen sample is gathered from different users, different live broadcast rooms. In the bullet screen sample data, each bullet screen at least comprises: barrage content, used IP, used equipment ID, user identity, live broadcast room ID, and client. Based on the bullet screen information, the bullet screen characteristics are sorted and counted.
S202, utilizing a random forest training sample;
for historical sample data analysis, sample characteristic dimensions can be set, and then nodes and the number of the nodes used in the decision tree can be determined.
In the embodiment of the present invention, the decision tree needs to be trained through samples first, and the training samples of each tree may be different. Each tree is equivalent to a learning unit, and sample features are extracted to facilitate classification.
Specifically, a training set, a test set, a feature dimension, and a given parameter are set: number of decision trees, depth, number of features, termination conditions, etc.; with the selected samples put back, training is started from the root node until all nodes and the decision tree are trained.
Preferably, the classification nodes are selected by selecting the minimum kiney impure degree of each node, and the calculation formula of the kiney impure degree is as follows:
Figure BDA0001929573430000061
wherein p isiIn order to divide the probability of the ith class according to the node and measure the prediction error rate of the data item, the smaller the gini impurity degree is, the better the classification effect of the model is.
S203, training a recognition model;
after the sample is trained, all decision trees form a random forest, and the random forest model is the recognition model. The recognition model is tested and trained through sample data or newly acquired samples, and the advertisement bullet screen and the normal bullet screen are calibrated in the sample data. And adjusting the response of the identification model (advertisement bullet screen identification model) according to the identification result so as to ensure the accuracy of the identification result.
In the embodiment of the invention, the advertisement bullet screen recognition model is obtained through random forest training samples, so that the real-time judgment and prediction of the advertisement bullet screen can be realized, and compared with the traditional neural network model, the method can be used for identifying and judging various bullet screens which do not appear.
Example three:
with reference to fig. 1, fig. 3 shows a flowchart of a specific implementation of step S103, specifically:
s1031, collecting a real-time bullet screen, and calculating bullet screen characteristics;
a real-time barrage generally includes user ID, live broadcast room ID, network IP, device ID, client type, barrage content, and sending time, etc., and based on the feature dimensions of the barrage information, each barrage feature is calculated, for example: counting the occurrence times, the number of rooms and the number of users to be sent based on the bullet screen content; the number of users, the number of rooms present, the number of delivered barrages, etc., based on the current device ID.
And according to the real-time bullet screen, carrying out behavior dimension characteristic statistics based on the user information and the historical bullet screen data.
And S1032, model detection and identification.
Deploying the recognition model on a line, and collecting bullet screen data in real time and inputting the bullet screen data into the recognition model. Optionally, the possibility that the input bullet screen is the advertisement is calculated, and a corresponding interception strategy is set according to the possibility of the advertisement bullet screen.
Preferably, different levels of interception methods are set. If not, further analysis, deletion of the current barrage or prohibition of sending the barrage by the current user ID, prohibition of sending the barrage by the current network IP, prohibition of sending the barrage by the equipment ID, etc.
For example, if the calculated probability that the current bullet screen is an advertisement is 60%, the current user ID, device or network IP, etc. may be marked and locked, and according to further analysis of the next bullet screen, if 80% of the bullet screen may be deleted, if 98% of the bullet screen is forbidden, the current user may be forbidden to speak in the live broadcast room for a short time, and if the user is detected to send advertisements in multiple live broadcast rooms at the same time, the current user or device ID may be forbidden to send bullet screen. Preferably, the user or device or IP can be marked for the current difficulty in determining whether the advertisement bullet screen is present, so that the user or device or IP can be locked in time when the advertisement bullet screen appears next time.
Furthermore, according to the detection result of the identification model, the user who sends the advertisement bullet screen and has low sending probability is set to not detect and identify any more in a short time, the user can be marked for the bullet screen which is difficult to judge at present, and the bullet screen which has high risk and can detect the advertisement again after deleting the bullet screen is forbidden.
Preferably, the real-time collection of the bullet screen can be optimized according to the detection result of the recognition model each time. For different users, the detection of the advertisement can be accelerated through the short-time mark, and the direct locking of a specific advertisement sender is facilitated.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Example four:
the above mainly describes an advertisement bullet screen detection method, and a detailed description will be given below to an advertisement bullet screen detection device.
Fig. 4 shows a schematic structural diagram of an advertisement bullet screen detection device provided by an embodiment of the present invention.
The acquisition module 410: the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring historical bullet screen data, and the historical bullet screen data comprises advertisement bullet screens and normal bullet screens;
the training module 420: the system is used for training the historical bullet screen data by utilizing a random forest to obtain an advertisement bullet screen prediction model;
optionally, the training of the historical bullet screen data by using the random forest specifically comprises:
and taking the historical bullet screen data as a sample, and selecting classification nodes for each decision tree by calculating the minimum kindness purity of the nodes.
The detection module 430: the method is used for acquiring bullet screen data in real time, and detecting whether the real-time bullet screen data is an advertisement bullet screen or not through the advertisement bullet screen prediction model according to dimension characteristics in the real-time bullet screen data.
Optionally, the predetermined characteristic is a different-dimension behavior statistical characteristic based on current bullet screen basic information and historical bullet screen information.
Optionally, the detecting module 430 includes:
and the calculating unit is used for calculating the probability that the real-time bullet screen corresponding to the preset characteristics is the advertisement bullet screen through the advertisement bullet screen prediction model according to the preset characteristics.
Optionally, the detecting 430 further includes:
and the interception unit is used for searching the interception grade corresponding to the probability according to the probability that the real-time bullet screen is the advertisement bullet screen and executing the interception strategy corresponding to the interception grade.
Above-mentioned device obtains advertisement bullet curtain recognition model through training module to historical bullet curtain data training, and then detects real-time bullet curtain, can realize the advertisement bullet curtain short-term test to the new appearance.
Example five:
fig. 5 is a schematic structural diagram of an advertisement bullet screen detection server according to an embodiment of the present invention. The server, which is a device providing computing services, generally refers to a computer with high computing power, and is provided to a plurality of users via a network. As shown in fig. 5, the apparatus 5 of this embodiment includes: a memory 510, a processor 520, and a system bus 530, the memory 510 including an executable program 5101 stored thereon, it being understood by those skilled in the art that the server architecture shown in fig. 5 is not limiting of servers and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
The following describes each component of the server in detail with reference to fig. 5:
the memory 510 may be used to store software programs and modules, and the processor 520 executes various functional applications of the server and data processing by operating the software programs and modules stored in the memory 510. The memory 510 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the server, and the like. Further, the memory 510 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
On the memory 510 is embodied a runnable program 5101 for network requested methods, which runnable program 5101 may be partitioned into one or more modules/units stored in the memory 510 and executed by the processor 520 to detect advertisement banners, which may be a series of computer program instruction segments describing the execution of the computer program 5101 in the server 5 that can perform certain functions. For example, the computer program 5101 can be partitioned into an acquisition module, a training module, and a detection module.
The processor 520 is a control center of the server, connects various parts of the entire server apparatus using various interfaces and lines, performs various functions of the server and processes data by operating or executing software programs and/or modules stored in the memory 510 and calling data stored in the memory 510, thereby performing overall monitoring of the server. Alternatively, processor 520 may include one or more processing units; preferably, the processor 520 may integrate an application processor, which mainly handles operating systems, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 520.
The system bus 530 is used to connect various functional units inside the computer, and can transmit data information, address information, and control information, and can be, for example, a PCI bus, an ISA bus, a VESA bus, etc. The instructions of the processor 520 are transferred to the memory 510 through the bus, the memory 510 feeds data back to the processor 520, and the system bus 530 is responsible for data and instruction interaction between the processor 520 and the memory 510. Of course, other devices, such as network interfaces, display devices, etc., may also be accessed by the system bus 530.
The server at least comprises a network card, an output device and the like, and other components are not described in detail herein.
In this embodiment of the present invention, the executable program executed by the processor 520 included in the server specifically includes:
an advertisement bullet screen detection method comprises the following steps:
collecting historical bullet screen data, wherein the historical bullet screen data comprises an advertisement bullet screen and a normal bullet screen;
training the historical bullet screen data by using a random forest to obtain an advertisement bullet screen prediction model;
and acquiring bullet screen data in real time, and detecting whether the real-time bullet screen data is an advertisement bullet screen or not through the advertisement bullet screen prediction model according to the dimension characteristics in the real-time bullet screen data.
Further, the training of the historical bullet screen data by using the random forest specifically comprises:
and taking the historical bullet screen data as a sample, and selecting classification nodes of the decision trees by calculating the minimum kiney purity of the nodes in each decision tree.
Furthermore, the dimension characteristics are statistical characteristics of different dimension information of bullet screen sending behaviors based on real-time bullet screen basic information and historical bullet screen information.
Further, the inputting of the predetermined characteristics into the advertisement bullet screen prediction model to detect whether the bullet screen data is an advertisement bullet screen specifically includes:
and according to the preset characteristics, calculating the probability that the real-time bullet screen corresponding to the preset characteristics is the advertisement bullet screen through the advertisement bullet screen prediction model.
Further, the calculating, according to the predetermined characteristic and through the advertisement bullet screen prediction model, the probability that the real-time bullet screen corresponding to the predetermined characteristic is the advertisement bullet screen further includes:
and searching the interception level corresponding to the probability according to the probability that the real-time bullet screen is the advertisement bullet screen, and executing the interception strategy corresponding to the interception level.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment. A
Those of ordinary skill in the art would appreciate that the modules, elements, and/or method steps of the various embodiments described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An advertisement bullet screen detection method is characterized by comprising the following steps:
collecting historical bullet screen data, wherein the historical bullet screen data comprises an advertisement bullet screen and a normal bullet screen;
training the historical bullet screen data by using a random forest to obtain an advertisement bullet screen prediction model;
and acquiring bullet screen data in real time, and detecting whether the real-time bullet screen data is an advertisement bullet screen or not through the advertisement bullet screen prediction model according to the dimension characteristics in the real-time bullet screen data.
2. The method as claimed in claim 1, wherein the training of the historical bullet screen data using random forests is specifically:
and taking the historical bullet screen data as a sample, and selecting classification nodes of the decision trees by calculating the minimum kiney purity of the nodes in each decision tree.
3. The method of claim 1, wherein the dimensional characteristics are statistical characteristics of different dimensional information of bullet screen sending behaviors based on real-time bullet screen basic information and historical bullet screen information.
4. The method of claim 1, wherein the detecting whether the real-time bullet screen data is an advertisement bullet screen by the advertisement bullet screen prediction model is specifically:
and according to the dimension characteristics, calculating the probability that the real-time bullet screen data corresponding to the dimension characteristics are the advertisement bullet screen through the advertisement bullet screen prediction model.
5. The method of claim 4, wherein calculating, according to the dimensional features, a probability that the real-time bullet screen corresponding to the predetermined features is an advertisement bullet screen through the advertisement bullet screen prediction model further comprises:
and searching the interception level corresponding to the probability according to the probability that the real-time bullet screen data is the advertisement bullet screen, and executing the interception strategy corresponding to the interception level.
6. An advertisement bullet screen detection device, its characterized in that includes:
an acquisition module: the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring historical bullet screen data, and the historical bullet screen data comprises advertisement bullet screens and normal bullet screens;
a training module: the system is used for training the historical bullet screen data by utilizing a random forest to obtain an advertisement bullet screen prediction model;
a detection module: the method is used for acquiring bullet screen data in real time, and detecting whether the real-time bullet screen data is an advertisement bullet screen or not through the advertisement bullet screen prediction model according to dimension characteristics in the real-time bullet screen data.
7. The apparatus of claim 6, wherein the detection unit comprises:
and the calculating unit is used for calculating the probability that the real-time bullet screen data corresponding to the dimension characteristics is the advertisement bullet screen through the advertisement bullet screen prediction model according to the dimension characteristics.
8. The apparatus of claim 7, wherein the computing unit comprises:
and the interception unit is used for searching the interception grade corresponding to the probability according to the probability that the real-time bullet screen data is the advertisement bullet screen, and executing the interception strategy corresponding to the interception grade.
9. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method for detecting an advertisement bullet screen according to any one of claims 1 to 5 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for detecting an advertisement bullet screen according to any one of claims 1 to 5.
CN201811633994.5A 2018-12-29 2018-12-29 Advertisement bullet screen detection method and device, server and storage medium Pending CN111385655A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811633994.5A CN111385655A (en) 2018-12-29 2018-12-29 Advertisement bullet screen detection method and device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811633994.5A CN111385655A (en) 2018-12-29 2018-12-29 Advertisement bullet screen detection method and device, server and storage medium

Publications (1)

Publication Number Publication Date
CN111385655A true CN111385655A (en) 2020-07-07

Family

ID=71220550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811633994.5A Pending CN111385655A (en) 2018-12-29 2018-12-29 Advertisement bullet screen detection method and device, server and storage medium

Country Status (1)

Country Link
CN (1) CN111385655A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113518262A (en) * 2021-07-09 2021-10-19 珠海云迈网络科技有限公司 Advertisement bullet screen publisher identification method and device, computer equipment and storage medium thereof

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103400391A (en) * 2013-08-09 2013-11-20 北京博思廷科技有限公司 Multiple-target tracking method and device based on improved random forest
CN103530595A (en) * 2012-07-04 2014-01-22 富士通株式会社 Method and apparatus for detecting eye states
CN103679132A (en) * 2013-07-15 2014-03-26 北京工业大学 A sensitive image identification method and a system
KR20160004467A (en) * 2014-07-02 2016-01-13 (주)퓨쳐스트림네트웍스 Mobile advertising engine and 3d deformable object modeling method thereof
CN105435453A (en) * 2015-12-22 2016-03-30 网易(杭州)网络有限公司 Bullet screen information processing method, device and system
CN106203508A (en) * 2016-07-11 2016-12-07 天津大学 A kind of image classification method based on Hadoop platform
CN106228389A (en) * 2016-07-14 2016-12-14 武汉斗鱼网络科技有限公司 Network potential usage mining method and system based on random forests algorithm
CN107181745A (en) * 2017-05-16 2017-09-19 阿里巴巴集团控股有限公司 Malicious messages recognition methods, device, equipment and computer-readable storage medium
CN107291780A (en) * 2016-04-12 2017-10-24 腾讯科技(深圳)有限公司 A kind of user comment information methods of exhibiting and device
CN108090046A (en) * 2017-12-29 2018-05-29 武汉大学 A kind of microblogging rumour recognition methods based on LDA and random forest
CN108537176A (en) * 2018-04-11 2018-09-14 武汉斗鱼网络科技有限公司 Recognition methods, device, terminal and the storage medium of target barrage
CN108550054A (en) * 2018-04-12 2018-09-18 百度在线网络技术(北京)有限公司 A kind of content quality appraisal procedure, device, equipment and medium
CN109086422A (en) * 2018-08-08 2018-12-25 武汉斗鱼网络科技有限公司 A kind of recognition methods, device, server and the storage medium of machine barrage user

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530595A (en) * 2012-07-04 2014-01-22 富士通株式会社 Method and apparatus for detecting eye states
CN103679132A (en) * 2013-07-15 2014-03-26 北京工业大学 A sensitive image identification method and a system
CN103400391A (en) * 2013-08-09 2013-11-20 北京博思廷科技有限公司 Multiple-target tracking method and device based on improved random forest
KR20160004467A (en) * 2014-07-02 2016-01-13 (주)퓨쳐스트림네트웍스 Mobile advertising engine and 3d deformable object modeling method thereof
CN105435453A (en) * 2015-12-22 2016-03-30 网易(杭州)网络有限公司 Bullet screen information processing method, device and system
CN107291780A (en) * 2016-04-12 2017-10-24 腾讯科技(深圳)有限公司 A kind of user comment information methods of exhibiting and device
CN106203508A (en) * 2016-07-11 2016-12-07 天津大学 A kind of image classification method based on Hadoop platform
CN106228389A (en) * 2016-07-14 2016-12-14 武汉斗鱼网络科技有限公司 Network potential usage mining method and system based on random forests algorithm
CN107181745A (en) * 2017-05-16 2017-09-19 阿里巴巴集团控股有限公司 Malicious messages recognition methods, device, equipment and computer-readable storage medium
CN108090046A (en) * 2017-12-29 2018-05-29 武汉大学 A kind of microblogging rumour recognition methods based on LDA and random forest
CN108537176A (en) * 2018-04-11 2018-09-14 武汉斗鱼网络科技有限公司 Recognition methods, device, terminal and the storage medium of target barrage
CN108550054A (en) * 2018-04-12 2018-09-18 百度在线网络技术(北京)有限公司 A kind of content quality appraisal procedure, device, equipment and medium
CN109086422A (en) * 2018-08-08 2018-12-25 武汉斗鱼网络科技有限公司 A kind of recognition methods, device, server and the storage medium of machine barrage user

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113518262A (en) * 2021-07-09 2021-10-19 珠海云迈网络科技有限公司 Advertisement bullet screen publisher identification method and device, computer equipment and storage medium thereof

Similar Documents

Publication Publication Date Title
US20210182611A1 (en) Training data acquisition method and device, server and storage medium
CN105808988B (en) Method and device for identifying abnormal account
CN103793484B (en) The fraud identifying system based on machine learning in classification information website
CN107579956B (en) User behavior detection method and device
CN104040963B (en) The system and method for carrying out spam detection for the frequency spectrum using character string
CN110336838B (en) Account abnormity detection method, device, terminal and storage medium
CN111866196B (en) Domain name traffic characteristic extraction method, device and equipment and readable storage medium
CN108833139B (en) OSSEC alarm data aggregation method based on category attribute division
CN108269122B (en) Advertisement similarity processing method and device
CN108985048B (en) Simulator identification method and related device
CN108234472A (en) Detection method and device, computer equipment and the readable medium of Challenging black hole attack
CN111339436A (en) Data identification method, device, equipment and readable storage medium
CN110502664A (en) Video tab indexes base establishing method, video tab generation method and device
CN102855245A (en) Image similarity determining method and image similarity determining equipment
CN106301979B (en) Method and system for detecting abnormal channel
CN106998336B (en) Method and device for detecting user in channel
CN105989114A (en) Collection content recommendation method and terminal
CN107885754B (en) Method and device for extracting credit variable from transaction data based on LDA model
CN104731937A (en) User behavior data processing method and device
CN107679883A (en) The method and system of advertisement generation
CN104484651A (en) Dynamic portrait comparing method and system
CN111385655A (en) Advertisement bullet screen detection method and device, server and storage medium
CN111784360B (en) Anti-fraud prediction method and system based on network link backtracking
CN111325255B (en) Specific crowd delineating method and device, electronic equipment and storage medium
CN107025567A (en) A kind of data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200707

RJ01 Rejection of invention patent application after publication