CN109145291A - A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword - Google Patents

A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword Download PDF

Info

Publication number
CN109145291A
CN109145291A CN201810829527.3A CN201810829527A CN109145291A CN 109145291 A CN109145291 A CN 109145291A CN 201810829527 A CN201810829527 A CN 201810829527A CN 109145291 A CN109145291 A CN 109145291A
Authority
CN
China
Prior art keywords
barrage
document
word
evaluated
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810829527.3A
Other languages
Chinese (zh)
Inventor
张祥
马逢伯
刘静
仇贲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huya Information Technology Co Ltd
Original Assignee
Guangzhou Huya Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huya Information Technology Co Ltd filed Critical Guangzhou Huya Information Technology Co Ltd
Priority to CN201810829527.3A priority Critical patent/CN109145291A/en
Publication of CN109145291A publication Critical patent/CN109145291A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses method, apparatus, equipment and the storage mediums of a kind of screening of barrage keyword.This method comprises: construction barrage document sets relevant to multiple main broadcasters, and select the corresponding barrage document of the main broadcaster for meeting the first preset condition as document to be evaluated;It include the barrage number of documents of the alternative word in first frequency of occurrences and the barrage document sets in correspondence document to be evaluated according to alternative word, determining key weight corresponding with each alternative word;According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated.It solves the problems, such as that a large amount of barrage contents of manual read, contact staff's subjectivity is needed to select high-quality barrage content by the above method, realizes after the conventional word for excluding displaying, select the word for representing barrage core content.

Description

A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword
Technical field
The present embodiments relate to word processing technology more particularly to a kind of method, apparatus of barrage keyword screening, set Standby and storage medium.
Background technique
With the development of Internet technology and intelligent mobile terminal equipment, work, life of the various internet products to people Living to bring many convenient and amusement, in recent years, all kinds of live streaming platforms for net cast emerge one after another, and net cast is to people Bring more real-time social experience.It is limited by live form, between the main broadcaster and user of same direct broadcasting room, Yong Huhe Between user, communication exchange is mainly carried out by way of barrage.
The quality of barrage directly affects the viewing experience of user, and currently used method is that machine directly selects out in barrage The higher word of the frequency of occurrences analyzes the quality of barrage as barrage keyword.But the word chosen in this way can not Good barrage quality of withdrawing deposit, by taking game live scene as an example, when there is wonderful, a large amount of barrages will appear The word of " 666 ", " severity " or " show " this kind.It, cannot be fine after these words become the words of description of displaying Embodiment barrage quality.
Also barrage is read often through artificial now, picks out high-quality barrage.But due to barrage amount quantity It is huge, need platform side to put into the screening that a large amount of contact staff carry out high-quality barrage.This screening also has very big master simultaneously See color.
Summary of the invention
The embodiment of the present invention provides a kind of barrage keyword screening technique, device, equipment and storage medium, in a large amount of bullets The word for representing barrage core content is accurately filtered out in curtain.
In a first aspect, the embodiment of the invention provides a kind of methods of barrage keyword screening, comprising:
Barrage document sets relevant to multiple main broadcasters are constructed, and select the corresponding barrage of main broadcaster for meeting the first preset condition Document is as document to be evaluated;Wherein, barrage document sets are made of multiple barrage documents, include multiple barrages in barrage document Word;
It include institute in first frequency of occurrences and the barrage document sets in correspondence document to be evaluated according to alternative word State the barrage number of documents of alternative word, determining key weight corresponding with each alternative word;Wherein, the barrage text The ratio of the barrage number of documents including the alternative word is denoted as second frequency of occurrences in the quantity and barrage document sets of shelves collection;
According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated;Its In, the keyword weight is directly proportional to first frequency of occurrences, is inversely proportional with second frequency of occurrences.
Second aspect, the embodiment of the invention also provides a kind of devices of barrage keyword screening, comprising:
Barrage document sets constructing module for constructing barrage document sets relevant to multiple main broadcasters, and selects to meet first The corresponding barrage document of the main broadcaster of preset condition is as document to be evaluated;Wherein, barrage document sets are made of multiple barrage documents, It include multiple barrage words in barrage document;
Key weight calculation module, for first frequency of occurrences according to alternative word in correspondence document to be evaluated, with And include the barrage number of documents of the alternative word in the barrage document sets, determining pass corresponding with each alternative word Keyness weight;Wherein, the quantity of the barrage document sets and the barrage number of documents in barrage document sets including the alternative word Ratio be denoted as second frequency of occurrences;
Keyword determining module, for being sieved in the barrage word of each document to be evaluated according to the key weight Select corresponding keyword;Wherein, the keyword weight is directly proportional to first frequency of occurrences, frequency occurs with described second Rate is inversely proportional.
The third aspect, the embodiment of the invention also provides a kind of equipment, comprising:
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing The method that device realizes a kind of barrage keyword screening as described in any embodiment of that present invention.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer Program realizes a kind of side of barrage keyword screening as described in any embodiment of that present invention when the program is executed by processor Method.
The embodiment of the present invention selects to meet the first preset condition by constructing barrage document sets relevant to multiple main broadcasters The corresponding barrage document of main broadcaster as document to be evaluated;According to first appearance frequency of the alternative word in correspondence document to be evaluated It include the barrage number of documents of the alternative word in rate and the barrage document sets, determination is right respectively with each alternative word The key weight answered;According to the key weight, corresponding pass is filtered out in the barrage word of each document to be evaluated Keyword.It solves the problems, such as that a large amount of barrage contents of manual read, contact staff's subjectivity is needed to select high-quality barrage content, realization is being arranged After the conventional word of displaying, the word for representing barrage core content is selected.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the method for barrage keyword screening that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of the method for barrage keyword screening provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of structure chart of the device for barrage keyword screening that the embodiment of the present invention three provides;
Fig. 4 is a kind of structural schematic diagram for equipment that the embodiment of the present invention four provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow chart of the method for barrage keyword screening that the embodiment of the present invention one provides.The present embodiment can Suitable for the scene of direct broadcasting room barrage keyword where counting main broadcaster, this method can be by a kind of device of barrage keyword screening It executes, which can be realized by way of hardware and/or software, and can generally be integrated in server or terminal device In.
Wherein, main broadcaster has unique identification number, the barrage information of collected same main broadcaster can be stored in one In document (barrage document), and the document is associated with preservation with the identification number of main broadcaster, thus come distinguish different main broadcaster with not Same document.Barrage refers to the comment subtitle popped up when watching video, and in live streaming platform, barrage is that main broadcaster and spectators are mutual A kind of dynamic mode.Every barrage at least will include generation time and display two information of content.
With reference to Fig. 1, the present embodiment specifically comprises the following steps:
S101, construction barrage document sets relevant to multiple main broadcasters, and select the main broadcaster for meeting the first preset condition corresponding Barrage document as document to be evaluated.Wherein, barrage document sets are made of multiple barrage documents, include multiple in barrage document Barrage word.
Wherein, barrage document is to number (ID) associated document in main broadcaster, and a main broadcaster has a corresponding bullet Curtain document, all barrage information of direct broadcasting room can all be stored in this barrage document where main broadcaster.Barrage information refers to spectators' input The summation of barrage and the barrage of main broadcaster's input, barrage information become barrage word by word segmentation processing, and each barrage word further includes using The time of family (main broadcaster or spectators) input barrage information.Barrage document sets are made of multiple barrage documents.Barrage document sets include Barrage document, barrage document include barrage word.First preset condition is to screen the condition of main broadcaster, since live streaming industry is a head The apparent industry of portion's Benefit Transfer, it is possible to it is excellent to obtain that keyword screening be carried out by the barrage document to head main broadcaster Barrage and crucial barrage.First preset condition can be set are as follows: the newly-increased barrage number per minute of direct broadcasting room where main broadcaster reaches certain One standard;And/or main broadcaster is averaged line duration more than sometime standard weekly;And/or main broadcaster is to preset range Main broadcaster, for example, the platform star main broadcaster set by operation personnel on backstage.Document to be evaluated is screened by the first preset condition The corresponding barrage document of main broadcaster out, is a part of barrage document sets.
Specifically, using main broadcaster ID as the mark of barrage document, to distinguish barrage document, and by barrage document and master It broadcasts associated.Barrage information is directly stored in barrage document, barrage information becomes barrage word after carrying out word segmentation processing, and barrage word is The subsequent basic unit for carrying out weight calculation.The barrage document of all main broadcasters constitutes barrage document sets.Main broadcaster is screened, such as It selects live streaming platform (main broadcaster of a certain live streaming platform being not limited to, as long as barrage information can be obtained by technological means) Ten main broadcaster is as the main broadcaster for meeting the first preset condition before the ranking of three kinds of different classes of game, then by this 30 main broadcasters couple The barrage document answered is as document to be evaluated.
S102, include in first frequency of occurrences and barrage document sets in correspondence document to be evaluated according to alternative word The barrage number of documents of alternative word determines key weight corresponding with each alternative word.Wherein, the barrage document sets The ratio of barrage number of documents in quantity and barrage document sets including the alternative word is denoted as second frequency of occurrences.
Wherein, alternative word is the word for being selected for being calculated key weight, and alternative word is exactly some in barrage word. The key weight that barrage word in article to be evaluated is calculated according to the mode of traversal, then needing successively will be in document to be evaluated Alternately word carries out the calculating of key weight to each barrage word.First frequency of occurrences is conduct in some document to be evaluated The quotient of the number of the barrage word of alternative word and the number of this all barrage word of document to be evaluated.Include in barrage document sets The barrage number of documents of alternative word refers to, if barrage document sets are made of 10000 barrage documents, wherein there is 100 documents to go out It now word " fifty and fifty percent " (not considering the number occurred in each barrage document), include the barrage of alternative word in barrage document sets Number of documents is 100.Key weight can pass throughThis formula calculates;Wherein, f indicates a certain alternative First frequency of occurrences of the word in correspondence document to be evaluated, TallIndicate the number of barrage document in barrage document sets, TconIt indicates It include the barrage number of documents of the alternative word in the barrage document sets.
Specifically, determining the key weight of each barrage word in document to be evaluated.It is selected in document to be evaluated first One in barrage word alternately word, calculates the first frequency of this alternative word.Calculating first frequency will count alternative word and exist The number of all barrage words of document to be evaluated where the number and alternative word that occur in document to be evaluated, first frequency is The quotient of the number of existing number and all barrage words.Key weight is also related to second frequency of occurrences.Second go out line frequency with Barrage number of documents in barrage document sets including the alternative word is related.Regardless of alternative word occurs in same barrage document Several times, being denoted as a barrage document includes alternative word.Second frequency of occurrences is the quantity and barrage document of barrage document sets Concentrate the ratio of the barrage number of documents including the alternative word.
S103, according to key weight, filter out corresponding keyword in the barrage word of each document to be evaluated.Wherein, The keyword weight is directly proportional to first frequency of occurrences, is inversely proportional with second frequency of occurrences.
Wherein, keyword is the important content in barrage that can represent determined after selecting barrage word, and area Divide in the barrage word of other words largely occurred.
Specifically, barrage word in document to be evaluated can be according to value ranked up from high to low by key weight, take pre- If the barrage word of number is as keyword.It can also be more than the barrage word work of default weight by weight key in document to be evaluated For keyword.Here predetermined number and default power weight can be according to the demand sets itself of user.
The present invention selects the main broadcaster for meeting the first preset condition by constructing barrage document sets relevant to multiple main broadcasters Corresponding barrage document is as document to be evaluated;According to first frequency of occurrences of the alternative word in correspondence document to be evaluated, and It include the barrage number of documents of the alternative word in the barrage document sets, determining key corresponding with each alternative word Property weight;According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated.It solves The problem of needing a large amount of barrage contents of manual read, contact staff's subjectivity to select high-quality barrage content is realized and is excluding displaying Conventional word after, select the word for representing barrage core content.
Embodiment two
Fig. 2 is a kind of flow chart of the method for barrage keyword screening provided by Embodiment 2 of the present invention.The present embodiment is The refinement carried out on the basis of example 1 is mainly described in detail barrage keyword screening plant is how to carry out barrage pass Keyword screening.It is specific:
Barrage document sets relevant to multiple main broadcasters are constructed, are specifically included:
Establish the connection relationship of main broadcaster's number and the barrage document for saving its barrage information;
The barrage document includes barrage information and barrage information generation time;
Word segmentation processing is done to the barrage information, barrage word is stored in the barrage document.
It is specifically included according to first frequency of occurrences of the alternative word in correspondence document to be evaluated:
Determine that a barrage word is alternative word in document to be evaluated;
Frequency that the alternative word occurs in correspondence document to be evaluated is calculated as first frequency of occurrences.
The calculation method of key weight, comprising:
Determine that a barrage word is alternative word, is specifically included in document to be evaluated:
Using traversal mode, barrage to be evaluated is successively selected to concentrate all words alternately word;Or
Using three layers of bayesian probability model, by the result of three layers of bayesian probability model alternately word.
Wherein, f indicates first frequency of occurrences of a certain alternative word in correspondence document to be evaluated, TallIndicate barrage document Concentrate the number of barrage document, TconIndicate the barrage number of documents in the barrage document sets including the alternative word.
The calculation method of key weight, further includes:
Wherein, favgIndicate the third frequency that the alternative word occurs in all barrage words of barrage document sets.
According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated, is wrapped It includes:
Barrage word in document to be evaluated is according to value ranked up from high to low by key weight, takes the barrage of predetermined number Word is as keyword;And/or
It is more than the barrage word of default weight as keyword using weight key in document to be evaluated.
Specifically, method provided in this embodiment specifically includes with reference to Fig. 2:
S201, the connection relationship for establishing main broadcaster's number and the barrage document for saving its barrage information.
Wherein, main broadcaster's number refers to that main broadcaster when platform registration is broadcast live, just has one and distinguishes with other main broadcasters Number.Barrage document can be established by title of main broadcaster's number.
Specifically, using main broadcaster ID as the mark of barrage document, to distinguish barrage document, and by barrage document and master It broadcasts associated.
S202, barrage document include barrage information and barrage information generation time.
Specifically, each barrage information preservation is entered into barrage document associated with main broadcaster when main broadcaster is broadcast live, The also generation time of this barrage information and the sender of this barrage information saved related to barrage information.Save barrage Information generation time is because when screening keyword, is not to screen in all barrage information when needing since registering main broadcaster Barrage keyword, so needing to screen barrage information by the relevant temporal information of barrage information.Optionally, Ke Yishe It sets barrage document and only saves nearest two months (preset time) the barrage information generated.On this basis, barrage pass is carried out every time It when keyword screens, can also be further arranged to: screen barrage keyword (in preset time) in nearest two weeks.
S203, word segmentation processing is done to barrage information, barrage word is stored in barrage document.
Wherein, existing segmentation methods can be divided into three categories: the segmenting method based on string matching, point based on understanding Word method and segmenting method based on statistics.It is combined according to whether with part-of-speech tagging process, and simple participle side can be divided into The integral method that method and participle are combined with mark.Common several mechanical segmentation methods are as follows: (1) Forward Maximum Method method (by left-to-right direction);(2) reverse maximum matching method (by right to left direction);(3) minimum cutting (makes to cut out in each sentence Word number it is minimum);(4) two-way maximum matching method (carry out by it is left-to-right, by right to left twice sweep).The present invention is used to specific Which kind of mode, which is segmented, does not do concrete regulation, as long as can achieve the purpose that participle.
Specifically, doing word segmentation processing to the barrage information in barrage document, the barrage information after participle is with the shape of barrage word Formula is stored in barrage document.The transmission of the also generation time and barrage information of barrage information saved associated with barrage word Person.
S204, select the corresponding barrage document of the main broadcaster for meeting the first preset condition as document to be evaluated.
Specifically, screening to main broadcaster, such as selection (is not limited to the main broadcaster of a certain live streaming platform, only in live streaming platform Want that barrage information can be obtained by technological means) before the rankings of three kinds of different classes of game ten main broadcaster as meeting first The main broadcaster of preset condition, then using this corresponding barrage document of 30 main broadcasters as document to be evaluated.At the same time it can also pass through the One preset condition restricts the timeliness of barrage keyword, such as: screening this nearest two weeks barrage information of 30 main broadcasters Basic information as screening barrage keyword.
S205, determine that a barrage word is alternative word in document to be evaluated.
Specifically, traversal mode can be used, barrage to be evaluated is successively selected to concentrate all words alternately word;It can also To use three layers of bayesian probability model, by the result of three layers of bayesian probability model alternately word.Bayesian probability (Bayesian Probability) is the explanation of a kind of pair of probability as provided by bayesian theory, it is used definition of probability For the concept for the degree that someone trusts a proposition.S206, the frequency that alternative word occurs in correspondence document to be evaluated is calculated As first frequency of occurrences.
Specifically, the corresponding barrage document of main broadcaster for determining that main broadcaster ID is 1111 is document to be evaluated, barrage document warp After crossing word segmentation processing, including a word.Determine that word " AAAA " is alternative word, in document to be evaluated, word " AAAA " occurs B times, then first frequency of occurrences that can determine that alternative word " AAAA " occurs in document to be evaluated is
S207, according in barrage document sets include alternative word barrage number of documents, determination respectively corresponded with each alternative word Key weight.
Wherein, the calculation method of key weight, comprising:Wherein, f table Show first frequency of occurrences of a certain alternative word in correspondence document to be evaluated, TallIndicate of barrage document in barrage document sets Number, TconIndicate the barrage number of documents in the barrage document sets including the alternative word.
Specifically, assuming there be 10000 main broadcasters (main broadcaster number successively from 00001 to 10000), then barrage document is concentrated with 10000 barrage document (Tall=10000).Determine that the corresponding barrage document of main broadcaster that main broadcaster ID is 1111 is document to be evaluated, The barrage document is after word segmentation processing, including 20000 words.Determine that word " AAAA " is alternative word, in document to be evaluated, Word " AAAA " occurs 300 times, then can determine first frequency of occurrences that alternative word " AAAA " occurs in document to be evaluatedMeanwhile there are 199 barrage documents barrage word " AAAA " (T occurcon=199).It is possible thereby to It calculates, alternative word " AAAA " in barrage document 1111
Optionally, the calculation method of key weight, further includes: Wherein, favgIndicate the third frequency that the alternative word occurs in all barrage words of barrage document sets.
Specifically, assuming there be 10000 main broadcasters (main broadcaster number successively from 00001 to 10000), then barrage document is concentrated with 10000 barrage document (Tall=10000), average each barrage document has 20000 barrage words.Determine that main broadcaster ID is 1111 The corresponding barrage document of main broadcaster be document to be evaluated, the barrage document is after word segmentation processing, including 20000 words.It determines Word " BBBB " is alternative word, and in document to be evaluated, word " BBBB " occurs 300 times, then can determine alternative word First frequency of occurrences that " BBBB " occurs in document to be evaluatedMeanwhile there are 199 barrage documents There is barrage word " BBBB " (TconIt is=199), averagely each in 199 barrage documents barrage word " BBBB " occur 200 times,It is possible thereby to calculate, alternative word " BBBB " in barrage document 1111
S208, barrage word in document to be evaluated is according to value ranked up from high to low by key weight, takes predetermined number Barrage word as keyword.
Specifically, barrage words all in barrage document are according to value ranked up from high to low according to key weight, it is such as pre- The keyword number for first setting each document to be evaluated is 3 (predetermined number), then selects key power in document to be evaluated Weight values ranking first three the document to be evaluated as described in keyword.
It optionally, is more than the barrage word of default weight as keyword using weight key in document to be evaluated.
Specifically, different modes is taken to obtain the key weight of barrage word in document to be evaluated, need to set difference Key weighted value.According toThis formula calculates key weighted value, then Can set key weighted value is more than the barrage word of 0.03 (default weighted value) as keyword;According toThis formula calculates key weighted value, then can set will be key Weighted value is more than the barrage word of 2 (default weighted values) as keyword.
The present invention selects the main broadcaster for meeting the first preset condition by constructing barrage document sets relevant to multiple main broadcasters Corresponding barrage document is as document to be evaluated;According to first frequency of occurrences of the alternative word in correspondence document to be evaluated, and It include the barrage number of documents of the alternative word in the barrage document sets, determining key corresponding with each alternative word Property weight;According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated.It solves The problem of needing a large amount of barrage contents of manual read, contact staff's subjectivity to select high-quality barrage content is realized and is excluding displaying Conventional word after, select the word for representing barrage core content.
Embodiment three
Fig. 3 is a kind of structure chart of the device for barrage keyword screening that the embodiment of the present invention three provides.The device includes: Barrage document sets constructing module 31, key weight calculation module 32 and keyword determining module 33.It is specific:
Barrage document sets constructing module 31, for constructing relevant to multiple main broadcasters barrage document sets, and selection meets the The corresponding barrage document of the main broadcaster of one preset condition is as document to be evaluated;Wherein, barrage document sets are by multiple barrage sets of documentation At including multiple barrage words in barrage document;
Key weight calculation module 32, for first frequency of occurrences according to alternative word in correspondence document to be evaluated, And including the barrage number of documents of the alternative word in the barrage document sets, determination is corresponding with each alternative word Key weight;Wherein, the quantity of the barrage document sets and the barrage number of files in barrage document sets including the alternative word The ratio of amount is denoted as second frequency of occurrences;
Keyword determining module 33 is used for according to the key weight, in the barrage word of each document to be evaluated Filter out corresponding keyword;Wherein, the keyword weight is directly proportional to first frequency of occurrences, occurs with described second Frequency is inversely proportional.
The embodiment of the present invention selects to meet the first preset condition by constructing barrage document sets relevant to multiple main broadcasters The corresponding barrage document of main broadcaster as document to be evaluated;According to first appearance frequency of the alternative word in correspondence document to be evaluated It include the barrage number of documents of the alternative word in rate and the barrage document sets, determination is right respectively with each alternative word The key weight answered;According to the key weight, corresponding pass is filtered out in the barrage word of each document to be evaluated Keyword.It solves the problems, such as that a large amount of barrage contents of manual read, contact staff's subjectivity is needed to select high-quality barrage content, realization is being arranged After the conventional word of displaying, the word for representing barrage core content is selected.
On the basis of the above embodiments, barrage document sets constructing module is specifically used for:
Establish the connection relationship of main broadcaster's number and the barrage document for saving its barrage information;
The barrage document includes barrage information and barrage information generation time;
Word segmentation processing is done to the barrage information, barrage word is stored in the barrage document.
On the basis of the above embodiments, the key weight calculation module specifically includes:
Determine that a barrage word is alternative word in document to be evaluated;It include: successively to be selected to be evaluated using traversal mode Barrage concentrates all words alternately word;Or three layers of bayesian probability model are used, by three layers of bayesian probability model Result alternately word.
Frequency that the alternative word occurs in correspondence document to be evaluated is calculated as first frequency of occurrences.
On the basis of the above embodiments, the calculation method of the key weight of the key weight calculation module, packet It includes:
Wherein, f indicates first frequency of occurrences of a certain alternative word in correspondence document to be evaluated, TallIndicate barrage document Concentrate the number of barrage document, TconIndicate the barrage number of documents in the barrage document sets including the alternative word.
On the basis of the above embodiments, the calculation method of the key weight of the key weight calculation module, also Include:
Wherein, favgIndicate the third frequency that the alternative word occurs in all barrage words of barrage document sets.
On the basis of the above embodiments, the keyword determining module specifically includes:
Barrage word in document to be evaluated is according to value ranked up from high to low by key weight, takes the barrage of predetermined number Word is as keyword;And/or
It is more than the barrage word of default weight as keyword using weight key in document to be evaluated.
A kind of device of barrage keyword screening provided in this embodiment can be used for executing what any of the above-described embodiment provided A kind of method of barrage keyword screening, has corresponding function and beneficial effect.
Example IV
Fig. 4 is a kind of structural schematic diagram for equipment that the embodiment of the present invention four provides.As shown in figure 4, the equipment includes place Manage device 40, memory 41, communication module 42, input unit 43 and output device 44;The quantity of processor 40 can be in equipment One or more, in Fig. 4 by taking a processor 40 as an example;Processor 40, memory 41, communication module 42, input in equipment Device 43 can be connected with output device 44 by bus or other modes, in Fig. 4 for being connected by bus.
Memory 41 is used as a kind of computer readable storage medium, can be used for storing software program, journey can be performed in computer Sequence and module, if the corresponding module of method of one of the present embodiment barrage keyword screening is (for example, a kind of barrage is crucial Barrage document sets constructing module 31, key weight calculation module 32 and the keyword determining module 33 of word screening).Processor 40 By running the software program, instruction and the module that are stored in memory 41, thereby executing equipment various function application with And data processing, that is, realize a kind of method of above-mentioned barrage keyword screening.
Memory 41 can mainly include storing program area and storage data area, wherein storing program area can store operation system Application program needed for system, at least one function;Storage data area, which can be stored, uses created data etc. according to equipment.This Outside, memory 41 may include high-speed random access memory, can also include nonvolatile memory, for example, at least a magnetic Disk storage device, flush memory device or other non-volatile solid state memory parts.In some instances, memory 41 can be further Including the memory remotely located relative to processor 40, these remote memories can pass through network connection to equipment.It is above-mentioned The example of network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Communication module 42 for establishing connection with display screen, and realizes the data interaction with display screen.Input unit 43 can Number for receiving input or character information, and generate key signals related with the user setting of equipment and function control Input.
The side for the barrage keyword screening that any embodiment of the present invention provides can be performed in a kind of equipment provided in this embodiment Method, specific corresponding function and beneficial effect.
Embodiment five
The embodiment of the present invention five also provides a kind of storage medium comprising computer executable instructions, and the computer can be held A kind of method of the row instruction when being executed by computer processor for executing barrage keyword screening, this method comprises:
Barrage document sets relevant to multiple main broadcasters are constructed, and select the corresponding barrage of main broadcaster for meeting the first preset condition Document is as document to be evaluated;Wherein, barrage document sets are made of multiple barrage documents, include multiple barrages in barrage document Word;
It include institute in first frequency of occurrences and the barrage document sets in correspondence document to be evaluated according to alternative word State the barrage number of documents of alternative word, determining key weight corresponding with each alternative word;Wherein, the barrage text The ratio of the barrage number of documents including the alternative word is denoted as second frequency of occurrences in the quantity and barrage document sets of shelves collection;
According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated;Its In, the keyword weight is directly proportional to first frequency of occurrences, is inversely proportional with second frequency of occurrences.
Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention The method operation that executable instruction is not limited to the described above, can also be performed the key of barrage provided by any embodiment of the present invention Relevant operation in the method for word screening.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art Part can be embodied in the form of software products, which can store in computer readable storage medium In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
It is worth noting that, in the embodiment of the device of above-mentioned barrage keyword screening, included each unit and mould Block is only divided according to the functional logic, but is not limited to the above division, and is as long as corresponding functions can be realized It can;In addition, the specific name of each functional unit is also only for convenience of distinguishing each other, the protection model being not intended to restrict the invention It encloses.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (10)

1. a kind of method of barrage keyword screening characterized by comprising
Barrage document sets relevant to multiple main broadcasters are constructed, and select the corresponding barrage document of main broadcaster for meeting the first preset condition As document to be evaluated;Wherein, barrage document sets are made of multiple barrage documents, include multiple barrage words in barrage document;
It include described standby in first frequency of occurrences and the barrage document sets in correspondence document to be evaluated according to alternative word Select the barrage number of documents of word, determining key weight corresponding with each alternative word;Wherein, the barrage document sets Quantity and barrage document sets in include that the ratio of barrage number of documents of the alternative word is denoted as second frequency of occurrences;
According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated;Wherein, institute It is directly proportional to first frequency of occurrences to state keyword weight, is inversely proportional with second frequency of occurrences.
2. the method according to claim 1, wherein relevant to the multiple main broadcasters barrage document sets of the construction, It specifically includes:
Establish the connection relationship of main broadcaster's number and the barrage document for saving its barrage information;
The barrage document includes barrage information and barrage information generation time;
Word segmentation processing is done to the barrage information, barrage word is stored in the barrage document.
3. the method according to claim 1, wherein according to alternative word in correspondence document to be evaluated One frequency of occurrences specifically includes:
Determine that a barrage word is alternative word in document to be evaluated;
Frequency that the alternative word occurs in correspondence document to be evaluated is calculated as first frequency of occurrences.
4. according to the method described in claim 3, it is characterized in that, described determine that a barrage word is standby in document to be evaluated Word is selected, is specifically included:
Using traversal mode, barrage to be evaluated is successively selected to concentrate all words alternately word;Or
Using three layers of bayesian probability model, by the result of three layers of bayesian probability model alternately word.
5. the method according to claim 1, wherein the calculation method of the key weight, comprising:
Wherein, f indicates first frequency of occurrences of a certain alternative word in correspondence document to be evaluated, TallIt indicates in barrage document sets The number of barrage document, TconIndicate the barrage number of documents in the barrage document sets including the alternative word.
6. according to the method described in claim 5, it is characterized in that, the calculation method of the key weight, further includes:
Wherein, favgIndicate the third frequency that the alternative word occurs in all barrage words of barrage document sets.
7. the method according to claim 1, wherein described according to the key weight, each described to be evaluated Corresponding keyword is filtered out in the barrage word of valence document, comprising:
Barrage word in document to be evaluated is according to value ranked up from high to low by key weight, the barrage word of predetermined number is taken to make For keyword;And/or
It is more than the barrage word of default weight as keyword using weight key in document to be evaluated.
8. a kind of device of barrage keyword screening characterized by comprising
Barrage document sets constructing module, for constructing barrage document sets relevant to multiple main broadcasters, and selection meets first and presets The corresponding barrage document of the main broadcaster of condition is as document to be evaluated;Wherein, barrage document sets are made of multiple barrage documents, barrage It include multiple barrage words in document;
Key weight calculation module, for first frequency of occurrences according to alternative word in correspondence document to be evaluated, Yi Jisuo The barrage number of documents in barrage document sets including the alternative word is stated, determination is corresponding key with each alternative word Weight;Wherein, the ratio of the quantity of the barrage document sets and the barrage number of documents in barrage document sets including the alternative word Value is denoted as second frequency of occurrences;
Keyword determining module, for being filtered out in the barrage word of each document to be evaluated according to the key weight Corresponding keyword;Wherein, the keyword weight is directly proportional to first frequency of occurrences, with second frequency of occurrences at Inverse ratio.
9. a kind of equipment characterized by comprising
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real A kind of existing method of barrage keyword screening as claimed in claim 1.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor A kind of method of barrage keyword screening as claimed in claim 1 is realized when execution.
CN201810829527.3A 2018-07-25 2018-07-25 A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword Pending CN109145291A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810829527.3A CN109145291A (en) 2018-07-25 2018-07-25 A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810829527.3A CN109145291A (en) 2018-07-25 2018-07-25 A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword

Publications (1)

Publication Number Publication Date
CN109145291A true CN109145291A (en) 2019-01-04

Family

ID=64797927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810829527.3A Pending CN109145291A (en) 2018-07-25 2018-07-25 A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword

Country Status (1)

Country Link
CN (1) CN109145291A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933691A (en) * 2019-02-11 2019-06-25 北京百度网讯科技有限公司 Method, apparatus, equipment and storage medium for content retrieval
CN109982128A (en) * 2019-03-19 2019-07-05 腾讯科技(深圳)有限公司 Barrage generation method, device, storage medium and the electronic device of video
CN111340329A (en) * 2020-02-05 2020-06-26 科大讯飞股份有限公司 Actor assessment method and device and electronic equipment
CN112347764A (en) * 2020-11-05 2021-02-09 中国平安人寿保险股份有限公司 Method and device for generating barrage cloud and computer equipment
CN112887746A (en) * 2021-01-22 2021-06-01 维沃移动通信(深圳)有限公司 Live broadcast interaction method and device
CN114827745A (en) * 2022-04-08 2022-07-29 海信集团控股股份有限公司 Video subtitle generation method and electronic equipment
CN115883912A (en) * 2023-03-08 2023-03-31 山东水浒文化传媒有限公司 Interaction method and system for internet communication demonstration

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893478A (en) * 2016-03-29 2016-08-24 广州华多网络科技有限公司 Tag extraction method and equipment
CN105975453A (en) * 2015-12-01 2016-09-28 乐视网信息技术(北京)股份有限公司 Method and device for comment label extraction
US20170139899A1 (en) * 2015-11-18 2017-05-18 Le Holdings (Beijing) Co., Ltd. Keyword extraction method and electronic device
CN107480123A (en) * 2017-06-28 2017-12-15 武汉斗鱼网络科技有限公司 A kind of recognition methods, device and the computer equipment of rubbish barrage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170139899A1 (en) * 2015-11-18 2017-05-18 Le Holdings (Beijing) Co., Ltd. Keyword extraction method and electronic device
CN105975453A (en) * 2015-12-01 2016-09-28 乐视网信息技术(北京)股份有限公司 Method and device for comment label extraction
CN105893478A (en) * 2016-03-29 2016-08-24 广州华多网络科技有限公司 Tag extraction method and equipment
CN107480123A (en) * 2017-06-28 2017-12-15 武汉斗鱼网络科技有限公司 A kind of recognition methods, device and the computer equipment of rubbish barrage

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王伟军等: "《大数据分析》", 31 May 2017, 重庆大学出版社 *
高尚: "《分布估计算法及其应用》", 31 January 2016, 国防工业出版社 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933691A (en) * 2019-02-11 2019-06-25 北京百度网讯科技有限公司 Method, apparatus, equipment and storage medium for content retrieval
CN109933691B (en) * 2019-02-11 2023-06-09 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for content retrieval
CN109982128A (en) * 2019-03-19 2019-07-05 腾讯科技(深圳)有限公司 Barrage generation method, device, storage medium and the electronic device of video
CN111340329A (en) * 2020-02-05 2020-06-26 科大讯飞股份有限公司 Actor assessment method and device and electronic equipment
CN111340329B (en) * 2020-02-05 2024-02-20 科大讯飞股份有限公司 Actor evaluation method and device and electronic equipment
CN112347764A (en) * 2020-11-05 2021-02-09 中国平安人寿保险股份有限公司 Method and device for generating barrage cloud and computer equipment
CN112347764B (en) * 2020-11-05 2024-05-07 中国平安人寿保险股份有限公司 Method and device for generating barrage cloud and computer equipment
CN112887746A (en) * 2021-01-22 2021-06-01 维沃移动通信(深圳)有限公司 Live broadcast interaction method and device
CN114827745A (en) * 2022-04-08 2022-07-29 海信集团控股股份有限公司 Video subtitle generation method and electronic equipment
CN114827745B (en) * 2022-04-08 2023-11-14 海信集团控股股份有限公司 Video subtitle generation method and electronic equipment
CN115883912A (en) * 2023-03-08 2023-03-31 山东水浒文化传媒有限公司 Interaction method and system for internet communication demonstration

Similar Documents

Publication Publication Date Title
CN109145291A (en) A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword
CN103442252B (en) Method for processing video frequency and device
CN109408639B (en) Bullet screen classification method, bullet screen classification device, bullet screen classification equipment and storage medium
CN105100165B (en) Network service recommends method and apparatus
CN105872837A (en) User recommendation method and device
KR101404585B1 (en) Segment creation device, segment creation method, and computer-readable recording medium having a segment creation program
CN106331778A (en) Video recommendation method and device
CN102222111B (en) Method for retrieving high-definition video content
CN105210048A (en) Content-identification engine based on social media
CN103718166A (en) Information processing apparatus, information processing method, and computer program product
CN109151500A (en) A kind of main broadcaster's recommended method, system and computer equipment for net cast
CN106028167A (en) Barrage display method and device
CN102332001A (en) Video thumbnail generation method and device
CN113301376B (en) Live broadcast interaction method and system based on virtual reality technology
CN109255632A (en) A kind of user community recognition methods, device, equipment and medium
CN112653918B (en) Preview video generation method and device, electronic equipment and storage medium
CN112686165A (en) Method and device for identifying target object in video, electronic equipment and storage medium
CN105635749A (en) Method and device for generating video frame set
CN104883627A (en) Plot movie and television, and broadcasting device and method thereof
CN103731737B (en) A kind of video information update method and electronic equipment
CN106791850A (en) Method for video coding and device
CN107295377A (en) Moviemaking method, apparatus and system
Rantasila et al. # fukushima Five years on: a multimethod analysis of twitter on the anniversary of the nuclear disaster
CN112199582A (en) Content recommendation method, device, equipment and medium
CN107343221B (en) Online multimedia interaction system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190104

RJ01 Rejection of invention patent application after publication