CN109145291A - A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword - Google Patents
A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword Download PDFInfo
- Publication number
- CN109145291A CN109145291A CN201810829527.3A CN201810829527A CN109145291A CN 109145291 A CN109145291 A CN 109145291A CN 201810829527 A CN201810829527 A CN 201810829527A CN 109145291 A CN109145291 A CN 109145291A
- Authority
- CN
- China
- Prior art keywords
- barrage
- document
- word
- evaluated
- keyword
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000012216 screening Methods 0.000 title claims abstract description 38
- 238000010276 construction Methods 0.000 claims abstract description 3
- 238000004364 calculation method Methods 0.000 claims description 17
- 230000015654 memory Effects 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 13
- 230000011218 segmentation Effects 0.000 claims description 12
- 239000012141 concentrate Substances 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000011430 maximum method Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000003442 weekly effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
- H04N21/4788—Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses method, apparatus, equipment and the storage mediums of a kind of screening of barrage keyword.This method comprises: construction barrage document sets relevant to multiple main broadcasters, and select the corresponding barrage document of the main broadcaster for meeting the first preset condition as document to be evaluated;It include the barrage number of documents of the alternative word in first frequency of occurrences and the barrage document sets in correspondence document to be evaluated according to alternative word, determining key weight corresponding with each alternative word;According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated.It solves the problems, such as that a large amount of barrage contents of manual read, contact staff's subjectivity is needed to select high-quality barrage content by the above method, realizes after the conventional word for excluding displaying, select the word for representing barrage core content.
Description
Technical field
The present embodiments relate to word processing technology more particularly to a kind of method, apparatus of barrage keyword screening, set
Standby and storage medium.
Background technique
With the development of Internet technology and intelligent mobile terminal equipment, work, life of the various internet products to people
Living to bring many convenient and amusement, in recent years, all kinds of live streaming platforms for net cast emerge one after another, and net cast is to people
Bring more real-time social experience.It is limited by live form, between the main broadcaster and user of same direct broadcasting room, Yong Huhe
Between user, communication exchange is mainly carried out by way of barrage.
The quality of barrage directly affects the viewing experience of user, and currently used method is that machine directly selects out in barrage
The higher word of the frequency of occurrences analyzes the quality of barrage as barrage keyword.But the word chosen in this way can not
Good barrage quality of withdrawing deposit, by taking game live scene as an example, when there is wonderful, a large amount of barrages will appear
The word of " 666 ", " severity " or " show " this kind.It, cannot be fine after these words become the words of description of displaying
Embodiment barrage quality.
Also barrage is read often through artificial now, picks out high-quality barrage.But due to barrage amount quantity
It is huge, need platform side to put into the screening that a large amount of contact staff carry out high-quality barrage.This screening also has very big master simultaneously
See color.
Summary of the invention
The embodiment of the present invention provides a kind of barrage keyword screening technique, device, equipment and storage medium, in a large amount of bullets
The word for representing barrage core content is accurately filtered out in curtain.
In a first aspect, the embodiment of the invention provides a kind of methods of barrage keyword screening, comprising:
Barrage document sets relevant to multiple main broadcasters are constructed, and select the corresponding barrage of main broadcaster for meeting the first preset condition
Document is as document to be evaluated;Wherein, barrage document sets are made of multiple barrage documents, include multiple barrages in barrage document
Word;
It include institute in first frequency of occurrences and the barrage document sets in correspondence document to be evaluated according to alternative word
State the barrage number of documents of alternative word, determining key weight corresponding with each alternative word;Wherein, the barrage text
The ratio of the barrage number of documents including the alternative word is denoted as second frequency of occurrences in the quantity and barrage document sets of shelves collection;
According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated;Its
In, the keyword weight is directly proportional to first frequency of occurrences, is inversely proportional with second frequency of occurrences.
Second aspect, the embodiment of the invention also provides a kind of devices of barrage keyword screening, comprising:
Barrage document sets constructing module for constructing barrage document sets relevant to multiple main broadcasters, and selects to meet first
The corresponding barrage document of the main broadcaster of preset condition is as document to be evaluated;Wherein, barrage document sets are made of multiple barrage documents,
It include multiple barrage words in barrage document;
Key weight calculation module, for first frequency of occurrences according to alternative word in correspondence document to be evaluated, with
And include the barrage number of documents of the alternative word in the barrage document sets, determining pass corresponding with each alternative word
Keyness weight;Wherein, the quantity of the barrage document sets and the barrage number of documents in barrage document sets including the alternative word
Ratio be denoted as second frequency of occurrences;
Keyword determining module, for being sieved in the barrage word of each document to be evaluated according to the key weight
Select corresponding keyword;Wherein, the keyword weight is directly proportional to first frequency of occurrences, frequency occurs with described second
Rate is inversely proportional.
The third aspect, the embodiment of the invention also provides a kind of equipment, comprising:
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing
The method that device realizes a kind of barrage keyword screening as described in any embodiment of that present invention.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer
Program realizes a kind of side of barrage keyword screening as described in any embodiment of that present invention when the program is executed by processor
Method.
The embodiment of the present invention selects to meet the first preset condition by constructing barrage document sets relevant to multiple main broadcasters
The corresponding barrage document of main broadcaster as document to be evaluated;According to first appearance frequency of the alternative word in correspondence document to be evaluated
It include the barrage number of documents of the alternative word in rate and the barrage document sets, determination is right respectively with each alternative word
The key weight answered;According to the key weight, corresponding pass is filtered out in the barrage word of each document to be evaluated
Keyword.It solves the problems, such as that a large amount of barrage contents of manual read, contact staff's subjectivity is needed to select high-quality barrage content, realization is being arranged
After the conventional word of displaying, the word for representing barrage core content is selected.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the method for barrage keyword screening that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of the method for barrage keyword screening provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of structure chart of the device for barrage keyword screening that the embodiment of the present invention three provides;
Fig. 4 is a kind of structural schematic diagram for equipment that the embodiment of the present invention four provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow chart of the method for barrage keyword screening that the embodiment of the present invention one provides.The present embodiment can
Suitable for the scene of direct broadcasting room barrage keyword where counting main broadcaster, this method can be by a kind of device of barrage keyword screening
It executes, which can be realized by way of hardware and/or software, and can generally be integrated in server or terminal device
In.
Wherein, main broadcaster has unique identification number, the barrage information of collected same main broadcaster can be stored in one
In document (barrage document), and the document is associated with preservation with the identification number of main broadcaster, thus come distinguish different main broadcaster with not
Same document.Barrage refers to the comment subtitle popped up when watching video, and in live streaming platform, barrage is that main broadcaster and spectators are mutual
A kind of dynamic mode.Every barrage at least will include generation time and display two information of content.
With reference to Fig. 1, the present embodiment specifically comprises the following steps:
S101, construction barrage document sets relevant to multiple main broadcasters, and select the main broadcaster for meeting the first preset condition corresponding
Barrage document as document to be evaluated.Wherein, barrage document sets are made of multiple barrage documents, include multiple in barrage document
Barrage word.
Wherein, barrage document is to number (ID) associated document in main broadcaster, and a main broadcaster has a corresponding bullet
Curtain document, all barrage information of direct broadcasting room can all be stored in this barrage document where main broadcaster.Barrage information refers to spectators' input
The summation of barrage and the barrage of main broadcaster's input, barrage information become barrage word by word segmentation processing, and each barrage word further includes using
The time of family (main broadcaster or spectators) input barrage information.Barrage document sets are made of multiple barrage documents.Barrage document sets include
Barrage document, barrage document include barrage word.First preset condition is to screen the condition of main broadcaster, since live streaming industry is a head
The apparent industry of portion's Benefit Transfer, it is possible to it is excellent to obtain that keyword screening be carried out by the barrage document to head main broadcaster
Barrage and crucial barrage.First preset condition can be set are as follows: the newly-increased barrage number per minute of direct broadcasting room where main broadcaster reaches certain
One standard;And/or main broadcaster is averaged line duration more than sometime standard weekly;And/or main broadcaster is to preset range
Main broadcaster, for example, the platform star main broadcaster set by operation personnel on backstage.Document to be evaluated is screened by the first preset condition
The corresponding barrage document of main broadcaster out, is a part of barrage document sets.
Specifically, using main broadcaster ID as the mark of barrage document, to distinguish barrage document, and by barrage document and master
It broadcasts associated.Barrage information is directly stored in barrage document, barrage information becomes barrage word after carrying out word segmentation processing, and barrage word is
The subsequent basic unit for carrying out weight calculation.The barrage document of all main broadcasters constitutes barrage document sets.Main broadcaster is screened, such as
It selects live streaming platform (main broadcaster of a certain live streaming platform being not limited to, as long as barrage information can be obtained by technological means)
Ten main broadcaster is as the main broadcaster for meeting the first preset condition before the ranking of three kinds of different classes of game, then by this 30 main broadcasters couple
The barrage document answered is as document to be evaluated.
S102, include in first frequency of occurrences and barrage document sets in correspondence document to be evaluated according to alternative word
The barrage number of documents of alternative word determines key weight corresponding with each alternative word.Wherein, the barrage document sets
The ratio of barrage number of documents in quantity and barrage document sets including the alternative word is denoted as second frequency of occurrences.
Wherein, alternative word is the word for being selected for being calculated key weight, and alternative word is exactly some in barrage word.
The key weight that barrage word in article to be evaluated is calculated according to the mode of traversal, then needing successively will be in document to be evaluated
Alternately word carries out the calculating of key weight to each barrage word.First frequency of occurrences is conduct in some document to be evaluated
The quotient of the number of the barrage word of alternative word and the number of this all barrage word of document to be evaluated.Include in barrage document sets
The barrage number of documents of alternative word refers to, if barrage document sets are made of 10000 barrage documents, wherein there is 100 documents to go out
It now word " fifty and fifty percent " (not considering the number occurred in each barrage document), include the barrage of alternative word in barrage document sets
Number of documents is 100.Key weight can pass throughThis formula calculates;Wherein, f indicates a certain alternative
First frequency of occurrences of the word in correspondence document to be evaluated, TallIndicate the number of barrage document in barrage document sets, TconIt indicates
It include the barrage number of documents of the alternative word in the barrage document sets.
Specifically, determining the key weight of each barrage word in document to be evaluated.It is selected in document to be evaluated first
One in barrage word alternately word, calculates the first frequency of this alternative word.Calculating first frequency will count alternative word and exist
The number of all barrage words of document to be evaluated where the number and alternative word that occur in document to be evaluated, first frequency is
The quotient of the number of existing number and all barrage words.Key weight is also related to second frequency of occurrences.Second go out line frequency with
Barrage number of documents in barrage document sets including the alternative word is related.Regardless of alternative word occurs in same barrage document
Several times, being denoted as a barrage document includes alternative word.Second frequency of occurrences is the quantity and barrage document of barrage document sets
Concentrate the ratio of the barrage number of documents including the alternative word.
S103, according to key weight, filter out corresponding keyword in the barrage word of each document to be evaluated.Wherein,
The keyword weight is directly proportional to first frequency of occurrences, is inversely proportional with second frequency of occurrences.
Wherein, keyword is the important content in barrage that can represent determined after selecting barrage word, and area
Divide in the barrage word of other words largely occurred.
Specifically, barrage word in document to be evaluated can be according to value ranked up from high to low by key weight, take pre-
If the barrage word of number is as keyword.It can also be more than the barrage word work of default weight by weight key in document to be evaluated
For keyword.Here predetermined number and default power weight can be according to the demand sets itself of user.
The present invention selects the main broadcaster for meeting the first preset condition by constructing barrage document sets relevant to multiple main broadcasters
Corresponding barrage document is as document to be evaluated;According to first frequency of occurrences of the alternative word in correspondence document to be evaluated, and
It include the barrage number of documents of the alternative word in the barrage document sets, determining key corresponding with each alternative word
Property weight;According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated.It solves
The problem of needing a large amount of barrage contents of manual read, contact staff's subjectivity to select high-quality barrage content is realized and is excluding displaying
Conventional word after, select the word for representing barrage core content.
Embodiment two
Fig. 2 is a kind of flow chart of the method for barrage keyword screening provided by Embodiment 2 of the present invention.The present embodiment is
The refinement carried out on the basis of example 1 is mainly described in detail barrage keyword screening plant is how to carry out barrage pass
Keyword screening.It is specific:
Barrage document sets relevant to multiple main broadcasters are constructed, are specifically included:
Establish the connection relationship of main broadcaster's number and the barrage document for saving its barrage information;
The barrage document includes barrage information and barrage information generation time;
Word segmentation processing is done to the barrage information, barrage word is stored in the barrage document.
It is specifically included according to first frequency of occurrences of the alternative word in correspondence document to be evaluated:
Determine that a barrage word is alternative word in document to be evaluated;
Frequency that the alternative word occurs in correspondence document to be evaluated is calculated as first frequency of occurrences.
The calculation method of key weight, comprising:
Determine that a barrage word is alternative word, is specifically included in document to be evaluated:
Using traversal mode, barrage to be evaluated is successively selected to concentrate all words alternately word;Or
Using three layers of bayesian probability model, by the result of three layers of bayesian probability model alternately word.
Wherein, f indicates first frequency of occurrences of a certain alternative word in correspondence document to be evaluated, TallIndicate barrage document
Concentrate the number of barrage document, TconIndicate the barrage number of documents in the barrage document sets including the alternative word.
The calculation method of key weight, further includes:
Wherein, favgIndicate the third frequency that the alternative word occurs in all barrage words of barrage document sets.
According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated, is wrapped
It includes:
Barrage word in document to be evaluated is according to value ranked up from high to low by key weight, takes the barrage of predetermined number
Word is as keyword;And/or
It is more than the barrage word of default weight as keyword using weight key in document to be evaluated.
Specifically, method provided in this embodiment specifically includes with reference to Fig. 2:
S201, the connection relationship for establishing main broadcaster's number and the barrage document for saving its barrage information.
Wherein, main broadcaster's number refers to that main broadcaster when platform registration is broadcast live, just has one and distinguishes with other main broadcasters
Number.Barrage document can be established by title of main broadcaster's number.
Specifically, using main broadcaster ID as the mark of barrage document, to distinguish barrage document, and by barrage document and master
It broadcasts associated.
S202, barrage document include barrage information and barrage information generation time.
Specifically, each barrage information preservation is entered into barrage document associated with main broadcaster when main broadcaster is broadcast live,
The also generation time of this barrage information and the sender of this barrage information saved related to barrage information.Save barrage
Information generation time is because when screening keyword, is not to screen in all barrage information when needing since registering main broadcaster
Barrage keyword, so needing to screen barrage information by the relevant temporal information of barrage information.Optionally, Ke Yishe
It sets barrage document and only saves nearest two months (preset time) the barrage information generated.On this basis, barrage pass is carried out every time
It when keyword screens, can also be further arranged to: screen barrage keyword (in preset time) in nearest two weeks.
S203, word segmentation processing is done to barrage information, barrage word is stored in barrage document.
Wherein, existing segmentation methods can be divided into three categories: the segmenting method based on string matching, point based on understanding
Word method and segmenting method based on statistics.It is combined according to whether with part-of-speech tagging process, and simple participle side can be divided into
The integral method that method and participle are combined with mark.Common several mechanical segmentation methods are as follows: (1) Forward Maximum Method method
(by left-to-right direction);(2) reverse maximum matching method (by right to left direction);(3) minimum cutting (makes to cut out in each sentence
Word number it is minimum);(4) two-way maximum matching method (carry out by it is left-to-right, by right to left twice sweep).The present invention is used to specific
Which kind of mode, which is segmented, does not do concrete regulation, as long as can achieve the purpose that participle.
Specifically, doing word segmentation processing to the barrage information in barrage document, the barrage information after participle is with the shape of barrage word
Formula is stored in barrage document.The transmission of the also generation time and barrage information of barrage information saved associated with barrage word
Person.
S204, select the corresponding barrage document of the main broadcaster for meeting the first preset condition as document to be evaluated.
Specifically, screening to main broadcaster, such as selection (is not limited to the main broadcaster of a certain live streaming platform, only in live streaming platform
Want that barrage information can be obtained by technological means) before the rankings of three kinds of different classes of game ten main broadcaster as meeting first
The main broadcaster of preset condition, then using this corresponding barrage document of 30 main broadcasters as document to be evaluated.At the same time it can also pass through the
One preset condition restricts the timeliness of barrage keyword, such as: screening this nearest two weeks barrage information of 30 main broadcasters
Basic information as screening barrage keyword.
S205, determine that a barrage word is alternative word in document to be evaluated.
Specifically, traversal mode can be used, barrage to be evaluated is successively selected to concentrate all words alternately word;It can also
To use three layers of bayesian probability model, by the result of three layers of bayesian probability model alternately word.Bayesian probability
(Bayesian Probability) is the explanation of a kind of pair of probability as provided by bayesian theory, it is used definition of probability
For the concept for the degree that someone trusts a proposition.S206, the frequency that alternative word occurs in correspondence document to be evaluated is calculated
As first frequency of occurrences.
Specifically, the corresponding barrage document of main broadcaster for determining that main broadcaster ID is 1111 is document to be evaluated, barrage document warp
After crossing word segmentation processing, including a word.Determine that word " AAAA " is alternative word, in document to be evaluated, word " AAAA " occurs
B times, then first frequency of occurrences that can determine that alternative word " AAAA " occurs in document to be evaluated is
S207, according in barrage document sets include alternative word barrage number of documents, determination respectively corresponded with each alternative word
Key weight.
Wherein, the calculation method of key weight, comprising:Wherein, f table
Show first frequency of occurrences of a certain alternative word in correspondence document to be evaluated, TallIndicate of barrage document in barrage document sets
Number, TconIndicate the barrage number of documents in the barrage document sets including the alternative word.
Specifically, assuming there be 10000 main broadcasters (main broadcaster number successively from 00001 to 10000), then barrage document is concentrated with
10000 barrage document (Tall=10000).Determine that the corresponding barrage document of main broadcaster that main broadcaster ID is 1111 is document to be evaluated,
The barrage document is after word segmentation processing, including 20000 words.Determine that word " AAAA " is alternative word, in document to be evaluated,
Word " AAAA " occurs 300 times, then can determine first frequency of occurrences that alternative word " AAAA " occurs in document to be evaluatedMeanwhile there are 199 barrage documents barrage word " AAAA " (T occurcon=199).It is possible thereby to
It calculates, alternative word " AAAA " in barrage document 1111
Optionally, the calculation method of key weight, further includes:
Wherein, favgIndicate the third frequency that the alternative word occurs in all barrage words of barrage document sets.
Specifically, assuming there be 10000 main broadcasters (main broadcaster number successively from 00001 to 10000), then barrage document is concentrated with
10000 barrage document (Tall=10000), average each barrage document has 20000 barrage words.Determine that main broadcaster ID is 1111
The corresponding barrage document of main broadcaster be document to be evaluated, the barrage document is after word segmentation processing, including 20000 words.It determines
Word " BBBB " is alternative word, and in document to be evaluated, word " BBBB " occurs 300 times, then can determine alternative word
First frequency of occurrences that " BBBB " occurs in document to be evaluatedMeanwhile there are 199 barrage documents
There is barrage word " BBBB " (TconIt is=199), averagely each in 199 barrage documents barrage word " BBBB " occur 200 times,It is possible thereby to calculate, alternative word " BBBB " in barrage document 1111
S208, barrage word in document to be evaluated is according to value ranked up from high to low by key weight, takes predetermined number
Barrage word as keyword.
Specifically, barrage words all in barrage document are according to value ranked up from high to low according to key weight, it is such as pre-
The keyword number for first setting each document to be evaluated is 3 (predetermined number), then selects key power in document to be evaluated
Weight values ranking first three the document to be evaluated as described in keyword.
It optionally, is more than the barrage word of default weight as keyword using weight key in document to be evaluated.
Specifically, different modes is taken to obtain the key weight of barrage word in document to be evaluated, need to set difference
Key weighted value.According toThis formula calculates key weighted value, then
Can set key weighted value is more than the barrage word of 0.03 (default weighted value) as keyword;According toThis formula calculates key weighted value, then can set will be key
Weighted value is more than the barrage word of 2 (default weighted values) as keyword.
The present invention selects the main broadcaster for meeting the first preset condition by constructing barrage document sets relevant to multiple main broadcasters
Corresponding barrage document is as document to be evaluated;According to first frequency of occurrences of the alternative word in correspondence document to be evaluated, and
It include the barrage number of documents of the alternative word in the barrage document sets, determining key corresponding with each alternative word
Property weight;According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated.It solves
The problem of needing a large amount of barrage contents of manual read, contact staff's subjectivity to select high-quality barrage content is realized and is excluding displaying
Conventional word after, select the word for representing barrage core content.
Embodiment three
Fig. 3 is a kind of structure chart of the device for barrage keyword screening that the embodiment of the present invention three provides.The device includes:
Barrage document sets constructing module 31, key weight calculation module 32 and keyword determining module 33.It is specific:
Barrage document sets constructing module 31, for constructing relevant to multiple main broadcasters barrage document sets, and selection meets the
The corresponding barrage document of the main broadcaster of one preset condition is as document to be evaluated;Wherein, barrage document sets are by multiple barrage sets of documentation
At including multiple barrage words in barrage document;
Key weight calculation module 32, for first frequency of occurrences according to alternative word in correspondence document to be evaluated,
And including the barrage number of documents of the alternative word in the barrage document sets, determination is corresponding with each alternative word
Key weight;Wherein, the quantity of the barrage document sets and the barrage number of files in barrage document sets including the alternative word
The ratio of amount is denoted as second frequency of occurrences;
Keyword determining module 33 is used for according to the key weight, in the barrage word of each document to be evaluated
Filter out corresponding keyword;Wherein, the keyword weight is directly proportional to first frequency of occurrences, occurs with described second
Frequency is inversely proportional.
The embodiment of the present invention selects to meet the first preset condition by constructing barrage document sets relevant to multiple main broadcasters
The corresponding barrage document of main broadcaster as document to be evaluated;According to first appearance frequency of the alternative word in correspondence document to be evaluated
It include the barrage number of documents of the alternative word in rate and the barrage document sets, determination is right respectively with each alternative word
The key weight answered;According to the key weight, corresponding pass is filtered out in the barrage word of each document to be evaluated
Keyword.It solves the problems, such as that a large amount of barrage contents of manual read, contact staff's subjectivity is needed to select high-quality barrage content, realization is being arranged
After the conventional word of displaying, the word for representing barrage core content is selected.
On the basis of the above embodiments, barrage document sets constructing module is specifically used for:
Establish the connection relationship of main broadcaster's number and the barrage document for saving its barrage information;
The barrage document includes barrage information and barrage information generation time;
Word segmentation processing is done to the barrage information, barrage word is stored in the barrage document.
On the basis of the above embodiments, the key weight calculation module specifically includes:
Determine that a barrage word is alternative word in document to be evaluated;It include: successively to be selected to be evaluated using traversal mode
Barrage concentrates all words alternately word;Or three layers of bayesian probability model are used, by three layers of bayesian probability model
Result alternately word.
Frequency that the alternative word occurs in correspondence document to be evaluated is calculated as first frequency of occurrences.
On the basis of the above embodiments, the calculation method of the key weight of the key weight calculation module, packet
It includes:
Wherein, f indicates first frequency of occurrences of a certain alternative word in correspondence document to be evaluated, TallIndicate barrage document
Concentrate the number of barrage document, TconIndicate the barrage number of documents in the barrage document sets including the alternative word.
On the basis of the above embodiments, the calculation method of the key weight of the key weight calculation module, also
Include:
Wherein, favgIndicate the third frequency that the alternative word occurs in all barrage words of barrage document sets.
On the basis of the above embodiments, the keyword determining module specifically includes:
Barrage word in document to be evaluated is according to value ranked up from high to low by key weight, takes the barrage of predetermined number
Word is as keyword;And/or
It is more than the barrage word of default weight as keyword using weight key in document to be evaluated.
A kind of device of barrage keyword screening provided in this embodiment can be used for executing what any of the above-described embodiment provided
A kind of method of barrage keyword screening, has corresponding function and beneficial effect.
Example IV
Fig. 4 is a kind of structural schematic diagram for equipment that the embodiment of the present invention four provides.As shown in figure 4, the equipment includes place
Manage device 40, memory 41, communication module 42, input unit 43 and output device 44;The quantity of processor 40 can be in equipment
One or more, in Fig. 4 by taking a processor 40 as an example;Processor 40, memory 41, communication module 42, input in equipment
Device 43 can be connected with output device 44 by bus or other modes, in Fig. 4 for being connected by bus.
Memory 41 is used as a kind of computer readable storage medium, can be used for storing software program, journey can be performed in computer
Sequence and module, if the corresponding module of method of one of the present embodiment barrage keyword screening is (for example, a kind of barrage is crucial
Barrage document sets constructing module 31, key weight calculation module 32 and the keyword determining module 33 of word screening).Processor 40
By running the software program, instruction and the module that are stored in memory 41, thereby executing equipment various function application with
And data processing, that is, realize a kind of method of above-mentioned barrage keyword screening.
Memory 41 can mainly include storing program area and storage data area, wherein storing program area can store operation system
Application program needed for system, at least one function;Storage data area, which can be stored, uses created data etc. according to equipment.This
Outside, memory 41 may include high-speed random access memory, can also include nonvolatile memory, for example, at least a magnetic
Disk storage device, flush memory device or other non-volatile solid state memory parts.In some instances, memory 41 can be further
Including the memory remotely located relative to processor 40, these remote memories can pass through network connection to equipment.It is above-mentioned
The example of network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Communication module 42 for establishing connection with display screen, and realizes the data interaction with display screen.Input unit 43 can
Number for receiving input or character information, and generate key signals related with the user setting of equipment and function control
Input.
The side for the barrage keyword screening that any embodiment of the present invention provides can be performed in a kind of equipment provided in this embodiment
Method, specific corresponding function and beneficial effect.
Embodiment five
The embodiment of the present invention five also provides a kind of storage medium comprising computer executable instructions, and the computer can be held
A kind of method of the row instruction when being executed by computer processor for executing barrage keyword screening, this method comprises:
Barrage document sets relevant to multiple main broadcasters are constructed, and select the corresponding barrage of main broadcaster for meeting the first preset condition
Document is as document to be evaluated;Wherein, barrage document sets are made of multiple barrage documents, include multiple barrages in barrage document
Word;
It include institute in first frequency of occurrences and the barrage document sets in correspondence document to be evaluated according to alternative word
State the barrage number of documents of alternative word, determining key weight corresponding with each alternative word;Wherein, the barrage text
The ratio of the barrage number of documents including the alternative word is denoted as second frequency of occurrences in the quantity and barrage document sets of shelves collection;
According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated;Its
In, the keyword weight is directly proportional to first frequency of occurrences, is inversely proportional with second frequency of occurrences.
Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention
The method operation that executable instruction is not limited to the described above, can also be performed the key of barrage provided by any embodiment of the present invention
Relevant operation in the method for word screening.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention
It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more
Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art
Part can be embodied in the form of software products, which can store in computer readable storage medium
In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer
Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set
Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
It is worth noting that, in the embodiment of the device of above-mentioned barrage keyword screening, included each unit and mould
Block is only divided according to the functional logic, but is not limited to the above division, and is as long as corresponding functions can be realized
It can;In addition, the specific name of each functional unit is also only for convenience of distinguishing each other, the protection model being not intended to restrict the invention
It encloses.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (10)
1. a kind of method of barrage keyword screening characterized by comprising
Barrage document sets relevant to multiple main broadcasters are constructed, and select the corresponding barrage document of main broadcaster for meeting the first preset condition
As document to be evaluated;Wherein, barrage document sets are made of multiple barrage documents, include multiple barrage words in barrage document;
It include described standby in first frequency of occurrences and the barrage document sets in correspondence document to be evaluated according to alternative word
Select the barrage number of documents of word, determining key weight corresponding with each alternative word;Wherein, the barrage document sets
Quantity and barrage document sets in include that the ratio of barrage number of documents of the alternative word is denoted as second frequency of occurrences;
According to the key weight, corresponding keyword is filtered out in the barrage word of each document to be evaluated;Wherein, institute
It is directly proportional to first frequency of occurrences to state keyword weight, is inversely proportional with second frequency of occurrences.
2. the method according to claim 1, wherein relevant to the multiple main broadcasters barrage document sets of the construction,
It specifically includes:
Establish the connection relationship of main broadcaster's number and the barrage document for saving its barrage information;
The barrage document includes barrage information and barrage information generation time;
Word segmentation processing is done to the barrage information, barrage word is stored in the barrage document.
3. the method according to claim 1, wherein according to alternative word in correspondence document to be evaluated
One frequency of occurrences specifically includes:
Determine that a barrage word is alternative word in document to be evaluated;
Frequency that the alternative word occurs in correspondence document to be evaluated is calculated as first frequency of occurrences.
4. according to the method described in claim 3, it is characterized in that, described determine that a barrage word is standby in document to be evaluated
Word is selected, is specifically included:
Using traversal mode, barrage to be evaluated is successively selected to concentrate all words alternately word;Or
Using three layers of bayesian probability model, by the result of three layers of bayesian probability model alternately word.
5. the method according to claim 1, wherein the calculation method of the key weight, comprising:
Wherein, f indicates first frequency of occurrences of a certain alternative word in correspondence document to be evaluated, TallIt indicates in barrage document sets
The number of barrage document, TconIndicate the barrage number of documents in the barrage document sets including the alternative word.
6. according to the method described in claim 5, it is characterized in that, the calculation method of the key weight, further includes:
Wherein, favgIndicate the third frequency that the alternative word occurs in all barrage words of barrage document sets.
7. the method according to claim 1, wherein described according to the key weight, each described to be evaluated
Corresponding keyword is filtered out in the barrage word of valence document, comprising:
Barrage word in document to be evaluated is according to value ranked up from high to low by key weight, the barrage word of predetermined number is taken to make
For keyword;And/or
It is more than the barrage word of default weight as keyword using weight key in document to be evaluated.
8. a kind of device of barrage keyword screening characterized by comprising
Barrage document sets constructing module, for constructing barrage document sets relevant to multiple main broadcasters, and selection meets first and presets
The corresponding barrage document of the main broadcaster of condition is as document to be evaluated;Wherein, barrage document sets are made of multiple barrage documents, barrage
It include multiple barrage words in document;
Key weight calculation module, for first frequency of occurrences according to alternative word in correspondence document to be evaluated, Yi Jisuo
The barrage number of documents in barrage document sets including the alternative word is stated, determination is corresponding key with each alternative word
Weight;Wherein, the ratio of the quantity of the barrage document sets and the barrage number of documents in barrage document sets including the alternative word
Value is denoted as second frequency of occurrences;
Keyword determining module, for being filtered out in the barrage word of each document to be evaluated according to the key weight
Corresponding keyword;Wherein, the keyword weight is directly proportional to first frequency of occurrences, with second frequency of occurrences at
Inverse ratio.
9. a kind of equipment characterized by comprising
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
A kind of existing method of barrage keyword screening as claimed in claim 1.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
A kind of method of barrage keyword screening as claimed in claim 1 is realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810829527.3A CN109145291A (en) | 2018-07-25 | 2018-07-25 | A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810829527.3A CN109145291A (en) | 2018-07-25 | 2018-07-25 | A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109145291A true CN109145291A (en) | 2019-01-04 |
Family
ID=64797927
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810829527.3A Pending CN109145291A (en) | 2018-07-25 | 2018-07-25 | A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109145291A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109933691A (en) * | 2019-02-11 | 2019-06-25 | 北京百度网讯科技有限公司 | Method, apparatus, equipment and storage medium for content retrieval |
CN109982128A (en) * | 2019-03-19 | 2019-07-05 | 腾讯科技(深圳)有限公司 | Barrage generation method, device, storage medium and the electronic device of video |
CN111340329A (en) * | 2020-02-05 | 2020-06-26 | 科大讯飞股份有限公司 | Actor assessment method and device and electronic equipment |
CN112347764A (en) * | 2020-11-05 | 2021-02-09 | 中国平安人寿保险股份有限公司 | Method and device for generating barrage cloud and computer equipment |
CN112887746A (en) * | 2021-01-22 | 2021-06-01 | 维沃移动通信(深圳)有限公司 | Live broadcast interaction method and device |
CN114827745A (en) * | 2022-04-08 | 2022-07-29 | 海信集团控股股份有限公司 | Video subtitle generation method and electronic equipment |
CN115883912A (en) * | 2023-03-08 | 2023-03-31 | 山东水浒文化传媒有限公司 | Interaction method and system for internet communication demonstration |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893478A (en) * | 2016-03-29 | 2016-08-24 | 广州华多网络科技有限公司 | Tag extraction method and equipment |
CN105975453A (en) * | 2015-12-01 | 2016-09-28 | 乐视网信息技术(北京)股份有限公司 | Method and device for comment label extraction |
US20170139899A1 (en) * | 2015-11-18 | 2017-05-18 | Le Holdings (Beijing) Co., Ltd. | Keyword extraction method and electronic device |
CN107480123A (en) * | 2017-06-28 | 2017-12-15 | 武汉斗鱼网络科技有限公司 | A kind of recognition methods, device and the computer equipment of rubbish barrage |
-
2018
- 2018-07-25 CN CN201810829527.3A patent/CN109145291A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170139899A1 (en) * | 2015-11-18 | 2017-05-18 | Le Holdings (Beijing) Co., Ltd. | Keyword extraction method and electronic device |
CN105975453A (en) * | 2015-12-01 | 2016-09-28 | 乐视网信息技术(北京)股份有限公司 | Method and device for comment label extraction |
CN105893478A (en) * | 2016-03-29 | 2016-08-24 | 广州华多网络科技有限公司 | Tag extraction method and equipment |
CN107480123A (en) * | 2017-06-28 | 2017-12-15 | 武汉斗鱼网络科技有限公司 | A kind of recognition methods, device and the computer equipment of rubbish barrage |
Non-Patent Citations (2)
Title |
---|
王伟军等: "《大数据分析》", 31 May 2017, 重庆大学出版社 * |
高尚: "《分布估计算法及其应用》", 31 January 2016, 国防工业出版社 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109933691A (en) * | 2019-02-11 | 2019-06-25 | 北京百度网讯科技有限公司 | Method, apparatus, equipment and storage medium for content retrieval |
CN109933691B (en) * | 2019-02-11 | 2023-06-09 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for content retrieval |
CN109982128A (en) * | 2019-03-19 | 2019-07-05 | 腾讯科技(深圳)有限公司 | Barrage generation method, device, storage medium and the electronic device of video |
CN111340329A (en) * | 2020-02-05 | 2020-06-26 | 科大讯飞股份有限公司 | Actor assessment method and device and electronic equipment |
CN111340329B (en) * | 2020-02-05 | 2024-02-20 | 科大讯飞股份有限公司 | Actor evaluation method and device and electronic equipment |
CN112347764A (en) * | 2020-11-05 | 2021-02-09 | 中国平安人寿保险股份有限公司 | Method and device for generating barrage cloud and computer equipment |
CN112347764B (en) * | 2020-11-05 | 2024-05-07 | 中国平安人寿保险股份有限公司 | Method and device for generating barrage cloud and computer equipment |
CN112887746A (en) * | 2021-01-22 | 2021-06-01 | 维沃移动通信(深圳)有限公司 | Live broadcast interaction method and device |
CN114827745A (en) * | 2022-04-08 | 2022-07-29 | 海信集团控股股份有限公司 | Video subtitle generation method and electronic equipment |
CN114827745B (en) * | 2022-04-08 | 2023-11-14 | 海信集团控股股份有限公司 | Video subtitle generation method and electronic equipment |
CN115883912A (en) * | 2023-03-08 | 2023-03-31 | 山东水浒文化传媒有限公司 | Interaction method and system for internet communication demonstration |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109145291A (en) | A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword | |
CN103442252B (en) | Method for processing video frequency and device | |
CN109408639B (en) | Bullet screen classification method, bullet screen classification device, bullet screen classification equipment and storage medium | |
CN105100165B (en) | Network service recommends method and apparatus | |
CN105872837A (en) | User recommendation method and device | |
KR101404585B1 (en) | Segment creation device, segment creation method, and computer-readable recording medium having a segment creation program | |
CN106331778A (en) | Video recommendation method and device | |
CN102222111B (en) | Method for retrieving high-definition video content | |
CN105210048A (en) | Content-identification engine based on social media | |
CN103718166A (en) | Information processing apparatus, information processing method, and computer program product | |
CN109151500A (en) | A kind of main broadcaster's recommended method, system and computer equipment for net cast | |
CN106028167A (en) | Barrage display method and device | |
CN102332001A (en) | Video thumbnail generation method and device | |
CN113301376B (en) | Live broadcast interaction method and system based on virtual reality technology | |
CN109255632A (en) | A kind of user community recognition methods, device, equipment and medium | |
CN112653918B (en) | Preview video generation method and device, electronic equipment and storage medium | |
CN112686165A (en) | Method and device for identifying target object in video, electronic equipment and storage medium | |
CN105635749A (en) | Method and device for generating video frame set | |
CN104883627A (en) | Plot movie and television, and broadcasting device and method thereof | |
CN103731737B (en) | A kind of video information update method and electronic equipment | |
CN106791850A (en) | Method for video coding and device | |
CN107295377A (en) | Moviemaking method, apparatus and system | |
Rantasila et al. | # fukushima Five years on: a multimethod analysis of twitter on the anniversary of the nuclear disaster | |
CN112199582A (en) | Content recommendation method, device, equipment and medium | |
CN107343221B (en) | Online multimedia interaction system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190104 |
|
RJ01 | Rejection of invention patent application after publication |