CN109547863A - A kind of labeling method of label, device, server and storage medium - Google Patents

A kind of labeling method of label, device, server and storage medium Download PDF

Info

Publication number
CN109547863A
CN109547863A CN201811229982.6A CN201811229982A CN109547863A CN 109547863 A CN109547863 A CN 109547863A CN 201811229982 A CN201811229982 A CN 201811229982A CN 109547863 A CN109547863 A CN 109547863A
Authority
CN
China
Prior art keywords
classification
live streaming
candidate word
feature words
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811229982.6A
Other languages
Chinese (zh)
Other versions
CN109547863B (en
Inventor
徐乐乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Qingzi Engineering Consulting Co ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201811229982.6A priority Critical patent/CN109547863B/en
Publication of CN109547863A publication Critical patent/CN109547863A/en
Application granted granted Critical
Publication of CN109547863B publication Critical patent/CN109547863B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/835Generation of protective data, e.g. certificates
    • H04N21/8352Generation of protective data, e.g. certificates involving content or source identification data, e.g. Unique Material Identifier [UMID]

Abstract

The embodiment of the invention discloses a kind of labeling method of label, device, server and storage mediums.This method comprises: extracting the text information of multiple types from the direct broadcasting room in live streaming classification;Feature Words are extracted from the text information;Candidate word is screened from the Feature Words according to the relevance between the Feature Words and the live streaming classification;Target word is screened from the candidate word according to the corresponding type of the text information;Set the target word to the label information of the live streaming classification.Pass through the word that relevance preliminary screening is representative, important word is accurately screened by type, it ensure that accuracy of the word as label information, user can by be broadcast live classification label information various dimensions identify the content of the live streaming classification direct broadcasting room that includes, quickly find the direct broadcasting room liked into suitable live streaming classification.

Description

A kind of labeling method of label, device, server and storage medium
Technical field
The present embodiments relate to the technology of natural language processing more particularly to a kind of labeling methods of label, device, clothes Business device and storage medium.
Background technique
With the fast development of the network technology, the quantity rapid growth of direct broadcasting room, for example, live game, a live streaming talent Skill performance, etc..
In order to facilitate management direct broadcasting room, live streaming platform usually divides different live streaming classifications, the usual base of user to direct broadcasting room Probably judge the content of direct broadcasting room in the title of live streaming classification, the direct broadcasting room liked hence into corresponding live streaming classification searching into Row viewing.
But the boundary between certain live streaming classifications is not obvious, if some direct broadcasting room exists simultaneously and multiple live streamings The same or similar element of classification may then be divided to multiple live streaming classifications.
For example, the live streaming classification such as " face value ", " open air ", " cuisines " may be marked off for relevant direct broadcasting room is entertained, certain Barbecue is broadcast live in a female star outdoors, is all possible to be divided in these three live streaming classifications.
Therefore, user, which needs to enter in multiple live streaming classifications, finds the direct broadcasting room liked, the quantity of the direct broadcasting room of traversal compared with It is more, lead to operate that relatively complicated, efficiency is lower.
Summary of the invention
The embodiment of the present invention provides labeling method, device, server and the storage medium of a kind of label, to solve based on straight The title for broadcasting classification probably judges the content of direct broadcasting room, so that the corresponding live streaming classification of selection finds direct broadcasting room, cause operation compared with For the lower problem of cumbersome, efficiency.
In a first aspect, the embodiment of the invention provides a kind of labeling methods of label, comprising:
The text information of multiple types is extracted from the direct broadcasting room in live streaming classification;
Feature Words are extracted from the text information;
Candidate word is screened from the Feature Words according to the relevance between the Feature Words and the live streaming classification;
Target word is screened from the candidate word according to the corresponding type of the text information;
Set the target word to the label information of the live streaming classification.
Optionally, the type of the text information includes following at least one:
The title of the direct broadcasting room, the barrage of the direct broadcasting room, subclassification title;
Wherein, the name for the live streaming subclassification that the entitled direct broadcasting room of the subclassification belongs under the live streaming classification Claim.
Optionally, the relevance according to the Feature Words between the live streaming classification is screened from the Feature Words Candidate word, comprising:
The desired value of the Feature Words is calculated based on the distributional difference between the Feature Words and the live streaming classification, In, the desired value and the distributional difference are positively correlated;
Candidate word is screened from the Feature Words according to the desired value.
Optionally, the desired value of the Feature Words is calculated by following formula
Wherein, N is the quantity of the text information, and A indicates occur the number of Feature Words w in live streaming classification v, and B is indicated Occurs the number of Feature Words w in non-live streaming classification v, C indicates occur the number of non-Feature Words w in live streaming classification v, and D is indicated There is the number of non-Feature Words w in non-live streaming classification v.
It is optionally, described to screen candidate word from the Feature Words according to the desired value, comprising:
Candidate value is selected from the desired value, wherein the candidate value is the m maximum desired value of value;
Candidate word is set by the corresponding Feature Words of the candidate value.
It is optionally, described that target word is screened from the candidate word according to the corresponding type of the text information, comprising:
Calculate classification score value of the candidate word in the type;
The comprehensive grading value of the candidate word is calculated in conjunction with the classification score value;
Target word is screened from the candidate word according to the comprehensive grading value.
Optionally, the classification score value for calculating the candidate word in the type, comprising:
Count the total degree that the candidate word occurs in the type;
Classification score value of the candidate word in the type is calculated according to the total degree, wherein the total degree It is positively correlated with the classification score value.
Optionally, classification score value described in the combination calculates the comprehensive grading value of the candidate word, comprising:
Weight is configured to the classification score value according to the type, obtains and adjusts power score value;
Calculate the sum of described tune power score value, the comprehensive grading value as the candidate word.
Optionally, the comprehensive grading value R (w) of the candidate word w is calculated by following formula:
R (w)=λ1*log(tf_text(w)+1)+λ2*log(tf_t(w)+1)+λ3*log(tf_zone(w)+1)
Wherein, tf_text (w) is candidate word w in the title of the direct broadcasting room, the barrage of the direct broadcasting room and subclassification name The total degree occurred in title, tf_t (w) are the total degree that candidate word w occurs in the title of the direct broadcasting room, tf_zone (w) For the total degree that candidate word w occurs in subclassification title, λ1、λ2、λ3For weight.
It is optionally, described to screen target word from the candidate word according to the comprehensive grading value, comprising:
The selection target score value from the comprehensive grading value, wherein the target scores n value to be worth maximum synthesis Score value;
Target word is set by the corresponding candidate word of the target score value.
Second aspect, the embodiment of the invention also provides a kind of labelling apparatus of label, comprising:
Text information extraction module, for extracting the text information of multiple types from the direct broadcasting room in live streaming classification;
Feature Words extraction module, for extracting Feature Words from the text information;
Candidate word screening module, for according to the relevance between the Feature Words and the live streaming classification from the feature Candidate word is screened in word;
Target word screening module, for screening target from the candidate word according to the corresponding type of the text information Word;
Label information setup module, for setting the target word to the label information of the live streaming classification.
Optionally, the type of the text information includes following at least one:
The title of the direct broadcasting room, the barrage of the direct broadcasting room, subclassification title;
Wherein, the name for the live streaming subclassification that the entitled direct broadcasting room of the subclassification belongs under the live streaming classification Claim.
Optionally, the candidate word screening module includes:
Desired value computational submodule, for calculating institute based on the distributional difference between the Feature Words and the live streaming classification State the desired value of Feature Words, wherein the desired value and the distributional difference are positively correlated;
Desired value screens submodule, for screening candidate word from the Feature Words according to the desired value.
Optionally, the desired value of the Feature Words is calculated by following formula
Wherein, N is the quantity of the text information, and A indicates occur the number of Feature Words w in live streaming classification v, and B is indicated Occurs the number of Feature Words w in non-live streaming classification v, C indicates occur the number of non-Feature Words w in live streaming classification v, and D is indicated There is the number of non-Feature Words w in non-live streaming classification v.
Optionally, the desired value screening submodule includes:
Candidate value selecting unit, for selecting candidate value from the desired value, wherein the candidate m value is to be worth most Big desired value;
Candidate value setting unit, for setting candidate word for the corresponding Feature Words of the candidate value.
Optionally, the target word screening module includes:
Classification score value computational submodule, for calculating classification score value of the candidate word in the type;
Comprehensive grading value computational submodule, for calculating the comprehensive score of the candidate word in conjunction with the classification score value Value;
Comprehensive grading value screens submodule, for screening target word from the candidate word according to the comprehensive grading value.
Optionally, the classification score value computational submodule includes:
Total degree statistic unit, the total degree occurred in the type for counting the candidate word;
Total degree computing unit scores for calculating classification of the candidate word in the type according to the total degree Value, wherein the total degree and the classification score value are positively correlated.
Optionally, the comprehensive grading value computational submodule includes:
Weight configuration unit obtains for configuring weight to the classification score value according to the type and adjusts power score value;
Summation unit, the comprehensive grading value for calculating the sum of described tune power score value, as the candidate word.
Optionally, the comprehensive grading value R (w) of the candidate word w is calculated by following formula:
R (w)=λ1*log(tf_text(w)+1)+λ2*log(tf_t(w)+1)+λ3*log(tf_zone(w)+1)
Wherein, tf_text (w) is candidate word w in the title of the direct broadcasting room, the barrage of the direct broadcasting room and subclassification name The total degree occurred in title, tf_t (w) are the total degree that candidate word w occurs in the title of the direct broadcasting room, tf_zone (w) For the total degree that candidate word w occurs in subclassification title, λ1、λ2、λ3For weight.
Optionally, the comprehensive grading value screening submodule includes:
Target score value selecting unit, for the selection target score value from the comprehensive grading value, wherein the target N value of scoring is the maximum comprehensive grading value of value;
Target score value setting unit, for setting target word for the corresponding candidate word of the target score value.
The third aspect the embodiment of the invention also provides a kind of server, including memory, processor and is stored in storage On device and the computer program that can run on a processor, the processor realize first aspect present invention when executing described program The labeling method for the label that embodiment provides.
The third aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer Program realizes the labeling method for the label that first aspect present invention embodiment provides when the program is executed by processor.
In embodiments of the present invention, the text information that multiple types are extracted from the direct broadcasting room in live streaming classification, from text envelope Feature Words are extracted in breath, candidate word are screened from Feature Words according to the relevance between Feature Words and live streaming classification, according to text The corresponding type of information screens target word from candidate word, sets target word to the label information of live streaming classification, passes through association The property representative word of preliminary screening, important word is accurately screened by type, ensure that standard of the word as label information True property, user can by be broadcast live classification label information various dimensions identify the content of the live streaming classification direct broadcasting room that includes, into Enter suitable live streaming classification and quickly find the direct broadcasting room liked, reduces the quantity of the direct broadcasting room of traversal, improve the letter of operation Just property and efficiency.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the labeling method for label that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of the labeling method of label provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of structural schematic diagram of the labelling apparatus for label that the embodiment of the present invention three provides;
Fig. 4 is a kind of structural schematic diagram for server that the embodiment of the present invention four provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow chart of the labeling method for label that the embodiment of the present invention one provides, and the present embodiment is applicable to The scene of live streaming category label label is quickly positioned in order to which user selects suitable direct broadcasting room by the label of various dimensions The direct broadcasting room liked, this method can be executed by server, be specifically comprised the following steps:
S110, the text information that multiple types are extracted from the direct broadcasting room in live streaming classification.
In embodiments of the present invention, a large amount of direct broadcasting room is offered in live streaming platform, these direct broadcasting rooms can be according to interior The difference of appearance is divided in different live streaming classifications.
It, can be with for example, for the direct broadcasting room of live game (including network competitive game, single-play game, mobile phone games etc.) Directly using game name as live streaming classification.
In another example for live streaming amusement direct broadcasting room, can mark off " face value ", " star joy ", " Quadratic Finite Element ", " cuisines ", Classification is broadcast live in " music " etc..
In another example for be broadcast live education of science and technology direct broadcasting room, can mark off " numeral science and technology ", " science popularization ", " automobile ", Classification is broadcast live in " documentary film " etc..
For being divided to the direct broadcasting room of live streaming classification, relevant text information can be acquired to it, and by text information It stores as corpus into a text collection.
In one example, the type of text information includes following at least one:
The title of direct broadcasting room, the barrage of direct broadcasting room, subclassification title.
Wherein, the title for the live streaming subclassification that the entitled direct broadcasting room of subclassification belongs in the case where classification is broadcast live.
In certain live streaming platforms, live streaming classification belongs to secondary classification, at this point, the live streaming subclassification belongs to three-level classification, Also referred to as three-level title.
For example, some live streaming platform will " amusement " as first-level class, in the presence of " face value ", " star joy ", " secondary under " amusement " There are the three-levels such as " singing ", " dancing ", " talk show " under " face value " and classify in the secondary classifications such as member ", " cuisines ", " music ", if with " singing " is clicked at family, and the direct broadcasting room for belonging to " singing " can be filtered out from the direct broadcasting room in " face value ".
Certainly, the type of above-mentioned text information is intended only as example, in implementing the embodiments of the present invention, can be according to reality The type of other text informations is arranged in situation, for example, the bulletin of the recommended information of main broadcaster, the label information of main broadcaster, direct broadcasting room is believed Breath, etc., the embodiments of the present invention are not limited thereto.In addition, other than the type of above-mentioned text information, those skilled in the art Member can also use the type of other text informations according to actual needs, and the embodiment of the present invention is also without restriction to this.
S120, Feature Words are extracted from the text information.
In the concrete realization, text information can be pre-processed, meaningful spy is extracted from text information Levy word.
In one embodiment, word segmentation processing, participle can be carried out to text information by modes such as stammerers (jieba) Text participle is obtained after processing.
Filtering stop words, the remaining text participle of filtering stop words are segmented to text using preset deactivated vocabulary As Feature Words.
Wherein, deactivate vocabulary in record have stop words gathered in advance, such as " ", " Ah ", " ", " and ", etc..
Certainly, other than above-mentioned word segmentation processing and filtering stop words, those skilled in the art can also be according to actual needs Using other pretreatments, to extract Feature Words, the embodiment of the present invention is also without restriction to this.
S130, candidate is screened from the Feature Words according to the relevance between the Feature Words and the live streaming classification Word.
In embodiments of the present invention, it can analyze being associated between at least two Feature Words and at least two live streaming classifications Property, thus selection and the live streaming close Feature Words of category associations, as candidate word representative in the live streaming classification.
S140, target word is screened from the candidate word according to the corresponding type of the text information.
In the concrete realization, the position (i.e. the corresponding type of text information) that candidate word occurs, can be to a certain degree On embody the importance of the candidate word, therefore, classify corresponding candidate word for some live streaming, candidate word institute can be referred to The position (i.e. the corresponding type of text information) of appearance selects more important candidate word, as target word.
S150, the label information for setting the target word to the live streaming classification.
For some corresponding candidate word of live streaming classification, the live streaming classification can be set by the target word selected Label information.
Hereafter, the application such as browser, independent live streaming client, then can be to live streaming platform request load live streaming classification The live streaming classification and its corresponding label information are shown in the application, user is identified in the live streaming classification by label information and is broadcast live Between content, can quickly position the direct broadcasting room liked.
In embodiments of the present invention, the text information that multiple types are extracted from the direct broadcasting room in live streaming classification, from text envelope Feature Words are extracted in breath, candidate word are screened from Feature Words according to the relevance between Feature Words and live streaming classification, according to text The corresponding type of information screens target word from candidate word, sets target word to the label information of live streaming classification, passes through association The property representative word of preliminary screening, important word is accurately screened by type, ensure that standard of the word as label information True property, user can by be broadcast live classification label information various dimensions identify the content of the live streaming classification direct broadcasting room that includes, into Enter suitable live streaming classification and quickly find the direct broadcasting room liked, reduces the quantity of the direct broadcasting room of traversal, improve the letter of operation Just property and efficiency.
Embodiment two
Fig. 2 is a kind of flow chart of the labeling method of label provided by Embodiment 2 of the present invention, and the present embodiment is with aforementioned reality Based on applying example, the processing operation for further increasing screening candidate word, screening target word, this method can be held by server Row, specifically comprises the following steps:
S210 extracts the text information of multiple types from the direct broadcasting room in live streaming classification.
S220 extracts Feature Words from the text information.
S230 calculates the expectation of the Feature Words based on the distributional difference between the Feature Words and the live streaming classification Value.
Wherein, desired value and distributional difference are positively correlated.
In embodiments of the present invention, distributional difference between Feature Words and live streaming classification can be calculated, it is special to calculate calculating with this The desired value for levying word, for Feature Words to be expressed with the departure degree between observed value and desired value.
If distributional difference is bigger between Feature Words and live streaming classification, desired value is bigger, i.e., the sampling of Feature Words is not more Meet the practical distribution situation in live streaming classification.
If distributional difference is smaller between Feature Words and live streaming classification, desired value is smaller, i.e. the sampling of Feature Words more accords with Close the practical distribution situation in live streaming classification.
In one example, if the type of text information includes the title of direct broadcasting room, the barrage of direct broadcasting room and subclassification name Claim.
Then in this example, the desired value of Feature Words can be calculated by following formula
Wherein, N is the quantity of the text information, and A indicates occur the number of Feature Words w in live streaming classification v, and B is indicated Occurs the number of Feature Words w in non-live streaming classification v, C indicates occur the number of non-Feature Words w in live streaming classification v, and D is indicated There is the number of non-Feature Words w in non-live streaming classification v.
Furthermore, live streaming classification v is some specific live streaming classification, such as " cuisines ", " automobile ", non-live streaming classification V refers to other live streaming classifications in addition to classification v is broadcast live, and Feature Words w is some specific Feature Words, such as " cooking shredded potato ", " preceding Drive " etc., non-Feature Words w refers to the other feature word in addition to Feature Words w.
S240 screens candidate word from the Feature Words according to the desired value.
Using the embodiment of the present invention, the screening mode of candidate word can be preset, if currently calculating the phase of Feature Words Prestige value then can select several Feature Words as candidate word according to the screening mode.
Under normal circumstances, the quantity of candidate word is less than the quantity of Feature Words.
In one embodiment, desired value can be compared or is sorted, candidate value is selected from desired value, In, candidate value is the m maximum desired value of value, and m is positive integer (such as 50), also, m is less than the quantity of Feature Words.
For example, can be ranked up in sequence to desired value, i.e., desired value is bigger, sorts higher, conversely, the phase Prestige value is smaller, sorts lower, m before sorting desired values is extracted, as candidate value.
Candidate word is set by the corresponding Feature Words of candidate value, to filter out m candidate word.
Certainly, the screening mode of above-mentioned candidate word is intended only as example, in implementing the embodiments of the present invention, can be according to reality The screening mode of other candidate words is arranged in border situation, for example, can select candidate according to threshold value, i.e. desired value is more than the threshold value Feature Words may be configured as candidate word, if the quantity of label is larger (showing as being greater than some threshold value), can set a threshold to One lower value, such as 0.5, if the negligible amounts (showing as being less than some threshold value) of label, can set a threshold to one Higher value, such as 1, etc. the embodiments of the present invention are not limited thereto.In addition, other than the screening mode of above-mentioned candidate word, Those skilled in the art can also use the screening mode of other candidate words according to actual needs, the embodiment of the present invention to this not yet It limits.
S250 calculates classification score value of the candidate word in the type.
For some candidate word, the candidate word can be calculated for the significance level in the type, as classification score value.
In the concrete realization, important kind more can be selected from each type, individually calculate candidate word in the type Classification score value, at least two types can also be combined, classification of the COMPREHENSIVE CALCULATING candidate word in the type of the combination Score value, the embodiments of the present invention are not limited thereto.
In one embodiment, classification score value of the candidate word in type can be calculated according to word frequency.
Furthermore, the total degree that candidate word occurs in the type can be counted, calculates candidate word according to total degree Classification score value in type.
Wherein, total degree and classification score value are positively correlated, i.e., total degree is more, and the significance level of the candidate word is higher, point Class score value is higher, and total degree is fewer, and the significance level of the candidate word is lower, and classification score value is lower.
S260 calculates the comprehensive grading value of the candidate word in conjunction with the classification score value.
For some candidate word, if calculating for different types of classification score value, can be commented in conjunction with each classification Score value, so that the comprehensive grading value of the candidate word is calculated, to embody whole importance.
In one embodiment, weight can be configured to classification score value according to type, obtains and adjust power score value, calculates Adjust the sum of power score value, the comprehensive grading value as candidate word.
In general, the sum of weight is 1, more important type (including combined type), the weight of configuration is bigger.
In one example, the comprehensive grading value R (w) of candidate word w can be calculated by following formula:
R (w)=λ1*log(tf_text(w)+1)+λ2*log(tf_t(w)+1)+λ3*log(tf_zone(w)+1)
Wherein, tf_text (w) is that candidate word w occurs in the title of direct broadcasting room, the barrage of direct broadcasting room and subclassification title Total degree, i.e. log (tf_text (w)+1) be candidate word w in the title of direct broadcasting room, the barrage of direct broadcasting room and subclassification title Classification score value, tf_t (w) is the total degree that occurs in the title of direct broadcasting room of candidate word w, i.e. log (tf_t (w)+1) is For candidate word w in the classification score value of the title of direct broadcasting room, tf_zone (w) is total time that candidate word w occurs in subclassification title Number, i.e. log (tf_zone (w)+1) are classification score value of the candidate word w in subclassification title, λ1、λ2、λ3For weight.
In this example, due to that may have a large amount of meaningless words in barrage, in order to avoid highlighting these Meaningless word does not calculate classification score value of the candidate word in barrage individually.
In addition, if think that the title of direct broadcasting room, the importance of subclassification title are higher, then it can be to λ2、λ3It is arranged higher Weight.
S270 screens target word from the candidate word according to the comprehensive grading value.
Using the embodiment of the present invention, the screening mode of target word can be preset, if currently calculating the comprehensive of candidate word Score value is closed, then several candidate words can be selected as target word according to the screening mode.
Under normal circumstances, the quantity of target word is less than the quantity of candidate word.
In one embodiment, comprehensive grading value can be compared or is sorted, mesh is selected from comprehensive grading value Mark score value, wherein target score value is the n maximum comprehensive grading value of value, and n is positive integer, also, n is less than candidate word Quantity
For example, can be ranked up in sequence to comprehensive grading value, i.e., comprehensive grading value is bigger, and sequence is got over Height sorts lower conversely, comprehensive grading value is smaller, n before sorting comprehensive grading values is extracted, as target score value.
Target word is set by the corresponding candidate word of target score value, to filter out n target word.
Certainly, the screening mode of above-mentioned target word is intended only as example, in implementing the embodiments of the present invention, can be according to reality The screening mode of other target words is arranged in border situation, for example, can select candidate according to threshold value, i.e. desired value is more than the threshold value Candidate word may be configured as target word, if the quantity of label is larger (showing as being greater than some threshold value), can set a threshold to One lower value, such as 0.5, if the negligible amounts (showing as being less than some threshold value) of label, can set a threshold to one Higher value, such as 1, etc. the embodiments of the present invention are not limited thereto.In addition, other than the screening mode of above-mentioned target word, Those skilled in the art can also use the screening mode of other target words according to actual needs, the embodiment of the present invention to this not yet It limits.
S280 sets the target word to the label information of the live streaming classification.
In embodiments of the present invention, the expectation of Feature Words is calculated based on the distributional difference between Feature Words and live streaming classification Value reflects candidate word for the representativeness of classification is broadcast live, in addition, passing through comprehensive inhomogeneity with this preliminary screening candidate word The comprehensive grading value that type calculates accurately screens target word, reflects candidate word for the importance of live streaming classification.
Embodiment in order to enable those skilled in the art to better understand the present invention illustrates this hair below by way of specific example To the method for live streaming category label label information in bright embodiment.
It is broadcast live in platform there are two live streaming classifications, i.e. " game " and " face value ", from " game " and the live streaming in " face value " Between title, barrage, be broadcast live subclassification title in extract text information.
The text information parts extracted to the direct broadcasting room in " game " are as follows:
1, impart knowledge to students great master: upper list overlord does not quarrel
2, the top of a valley --- wild Qu Wangzhe
3, main broadcaster is exactly a set pattern great master!
4, fiery shadow is robbed
5, black rose: HappyTime has begun
Wherein, 1,2,4 be direct broadcasting room title, 3 be barrage, and 4 be the title that subclassification is broadcast live.
The text information parts extracted to the direct broadcasting room in " face value " are as follows:
1, I likes this beautiful elder sister
2, light fragrant flower language, temperature are gentle as before
3, people's Western style of singing sweet tea
4, pure and fresh beauty is in sing and dance
5, main broadcaster has taken key away
Wherein, 2 be direct broadcasting room title, 1,4,5 be barrage, and 3 be the title that subclassification is broadcast live.
After carrying out word segmentation processing to the text information in " game " and filter stop words, following Feature Words are obtained:
1, the upper single overlord of teaching great master does not quarrel
2, the top of a valley open country Qu Wangzhe
3, main broadcaster's set pattern great master
4, fiery shadow robs the operation of limit mind
5, black rose HappyTime starts
After carrying out word segmentation processing to the text information in " face value " and filter stop words, following Feature Words are obtained:
1, I likes beautiful elder sister
2, light fragrant flower language temperature is gentle as before
3, the small elder sister of people's Western style of singing sweet tea
4, pure and fresh beauty's singing and dancing
5, main broadcaster takes key away
Pass throughEach feature in calculating " game ", " face value " The desired value of word, and highest 4 Feature Words of selective value, as candidate word.
Candidate word and its desired value in " game " is as follows:
Upper list overlord 1.12
Wild area king 1.02
Set pattern great master 0.98
Fiery shadow robs 0.87
Candidate word and its desired value in " face value " is as follows:
Beautiful elder sister 1.65
Temperature gentle as before 1.23
People's Western style of singing sweet tea 1.02
Pure and fresh beauty 0.98
Pass through R (w)=λ1*log(tf_text(w)+1)+λ2*log(tf_t(w)+1)+λ3*log(tf_zone(w)+1) (set λ1=0.2, λ2=0.5, λ3=0.3) calculate " game ", in " face value " each candidate word comprehensive grading value, and selective value is most 2 high candidate words, as target word.
By taking " upper list overlord " as an example, if:
Tf_text (upper list overlord)=120
Tf_t (upper list overlord)=10
Tf_zone (upper list overlord)=2
R (upper list overlord)=0.2*loog (120+1)+0.5*lg (10+1)+0.3*log (2+1)=1.07
Similarly, R (wild Qu Wangzhe)=0.575, R (set pattern great master)=0.621, R (fiery shadow misfortune)=1.53, R (beauty are calculated Beautiful elder sister)=2.13, R (temperature is gentle as before)=1.226, R (people's Western style of singing sweet tea)=0.776, R (pure and fresh beauty)=1.13
Therefore, the label information of " game " is set by " fiery shadow misfortune ", " upper list overlord ", by " beautiful elder sister ", " Wen Wanru Just " it is set as the label information of " face value ".
Embodiment three
Fig. 3 is a kind of structural schematic diagram of the labelling apparatus for label that the embodiment of the present invention three provides, and be can specifically include Following module:
Text information extraction module 310, for extracting the text information of multiple types from the direct broadcasting room in live streaming classification;
Feature Words extraction module 320, for extracting Feature Words from the text information;
Candidate word screening module 330, for according to the relevance between the Feature Words and the live streaming classification from described Candidate word is screened in Feature Words;
Target word screening module 340, for screening mesh from the candidate word according to the corresponding type of the text information Mark word;
Label information setup module 350, for setting the target word to the label information of the live streaming classification.
In one example of an embodiment of the present invention, the type of the text information includes following at least one:
The title of the direct broadcasting room, the barrage of the direct broadcasting room, subclassification title;
Wherein, the name for the live streaming subclassification that the entitled direct broadcasting room of the subclassification belongs under the live streaming classification Claim.
In one embodiment of the invention, the candidate word screening module 330 includes:
Desired value computational submodule, for calculating institute based on the distributional difference between the Feature Words and the live streaming classification State the desired value of Feature Words, wherein the desired value and the distributional difference are positively correlated;
Desired value screens submodule, for screening candidate word from the Feature Words according to the desired value.
In one example of an embodiment of the present invention, the desired value of the Feature Words is calculated by following formula
Wherein, N is the quantity of the text information, and A indicates occur the number of Feature Words w in live streaming classification v, and B is indicated Occurs the number of Feature Words w in non-live streaming classification v, C indicates occur the number of non-Feature Words w in live streaming classification v, and D is indicated There is the number of non-Feature Words w in non-live streaming classification v.
In one embodiment of the invention, the desired value screening submodule includes:
Candidate value selecting unit, for selecting candidate value from the desired value, wherein the candidate value is m and is worth most Big desired value;
Candidate value setting unit, for setting candidate word for the corresponding Feature Words of the candidate value.
In one embodiment of the invention, the target word screening module 340 includes:
Classification score value computational submodule, for calculating classification score value of the candidate word in the type;
Comprehensive grading value computational submodule, for calculating the comprehensive score of the candidate word in conjunction with the classification score value Value;
Comprehensive grading value screens submodule, for screening target word from the candidate word according to the comprehensive grading value.
In one embodiment of the invention, the classification score value computational submodule includes:
Total degree statistic unit, the total degree occurred in the type for counting the candidate word;
Total degree computing unit scores for calculating classification of the candidate word in the type according to the total degree Value, wherein the total degree and the classification score value are positively correlated.
In one embodiment of the invention, the comprehensive grading value computational submodule includes:
Weight configuration unit obtains for configuring weight to the classification score value according to the type and adjusts power score value;
Summation unit, the comprehensive grading value for calculating the sum of described tune power score value, as the candidate word.
In one example of an embodiment of the present invention, the comprehensive grading value R of the candidate word w is calculated by following formula (w):
R (w)=λ1*log(tf_text(w)+1)+λ2*log(tf_t(w)+1)+λ3*log(tf_zone(w)+1)
Wherein, tf_text (w) is candidate word w in the title of the direct broadcasting room, the barrage of the direct broadcasting room and subclassification name The total degree occurred in title, tf_t (w) are the total degree that candidate word w occurs in the title of the direct broadcasting room, tf_zone (w) For the total degree that candidate word w occurs in subclassification title, λ1、λ2、λ3For weight.
In one embodiment of the invention, the comprehensive grading value screening submodule includes:
Target score value selecting unit, for the selection target score value from the comprehensive grading value, wherein the target Score value is the n maximum comprehensive grading value of value;
Target score value setting unit, for setting target word for the corresponding candidate word of the target score value.
Label provided by any embodiment of the invention can be performed in the labelling apparatus of label provided by the embodiment of the present invention Labeling method, have the corresponding functional module of execution method and beneficial effect.
Example IV
Fig. 4 is a kind of structural schematic diagram for server that the embodiment of the present invention four provides, as shown in figure 4, the server packet Include processor 40, memory 41, input unit 42 and output device 43;In server the quantity of processor 40 can be one or It is multiple, in Fig. 4 by taking a processor 40 as an example;Processor 40, memory 41, input unit 42 and output device in server 43 can be connected by bus or other modes, in Fig. 4 for being connected by bus.
Memory 41 is used as a kind of computer readable storage medium, can be used for storing software program, journey can be performed in computer Sequence and module, if the corresponding program instruction/module of the labeling method of the label in the embodiment of the present invention is (for example, text information Extraction module 310, Feature Words extraction module 320, candidate word screening module 330, target word screening module 340 and label information are set Set module 350).Software program, instruction and the module that processor 40 is stored in memory 41 by operation, thereby executing clothes The various function application and data processing of business device, that is, realize the labeling method of above-mentioned label.
Memory 41 can mainly include storing program area and storage data area, wherein storing program area can store operation system Application program needed for system, at least one function;Storage data area, which can be stored, uses created data etc. according to terminal.This Outside, memory 41 may include high-speed random access memory, can also include nonvolatile memory, for example, at least a magnetic Disk storage device, flush memory device or other non-volatile solid state memory parts.In some instances, memory 41 can be further Including the memory remotely located relative to processor 40, these remote memories can pass through network connection to server.On The example for stating network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Input unit 42 can be used for receiving the number or character information of input, and generate with the user setting of server with And the related key signals input of function control.Output device 43 may include that display screen etc. shows equipment.
Embodiment five
The embodiment of the present invention five also provides a kind of storage medium comprising computer executable instructions, and the computer can be held Row is instructed when being executed by computer processor for executing a kind of labeling method of label, this method comprises:
The text information of multiple types is extracted from the direct broadcasting room in live streaming classification;
Feature Words are extracted from the text information;
Candidate word is screened from the Feature Words according to the relevance between the Feature Words and the live streaming classification;
Target word is screened from the candidate word according to the corresponding type of the text information;
Set the target word to the label information of the live streaming classification.
Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention The method operation that executable instruction is not limited to the described above, can also be performed the mark of label provided by any embodiment of the invention Relevant operation in note method.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art Part can be embodied in the form of software products, which can store in computer readable storage medium In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
It is worth noting that, in the embodiment of the labelling apparatus of above-mentioned label, included each unit and module are It is divided according to the functional logic, but is not limited to the above division, as long as corresponding functions can be realized;Separately Outside, the specific name of each functional unit is also only for convenience of distinguishing each other, the protection scope being not intended to restrict the invention.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (10)

1. a kind of labeling method of label characterized by comprising
The text information of multiple types is extracted from the direct broadcasting room in live streaming classification;
Feature Words are extracted from the text information;
Candidate word is screened from the Feature Words according to the relevance between the Feature Words and the live streaming classification;
Target word is screened from the candidate word according to the corresponding type of the text information;
Set the target word to the label information of the live streaming classification.
2. the method according to claim 1, wherein described according between the Feature Words and the live streaming classification Relevance screen candidate word from the Feature Words, comprising:
The desired value of the Feature Words is calculated based on the distributional difference between the Feature Words and the live streaming classification, wherein institute It states desired value and the distributional difference is positively correlated;
Candidate word is screened from the Feature Words according to the desired value.
3. according to the method described in claim 2, it is characterized in that, calculating the desired value of the Feature Words by following formula
Wherein, N is the quantity of the text information, and A indicates occur the number of Feature Words w in live streaming classification v, and B is indicated non- Occurs the number of Feature Words w in live streaming classification v, C indicates occur the number of non-Feature Words w in live streaming classification v, and D is indicated non- There is the number of non-Feature Words w in live streaming classification v.
4. method according to claim 1-3, which is characterized in that described according to the corresponding class of the text information Type screens target word from the candidate word, comprising:
Calculate classification score value of the candidate word in the type;
The comprehensive grading value of the candidate word is calculated in conjunction with the classification score value;
Target word is screened from the candidate word according to the comprehensive grading value.
5. according to the method described in claim 4, it is characterized in that, the classification for calculating the candidate word in the type Score value, comprising:
Count the total degree that the candidate word occurs in the type;
Classification score value of the candidate word in the type is calculated according to the total degree, wherein the total degree and institute Classification score value is stated to be positively correlated.
6. according to the method described in claim 4, it is characterized in that, classification score value described in the combination calculates the candidate word Comprehensive grading value, comprising:
Weight is configured to the classification score value according to the type, obtains and adjusts power score value;
Calculate the sum of described tune power score value, the comprehensive grading value as the candidate word.
7. according to the method described in claim 4, it is characterized in that, the type of the text information includes following at least one:
The title of the direct broadcasting room, the barrage of the direct broadcasting room, subclassification title;
Wherein, the title for the live streaming subclassification that the entitled direct broadcasting room of the subclassification belongs under the live streaming classification;
The comprehensive grading value R (w) of the candidate word w is calculated by following formula:
R (w)=λ1*log(tf_text(w)+1)+λ2*log(tf_t(w)+1)+λ3*log(tf_zone(w)+1)
Wherein, tf_text (w) is candidate word w in the title of the direct broadcasting room, the barrage and subclassification title of the direct broadcasting room The total degree of appearance, tf_t (w) are the total degree that candidate word w occurs in the title of the direct broadcasting room, and tf_zone (w) is to wait The total degree for selecting word w to occur in subclassification title, λ1、λ2、λ3For weight.
8. a kind of labelling apparatus of label characterized by comprising
Text information extraction module, for extracting the text information of multiple types from the direct broadcasting room in live streaming classification;
Feature Words extraction module, for extracting Feature Words from the text information;
Candidate word screening module, for according to the Feature Words and it is described live streaming classification between relevance from the Feature Words Screen candidate word;
Target word screening module, for screening target word from the candidate word according to the corresponding type of the text information;
Label information setup module, for setting the target word to the label information of the live streaming classification.
9. a kind of server including memory, processor and stores the computer that can be run on a memory and on a processor Program, which is characterized in that the processor realizes the mark of the label as described in any in claim 1-7 when executing described program Note method.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The labeling method of the label as described in any in claim 1-7 is realized when execution.
CN201811229982.6A 2018-10-22 2018-10-22 Label marking method, label marking device, server and storage medium Active CN109547863B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811229982.6A CN109547863B (en) 2018-10-22 2018-10-22 Label marking method, label marking device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811229982.6A CN109547863B (en) 2018-10-22 2018-10-22 Label marking method, label marking device, server and storage medium

Publications (2)

Publication Number Publication Date
CN109547863A true CN109547863A (en) 2019-03-29
CN109547863B CN109547863B (en) 2021-06-15

Family

ID=65844520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811229982.6A Active CN109547863B (en) 2018-10-22 2018-10-22 Label marking method, label marking device, server and storage medium

Country Status (1)

Country Link
CN (1) CN109547863B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110198490A (en) * 2019-05-23 2019-09-03 北京奇艺世纪科技有限公司 Live video subject classification method, apparatus and electronic equipment
CN112995690A (en) * 2021-02-26 2021-06-18 广州虎牙科技有限公司 Live content item identification method and device, electronic equipment and readable storage medium
CN114780668A (en) * 2022-04-22 2022-07-22 盐城金堤科技有限公司 Method and device for generating service label, computer storage medium and electronic terminal
WO2022247906A1 (en) * 2021-05-28 2022-12-01 北京沃东天骏信息技术有限公司 Live-streaming processing method, live-streaming platform, and apparatus, system, medium and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096031A (en) * 2016-06-27 2016-11-09 武汉斗鱼网络科技有限公司 The video sequencing method of a kind of tape label and device
CN107770614A (en) * 2016-08-18 2018-03-06 中国电信股份有限公司 The label producing method and device of content of multimedia
CN108256044A (en) * 2018-01-12 2018-07-06 武汉斗鱼网络科技有限公司 Direct broadcasting room recommends method, apparatus and electronic equipment
CN108271076A (en) * 2017-01-03 2018-07-10 武汉斗鱼网络科技有限公司 A kind of method and device for recommending direct broadcasting room
CN108280059A (en) * 2018-01-09 2018-07-13 武汉斗鱼网络科技有限公司 Direct broadcasting room content tab extracting method, storage medium, electronic equipment and system
JP2018112853A (en) * 2017-01-11 2018-07-19 日本放送協会 Topic classification apparatus and program therefor

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096031A (en) * 2016-06-27 2016-11-09 武汉斗鱼网络科技有限公司 The video sequencing method of a kind of tape label and device
CN107770614A (en) * 2016-08-18 2018-03-06 中国电信股份有限公司 The label producing method and device of content of multimedia
CN108271076A (en) * 2017-01-03 2018-07-10 武汉斗鱼网络科技有限公司 A kind of method and device for recommending direct broadcasting room
JP2018112853A (en) * 2017-01-11 2018-07-19 日本放送協会 Topic classification apparatus and program therefor
CN108280059A (en) * 2018-01-09 2018-07-13 武汉斗鱼网络科技有限公司 Direct broadcasting room content tab extracting method, storage medium, electronic equipment and system
CN108256044A (en) * 2018-01-12 2018-07-06 武汉斗鱼网络科技有限公司 Direct broadcasting room recommends method, apparatus and electronic equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110198490A (en) * 2019-05-23 2019-09-03 北京奇艺世纪科技有限公司 Live video subject classification method, apparatus and electronic equipment
CN110198490B (en) * 2019-05-23 2021-07-30 北京奇艺世纪科技有限公司 Live video theme classification method and device and electronic equipment
CN112995690A (en) * 2021-02-26 2021-06-18 广州虎牙科技有限公司 Live content item identification method and device, electronic equipment and readable storage medium
WO2022247906A1 (en) * 2021-05-28 2022-12-01 北京沃东天骏信息技术有限公司 Live-streaming processing method, live-streaming platform, and apparatus, system, medium and device
CN114780668A (en) * 2022-04-22 2022-07-22 盐城金堤科技有限公司 Method and device for generating service label, computer storage medium and electronic terminal
CN114780668B (en) * 2022-04-22 2024-04-09 盐城天眼察微科技有限公司 Service label generation method and device, computer storage medium and electronic terminal

Also Published As

Publication number Publication date
CN109547863B (en) 2021-06-15

Similar Documents

Publication Publication Date Title
CN109547863A (en) A kind of labeling method of label, device, server and storage medium
CN102591942B (en) Method and device for automatic application recommendation
CN109408639A (en) A kind of barrage classification method, device, equipment and storage medium
CN110297934B (en) Image data processing method, device and storage medium
KR102017853B1 (en) Method and apparatus for searching
CN108769823A (en) Direct broadcasting room display methods, device, equipment and storage medium
CN103984741A (en) Method and system for extracting user attribute information
CN111274442B (en) Method for determining video tag, server and storage medium
CN106326391A (en) Method and device for recommending multimedia resources
CN110162643B (en) Electronic album report generation method, device and storage medium
CN109408672B (en) Article generation method, article generation device, server and storage medium
CN110019647A (en) A kind of keyword search methodology, device and search engine
CN103744849A (en) Method and device for automatic recommendation application
CN107203569B (en) Intelligent reading subject setting method and device for immersive reading
CN110390025A (en) Cover figure determines method, apparatus, equipment and computer readable storage medium
CN110796098A (en) Method, device, equipment and storage medium for training and auditing content auditing model
KR20200023013A (en) Video Service device for supporting search of video clip and Method thereof
CN104090880A (en) Method and deice for configuring equalizer parameters of audio files
CN113329261B (en) Video processing method and device
CN109643332A (en) A kind of sentence recommended method and device
CN109729377A (en) A kind of method for pushing, device, computer equipment and the storage medium of main broadcaster's information
CN108959304A (en) A kind of Tag Estimation method and device
CN102930016B (en) A kind of method and apparatus for providing Search Results on mobile terminals
CN111737473A (en) Text classification method, device and equipment
Yanai et al. A visual analysis of the relationship between word concepts and geographical locations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231226

Address after: Room 1401-1408, 10th to 13th and 14th floors, Building 2, China Shipbuilding Heavy Industry Technology Building, No. 176 Haier Road, Laoshan District, Qingdao City, Shandong Province, 266035

Patentee after: Qingdao Qingzi Engineering Consulting Co.,Ltd.

Address before: 11 / F, building B1, software industry phase 4.1, No.1, Software Park East Road, Donghu Development Zone, Wuhan City, Hubei Province 430070

Patentee before: WUHAN DOUYU NETWORK TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right