CN105893478A - Tag extraction method and equipment - Google Patents
Tag extraction method and equipment Download PDFInfo
- Publication number
- CN105893478A CN105893478A CN201610186950.7A CN201610186950A CN105893478A CN 105893478 A CN105893478 A CN 105893478A CN 201610186950 A CN201610186950 A CN 201610186950A CN 105893478 A CN105893478 A CN 105893478A
- Authority
- CN
- China
- Prior art keywords
- vocabulary
- label
- weighted value
- ugc
- candidate word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the invention discloses a tag extraction method and equipment, wherein the method is realized by the following steps of obtaining a UGC (User Generated Content) associated with a content provided by a content issuer; performing word segmentation on the UGC; using words obtained after the word segmentation as optional tags; calculating the weight values of each word in the optional tags; selecting the words from the optional tags in a sequence from great weight values to small weight values of each word in the optional tag to be used as candidate words; and using the candidate words as a second tag. The tag is obtained through extraction by extracting the UGC, so that the special input of the tag by the user is not needed; the UGC can be from various users; the partition can be performed on the basis of weight values; the tag extraction can be automatically completed; therefore the tag contents can be enriched; and the tags can be diversified and accurate.
Description
Technical field
The present invention relates to communication technical field, particularly to a kind of tag extraction method and apparatus.
Background technology
In this application, label refers to user's evaluation label to content publisher.
As a example by video website, the label of the video in present each big video website is substantially all to be sent out by video
Cloth person or web editor are stamped, unavoidably can be with subjectivity and one-sidedness.That is tag extraction is
Realize by the way of receiving video distribution person or web editor.
For the deficiency improving video distribution person and web editor labels, user is allowed to participate in labelling and can compare
Preferably;Then can improve the deficiency only labelled by author, program tag extraction is by receiving user
Mode realize.
But user is relatively low to the participation that labels, cause user to content provider or content provider
The label of the content provided is less, is even difficult to obtain.
Summary of the invention
Embodiments provide a kind of tag extraction method and apparatus, can for extract that user provides
Select label, enrich label substance, and make label more diversification with accurate.
On the one hand a kind of tag extraction method is embodiments provided, including:
The user original content UGC that the content that obtaining provides with content issuer is associated, to described UGC
The vocabulary that participle carries out participle, obtain is as optional label;
Calculate the weighted value of each vocabulary in described optional label, according to the power of each vocabulary in described optional label
Weight values selects vocabulary as candidate word from high to low from described optional label;
Using described candidate word as the second label.
In a possible implementation, in calculating described optional label before the weighted value of each vocabulary,
Described method also includes:
Select noun and/or the vocabulary of noun phrase in described optional label, and remove the vocabulary of the repetition meaning of one's words
And invalid vocabulary obtains remaining vocabulary;
The weighted value of each vocabulary in the described optional label of described calculating, according to each vocabulary in described optional label
Weighted value from high to low from described optional label select vocabulary as candidate word;Including:
Described residue vocabulary is carried out weight calculation and obtains the first weighted value of each vocabulary in described residue vocabulary,
And select vocabulary as candidate word from high to low from described residue vocabulary according to described first weighted value.
In a possible implementation, according to described first weighted value from high to low from described residue
Before selecting vocabulary as candidate word in vocabulary, described method also includes:
Obtain the first label that described content issuer provides;Calculate described candidate word and described first label
The degree of association obtain the second weighted value;
Selecting vocabulary as candidate word from high to low from described residue vocabulary according to described first weighted value
Afterwards, described method also includes:
Select vocabulary as the second label from high to low from described candidate word according to the second weighted value;Or,
Calculate described first weighted value and the comprehensive weight of the second weighted value, according to described comprehensive weight from high to low
Select vocabulary as the second label from described candidate word.
In a possible implementation, the described weight calculation that carries out described residue vocabulary obtains described
In residue vocabulary, the weighted value of each vocabulary includes:
Add up each vocabulary occurrence number in described UGC in described residue vocabulary, and determine and each vocabulary
The weighted value that occurrence number in described UGC is corresponding.
In a possible implementation, described described UGC carried out participle include:
Obtain the sentence of described UGC, described sentence is grown most coupling and the most anti-
To the longest coupling, take the less result of participle amount as word segmentation result, take when participle amount is identical described instead
To the result of the longest coupling as word segmentation result.
The two aspect embodiment of the present invention additionally provide a kind of tag extraction equipment, including:
Contents acquiring unit, the user for obtaining with the content of content issuer offer is associated is original interior
Hold UGC;
Bilingual lexicon acquisition unit, the vocabulary that participle is used for that described UGC is carried out participle, obtain is as optional
Label;
Weight calculation unit, for calculating the weighted value of each vocabulary in described optional label;
Lexical choice unit, is used for according to the weighted value of each vocabulary in described optional label from high to low from institute
State and optional label selects vocabulary as candidate word;
Tag determination unit, is used for described candidate word as the second label.
In a possible implementation, described tag extraction equipment also includes:
Vocabulary screening unit, specifically for noun and/or the vocabulary of noun phrase in the described optional label of selection,
And remove the vocabulary of the repetition meaning of one's words and invalid vocabulary obtains remaining vocabulary;
Described weight calculation unit, obtains described surplus specifically for described residue vocabulary carries out weight calculation
First weighted value of each vocabulary in remaining vocabulary;
Described lexical choice unit, specifically for according to described first weighted value from high to low from described residue
Vocabulary select vocabulary as candidate word.
In a possible implementation, described tag extraction equipment also includes:
Label acquiring unit, for obtaining the first label that described content issuer provides;
Described weight calculation unit, the degree of association being additionally operable to calculate described candidate word and described first label obtains
To the second weighted value;Or, the degree of association calculating described candidate word and described first label obtains the second power
Weight values, then calculates described first weighted value and the comprehensive weight of the second weighted value;
Described tag determination unit, specifically for foundation the second weighted value from high to low from described candidate word
Select vocabulary as the second label according to described comprehensive weight from high to low from described candidate word select vocabulary
As the second label.
In a possible implementation, described weight calculation unit, specifically for adding up described residue
Each vocabulary occurrence number in described UGC in vocabulary, and determine with each vocabulary in described UGC
The weighted value that occurrence number is corresponding.
In a possible implementation, described bilingual lexicon acquisition unit, specifically for obtaining described UGC
Sentence, described sentence is grown most coupling and reverse the longest coupling from right to left, takes point
The less result of word amount, as word segmentation result, takes the result of described reversely the longest coupling when participle amount is identical
As word segmentation result.
As can be seen from the above technical solutions, the embodiment of the present invention has the advantage that by former to user
Wound content UGC extract, thus extract obtain label, so can the special input label of user,
UGC may come from numerous user, carries out subregion based on weighted value, and tag extraction is automatically performed;Therefore,
Label substance can be enriched, and make label more diversification with accurate.
Accompanying drawing explanation
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, in embodiment being described below
The required accompanying drawing used is briefly introduced, it should be apparent that, the accompanying drawing in describing below is only this
Some bright embodiments, from the point of view of those of ordinary skill in the art, are not paying creative work
On the premise of, it is also possible to other accompanying drawing is obtained according to these accompanying drawings.
Fig. 1 is embodiment of the present invention method flow schematic diagram;
Fig. 2 is embodiment of the present invention method flow schematic diagram;
Fig. 3 is embodiment of the present invention device structure schematic diagram;
Fig. 4 is embodiment of the present invention device structure schematic diagram;
Fig. 5 is embodiment of the present invention device structure schematic diagram;
Fig. 6 is embodiment of the present invention server architecture schematic diagram;
Fig. 7 is embodiment of the present invention device structure schematic diagram.
Detailed description of the invention
In order to make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to this
Invention is described in further detail, it is clear that described embodiment is only that some of the present invention is implemented
Example rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art exist
Do not make all other embodiments obtained under creative work premise, broadly fall into present invention protection
Scope.
Embodiments provide a kind of tag extraction method, as it is shown in figure 1, include:
101: the user original content UGC that the content that obtaining provides with content issuer is associated, to above-mentioned
The vocabulary that UGC carries out participle, obtained by participle is as optional label;
At communication technical field, content issuer is the publisher of Internet resources, such as: net cast
Publisher;User refers to that the user of Internet resources, user original content UGC are that video is delivered by user
Suggestion, can be generally Word message, such as: barrage or comment etc..In theory if audio frequency,
Need to do speech recognition, it is also possible to realizing, data processing amount can be bigger.
The Word message of user original content UGC is carried out participle, and concrete segmentation methods is referred to
Through more ripe segmentation methods, the present invention implements that this does not make uniqueness and limits.
102: calculate the weighted value of each vocabulary in above-mentioned optional label, according to each vocabulary in above-mentioned optional label
Weighted value from high to low from above-mentioned optional label select vocabulary as candidate word;
After the Word message of user original content UGC is carried out participle, some optional vocabulary can be obtained,
These vocabulary can use as label, but the optional label that participle obtains can be the most, it would be desirable to
Select a portion as label;Therefore each vocabulary can be distinguished, specifically in the way of using weighted value
How to determine the weighted value of vocabulary, empirical value can be used to determine, it is also possible to determine based on statistical magnitude,
This embodiment of the present invention is not made uniqueness limit.
103: using above-mentioned candidate word as the second label.
" first " and " second " is the label in order to distinguish type in embodiments of the present invention, should not
It is interpreted as that there is the implication that other technologies limit.Wherein the first label is the label that content issuer provides,
Second label is the label using the embodiment of the present invention to carry out tag extraction acquisition.
The embodiment of the present invention, by extracting user original content UGC, thus extracts and obtains label,
So can the special input label of user, UGC may come from numerous user, enters based on weighted value
Row subregion, tag extraction is automatically performed;Therefore, it can enrich label substance, and make label the most polynary
Change and accurate.
Further, due to UGC wide material sources, such as: barrage function may have a lot of people sending out
Barrage, this is it would appear that more word, and these words there may be what the meaning of one's words repeated, it is also possible to occurs
Some cannot function as the invalid word of label, and the embodiment of the present invention can be removed these words and improve label further
The accuracy extracted, specific as follows: computationally to state in optional label before the weighted value of each vocabulary, on
Method of stating also includes:
Select noun and/or the vocabulary of noun phrase in above-mentioned optional label, and remove the vocabulary of the repetition meaning of one's words
And invalid vocabulary obtains remaining vocabulary;
The weighted value of each vocabulary in the above-mentioned optional label of above-mentioned calculating, according to each vocabulary in above-mentioned optional label
Weighted value from high to low from above-mentioned optional label select vocabulary as candidate word;Including:
Above-mentioned residue vocabulary is carried out weight calculation and obtains the first weighted value of each vocabulary in above-mentioned residue vocabulary,
And select vocabulary as candidate word from high to low from above-mentioned residue vocabulary according to above-mentioned first weighted value.
In general noun can use as label, and verb, measure word etc. are general unsuitable for conduct
Label, it is possible to noun and noun phrase are extracted, then removes the vocabulary of the repetition meaning of one's words.Weight
The multiple meaning of one's words, is to reduce the vocabulary of close implication as label, causing the label duplicated;This
In bright embodiment, " first " is additionally operable to distinguish two different weighted values, wherein the first power with " second "
Weight values is the weighted value of residue vocabulary, and the second weighted value is the weighted value of candidate word;It has been not construed as
Other technologies implication.Invalid vocabulary, can be the sensitive vocabulary being forbidden to occur in label legally, it is possible to
To be nonsensical vocabulary itself;Can be removed by the form of lexicon.It addition, title and noun
Phrase, it is also possible to set up effective lexicon, title here and noun phrase are when needing in effective lexicon
The vocabulary in face.
Further, UGC being carried out tag extraction, directivity is relatively low, and the tag orientation that may extract has
Deviation, in order to reduce this deviation, the embodiment of the present invention additionally provides following solution: in foundation
State the first weighted value from high to low from above-mentioned residue vocabulary select vocabulary as candidate word before, above-mentioned side
Method also includes:
Obtain the first label that foregoing publisher provides;Calculate above-mentioned candidate word and above-mentioned first label
The degree of association obtain the second weighted value;
Selecting vocabulary as candidate word from high to low from above-mentioned residue vocabulary according to above-mentioned first weighted value
Afterwards, said method also includes:
Select vocabulary as the second label from high to low from above-mentioned candidate word according to the second weighted value;Or,
Calculate above-mentioned first weighted value and the comprehensive weight of the second weighted value, according to above-mentioned comprehensive weight from high to low
Select vocabulary as the second label from above-mentioned candidate word.
The present embodiment, the power of candidate word assessed by the first label be given by content providers as direction
Weight, so that the thinking of content issuer pressed close on direction by the label extracted, abundant content issuer
Label, and embody other users evaluation to Internet resources.In the present embodiment, the second label is permissible
Only with reference to the second weight, use comprehensive weight can be at the base taking into account the tag orientation that content providers provides
Balancing the result extracted based on UGC automated tag on plinth, the situation reducing label one-sidedness occurs.
Alternatively, above-mentioned above-mentioned residue vocabulary is carried out weight calculation obtain each vocabulary in above-mentioned residue vocabulary
Weighted value include:
Add up each vocabulary occurrence number in above-mentioned UGC in above-mentioned residue vocabulary, and determine and each vocabulary
The weighted value that occurrence number in above-mentioned UGC is corresponding.
Weighted value calculation can have a variety of, using occurrence number as statistics knot in the embodiment of the present invention
Fruit determines weighted value, relatively simple and can embody the most users evaluation to Internet resources, meets mark
The requirement signed, enables the label extracted to reflect the evaluation of user.
Alternatively, above-mentioned above-mentioned UGC carried out participle include:
Obtain the sentence of above-mentioned UGC, above-mentioned sentence is grown most coupling and the most anti-
To the longest coupling, take the less result of participle amount as word segmentation result, take when participle amount is identical above-mentioned instead
To the result of the longest coupling as word segmentation result.
In the present embodiment, the amount of calculation of participle can increase along with the quantity of UGC, for be likely to occur
Magnanimity UGC, this step can use distributed arithmetic to improve calculating speed.Select at the present embodiment
Amount of calculation is relatively small and is more suitable for the algorithm of tag extraction, to improve participle efficiency and to obtain relatively
For word segmentation result accurately.
The embodiment of the present invention, mainly according to the UGC content that video is relevant, such as: comment and barrage, is carried out
Data mining obtains some valuable labels, as video tab.On the one hand can make up in product side
Video distribution side unilaterally labels with subjective and unilateral deficiency, on the other hand to user's unaware,
Without threshold, evade the enthusiasm problem how guiding and providing user to label well.
As in figure 2 it is shown, be the main body frame of the embodiment of the present invention;Including:
201: obtain publisher's label;
This step is to obtain the label that publisher beats.
202: go out the user tag of Weight according to UGC content mining;
User tag is that the UGC of viewing side's offer of Internet resources excavates the label obtained.
203: publisher's label weights with user tag similarity;
204: screening denoising, export result.
Screening denoising is to screen the label obtained.
Above step, is embodied in subsequent embodiment and describes in detail respectively.
One, publisher's label:
First, video author (or website editor) is allowed to label to video.
It should be noted that embodiment of the present invention scheme can not only label according to user's UGC content,
It is also based on the basis of video author labels, introducing user tag, making label more polynary and accurate.
Allow author label in advance and be used as the candidate factors of final label, be to make to excavate the mark obtained subsequently
Sign also with certain theme tendency, so can better ensure that the quality of label, and make it symbol
Close the planning of video website self.
Two, the user tag of Weight is gone out according to UGC content mining.
1, UGC content is carried out participle:
The UGC contents such as the comment of video and barrage are carried out participle, and the algorithm of Chinese word segmentation can select one
The participle development library increased income a bit is to complete participle function, such as Sfanford, IKAnalyzer, Word etc..
Can consider to select the minimum segmentation methods of forward and reverse maximum match in terms of segmentation methods, it may be assumed that from sentence
The combination algorithm of the longest left-to-right coupling and from right to left reverse the longest coupling, and take that participle amount is minimum one
Individual result, negate when participle number is identical to segmenting method.It is pointed out that Chinese word segmentation is often
Computationally intensive, it may be necessary to consider to use Spark cluster etc. to carry out distributed arithmetic to improve calculating speed.
2, key word is extracted:
Keyword extraction can be divided into three below step:
A) part-of-speech tagging and selection:
In view of label mostly based on noun, therefore can only extract noun therein or noun phrase
As candidate word.Participle instrument is from part-of-speech tagging function at present, therefore can be used directly to carry
Take out all of noun.
B) invalid word filters:
Invalid word filters and refers in the set of candidate word, weeds out some and labels video and have little significance
Word, such as: unhealthy word, sensitive word etc..Invalid word filters can be according to invalid word list check and correction
Mode realizes, and invalid word list can be set up and " the screening denoising " of each label generation process by artificial
Step carries out being continuously replenished upgrading.
C) meaning of one's words duplicate removal:
Owing to the label of the identical meaning of one's words is the most useful, it is possible to the candidate word obtained after participle, carry out language
Meaning duplicate removal.Meaning of one's words duplicate removal can be realized by the method for near-synonym, it is not necessary to is concerned about near synonym identification
Algorithm, it is only necessary to the Chinese near synonym storehouse ready-made with some carries out processing.Belonging near synonym
Word is marked classification, then will belong to the word of a class, replaces with occurrence number in classification most
One.
3, weight calculation:
After keyword extraction is complete, key word is carried out weight calculation, finally extract weighted value forward
Some key words, as candidate word.The weight calculation of key word has many conventional algorithms, such as tf-idf.
Owing to video tab is classified or not quite alike with data, a video can play the label of multiple kind,
The most here we can calculate weight only by statistics tf, i.e. word frequency, the namely weight of key word
Can be determined by the occurrence number calculating key word, ratio is in lists of keywords, and word A has 10, word
B has 2, then the weight coefficient of word A is 10, and the weight coefficient of word B is 2, and this weight suspense is x.
3.1: publisher's label weights with the similarity of user tag:
By above-mentioned step, we have obtained author's label and user tag.If author's label and use
Have some same or like labels between the label of family, then illustrate on this point, the judgement of author and
The judgement of user matches, and then it is believed that this label possesses accuracy more higher than other labels,
Higher weight should be given.We can be according to the dictionary definition of author's label and the dictionary of user tag
Lexical or textual analysis, as language material, does Similarity measures, obtains the similarity weight y of each label, further according to user
Word frequency weight x of label, obtains final weight w by certain ratiometric conversion.
3.2: screening denoising:
Some author's labels with similarity gain weight, and band is had been obtained for based on aforementioned processing
There is the user tag of weight.For further ensuring that the quality of label and making label meet the planning of website,
After can by use artificial in the way of from these candidate's labels, select final label.Can in screening process
To doing suitable meaning of one's words duplicate removal between author and user tag, can consider to add for some invalid labels
Enter to " invalid word list ".The principle of label filtration is preferentially to select the higher content of similarity gain to provide
Side's label, and the user tag that weight is higher, because these labels possess the highest accuracy.And some
Vision unique, fresh and be no lack of representational label can also be selected so that label more diversification.
The beneficial effect that embodiment of the present invention technical scheme is brought:
The present invention traditional labelled by video author (publisher or editor) on the basis of, add
The unit of user tag usually makes label more accurately with polynary.By the way of weight is measured, make the standard of label
Really property is quantified, and expands so that label more horn of plenty is polynary by adding user tag.
The embodiment of the present invention additionally provides a kind of tag extraction equipment, as it is shown on figure 3, include:
Contents acquiring unit 301, original for the user obtained with the content of content issuer offer is associated
Content UGC;
Bilingual lexicon acquisition unit 302, for above-mentioned UGC is carried out participle, vocabulary that participle is obtained as
Optional label;
Weight calculation unit 303, for calculating the weighted value of each vocabulary in above-mentioned optional label;
Lexical choice unit 304, for according to the weighted value of each vocabulary in above-mentioned optional label from high to low from
Above-mentioned optional label select vocabulary as candidate word;
Tag determination unit 305, is used for above-mentioned candidate word as the second label.
At communication technical field, content issuer is the publisher of Internet resources, such as: net cast
Publisher;User refers to that the user of Internet resources, user original content UGC are that video is delivered by user
Suggestion, can be generally Word message, such as: barrage or comment etc..In theory if audio frequency,
Need to do speech recognition, it is also possible to realizing, data processing amount can be bigger.
The Word message of user original content UGC is carried out participle, and concrete segmentation methods is referred to
Through more ripe segmentation methods, the present invention implements that this does not make uniqueness and limits.
After the Word message of user original content UGC is carried out participle, some optional vocabulary can be obtained,
These vocabulary can use as label, but the optional label that participle obtains can be the most, it would be desirable to
Select a portion as label;Therefore each vocabulary can be distinguished, specifically in the way of using weighted value
How to determine the weighted value of vocabulary, empirical value can be used to determine, it is also possible to determine based on statistical magnitude,
This embodiment of the present invention is not made uniqueness limit.
" first " and " second " is the label in order to distinguish type in embodiments of the present invention, should not
It is interpreted as that there is the implication that other technologies limit.Wherein the first label is the label that content issuer provides,
Second label is the label using the embodiment of the present invention to carry out tag extraction acquisition.
The embodiment of the present invention, by extracting user original content UGC, thus extracts and obtains label,
So can the special input label of user, UGC may come from numerous user, enters based on weighted value
Row subregion, tag extraction is automatically performed;Therefore, it can enrich label substance, and make label the most polynary
Change and accurate.
Further, due to UGC wide material sources, such as: barrage function may have a lot of people sending out
Barrage, this is it would appear that more word, and these words there may be what the meaning of one's words repeated, it is also possible to occurs
Some cannot function as the invalid word of label, and the embodiment of the present invention can be removed these words and improve label further
The accuracy extracted, specific as follows: as shown in Figure 4, above-mentioned tag extraction equipment also includes:
Vocabulary screening unit 401, specifically for selecting noun in above-mentioned optional label and/or noun phrase
Vocabulary, and remove the vocabulary of the repetition meaning of one's words and invalid vocabulary obtains remaining vocabulary;
Above-mentioned weight calculation unit 303, obtains above-mentioned specifically for above-mentioned residue vocabulary is carried out weight calculation
First weighted value of each vocabulary in residue vocabulary;
Above-mentioned lexical choice unit 304, specifically for remaining from above-mentioned from high to low according to above-mentioned first weighted value
Remaining vocabulary select vocabulary as candidate word.
In general noun can use as label, and verb, measure word etc. are general unsuitable for conduct
Label, it is possible to noun and noun phrase are extracted, then removes the vocabulary of the repetition meaning of one's words.Weight
The multiple meaning of one's words, is to reduce the vocabulary of close implication as label, causing the label duplicated;This
In bright embodiment, " first " is additionally operable to distinguish two different weighted values, wherein the first power with " second "
Weight values is the weighted value of residue vocabulary, and the second weighted value is the weighted value of candidate word;It has been not construed as
Other technologies implication.Invalid vocabulary, can be the sensitive vocabulary being forbidden to occur in label legally, it is possible to
To be nonsensical vocabulary itself;Can be removed by the form of lexicon.It addition, title and noun
Phrase, it is also possible to set up effective lexicon, title here and noun phrase are when needing in effective lexicon
The vocabulary in face.
Further, UGC being carried out tag extraction, directivity is relatively low, and the tag orientation that may extract has
Deviation, in order to reduce this deviation, the embodiment of the present invention additionally provides following solution: such as Fig. 5 institute
Showing, above-mentioned tag extraction equipment also includes:
Label acquiring unit 501, for obtaining the first label that foregoing publisher provides;
Above-mentioned weight calculation unit 303, is additionally operable to the degree of association calculating above-mentioned candidate word with above-mentioned first label
Obtain the second weighted value;Or, the degree of association calculating above-mentioned candidate word and above-mentioned first label obtains second
Weighted value, then calculates above-mentioned first weighted value and the comprehensive weight of the second weighted value;
Above-mentioned tag determination unit 305, specifically for foundation the second weighted value from high to low from above-mentioned candidate word
Middle selection vocabulary as the second label according to above-mentioned comprehensive weight from high to low from above-mentioned candidate word select word
Converge as the second label.
The present embodiment, the power of candidate word assessed by the first label be given by content providers as direction
Weight, so that the thinking of content issuer pressed close on direction by the label extracted, abundant content issuer
Label, and embody other users evaluation to Internet resources.In the present embodiment, the second label is permissible
Only with reference to the second weight, use comprehensive weight can be at the base taking into account the tag orientation that content providers provides
Balancing the result extracted based on UGC automated tag on plinth, the situation reducing label one-sidedness occurs.
Alternatively, above-mentioned weight calculation unit 303, exist specifically for adding up each vocabulary in above-mentioned residue vocabulary
Occurrence number in above-mentioned UGC, and determine corresponding with each vocabulary occurrence number in above-mentioned UGC
Weighted value.
Weighted value calculation can have a variety of, using occurrence number as statistics knot in the embodiment of the present invention
Fruit determines weighted value, relatively simple and can embody the most users evaluation to Internet resources, meets mark
The requirement signed, enables the label extracted to reflect the evaluation of user.
Alternatively, above-mentioned bilingual lexicon acquisition unit 302, specifically for obtaining the sentence of above-mentioned UGC, by upper
State sentence and grow most coupling and reverse the longest coupling from right to left, take the less knot of participle amount
Fruit, as word segmentation result, takes the result of above-mentioned reversely the longest coupling as word segmentation result when participle amount is identical.
In the present embodiment, the amount of calculation of participle can increase along with the quantity of UGC, for be likely to occur
Magnanimity UGC, this step can use distributed arithmetic to improve calculating speed.Select at the present embodiment
Amount of calculation is relatively small and is more suitable for the algorithm of tag extraction, to improve participle efficiency and to obtain relatively
For word segmentation result accurately.
The embodiment of the present invention additionally provides a kind of tag extraction equipment, including: Fig. 6 is the embodiment of the present invention
The server architecture schematic diagram provided, this server 600 can produce bigger because of configuration or performance difference
Difference, one or more central processing units (central processing units, CPU) can be included
622 (such as, one or more processors) and memorizeies 632, one or more storages should
With the storage medium 630 (such as one or more mass memory units) of program 642 or data 644.
Wherein, memorizer 632 and storage medium 630 can be of short duration storage or persistently store.It is stored in storage
The program of medium 630 can include one or more modules (diagram does not marks), and each module is permissible
Including to a series of command operatings in server.Further, central processing unit 622 can be arranged
For communicating with storage medium 630, server 600 performs a series of instructions in storage medium 630
Operation.
Server 600 can also include one or more power supplys 626, one or more wired or
Radio network interface 650, one or more input/output interfaces 658, and/or, one or one with
Upper operating system 641, such as Windows Server TM, Mac OS XTM, Unix TM, Linux TM,
FreeBSDTM etc..
In above-described embodiment, method step can be based on the server architecture shown in this Fig. 6.
The embodiment of the present invention additionally provides another kind of tag extraction equipment, as it is shown in fig. 7, comprises: receive
Equipment 701, transmitting equipment 702, processor 703 and storage device 704;
Wherein processor 703, the user for obtaining with the content of content issuer offer is associated is original interior
Holding UGC, the vocabulary that above-mentioned UGC carries out participle, obtain for participle is as optional label;In calculating
State the weighted value of each vocabulary in optional label, according to the weighted value of each vocabulary in above-mentioned optional label from height to
Low from above-mentioned optional label select vocabulary as candidate word;Using above-mentioned candidate word as the second label.
At communication technical field, content issuer is the publisher of Internet resources, such as: net cast
Publisher;User refers to that the user of Internet resources, user original content UGC are that video is delivered by user
Suggestion, can be generally Word message, such as: barrage or comment etc..In theory if audio frequency,
Need to do speech recognition, it is also possible to realizing, data processing amount can be bigger.
The Word message of user original content UGC is carried out participle, and concrete segmentation methods is referred to
Through more ripe segmentation methods, the present invention implements that this does not make uniqueness and limits.
After the Word message of user original content UGC is carried out participle, some optional vocabulary can be obtained,
These vocabulary can use as label, but the optional label that participle obtains can be the most, it would be desirable to
Select a portion as label;Therefore each vocabulary can be distinguished, specifically in the way of using weighted value
How to determine the weighted value of vocabulary, empirical value can be used to determine, it is also possible to determine based on statistical magnitude,
This embodiment of the present invention is not made uniqueness limit.
" first " and " second " is the label in order to distinguish type in embodiments of the present invention, should not
It is interpreted as that there is the implication that other technologies limit.Wherein the first label is the label that content issuer provides,
Second label is the label using the embodiment of the present invention to carry out tag extraction acquisition.
The embodiment of the present invention, by extracting user original content UGC, thus extracts and obtains label,
So can the special input label of user, UGC may come from numerous user, enters based on weighted value
Row subregion, tag extraction is automatically performed;Therefore, it can enrich label substance, and make label the most polynary
Change and accurate.
Further, due to UGC wide material sources, such as: barrage function may have a lot of people sending out
Barrage, this is it would appear that more word, and these words there may be what the meaning of one's words repeated, it is also possible to occurs
Some cannot function as the invalid word of label, and the embodiment of the present invention can be removed these words and improve label further
The accuracy extracted, specific as follows: above-mentioned processor 703, it is additionally operable to computationally state in optional label each
Before the weighted value of vocabulary, select noun and/or the vocabulary of noun phrase in above-mentioned optional label, and remove
The vocabulary and the invalid vocabulary that repeat the meaning of one's words obtain remaining vocabulary;
The weighted value of each vocabulary in the above-mentioned optional label of above-mentioned calculating, according to each vocabulary in above-mentioned optional label
Weighted value from high to low from above-mentioned optional label select vocabulary as candidate word;Including:
Above-mentioned residue vocabulary is carried out weight calculation and obtains the first weighted value of each vocabulary in above-mentioned residue vocabulary,
And select vocabulary as candidate word from high to low from above-mentioned residue vocabulary according to above-mentioned first weighted value.
In general noun can use as label, and verb, measure word etc. are general unsuitable for conduct
Label, it is possible to noun and noun phrase are extracted, then removes the vocabulary of the repetition meaning of one's words.Weight
The multiple meaning of one's words, is to reduce the vocabulary of close implication as label, causing the label duplicated;This
In bright embodiment, " first " is additionally operable to distinguish two different weighted values, wherein the first power with " second "
Weight values is the weighted value of residue vocabulary, and the second weighted value is the weighted value of candidate word;It has been not construed as
Other technologies implication.Invalid vocabulary, can be the sensitive vocabulary being forbidden to occur in label legally, it is possible to
To be nonsensical vocabulary itself;Can be removed by the form of lexicon.It addition, title and noun
Phrase, it is also possible to set up effective lexicon, title here and noun phrase are when needing in effective lexicon
The vocabulary in face.
Further, UGC being carried out tag extraction, directivity is relatively low, and the tag orientation that may extract has
Deviation, in order to reduce this deviation, the embodiment of the present invention additionally provides following solution: above-mentioned process
Device 703, is additionally operable to selecting vocabulary to make from high to low from above-mentioned residue vocabulary according to above-mentioned first weighted value
Before candidate word, obtain the first label that foregoing publisher provides;Calculate above-mentioned candidate word with upper
The degree of association stating the first label obtains the second weighted value;
Selecting vocabulary as candidate word from high to low from above-mentioned residue vocabulary according to above-mentioned first weighted value
Afterwards, select vocabulary as the second label from high to low from above-mentioned candidate word according to the second weighted value;Or
Person, calculates above-mentioned first weighted value and the comprehensive weight of the second weighted value, according to above-mentioned comprehensive weight from height
Select vocabulary as the second label from above-mentioned candidate word to low.
The present embodiment, the power of candidate word assessed by the first label be given by content providers as direction
Weight, so that the thinking of content issuer pressed close on direction by the label extracted, abundant content issuer
Label, and embody other users evaluation to Internet resources.In the present embodiment, the second label is permissible
Only with reference to the second weight, use comprehensive weight can be at the base taking into account the tag orientation that content providers provides
Balancing the result extracted based on UGC automated tag on plinth, the situation reducing label one-sidedness occurs.
Alternatively, above-mentioned processor 703, obtain above-mentioned surplus for above-mentioned residue vocabulary being carried out weight calculation
In remaining vocabulary, the weighted value of each vocabulary includes: add up in above-mentioned residue vocabulary each vocabulary in above-mentioned UGC
Occurrence number, and determine the weighted value corresponding with each vocabulary occurrence number in above-mentioned UGC.
Weighted value calculation can have a variety of, using occurrence number as statistics knot in the embodiment of the present invention
Fruit determines weighted value, relatively simple and can embody the most users evaluation to Internet resources, meets mark
The requirement signed, enables the label extracted to reflect the evaluation of user.
Alternatively, above-mentioned processor 703, include for above-mentioned UGC is carried out participle:
Obtain the sentence of above-mentioned UGC, above-mentioned sentence is grown most coupling and the most anti-
To the longest coupling, take the less result of participle amount as word segmentation result, take when participle amount is identical above-mentioned instead
To the result of the longest coupling as word segmentation result.
In the present embodiment, the amount of calculation of participle can increase along with the quantity of UGC, for be likely to occur
Magnanimity UGC, this step can use distributed arithmetic to improve calculating speed.Select at the present embodiment
Amount of calculation is relatively small and is more suitable for the algorithm of tag extraction, to improve participle efficiency and to obtain relatively
For word segmentation result accurately.
It should be noted that in the said equipment embodiment, included unit is simply patrolled according to function
Volume carry out dividing, but be not limited to above-mentioned division, as long as being capable of corresponding function;
It addition, the specific name of each functional unit is also only to facilitate mutually distinguish, it is not limited to this
Bright protection domain.
It addition, one of ordinary skill in the art will appreciate that realize whole in above-mentioned each method embodiment or
Part steps can be by program and completes to instruct relevant hardware, and corresponding program can be stored in one
In kind of computer-readable recording medium, storage medium mentioned above can be read only memory, disk or
CD etc..
These are only the present invention preferably detailed description of the invention, but protection scope of the present invention is not limited to
This, any those familiar with the art, can in the technical scope that the embodiment of the present invention discloses
The change readily occurred in or replacement, all should contain within protection scope of the present invention.Therefore, the present invention
Protection domain should be as the criterion with scope of the claims.
Claims (10)
1. a tag extraction method, it is characterised in that including:
The user original content UGC that the content that obtaining provides with content issuer is associated, to described UGC
The vocabulary that participle carries out participle, obtain is as optional label;
Calculate the weighted value of each vocabulary in described optional label, according to the power of each vocabulary in described optional label
Weight values selects vocabulary as candidate word from high to low from described optional label;
Using described candidate word as the second label.
Method the most according to claim 1, it is characterised in that each word in calculating described optional label
Before the weighted value converged, described method also includes:
Select noun and/or the vocabulary of noun phrase in described optional label, and remove the vocabulary of the repetition meaning of one's words
And invalid vocabulary obtains remaining vocabulary;
The weighted value of each vocabulary in the described optional label of described calculating, according to each vocabulary in described optional label
Weighted value from high to low from described optional label select vocabulary as candidate word;Including:
Described residue vocabulary is carried out weight calculation and obtains the first weighted value of each vocabulary in described residue vocabulary,
And select vocabulary as candidate word from high to low from described residue vocabulary according to described first weighted value.
Method the most according to claim 2, it is characterised in that according to described first weighted value from height
To low from described residue vocabulary select vocabulary as candidate word before, described method also includes:
Obtain the first label that described content issuer provides;Calculate described candidate word and described first label
The degree of association obtain the second weighted value;
Selecting vocabulary as candidate word from high to low from described residue vocabulary according to described first weighted value
Afterwards, described method also includes:
Select vocabulary as the second label from high to low from described candidate word according to the second weighted value;Or,
Calculate described first weighted value and the comprehensive weight of the second weighted value, according to described comprehensive weight from high to low
Select vocabulary as the second label from described candidate word.
4. according to method described in Claims 2 or 3, it is characterised in that described described residue vocabulary is entered
Row weight calculation obtains the weighted value of each vocabulary in described residue vocabulary and includes:
Add up each vocabulary occurrence number in described UGC in described residue vocabulary, and determine and each vocabulary
The weighted value that occurrence number in described UGC is corresponding.
5. according to method described in claims 1 to 3 any one, it is characterised in that described to described
UGC carries out participle and includes:
Obtain the sentence of described UGC, described sentence is grown most coupling and the most anti-
To the longest coupling, take the less result of participle amount as word segmentation result, take when participle amount is identical described instead
To the result of the longest coupling as word segmentation result.
6. a tag extraction equipment, it is characterised in that including:
Contents acquiring unit, the user for obtaining with the content of content issuer offer is associated is original interior
Hold UGC;
Bilingual lexicon acquisition unit, the vocabulary that participle is used for that described UGC is carried out participle, obtain is as optional
Label;
Weight calculation unit, for calculating the weighted value of each vocabulary in described optional label;
Lexical choice unit, is used for according to the weighted value of each vocabulary in described optional label from high to low from institute
State and optional label selects vocabulary as candidate word;
Tag determination unit, is used for described candidate word as the second label.
Tag extraction equipment the most according to claim 6, it is characterised in that described tag extraction equipment
Also include:
Vocabulary screening unit, specifically for noun and/or the vocabulary of noun phrase in the described optional label of selection,
And remove the vocabulary of the repetition meaning of one's words and invalid vocabulary obtains remaining vocabulary;
Described weight calculation unit, obtains described surplus specifically for described residue vocabulary carries out weight calculation
First weighted value of each vocabulary in remaining vocabulary;
Described lexical choice unit, specifically for according to described first weighted value from high to low from described residue
Vocabulary select vocabulary as candidate word.
Tag extraction equipment the most according to claim 7, it is characterised in that described tag extraction equipment
Also include:
Label acquiring unit, for obtaining the first label that described content issuer provides;
Described weight calculation unit, the degree of association being additionally operable to calculate described candidate word and described first label obtains
To the second weighted value;Or, the degree of association calculating described candidate word and described first label obtains the second power
Weight values, then calculates described first weighted value and the comprehensive weight of the second weighted value;
Described tag determination unit, specifically for foundation the second weighted value from high to low from described candidate word
Select vocabulary as the second label according to described comprehensive weight from high to low from described candidate word select vocabulary
As the second label.
9. according to tag extraction equipment described in claim 7 or 8, it is characterised in that
Described weight calculation unit, specifically for each vocabulary in the described residue vocabulary of statistics in described UGC
Occurrence number, and determine the weighted value corresponding with each vocabulary occurrence number in described UGC.
10. according to tag extraction equipment described in claim 7 to 8 any one, it is characterised in that
Described bilingual lexicon acquisition unit, specifically for obtain described UGC sentence, by described sentence from a left side to
Coupling and reverse the longest coupling from right to left are grown most in the right side, take the less result of participle amount and tie as participle
Really, the result of described reversely the longest coupling is taken when participle amount is identical as word segmentation result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610186950.7A CN105893478B (en) | 2016-03-29 | 2016-03-29 | A kind of tag extraction method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610186950.7A CN105893478B (en) | 2016-03-29 | 2016-03-29 | A kind of tag extraction method and apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105893478A true CN105893478A (en) | 2016-08-24 |
CN105893478B CN105893478B (en) | 2019-10-29 |
Family
ID=57013945
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610186950.7A Active CN105893478B (en) | 2016-03-29 | 2016-03-29 | A kind of tag extraction method and apparatus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105893478B (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106951494A (en) * | 2017-03-14 | 2017-07-14 | 腾讯科技(深圳)有限公司 | A kind of information recommendation method and device |
CN106960033A (en) * | 2017-03-22 | 2017-07-18 | 广州优视网络科技有限公司 | A kind of method and apparatus that label is marked to information flow |
CN106980667A (en) * | 2017-03-22 | 2017-07-25 | 广州优视网络科技有限公司 | A kind of method and apparatus that label is marked to article |
CN107169011A (en) * | 2017-03-31 | 2017-09-15 | 百度在线网络技术(北京)有限公司 | The original recognition methods of webpage based on artificial intelligence, device and storage medium |
CN107484038A (en) * | 2017-08-22 | 2017-12-15 | 北京奇艺世纪科技有限公司 | A kind of generation method of video subject, device and electronic equipment |
CN107566917A (en) * | 2017-09-15 | 2018-01-09 | 维沃移动通信有限公司 | A kind of video marker method and video playback apparatus |
CN107977375A (en) * | 2016-10-25 | 2018-05-01 | 央视国际网络无锡有限公司 | A kind of video tab generation method and device |
CN108228665A (en) * | 2016-12-22 | 2018-06-29 | 阿里巴巴集团控股有限公司 | Determine object tag, the method and device for establishing tab indexes, object search |
CN108280059A (en) * | 2018-01-09 | 2018-07-13 | 武汉斗鱼网络科技有限公司 | Direct broadcasting room content tab extracting method, storage medium, electronic equipment and system |
CN108376164A (en) * | 2018-02-24 | 2018-08-07 | 武汉斗鱼网络科技有限公司 | A kind of methods of exhibiting and device of potentiality main broadcaster |
CN108664585A (en) * | 2018-05-07 | 2018-10-16 | 多盟睿达科技(中国)有限公司 | Word method is selected in a kind of advertisement based on big data |
CN109145291A (en) * | 2018-07-25 | 2019-01-04 | 广州虎牙信息科技有限公司 | A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword |
CN109213841A (en) * | 2017-06-29 | 2019-01-15 | 武汉斗鱼网络科技有限公司 | Theme sample extraction method, storage medium, electronic equipment and system is broadcast live |
CN109522275A (en) * | 2018-11-27 | 2019-03-26 | 掌阅科技股份有限公司 | Label method for digging, electronic equipment and the storage medium of content are produced based on user |
CN109558502A (en) * | 2018-12-18 | 2019-04-02 | 福州大学 | A kind of urban safety data retrieval method of knowledge based map |
CN110245343A (en) * | 2018-03-07 | 2019-09-17 | 优酷网络技术(北京)有限公司 | Barrage analysis method and device |
CN111444687A (en) * | 2020-03-20 | 2020-07-24 | 北京达佳互联信息技术有限公司 | Label generation method and device, server and storage medium |
CN111625620A (en) * | 2019-02-28 | 2020-09-04 | 北京京东尚科信息技术有限公司 | Information processing method and device |
CN112100443A (en) * | 2020-08-03 | 2020-12-18 | 咪咕文化科技有限公司 | Video tag obtaining method and device, electronic equipment and storage medium |
CN112597409A (en) * | 2021-03-04 | 2021-04-02 | 蚂蚁智信(杭州)信息技术有限公司 | Label display method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050135596A1 (en) * | 2000-12-26 | 2005-06-23 | Aspect Communications Corporation | Method and system for providing personalized service over different contact channels |
US20100010982A1 (en) * | 2008-07-09 | 2010-01-14 | Broder Andrei Z | Web content characterization based on semantic folksonomies associated with user generated content |
CN102289523A (en) * | 2011-09-20 | 2011-12-21 | 北京金和软件股份有限公司 | Method for intelligently extracting text labels |
CN103164471A (en) * | 2011-12-15 | 2013-06-19 | 盛乐信息技术(上海)有限公司 | Recommendation method and system of video text labels |
CN104978332A (en) * | 2014-04-04 | 2015-10-14 | 腾讯科技(深圳)有限公司 | UGC label data generating method, UGC label data generating device, relevant method and relevant device |
-
2016
- 2016-03-29 CN CN201610186950.7A patent/CN105893478B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050135596A1 (en) * | 2000-12-26 | 2005-06-23 | Aspect Communications Corporation | Method and system for providing personalized service over different contact channels |
US20100010982A1 (en) * | 2008-07-09 | 2010-01-14 | Broder Andrei Z | Web content characterization based on semantic folksonomies associated with user generated content |
CN102289523A (en) * | 2011-09-20 | 2011-12-21 | 北京金和软件股份有限公司 | Method for intelligently extracting text labels |
CN103164471A (en) * | 2011-12-15 | 2013-06-19 | 盛乐信息技术(上海)有限公司 | Recommendation method and system of video text labels |
CN104978332A (en) * | 2014-04-04 | 2015-10-14 | 腾讯科技(深圳)有限公司 | UGC label data generating method, UGC label data generating device, relevant method and relevant device |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107977375A (en) * | 2016-10-25 | 2018-05-01 | 央视国际网络无锡有限公司 | A kind of video tab generation method and device |
CN108228665A (en) * | 2016-12-22 | 2018-06-29 | 阿里巴巴集团控股有限公司 | Determine object tag, the method and device for establishing tab indexes, object search |
CN106951494A (en) * | 2017-03-14 | 2017-07-14 | 腾讯科技(深圳)有限公司 | A kind of information recommendation method and device |
CN106951494B (en) * | 2017-03-14 | 2022-01-04 | 腾讯科技(深圳)有限公司 | Information recommendation method and device |
CN106960033A (en) * | 2017-03-22 | 2017-07-18 | 广州优视网络科技有限公司 | A kind of method and apparatus that label is marked to information flow |
CN106980667A (en) * | 2017-03-22 | 2017-07-25 | 广州优视网络科技有限公司 | A kind of method and apparatus that label is marked to article |
WO2018171295A1 (en) * | 2017-03-22 | 2018-09-27 | 广州优视网络科技有限公司 | Method and apparatus for tagging article, terminal, and computer readable storage medium |
CN106960033B (en) * | 2017-03-22 | 2021-09-14 | 阿里巴巴(中国)有限公司 | Method and device for labeling information stream |
CN107169011B (en) * | 2017-03-31 | 2021-06-11 | 百度在线网络技术(北京)有限公司 | Webpage originality identification method and device based on artificial intelligence and storage medium |
CN107169011A (en) * | 2017-03-31 | 2017-09-15 | 百度在线网络技术(北京)有限公司 | The original recognition methods of webpage based on artificial intelligence, device and storage medium |
CN109213841B (en) * | 2017-06-29 | 2021-01-01 | 武汉斗鱼网络科技有限公司 | Live broadcast theme sample extraction method, storage medium, electronic device and system |
CN109213841A (en) * | 2017-06-29 | 2019-01-15 | 武汉斗鱼网络科技有限公司 | Theme sample extraction method, storage medium, electronic equipment and system is broadcast live |
CN107484038A (en) * | 2017-08-22 | 2017-12-15 | 北京奇艺世纪科技有限公司 | A kind of generation method of video subject, device and electronic equipment |
CN107566917A (en) * | 2017-09-15 | 2018-01-09 | 维沃移动通信有限公司 | A kind of video marker method and video playback apparatus |
CN108280059A (en) * | 2018-01-09 | 2018-07-13 | 武汉斗鱼网络科技有限公司 | Direct broadcasting room content tab extracting method, storage medium, electronic equipment and system |
WO2019136841A1 (en) * | 2018-01-09 | 2019-07-18 | 武汉斗鱼网络科技有限公司 | Method for extracting content tag of live stream rooms, storage medium, electronic device, and system |
CN108376164B (en) * | 2018-02-24 | 2021-01-01 | 武汉斗鱼网络科技有限公司 | Display method and device of potential anchor |
CN108376164A (en) * | 2018-02-24 | 2018-08-07 | 武汉斗鱼网络科技有限公司 | A kind of methods of exhibiting and device of potentiality main broadcaster |
CN110245343A (en) * | 2018-03-07 | 2019-09-17 | 优酷网络技术(北京)有限公司 | Barrage analysis method and device |
CN108664585A (en) * | 2018-05-07 | 2018-10-16 | 多盟睿达科技(中国)有限公司 | Word method is selected in a kind of advertisement based on big data |
CN109145291A (en) * | 2018-07-25 | 2019-01-04 | 广州虎牙信息科技有限公司 | A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword |
CN109522275A (en) * | 2018-11-27 | 2019-03-26 | 掌阅科技股份有限公司 | Label method for digging, electronic equipment and the storage medium of content are produced based on user |
CN109558502A (en) * | 2018-12-18 | 2019-04-02 | 福州大学 | A kind of urban safety data retrieval method of knowledge based map |
CN109558502B (en) * | 2018-12-18 | 2021-11-30 | 福州大学 | Urban safety data retrieval method based on knowledge graph |
CN111625620A (en) * | 2019-02-28 | 2020-09-04 | 北京京东尚科信息技术有限公司 | Information processing method and device |
CN111444687A (en) * | 2020-03-20 | 2020-07-24 | 北京达佳互联信息技术有限公司 | Label generation method and device, server and storage medium |
CN112100443A (en) * | 2020-08-03 | 2020-12-18 | 咪咕文化科技有限公司 | Video tag obtaining method and device, electronic equipment and storage medium |
CN112100443B (en) * | 2020-08-03 | 2024-06-04 | 咪咕文化科技有限公司 | Video tag acquisition method and device, electronic equipment and storage medium |
CN112597409A (en) * | 2021-03-04 | 2021-04-02 | 蚂蚁智信(杭州)信息技术有限公司 | Label display method and device |
Also Published As
Publication number | Publication date |
---|---|
CN105893478B (en) | 2019-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105893478A (en) | Tag extraction method and equipment | |
CN107301170B (en) | Method and device for segmenting sentences based on artificial intelligence | |
CN112270196B (en) | Entity relationship identification method and device and electronic equipment | |
CN110162593A (en) | A kind of processing of search result, similarity model training method and device | |
CN110427463A (en) | Search statement response method, device and server and storage medium | |
CN104881458B (en) | A kind of mask method and device of Web page subject | |
TWI554896B (en) | Information Classification Method and Information Classification System Based on Product Identification | |
CN110442718A (en) | Sentence processing method, device and server and storage medium | |
CN106997341B (en) | A kind of innovation scheme matching process, device, server and system | |
CN110619051B (en) | Question sentence classification method, device, electronic equipment and storage medium | |
CN105979376A (en) | Recommendation method and device | |
CN114254158B (en) | Video generation method and device, and neural network training method and device | |
CN112699645B (en) | Corpus labeling method, apparatus and device | |
CN111309916B (en) | Digest extracting method and apparatus, storage medium, and electronic apparatus | |
CN110287314A (en) | Long text credibility evaluation method and system based on Unsupervised clustering | |
CN113032673A (en) | Resource acquisition method and device, computer equipment and storage medium | |
CN113779381A (en) | Resource recommendation method and device, electronic equipment and storage medium | |
CN112613321A (en) | Method and system for extracting entity attribute information in text | |
CN108875743A (en) | A kind of text recognition method and device | |
CN115099239B (en) | Resource identification method, device, equipment and storage medium | |
CN112231554A (en) | Search recommendation word generation method and device, storage medium and computer equipment | |
CN116956896A (en) | Text analysis method, system, electronic equipment and medium based on artificial intelligence | |
CN115438141B (en) | Information retrieval method based on knowledge graph model | |
CN107423307A (en) | The distribution method and device of a kind of internet information resource | |
CN110209765A (en) | A kind of method and apparatus by semantic search key |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210111 Address after: 511442 3108, 79 Wanbo 2nd Road, Nancun Town, Panyu District, Guangzhou City, Guangdong Province Patentee after: GUANGZHOU CUBESILI INFORMATION TECHNOLOGY Co.,Ltd. Address before: 511449 29 / F, building B-1, Wanda Plaza, Wanbo business district, Nancun Town, Panyu District, Guangzhou City, Guangdong Province Patentee before: GUANGZHOU HUADUO NETWORK TECHNOLOGY Co.,Ltd. |