CN105893478A - Tag extraction method and equipment - Google Patents

Tag extraction method and equipment Download PDF

Info

Publication number
CN105893478A
CN105893478A CN201610186950.7A CN201610186950A CN105893478A CN 105893478 A CN105893478 A CN 105893478A CN 201610186950 A CN201610186950 A CN 201610186950A CN 105893478 A CN105893478 A CN 105893478A
Authority
CN
China
Prior art keywords
vocabulary
label
weighted value
ugc
candidate word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610186950.7A
Other languages
Chinese (zh)
Other versions
CN105893478B (en
Inventor
许志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Cubesili Information Technology Co Ltd
Original Assignee
Guangzhou Huaduo Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huaduo Network Technology Co Ltd filed Critical Guangzhou Huaduo Network Technology Co Ltd
Priority to CN201610186950.7A priority Critical patent/CN105893478B/en
Publication of CN105893478A publication Critical patent/CN105893478A/en
Application granted granted Critical
Publication of CN105893478B publication Critical patent/CN105893478B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention discloses a tag extraction method and equipment, wherein the method is realized by the following steps of obtaining a UGC (User Generated Content) associated with a content provided by a content issuer; performing word segmentation on the UGC; using words obtained after the word segmentation as optional tags; calculating the weight values of each word in the optional tags; selecting the words from the optional tags in a sequence from great weight values to small weight values of each word in the optional tag to be used as candidate words; and using the candidate words as a second tag. The tag is obtained through extraction by extracting the UGC, so that the special input of the tag by the user is not needed; the UGC can be from various users; the partition can be performed on the basis of weight values; the tag extraction can be automatically completed; therefore the tag contents can be enriched; and the tags can be diversified and accurate.

Description

A kind of tag extraction method and apparatus
Technical field
The present invention relates to communication technical field, particularly to a kind of tag extraction method and apparatus.
Background technology
In this application, label refers to user's evaluation label to content publisher.
As a example by video website, the label of the video in present each big video website is substantially all to be sent out by video Cloth person or web editor are stamped, unavoidably can be with subjectivity and one-sidedness.That is tag extraction is Realize by the way of receiving video distribution person or web editor.
For the deficiency improving video distribution person and web editor labels, user is allowed to participate in labelling and can compare Preferably;Then can improve the deficiency only labelled by author, program tag extraction is by receiving user Mode realize.
But user is relatively low to the participation that labels, cause user to content provider or content provider The label of the content provided is less, is even difficult to obtain.
Summary of the invention
Embodiments provide a kind of tag extraction method and apparatus, can for extract that user provides Select label, enrich label substance, and make label more diversification with accurate.
On the one hand a kind of tag extraction method is embodiments provided, including:
The user original content UGC that the content that obtaining provides with content issuer is associated, to described UGC The vocabulary that participle carries out participle, obtain is as optional label;
Calculate the weighted value of each vocabulary in described optional label, according to the power of each vocabulary in described optional label Weight values selects vocabulary as candidate word from high to low from described optional label;
Using described candidate word as the second label.
In a possible implementation, in calculating described optional label before the weighted value of each vocabulary, Described method also includes:
Select noun and/or the vocabulary of noun phrase in described optional label, and remove the vocabulary of the repetition meaning of one's words And invalid vocabulary obtains remaining vocabulary;
The weighted value of each vocabulary in the described optional label of described calculating, according to each vocabulary in described optional label Weighted value from high to low from described optional label select vocabulary as candidate word;Including:
Described residue vocabulary is carried out weight calculation and obtains the first weighted value of each vocabulary in described residue vocabulary, And select vocabulary as candidate word from high to low from described residue vocabulary according to described first weighted value.
In a possible implementation, according to described first weighted value from high to low from described residue Before selecting vocabulary as candidate word in vocabulary, described method also includes:
Obtain the first label that described content issuer provides;Calculate described candidate word and described first label The degree of association obtain the second weighted value;
Selecting vocabulary as candidate word from high to low from described residue vocabulary according to described first weighted value Afterwards, described method also includes:
Select vocabulary as the second label from high to low from described candidate word according to the second weighted value;Or, Calculate described first weighted value and the comprehensive weight of the second weighted value, according to described comprehensive weight from high to low Select vocabulary as the second label from described candidate word.
In a possible implementation, the described weight calculation that carries out described residue vocabulary obtains described In residue vocabulary, the weighted value of each vocabulary includes:
Add up each vocabulary occurrence number in described UGC in described residue vocabulary, and determine and each vocabulary The weighted value that occurrence number in described UGC is corresponding.
In a possible implementation, described described UGC carried out participle include:
Obtain the sentence of described UGC, described sentence is grown most coupling and the most anti- To the longest coupling, take the less result of participle amount as word segmentation result, take when participle amount is identical described instead To the result of the longest coupling as word segmentation result.
The two aspect embodiment of the present invention additionally provide a kind of tag extraction equipment, including:
Contents acquiring unit, the user for obtaining with the content of content issuer offer is associated is original interior Hold UGC;
Bilingual lexicon acquisition unit, the vocabulary that participle is used for that described UGC is carried out participle, obtain is as optional Label;
Weight calculation unit, for calculating the weighted value of each vocabulary in described optional label;
Lexical choice unit, is used for according to the weighted value of each vocabulary in described optional label from high to low from institute State and optional label selects vocabulary as candidate word;
Tag determination unit, is used for described candidate word as the second label.
In a possible implementation, described tag extraction equipment also includes:
Vocabulary screening unit, specifically for noun and/or the vocabulary of noun phrase in the described optional label of selection, And remove the vocabulary of the repetition meaning of one's words and invalid vocabulary obtains remaining vocabulary;
Described weight calculation unit, obtains described surplus specifically for described residue vocabulary carries out weight calculation First weighted value of each vocabulary in remaining vocabulary;
Described lexical choice unit, specifically for according to described first weighted value from high to low from described residue Vocabulary select vocabulary as candidate word.
In a possible implementation, described tag extraction equipment also includes:
Label acquiring unit, for obtaining the first label that described content issuer provides;
Described weight calculation unit, the degree of association being additionally operable to calculate described candidate word and described first label obtains To the second weighted value;Or, the degree of association calculating described candidate word and described first label obtains the second power Weight values, then calculates described first weighted value and the comprehensive weight of the second weighted value;
Described tag determination unit, specifically for foundation the second weighted value from high to low from described candidate word Select vocabulary as the second label according to described comprehensive weight from high to low from described candidate word select vocabulary As the second label.
In a possible implementation, described weight calculation unit, specifically for adding up described residue Each vocabulary occurrence number in described UGC in vocabulary, and determine with each vocabulary in described UGC The weighted value that occurrence number is corresponding.
In a possible implementation, described bilingual lexicon acquisition unit, specifically for obtaining described UGC Sentence, described sentence is grown most coupling and reverse the longest coupling from right to left, takes point The less result of word amount, as word segmentation result, takes the result of described reversely the longest coupling when participle amount is identical As word segmentation result.
As can be seen from the above technical solutions, the embodiment of the present invention has the advantage that by former to user Wound content UGC extract, thus extract obtain label, so can the special input label of user, UGC may come from numerous user, carries out subregion based on weighted value, and tag extraction is automatically performed;Therefore, Label substance can be enriched, and make label more diversification with accurate.
Accompanying drawing explanation
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, in embodiment being described below The required accompanying drawing used is briefly introduced, it should be apparent that, the accompanying drawing in describing below is only this Some bright embodiments, from the point of view of those of ordinary skill in the art, are not paying creative work On the premise of, it is also possible to other accompanying drawing is obtained according to these accompanying drawings.
Fig. 1 is embodiment of the present invention method flow schematic diagram;
Fig. 2 is embodiment of the present invention method flow schematic diagram;
Fig. 3 is embodiment of the present invention device structure schematic diagram;
Fig. 4 is embodiment of the present invention device structure schematic diagram;
Fig. 5 is embodiment of the present invention device structure schematic diagram;
Fig. 6 is embodiment of the present invention server architecture schematic diagram;
Fig. 7 is embodiment of the present invention device structure schematic diagram.
Detailed description of the invention
In order to make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to this Invention is described in further detail, it is clear that described embodiment is only that some of the present invention is implemented Example rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art exist Do not make all other embodiments obtained under creative work premise, broadly fall into present invention protection Scope.
Embodiments provide a kind of tag extraction method, as it is shown in figure 1, include:
101: the user original content UGC that the content that obtaining provides with content issuer is associated, to above-mentioned The vocabulary that UGC carries out participle, obtained by participle is as optional label;
At communication technical field, content issuer is the publisher of Internet resources, such as: net cast Publisher;User refers to that the user of Internet resources, user original content UGC are that video is delivered by user Suggestion, can be generally Word message, such as: barrage or comment etc..In theory if audio frequency, Need to do speech recognition, it is also possible to realizing, data processing amount can be bigger.
The Word message of user original content UGC is carried out participle, and concrete segmentation methods is referred to Through more ripe segmentation methods, the present invention implements that this does not make uniqueness and limits.
102: calculate the weighted value of each vocabulary in above-mentioned optional label, according to each vocabulary in above-mentioned optional label Weighted value from high to low from above-mentioned optional label select vocabulary as candidate word;
After the Word message of user original content UGC is carried out participle, some optional vocabulary can be obtained, These vocabulary can use as label, but the optional label that participle obtains can be the most, it would be desirable to Select a portion as label;Therefore each vocabulary can be distinguished, specifically in the way of using weighted value How to determine the weighted value of vocabulary, empirical value can be used to determine, it is also possible to determine based on statistical magnitude, This embodiment of the present invention is not made uniqueness limit.
103: using above-mentioned candidate word as the second label.
" first " and " second " is the label in order to distinguish type in embodiments of the present invention, should not It is interpreted as that there is the implication that other technologies limit.Wherein the first label is the label that content issuer provides, Second label is the label using the embodiment of the present invention to carry out tag extraction acquisition.
The embodiment of the present invention, by extracting user original content UGC, thus extracts and obtains label, So can the special input label of user, UGC may come from numerous user, enters based on weighted value Row subregion, tag extraction is automatically performed;Therefore, it can enrich label substance, and make label the most polynary Change and accurate.
Further, due to UGC wide material sources, such as: barrage function may have a lot of people sending out Barrage, this is it would appear that more word, and these words there may be what the meaning of one's words repeated, it is also possible to occurs Some cannot function as the invalid word of label, and the embodiment of the present invention can be removed these words and improve label further The accuracy extracted, specific as follows: computationally to state in optional label before the weighted value of each vocabulary, on Method of stating also includes:
Select noun and/or the vocabulary of noun phrase in above-mentioned optional label, and remove the vocabulary of the repetition meaning of one's words And invalid vocabulary obtains remaining vocabulary;
The weighted value of each vocabulary in the above-mentioned optional label of above-mentioned calculating, according to each vocabulary in above-mentioned optional label Weighted value from high to low from above-mentioned optional label select vocabulary as candidate word;Including:
Above-mentioned residue vocabulary is carried out weight calculation and obtains the first weighted value of each vocabulary in above-mentioned residue vocabulary, And select vocabulary as candidate word from high to low from above-mentioned residue vocabulary according to above-mentioned first weighted value.
In general noun can use as label, and verb, measure word etc. are general unsuitable for conduct Label, it is possible to noun and noun phrase are extracted, then removes the vocabulary of the repetition meaning of one's words.Weight The multiple meaning of one's words, is to reduce the vocabulary of close implication as label, causing the label duplicated;This In bright embodiment, " first " is additionally operable to distinguish two different weighted values, wherein the first power with " second " Weight values is the weighted value of residue vocabulary, and the second weighted value is the weighted value of candidate word;It has been not construed as Other technologies implication.Invalid vocabulary, can be the sensitive vocabulary being forbidden to occur in label legally, it is possible to To be nonsensical vocabulary itself;Can be removed by the form of lexicon.It addition, title and noun Phrase, it is also possible to set up effective lexicon, title here and noun phrase are when needing in effective lexicon The vocabulary in face.
Further, UGC being carried out tag extraction, directivity is relatively low, and the tag orientation that may extract has Deviation, in order to reduce this deviation, the embodiment of the present invention additionally provides following solution: in foundation State the first weighted value from high to low from above-mentioned residue vocabulary select vocabulary as candidate word before, above-mentioned side Method also includes:
Obtain the first label that foregoing publisher provides;Calculate above-mentioned candidate word and above-mentioned first label The degree of association obtain the second weighted value;
Selecting vocabulary as candidate word from high to low from above-mentioned residue vocabulary according to above-mentioned first weighted value Afterwards, said method also includes:
Select vocabulary as the second label from high to low from above-mentioned candidate word according to the second weighted value;Or, Calculate above-mentioned first weighted value and the comprehensive weight of the second weighted value, according to above-mentioned comprehensive weight from high to low Select vocabulary as the second label from above-mentioned candidate word.
The present embodiment, the power of candidate word assessed by the first label be given by content providers as direction Weight, so that the thinking of content issuer pressed close on direction by the label extracted, abundant content issuer Label, and embody other users evaluation to Internet resources.In the present embodiment, the second label is permissible Only with reference to the second weight, use comprehensive weight can be at the base taking into account the tag orientation that content providers provides Balancing the result extracted based on UGC automated tag on plinth, the situation reducing label one-sidedness occurs.
Alternatively, above-mentioned above-mentioned residue vocabulary is carried out weight calculation obtain each vocabulary in above-mentioned residue vocabulary Weighted value include:
Add up each vocabulary occurrence number in above-mentioned UGC in above-mentioned residue vocabulary, and determine and each vocabulary The weighted value that occurrence number in above-mentioned UGC is corresponding.
Weighted value calculation can have a variety of, using occurrence number as statistics knot in the embodiment of the present invention Fruit determines weighted value, relatively simple and can embody the most users evaluation to Internet resources, meets mark The requirement signed, enables the label extracted to reflect the evaluation of user.
Alternatively, above-mentioned above-mentioned UGC carried out participle include:
Obtain the sentence of above-mentioned UGC, above-mentioned sentence is grown most coupling and the most anti- To the longest coupling, take the less result of participle amount as word segmentation result, take when participle amount is identical above-mentioned instead To the result of the longest coupling as word segmentation result.
In the present embodiment, the amount of calculation of participle can increase along with the quantity of UGC, for be likely to occur Magnanimity UGC, this step can use distributed arithmetic to improve calculating speed.Select at the present embodiment Amount of calculation is relatively small and is more suitable for the algorithm of tag extraction, to improve participle efficiency and to obtain relatively For word segmentation result accurately.
The embodiment of the present invention, mainly according to the UGC content that video is relevant, such as: comment and barrage, is carried out Data mining obtains some valuable labels, as video tab.On the one hand can make up in product side Video distribution side unilaterally labels with subjective and unilateral deficiency, on the other hand to user's unaware, Without threshold, evade the enthusiasm problem how guiding and providing user to label well.
As in figure 2 it is shown, be the main body frame of the embodiment of the present invention;Including:
201: obtain publisher's label;
This step is to obtain the label that publisher beats.
202: go out the user tag of Weight according to UGC content mining;
User tag is that the UGC of viewing side's offer of Internet resources excavates the label obtained.
203: publisher's label weights with user tag similarity;
204: screening denoising, export result.
Screening denoising is to screen the label obtained.
Above step, is embodied in subsequent embodiment and describes in detail respectively.
One, publisher's label:
First, video author (or website editor) is allowed to label to video.
It should be noted that embodiment of the present invention scheme can not only label according to user's UGC content, It is also based on the basis of video author labels, introducing user tag, making label more polynary and accurate. Allow author label in advance and be used as the candidate factors of final label, be to make to excavate the mark obtained subsequently Sign also with certain theme tendency, so can better ensure that the quality of label, and make it symbol Close the planning of video website self.
Two, the user tag of Weight is gone out according to UGC content mining.
1, UGC content is carried out participle:
The UGC contents such as the comment of video and barrage are carried out participle, and the algorithm of Chinese word segmentation can select one The participle development library increased income a bit is to complete participle function, such as Sfanford, IKAnalyzer, Word etc.. Can consider to select the minimum segmentation methods of forward and reverse maximum match in terms of segmentation methods, it may be assumed that from sentence The combination algorithm of the longest left-to-right coupling and from right to left reverse the longest coupling, and take that participle amount is minimum one Individual result, negate when participle number is identical to segmenting method.It is pointed out that Chinese word segmentation is often Computationally intensive, it may be necessary to consider to use Spark cluster etc. to carry out distributed arithmetic to improve calculating speed.
2, key word is extracted:
Keyword extraction can be divided into three below step:
A) part-of-speech tagging and selection:
In view of label mostly based on noun, therefore can only extract noun therein or noun phrase As candidate word.Participle instrument is from part-of-speech tagging function at present, therefore can be used directly to carry Take out all of noun.
B) invalid word filters:
Invalid word filters and refers in the set of candidate word, weeds out some and labels video and have little significance Word, such as: unhealthy word, sensitive word etc..Invalid word filters can be according to invalid word list check and correction Mode realizes, and invalid word list can be set up and " the screening denoising " of each label generation process by artificial Step carries out being continuously replenished upgrading.
C) meaning of one's words duplicate removal:
Owing to the label of the identical meaning of one's words is the most useful, it is possible to the candidate word obtained after participle, carry out language Meaning duplicate removal.Meaning of one's words duplicate removal can be realized by the method for near-synonym, it is not necessary to is concerned about near synonym identification Algorithm, it is only necessary to the Chinese near synonym storehouse ready-made with some carries out processing.Belonging near synonym Word is marked classification, then will belong to the word of a class, replaces with occurrence number in classification most One.
3, weight calculation:
After keyword extraction is complete, key word is carried out weight calculation, finally extract weighted value forward Some key words, as candidate word.The weight calculation of key word has many conventional algorithms, such as tf-idf. Owing to video tab is classified or not quite alike with data, a video can play the label of multiple kind, The most here we can calculate weight only by statistics tf, i.e. word frequency, the namely weight of key word Can be determined by the occurrence number calculating key word, ratio is in lists of keywords, and word A has 10, word B has 2, then the weight coefficient of word A is 10, and the weight coefficient of word B is 2, and this weight suspense is x.
3.1: publisher's label weights with the similarity of user tag:
By above-mentioned step, we have obtained author's label and user tag.If author's label and use Have some same or like labels between the label of family, then illustrate on this point, the judgement of author and The judgement of user matches, and then it is believed that this label possesses accuracy more higher than other labels, Higher weight should be given.We can be according to the dictionary definition of author's label and the dictionary of user tag Lexical or textual analysis, as language material, does Similarity measures, obtains the similarity weight y of each label, further according to user Word frequency weight x of label, obtains final weight w by certain ratiometric conversion.
3.2: screening denoising:
Some author's labels with similarity gain weight, and band is had been obtained for based on aforementioned processing There is the user tag of weight.For further ensuring that the quality of label and making label meet the planning of website, After can by use artificial in the way of from these candidate's labels, select final label.Can in screening process To doing suitable meaning of one's words duplicate removal between author and user tag, can consider to add for some invalid labels Enter to " invalid word list ".The principle of label filtration is preferentially to select the higher content of similarity gain to provide Side's label, and the user tag that weight is higher, because these labels possess the highest accuracy.And some Vision unique, fresh and be no lack of representational label can also be selected so that label more diversification.
The beneficial effect that embodiment of the present invention technical scheme is brought:
The present invention traditional labelled by video author (publisher or editor) on the basis of, add The unit of user tag usually makes label more accurately with polynary.By the way of weight is measured, make the standard of label Really property is quantified, and expands so that label more horn of plenty is polynary by adding user tag.
The embodiment of the present invention additionally provides a kind of tag extraction equipment, as it is shown on figure 3, include:
Contents acquiring unit 301, original for the user obtained with the content of content issuer offer is associated Content UGC;
Bilingual lexicon acquisition unit 302, for above-mentioned UGC is carried out participle, vocabulary that participle is obtained as Optional label;
Weight calculation unit 303, for calculating the weighted value of each vocabulary in above-mentioned optional label;
Lexical choice unit 304, for according to the weighted value of each vocabulary in above-mentioned optional label from high to low from Above-mentioned optional label select vocabulary as candidate word;
Tag determination unit 305, is used for above-mentioned candidate word as the second label.
At communication technical field, content issuer is the publisher of Internet resources, such as: net cast Publisher;User refers to that the user of Internet resources, user original content UGC are that video is delivered by user Suggestion, can be generally Word message, such as: barrage or comment etc..In theory if audio frequency, Need to do speech recognition, it is also possible to realizing, data processing amount can be bigger.
The Word message of user original content UGC is carried out participle, and concrete segmentation methods is referred to Through more ripe segmentation methods, the present invention implements that this does not make uniqueness and limits.
After the Word message of user original content UGC is carried out participle, some optional vocabulary can be obtained, These vocabulary can use as label, but the optional label that participle obtains can be the most, it would be desirable to Select a portion as label;Therefore each vocabulary can be distinguished, specifically in the way of using weighted value How to determine the weighted value of vocabulary, empirical value can be used to determine, it is also possible to determine based on statistical magnitude, This embodiment of the present invention is not made uniqueness limit.
" first " and " second " is the label in order to distinguish type in embodiments of the present invention, should not It is interpreted as that there is the implication that other technologies limit.Wherein the first label is the label that content issuer provides, Second label is the label using the embodiment of the present invention to carry out tag extraction acquisition.
The embodiment of the present invention, by extracting user original content UGC, thus extracts and obtains label, So can the special input label of user, UGC may come from numerous user, enters based on weighted value Row subregion, tag extraction is automatically performed;Therefore, it can enrich label substance, and make label the most polynary Change and accurate.
Further, due to UGC wide material sources, such as: barrage function may have a lot of people sending out Barrage, this is it would appear that more word, and these words there may be what the meaning of one's words repeated, it is also possible to occurs Some cannot function as the invalid word of label, and the embodiment of the present invention can be removed these words and improve label further The accuracy extracted, specific as follows: as shown in Figure 4, above-mentioned tag extraction equipment also includes:
Vocabulary screening unit 401, specifically for selecting noun in above-mentioned optional label and/or noun phrase Vocabulary, and remove the vocabulary of the repetition meaning of one's words and invalid vocabulary obtains remaining vocabulary;
Above-mentioned weight calculation unit 303, obtains above-mentioned specifically for above-mentioned residue vocabulary is carried out weight calculation First weighted value of each vocabulary in residue vocabulary;
Above-mentioned lexical choice unit 304, specifically for remaining from above-mentioned from high to low according to above-mentioned first weighted value Remaining vocabulary select vocabulary as candidate word.
In general noun can use as label, and verb, measure word etc. are general unsuitable for conduct Label, it is possible to noun and noun phrase are extracted, then removes the vocabulary of the repetition meaning of one's words.Weight The multiple meaning of one's words, is to reduce the vocabulary of close implication as label, causing the label duplicated;This In bright embodiment, " first " is additionally operable to distinguish two different weighted values, wherein the first power with " second " Weight values is the weighted value of residue vocabulary, and the second weighted value is the weighted value of candidate word;It has been not construed as Other technologies implication.Invalid vocabulary, can be the sensitive vocabulary being forbidden to occur in label legally, it is possible to To be nonsensical vocabulary itself;Can be removed by the form of lexicon.It addition, title and noun Phrase, it is also possible to set up effective lexicon, title here and noun phrase are when needing in effective lexicon The vocabulary in face.
Further, UGC being carried out tag extraction, directivity is relatively low, and the tag orientation that may extract has Deviation, in order to reduce this deviation, the embodiment of the present invention additionally provides following solution: such as Fig. 5 institute Showing, above-mentioned tag extraction equipment also includes:
Label acquiring unit 501, for obtaining the first label that foregoing publisher provides;
Above-mentioned weight calculation unit 303, is additionally operable to the degree of association calculating above-mentioned candidate word with above-mentioned first label Obtain the second weighted value;Or, the degree of association calculating above-mentioned candidate word and above-mentioned first label obtains second Weighted value, then calculates above-mentioned first weighted value and the comprehensive weight of the second weighted value;
Above-mentioned tag determination unit 305, specifically for foundation the second weighted value from high to low from above-mentioned candidate word Middle selection vocabulary as the second label according to above-mentioned comprehensive weight from high to low from above-mentioned candidate word select word Converge as the second label.
The present embodiment, the power of candidate word assessed by the first label be given by content providers as direction Weight, so that the thinking of content issuer pressed close on direction by the label extracted, abundant content issuer Label, and embody other users evaluation to Internet resources.In the present embodiment, the second label is permissible Only with reference to the second weight, use comprehensive weight can be at the base taking into account the tag orientation that content providers provides Balancing the result extracted based on UGC automated tag on plinth, the situation reducing label one-sidedness occurs.
Alternatively, above-mentioned weight calculation unit 303, exist specifically for adding up each vocabulary in above-mentioned residue vocabulary Occurrence number in above-mentioned UGC, and determine corresponding with each vocabulary occurrence number in above-mentioned UGC Weighted value.
Weighted value calculation can have a variety of, using occurrence number as statistics knot in the embodiment of the present invention Fruit determines weighted value, relatively simple and can embody the most users evaluation to Internet resources, meets mark The requirement signed, enables the label extracted to reflect the evaluation of user.
Alternatively, above-mentioned bilingual lexicon acquisition unit 302, specifically for obtaining the sentence of above-mentioned UGC, by upper State sentence and grow most coupling and reverse the longest coupling from right to left, take the less knot of participle amount Fruit, as word segmentation result, takes the result of above-mentioned reversely the longest coupling as word segmentation result when participle amount is identical.
In the present embodiment, the amount of calculation of participle can increase along with the quantity of UGC, for be likely to occur Magnanimity UGC, this step can use distributed arithmetic to improve calculating speed.Select at the present embodiment Amount of calculation is relatively small and is more suitable for the algorithm of tag extraction, to improve participle efficiency and to obtain relatively For word segmentation result accurately.
The embodiment of the present invention additionally provides a kind of tag extraction equipment, including: Fig. 6 is the embodiment of the present invention The server architecture schematic diagram provided, this server 600 can produce bigger because of configuration or performance difference Difference, one or more central processing units (central processing units, CPU) can be included 622 (such as, one or more processors) and memorizeies 632, one or more storages should With the storage medium 630 (such as one or more mass memory units) of program 642 or data 644. Wherein, memorizer 632 and storage medium 630 can be of short duration storage or persistently store.It is stored in storage The program of medium 630 can include one or more modules (diagram does not marks), and each module is permissible Including to a series of command operatings in server.Further, central processing unit 622 can be arranged For communicating with storage medium 630, server 600 performs a series of instructions in storage medium 630 Operation.
Server 600 can also include one or more power supplys 626, one or more wired or Radio network interface 650, one or more input/output interfaces 658, and/or, one or one with Upper operating system 641, such as Windows Server TM, Mac OS XTM, Unix TM, Linux TM, FreeBSDTM etc..
In above-described embodiment, method step can be based on the server architecture shown in this Fig. 6.
The embodiment of the present invention additionally provides another kind of tag extraction equipment, as it is shown in fig. 7, comprises: receive Equipment 701, transmitting equipment 702, processor 703 and storage device 704;
Wherein processor 703, the user for obtaining with the content of content issuer offer is associated is original interior Holding UGC, the vocabulary that above-mentioned UGC carries out participle, obtain for participle is as optional label;In calculating State the weighted value of each vocabulary in optional label, according to the weighted value of each vocabulary in above-mentioned optional label from height to Low from above-mentioned optional label select vocabulary as candidate word;Using above-mentioned candidate word as the second label.
At communication technical field, content issuer is the publisher of Internet resources, such as: net cast Publisher;User refers to that the user of Internet resources, user original content UGC are that video is delivered by user Suggestion, can be generally Word message, such as: barrage or comment etc..In theory if audio frequency, Need to do speech recognition, it is also possible to realizing, data processing amount can be bigger.
The Word message of user original content UGC is carried out participle, and concrete segmentation methods is referred to Through more ripe segmentation methods, the present invention implements that this does not make uniqueness and limits.
After the Word message of user original content UGC is carried out participle, some optional vocabulary can be obtained, These vocabulary can use as label, but the optional label that participle obtains can be the most, it would be desirable to Select a portion as label;Therefore each vocabulary can be distinguished, specifically in the way of using weighted value How to determine the weighted value of vocabulary, empirical value can be used to determine, it is also possible to determine based on statistical magnitude, This embodiment of the present invention is not made uniqueness limit.
" first " and " second " is the label in order to distinguish type in embodiments of the present invention, should not It is interpreted as that there is the implication that other technologies limit.Wherein the first label is the label that content issuer provides, Second label is the label using the embodiment of the present invention to carry out tag extraction acquisition.
The embodiment of the present invention, by extracting user original content UGC, thus extracts and obtains label, So can the special input label of user, UGC may come from numerous user, enters based on weighted value Row subregion, tag extraction is automatically performed;Therefore, it can enrich label substance, and make label the most polynary Change and accurate.
Further, due to UGC wide material sources, such as: barrage function may have a lot of people sending out Barrage, this is it would appear that more word, and these words there may be what the meaning of one's words repeated, it is also possible to occurs Some cannot function as the invalid word of label, and the embodiment of the present invention can be removed these words and improve label further The accuracy extracted, specific as follows: above-mentioned processor 703, it is additionally operable to computationally state in optional label each Before the weighted value of vocabulary, select noun and/or the vocabulary of noun phrase in above-mentioned optional label, and remove The vocabulary and the invalid vocabulary that repeat the meaning of one's words obtain remaining vocabulary;
The weighted value of each vocabulary in the above-mentioned optional label of above-mentioned calculating, according to each vocabulary in above-mentioned optional label Weighted value from high to low from above-mentioned optional label select vocabulary as candidate word;Including:
Above-mentioned residue vocabulary is carried out weight calculation and obtains the first weighted value of each vocabulary in above-mentioned residue vocabulary, And select vocabulary as candidate word from high to low from above-mentioned residue vocabulary according to above-mentioned first weighted value.
In general noun can use as label, and verb, measure word etc. are general unsuitable for conduct Label, it is possible to noun and noun phrase are extracted, then removes the vocabulary of the repetition meaning of one's words.Weight The multiple meaning of one's words, is to reduce the vocabulary of close implication as label, causing the label duplicated;This In bright embodiment, " first " is additionally operable to distinguish two different weighted values, wherein the first power with " second " Weight values is the weighted value of residue vocabulary, and the second weighted value is the weighted value of candidate word;It has been not construed as Other technologies implication.Invalid vocabulary, can be the sensitive vocabulary being forbidden to occur in label legally, it is possible to To be nonsensical vocabulary itself;Can be removed by the form of lexicon.It addition, title and noun Phrase, it is also possible to set up effective lexicon, title here and noun phrase are when needing in effective lexicon The vocabulary in face.
Further, UGC being carried out tag extraction, directivity is relatively low, and the tag orientation that may extract has Deviation, in order to reduce this deviation, the embodiment of the present invention additionally provides following solution: above-mentioned process Device 703, is additionally operable to selecting vocabulary to make from high to low from above-mentioned residue vocabulary according to above-mentioned first weighted value Before candidate word, obtain the first label that foregoing publisher provides;Calculate above-mentioned candidate word with upper The degree of association stating the first label obtains the second weighted value;
Selecting vocabulary as candidate word from high to low from above-mentioned residue vocabulary according to above-mentioned first weighted value Afterwards, select vocabulary as the second label from high to low from above-mentioned candidate word according to the second weighted value;Or Person, calculates above-mentioned first weighted value and the comprehensive weight of the second weighted value, according to above-mentioned comprehensive weight from height Select vocabulary as the second label from above-mentioned candidate word to low.
The present embodiment, the power of candidate word assessed by the first label be given by content providers as direction Weight, so that the thinking of content issuer pressed close on direction by the label extracted, abundant content issuer Label, and embody other users evaluation to Internet resources.In the present embodiment, the second label is permissible Only with reference to the second weight, use comprehensive weight can be at the base taking into account the tag orientation that content providers provides Balancing the result extracted based on UGC automated tag on plinth, the situation reducing label one-sidedness occurs.
Alternatively, above-mentioned processor 703, obtain above-mentioned surplus for above-mentioned residue vocabulary being carried out weight calculation In remaining vocabulary, the weighted value of each vocabulary includes: add up in above-mentioned residue vocabulary each vocabulary in above-mentioned UGC Occurrence number, and determine the weighted value corresponding with each vocabulary occurrence number in above-mentioned UGC.
Weighted value calculation can have a variety of, using occurrence number as statistics knot in the embodiment of the present invention Fruit determines weighted value, relatively simple and can embody the most users evaluation to Internet resources, meets mark The requirement signed, enables the label extracted to reflect the evaluation of user.
Alternatively, above-mentioned processor 703, include for above-mentioned UGC is carried out participle:
Obtain the sentence of above-mentioned UGC, above-mentioned sentence is grown most coupling and the most anti- To the longest coupling, take the less result of participle amount as word segmentation result, take when participle amount is identical above-mentioned instead To the result of the longest coupling as word segmentation result.
In the present embodiment, the amount of calculation of participle can increase along with the quantity of UGC, for be likely to occur Magnanimity UGC, this step can use distributed arithmetic to improve calculating speed.Select at the present embodiment Amount of calculation is relatively small and is more suitable for the algorithm of tag extraction, to improve participle efficiency and to obtain relatively For word segmentation result accurately.
It should be noted that in the said equipment embodiment, included unit is simply patrolled according to function Volume carry out dividing, but be not limited to above-mentioned division, as long as being capable of corresponding function; It addition, the specific name of each functional unit is also only to facilitate mutually distinguish, it is not limited to this Bright protection domain.
It addition, one of ordinary skill in the art will appreciate that realize whole in above-mentioned each method embodiment or Part steps can be by program and completes to instruct relevant hardware, and corresponding program can be stored in one In kind of computer-readable recording medium, storage medium mentioned above can be read only memory, disk or CD etc..
These are only the present invention preferably detailed description of the invention, but protection scope of the present invention is not limited to This, any those familiar with the art, can in the technical scope that the embodiment of the present invention discloses The change readily occurred in or replacement, all should contain within protection scope of the present invention.Therefore, the present invention Protection domain should be as the criterion with scope of the claims.

Claims (10)

1. a tag extraction method, it is characterised in that including:
The user original content UGC that the content that obtaining provides with content issuer is associated, to described UGC The vocabulary that participle carries out participle, obtain is as optional label;
Calculate the weighted value of each vocabulary in described optional label, according to the power of each vocabulary in described optional label Weight values selects vocabulary as candidate word from high to low from described optional label;
Using described candidate word as the second label.
Method the most according to claim 1, it is characterised in that each word in calculating described optional label Before the weighted value converged, described method also includes:
Select noun and/or the vocabulary of noun phrase in described optional label, and remove the vocabulary of the repetition meaning of one's words And invalid vocabulary obtains remaining vocabulary;
The weighted value of each vocabulary in the described optional label of described calculating, according to each vocabulary in described optional label Weighted value from high to low from described optional label select vocabulary as candidate word;Including:
Described residue vocabulary is carried out weight calculation and obtains the first weighted value of each vocabulary in described residue vocabulary, And select vocabulary as candidate word from high to low from described residue vocabulary according to described first weighted value.
Method the most according to claim 2, it is characterised in that according to described first weighted value from height To low from described residue vocabulary select vocabulary as candidate word before, described method also includes:
Obtain the first label that described content issuer provides;Calculate described candidate word and described first label The degree of association obtain the second weighted value;
Selecting vocabulary as candidate word from high to low from described residue vocabulary according to described first weighted value Afterwards, described method also includes:
Select vocabulary as the second label from high to low from described candidate word according to the second weighted value;Or, Calculate described first weighted value and the comprehensive weight of the second weighted value, according to described comprehensive weight from high to low Select vocabulary as the second label from described candidate word.
4. according to method described in Claims 2 or 3, it is characterised in that described described residue vocabulary is entered Row weight calculation obtains the weighted value of each vocabulary in described residue vocabulary and includes:
Add up each vocabulary occurrence number in described UGC in described residue vocabulary, and determine and each vocabulary The weighted value that occurrence number in described UGC is corresponding.
5. according to method described in claims 1 to 3 any one, it is characterised in that described to described UGC carries out participle and includes:
Obtain the sentence of described UGC, described sentence is grown most coupling and the most anti- To the longest coupling, take the less result of participle amount as word segmentation result, take when participle amount is identical described instead To the result of the longest coupling as word segmentation result.
6. a tag extraction equipment, it is characterised in that including:
Contents acquiring unit, the user for obtaining with the content of content issuer offer is associated is original interior Hold UGC;
Bilingual lexicon acquisition unit, the vocabulary that participle is used for that described UGC is carried out participle, obtain is as optional Label;
Weight calculation unit, for calculating the weighted value of each vocabulary in described optional label;
Lexical choice unit, is used for according to the weighted value of each vocabulary in described optional label from high to low from institute State and optional label selects vocabulary as candidate word;
Tag determination unit, is used for described candidate word as the second label.
Tag extraction equipment the most according to claim 6, it is characterised in that described tag extraction equipment Also include:
Vocabulary screening unit, specifically for noun and/or the vocabulary of noun phrase in the described optional label of selection, And remove the vocabulary of the repetition meaning of one's words and invalid vocabulary obtains remaining vocabulary;
Described weight calculation unit, obtains described surplus specifically for described residue vocabulary carries out weight calculation First weighted value of each vocabulary in remaining vocabulary;
Described lexical choice unit, specifically for according to described first weighted value from high to low from described residue Vocabulary select vocabulary as candidate word.
Tag extraction equipment the most according to claim 7, it is characterised in that described tag extraction equipment Also include:
Label acquiring unit, for obtaining the first label that described content issuer provides;
Described weight calculation unit, the degree of association being additionally operable to calculate described candidate word and described first label obtains To the second weighted value;Or, the degree of association calculating described candidate word and described first label obtains the second power Weight values, then calculates described first weighted value and the comprehensive weight of the second weighted value;
Described tag determination unit, specifically for foundation the second weighted value from high to low from described candidate word Select vocabulary as the second label according to described comprehensive weight from high to low from described candidate word select vocabulary As the second label.
9. according to tag extraction equipment described in claim 7 or 8, it is characterised in that
Described weight calculation unit, specifically for each vocabulary in the described residue vocabulary of statistics in described UGC Occurrence number, and determine the weighted value corresponding with each vocabulary occurrence number in described UGC.
10. according to tag extraction equipment described in claim 7 to 8 any one, it is characterised in that
Described bilingual lexicon acquisition unit, specifically for obtain described UGC sentence, by described sentence from a left side to Coupling and reverse the longest coupling from right to left are grown most in the right side, take the less result of participle amount and tie as participle Really, the result of described reversely the longest coupling is taken when participle amount is identical as word segmentation result.
CN201610186950.7A 2016-03-29 2016-03-29 A kind of tag extraction method and apparatus Active CN105893478B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610186950.7A CN105893478B (en) 2016-03-29 2016-03-29 A kind of tag extraction method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610186950.7A CN105893478B (en) 2016-03-29 2016-03-29 A kind of tag extraction method and apparatus

Publications (2)

Publication Number Publication Date
CN105893478A true CN105893478A (en) 2016-08-24
CN105893478B CN105893478B (en) 2019-10-29

Family

ID=57013945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610186950.7A Active CN105893478B (en) 2016-03-29 2016-03-29 A kind of tag extraction method and apparatus

Country Status (1)

Country Link
CN (1) CN105893478B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951494A (en) * 2017-03-14 2017-07-14 腾讯科技(深圳)有限公司 A kind of information recommendation method and device
CN106960033A (en) * 2017-03-22 2017-07-18 广州优视网络科技有限公司 A kind of method and apparatus that label is marked to information flow
CN106980667A (en) * 2017-03-22 2017-07-25 广州优视网络科技有限公司 A kind of method and apparatus that label is marked to article
CN107169011A (en) * 2017-03-31 2017-09-15 百度在线网络技术(北京)有限公司 The original recognition methods of webpage based on artificial intelligence, device and storage medium
CN107484038A (en) * 2017-08-22 2017-12-15 北京奇艺世纪科技有限公司 A kind of generation method of video subject, device and electronic equipment
CN107566917A (en) * 2017-09-15 2018-01-09 维沃移动通信有限公司 A kind of video marker method and video playback apparatus
CN107977375A (en) * 2016-10-25 2018-05-01 央视国际网络无锡有限公司 A kind of video tab generation method and device
CN108228665A (en) * 2016-12-22 2018-06-29 阿里巴巴集团控股有限公司 Determine object tag, the method and device for establishing tab indexes, object search
CN108280059A (en) * 2018-01-09 2018-07-13 武汉斗鱼网络科技有限公司 Direct broadcasting room content tab extracting method, storage medium, electronic equipment and system
CN108376164A (en) * 2018-02-24 2018-08-07 武汉斗鱼网络科技有限公司 A kind of methods of exhibiting and device of potentiality main broadcaster
CN108664585A (en) * 2018-05-07 2018-10-16 多盟睿达科技(中国)有限公司 Word method is selected in a kind of advertisement based on big data
CN109145291A (en) * 2018-07-25 2019-01-04 广州虎牙信息科技有限公司 A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword
CN109213841A (en) * 2017-06-29 2019-01-15 武汉斗鱼网络科技有限公司 Theme sample extraction method, storage medium, electronic equipment and system is broadcast live
CN109522275A (en) * 2018-11-27 2019-03-26 掌阅科技股份有限公司 Label method for digging, electronic equipment and the storage medium of content are produced based on user
CN109558502A (en) * 2018-12-18 2019-04-02 福州大学 A kind of urban safety data retrieval method of knowledge based map
CN110245343A (en) * 2018-03-07 2019-09-17 优酷网络技术(北京)有限公司 Barrage analysis method and device
CN111444687A (en) * 2020-03-20 2020-07-24 北京达佳互联信息技术有限公司 Label generation method and device, server and storage medium
CN111625620A (en) * 2019-02-28 2020-09-04 北京京东尚科信息技术有限公司 Information processing method and device
CN112100443A (en) * 2020-08-03 2020-12-18 咪咕文化科技有限公司 Video tag obtaining method and device, electronic equipment and storage medium
CN112597409A (en) * 2021-03-04 2021-04-02 蚂蚁智信(杭州)信息技术有限公司 Label display method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050135596A1 (en) * 2000-12-26 2005-06-23 Aspect Communications Corporation Method and system for providing personalized service over different contact channels
US20100010982A1 (en) * 2008-07-09 2010-01-14 Broder Andrei Z Web content characterization based on semantic folksonomies associated with user generated content
CN102289523A (en) * 2011-09-20 2011-12-21 北京金和软件股份有限公司 Method for intelligently extracting text labels
CN103164471A (en) * 2011-12-15 2013-06-19 盛乐信息技术(上海)有限公司 Recommendation method and system of video text labels
CN104978332A (en) * 2014-04-04 2015-10-14 腾讯科技(深圳)有限公司 UGC label data generating method, UGC label data generating device, relevant method and relevant device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050135596A1 (en) * 2000-12-26 2005-06-23 Aspect Communications Corporation Method and system for providing personalized service over different contact channels
US20100010982A1 (en) * 2008-07-09 2010-01-14 Broder Andrei Z Web content characterization based on semantic folksonomies associated with user generated content
CN102289523A (en) * 2011-09-20 2011-12-21 北京金和软件股份有限公司 Method for intelligently extracting text labels
CN103164471A (en) * 2011-12-15 2013-06-19 盛乐信息技术(上海)有限公司 Recommendation method and system of video text labels
CN104978332A (en) * 2014-04-04 2015-10-14 腾讯科技(深圳)有限公司 UGC label data generating method, UGC label data generating device, relevant method and relevant device

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977375A (en) * 2016-10-25 2018-05-01 央视国际网络无锡有限公司 A kind of video tab generation method and device
CN108228665A (en) * 2016-12-22 2018-06-29 阿里巴巴集团控股有限公司 Determine object tag, the method and device for establishing tab indexes, object search
CN106951494A (en) * 2017-03-14 2017-07-14 腾讯科技(深圳)有限公司 A kind of information recommendation method and device
CN106951494B (en) * 2017-03-14 2022-01-04 腾讯科技(深圳)有限公司 Information recommendation method and device
CN106960033A (en) * 2017-03-22 2017-07-18 广州优视网络科技有限公司 A kind of method and apparatus that label is marked to information flow
CN106980667A (en) * 2017-03-22 2017-07-25 广州优视网络科技有限公司 A kind of method and apparatus that label is marked to article
WO2018171295A1 (en) * 2017-03-22 2018-09-27 广州优视网络科技有限公司 Method and apparatus for tagging article, terminal, and computer readable storage medium
CN106960033B (en) * 2017-03-22 2021-09-14 阿里巴巴(中国)有限公司 Method and device for labeling information stream
CN107169011B (en) * 2017-03-31 2021-06-11 百度在线网络技术(北京)有限公司 Webpage originality identification method and device based on artificial intelligence and storage medium
CN107169011A (en) * 2017-03-31 2017-09-15 百度在线网络技术(北京)有限公司 The original recognition methods of webpage based on artificial intelligence, device and storage medium
CN109213841B (en) * 2017-06-29 2021-01-01 武汉斗鱼网络科技有限公司 Live broadcast theme sample extraction method, storage medium, electronic device and system
CN109213841A (en) * 2017-06-29 2019-01-15 武汉斗鱼网络科技有限公司 Theme sample extraction method, storage medium, electronic equipment and system is broadcast live
CN107484038A (en) * 2017-08-22 2017-12-15 北京奇艺世纪科技有限公司 A kind of generation method of video subject, device and electronic equipment
CN107566917A (en) * 2017-09-15 2018-01-09 维沃移动通信有限公司 A kind of video marker method and video playback apparatus
CN108280059A (en) * 2018-01-09 2018-07-13 武汉斗鱼网络科技有限公司 Direct broadcasting room content tab extracting method, storage medium, electronic equipment and system
WO2019136841A1 (en) * 2018-01-09 2019-07-18 武汉斗鱼网络科技有限公司 Method for extracting content tag of live stream rooms, storage medium, electronic device, and system
CN108376164B (en) * 2018-02-24 2021-01-01 武汉斗鱼网络科技有限公司 Display method and device of potential anchor
CN108376164A (en) * 2018-02-24 2018-08-07 武汉斗鱼网络科技有限公司 A kind of methods of exhibiting and device of potentiality main broadcaster
CN110245343A (en) * 2018-03-07 2019-09-17 优酷网络技术(北京)有限公司 Barrage analysis method and device
CN108664585A (en) * 2018-05-07 2018-10-16 多盟睿达科技(中国)有限公司 Word method is selected in a kind of advertisement based on big data
CN109145291A (en) * 2018-07-25 2019-01-04 广州虎牙信息科技有限公司 A kind of method, apparatus, equipment and the storage medium of the screening of barrage keyword
CN109522275A (en) * 2018-11-27 2019-03-26 掌阅科技股份有限公司 Label method for digging, electronic equipment and the storage medium of content are produced based on user
CN109558502A (en) * 2018-12-18 2019-04-02 福州大学 A kind of urban safety data retrieval method of knowledge based map
CN109558502B (en) * 2018-12-18 2021-11-30 福州大学 Urban safety data retrieval method based on knowledge graph
CN111625620A (en) * 2019-02-28 2020-09-04 北京京东尚科信息技术有限公司 Information processing method and device
CN111444687A (en) * 2020-03-20 2020-07-24 北京达佳互联信息技术有限公司 Label generation method and device, server and storage medium
CN112100443A (en) * 2020-08-03 2020-12-18 咪咕文化科技有限公司 Video tag obtaining method and device, electronic equipment and storage medium
CN112100443B (en) * 2020-08-03 2024-06-04 咪咕文化科技有限公司 Video tag acquisition method and device, electronic equipment and storage medium
CN112597409A (en) * 2021-03-04 2021-04-02 蚂蚁智信(杭州)信息技术有限公司 Label display method and device

Also Published As

Publication number Publication date
CN105893478B (en) 2019-10-29

Similar Documents

Publication Publication Date Title
CN105893478A (en) Tag extraction method and equipment
CN107301170B (en) Method and device for segmenting sentences based on artificial intelligence
CN112270196B (en) Entity relationship identification method and device and electronic equipment
CN110162593A (en) A kind of processing of search result, similarity model training method and device
CN110427463A (en) Search statement response method, device and server and storage medium
CN104881458B (en) A kind of mask method and device of Web page subject
TWI554896B (en) Information Classification Method and Information Classification System Based on Product Identification
CN110442718A (en) Sentence processing method, device and server and storage medium
CN106997341B (en) A kind of innovation scheme matching process, device, server and system
CN110619051B (en) Question sentence classification method, device, electronic equipment and storage medium
CN105979376A (en) Recommendation method and device
CN114254158B (en) Video generation method and device, and neural network training method and device
CN112699645B (en) Corpus labeling method, apparatus and device
CN111309916B (en) Digest extracting method and apparatus, storage medium, and electronic apparatus
CN110287314A (en) Long text credibility evaluation method and system based on Unsupervised clustering
CN113032673A (en) Resource acquisition method and device, computer equipment and storage medium
CN113779381A (en) Resource recommendation method and device, electronic equipment and storage medium
CN112613321A (en) Method and system for extracting entity attribute information in text
CN108875743A (en) A kind of text recognition method and device
CN115099239B (en) Resource identification method, device, equipment and storage medium
CN112231554A (en) Search recommendation word generation method and device, storage medium and computer equipment
CN116956896A (en) Text analysis method, system, electronic equipment and medium based on artificial intelligence
CN115438141B (en) Information retrieval method based on knowledge graph model
CN107423307A (en) The distribution method and device of a kind of internet information resource
CN110209765A (en) A kind of method and apparatus by semantic search key

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210111

Address after: 511442 3108, 79 Wanbo 2nd Road, Nancun Town, Panyu District, Guangzhou City, Guangdong Province

Patentee after: GUANGZHOU CUBESILI INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 511449 29 / F, building B-1, Wanda Plaza, Wanbo business district, Nancun Town, Panyu District, Guangzhou City, Guangdong Province

Patentee before: GUANGZHOU HUADUO NETWORK TECHNOLOGY Co.,Ltd.