CN108595660A - Label information generation method, device, storage medium and the equipment of multimedia resource - Google Patents
Label information generation method, device, storage medium and the equipment of multimedia resource Download PDFInfo
- Publication number
- CN108595660A CN108595660A CN201810400431.5A CN201810400431A CN108595660A CN 108595660 A CN108595660 A CN 108595660A CN 201810400431 A CN201810400431 A CN 201810400431A CN 108595660 A CN108595660 A CN 108595660A
- Authority
- CN
- China
- Prior art keywords
- vocabulary
- multimedia resource
- information
- label information
- vocabularies
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Abstract
The invention discloses label information generation method, device, storage medium and the equipment of a kind of multimedia resource, belong to Internet technical field.The method includes:The comment information for obtaining destination multimedia resource carries out word segmentation processing to the comment information;Obtain the term vector of at least one vocabulary obtained after participle;The term vector of at least one vocabulary is clustered, multiple classified vocabularies are obtained, different classified vocabularies has different subject informations;In at least one vocabulary obtained after participle, the key vocabularies of the destination multimedia resource are extracted;Subject information based on the key vocabularies and the multiple classified vocabulary generates label information for the destination multimedia resource.The present invention realizes full automation when generating label information, intelligent preferable without consuming a large amount of manpower and time;And the label information of generation is more accurate, improves the precision subsequently when carrying out multimedia resource recommendation.
Description
Technical field
The present invention relates to Internet technical field, more particularly to the label information generation method of a kind of multimedia resource, dress
It sets, storage medium and equipment.
Background technology
With the rapid development of Internet technology, at present major website be dedicated to how efficiently and accurately to user into
Row multimedia resource is recommended, to promote user experience.Wherein, the above-mentioned multimedia resource referred to can cover film, TV play, small
It says, article etc..Under normal conditions, it before carrying out multimedia resource recommendation, generally also needs to be first that multimedia resource generates phase
The label information answered, and then recommended to complete multimedia resource by label information.Wherein, label information is used to provide multimedia
Source is identified, in order to which user screens subject matter type or the core subject etc. of multimedia resource.
Based on it is described above it is found that multimedia resource label information to carry out multimedia resource recommend it is particularly significant, be
This, how to generate label information for multimedia resource becomes a focus of those skilled in the art's concern at present.Wherein, phase
Pass technology is completely dependent on when generating label information for multimedia resource and is accomplished manually.By taking multimedia resource is film as an example, then
Referring to Figure 1A, if film is " The Shawshank Redemption ", then staff may be manually it add " plot ", " crime " this
The label information of sample.
In the implementation of the present invention, the relevant technologies have at least the following problems:
Label information is dependent on manually generated, and the number magnanimity of multimedia resource, so the generation of this kind of label information
Mode can consume a large amount of manpower and time, not smart enoughization;In addition, that there are accuracies is poor for manually generated label information
Defect, this can cause subsequently, and when carrying out multimedia resource recommendation based on label information, precision substantially reduces.
Invention content
An embodiment of the present invention provides a kind of label information generation method of multimedia resource, device, storage medium and set
Standby, not smart enoughization and accuracy are poor when solving generation label information existing for the relevant technologies, so as to cause recommending
The problem of recommending precision to be also greatly reduced when multimedia resource.The technical solution is as follows:
On the one hand, a kind of label information generation method of multimedia resource is provided, the method includes:
The comment information for obtaining destination multimedia resource carries out word segmentation processing to the comment information;
Obtain the term vector of at least one vocabulary obtained after participle;
The term vector of at least one vocabulary is clustered, multiple classified vocabularies, the different vocabulary point are obtained
Class has different subject informations;
In at least one vocabulary obtained after participle, the key vocabularies of the destination multimedia resource are extracted;
Subject information based on the key vocabularies and the multiple classified vocabulary generates for the destination multimedia resource
Label information.
On the other hand, a kind of label information generating means of multimedia resource are provided, described device includes:
First acquisition module, the comment information for obtaining destination multimedia resource segment the comment information
Processing;
Second acquisition module, the term vector for obtaining at least one vocabulary obtained after participle;
Cluster module is clustered for the term vector at least one vocabulary, obtains multiple classified vocabularies, different
The classified vocabulary have different subject informations;
Extraction module at least one vocabulary for being obtained after participle, extracts the pass of the destination multimedia resource
Keyword converges;
Generation module is used for the subject information based on the key vocabularies and the multiple classified vocabulary, is the target
Multimedia resource generates label information.
On the other hand, provide a kind of storage medium, be stored at least one instruction in the storage medium, it is described at least
One instruction is loaded by processor and is executed to realize the label information generation method of above-mentioned multimedia resource.
On the other hand, a kind of equipment for generating label information is provided, the equipment includes processor and memory,
At least one instruction is stored in the memory, at least one instruction is loaded by the processor and executed to realize such as
The label information generation method of above-mentioned multimedia resource.
The advantageous effect that technical solution provided in an embodiment of the present invention is brought is:
Full automation is realized when generating label information for multimedia resource, due to being not necessarily to put into manpower into row label
The addition of information, so without consuming a large amount of manpower and time, it is intelligent preferable;And the embodiment of the present invention is based on multimedia
The comment information of resource is got for the subject information of multiple classified vocabularies of this multimedia resource and for commenting this
Multiple key vocabularies of item multimedia resource are come to generate label information for this multimedia resource with this, not only make generation
Label information is more accurate, and improves the subsequently precision when carrying out multimedia resource recommendation.
Description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings
Attached drawing.
Figure 1A is a kind of interface schematic diagram for showing label information that background technology provides;
Figure 1B is the implementation involved by a kind of label information generation method of multimedia resource provided in an embodiment of the present invention
The configuration diagram of environment.
Fig. 2 is a kind of disposed of in its entirety flow of the label information generation method of multimedia resource provided in an embodiment of the present invention
Figure;
Fig. 3 is a kind of flow chart of the label information generation method of multimedia resource provided in an embodiment of the present invention;
Fig. 4 is a kind of schematic diagram of weighted value calculating label information provided in an embodiment of the present invention;
Fig. 5 is a kind of flow chart of the label information generation method of multimedia resource provided in an embodiment of the present invention;
Fig. 6 is a kind of interface schematic diagram for showing label information provided in an embodiment of the present invention;
Fig. 7 is a kind of structural schematic diagram of the label information generating means of multimedia resource provided in an embodiment of the present invention;
Fig. 8 is a kind of structural schematic diagram for generating the equipment of label information provided in an embodiment of the present invention.
Specific implementation mode
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
Before to the embodiment of the present invention carrying out that explanation is explained in detail, first to the present embodiments relate to some names
Word is explained.
Multimedia resource:Its form of expression includes but not limited to textual form, visual form, speech form, image format,
It can cover film, TV play, novel, article, audio fragment, variety video etc., and the embodiment of the present invention is to this without tool
Body limits.
And multimedia resource can be presented by the visual user interface of electronic equipment to user.Wherein, electronic equipment can be
The equipment that smart mobile phone, tablet computer, television set, laptop, desktop computer etc. arbitrarily have display screen.
Label information:For being identified to multimedia resource, in order to which user screens the subject matter type of multimedia resource
Or core subject etc..
By taking film as an example, the label information of film may include:Plot, action, love, drama, venture, war, is shied at crime
Horrified, suspense, terror, science fiction, song and dance, history, family, swordsman, ethics, record, biography etc..
As it was noted above, major website is dedicated to how efficiently and accurately pushing away to user's progress multimedia resource at present
It recommends, and the major premise for carrying out multimedia resource recommendation is:Mass multimedia resource is precisely divided by label information
Class.However, since the relevant technologies take the artificial mode for adding label information for multimedia resource, so would generally bring following
Similar problem:
1), since label information manually adds, so it is difficult to controlling the mark that different operating personnel define label information
Accurate and granularity of classification;In addition, the number magnanimity of multimedia resource, therefore can manually be consumed for multimedia resource addition label information
A large amount of manpower and time, lack of wisdom.
2) the label information accuracy, manually added is generally poor, so carrying out multimedia based on such label information
Resource recommendation can have that recommendation effect is bad.
3) territory for the definition for tag information, manually added is generally too extensive.By taking film as an example, belong to plot and
There are many film quantity of this scope of crime, therefore are carrying out related shadow based on label information as such as " plot, crime "
When piece is recommended, can there is a problem of that the film recommended is not accurate enough.
To solve the above-mentioned problems, it is that multimedia resource adds automatically that the embodiment of the present invention, which proposes one kind based on big data,
The method of label information, and also achieve the label information based on generation and carry out similar multimedia resource recommendation.Wherein, above-mentioned
Big data refers to the comment information that mass users comment on multimedia resource.
Figure 1B is the implementation involved by a kind of label information generation method of multimedia resource provided in an embodiment of the present invention
The structure chart of environment.
Referring to Figure 1B, which includes terminal 101 and server 102.Wherein, terminal 101 is for showing more matchmakers
The label information of body resource, and similar with this multimedia resource multimedia resource of displaying, the type of terminal 101 include but
It is not limited to smart mobile phone, tablet computer, television set, laptop, desktop computer etc., the embodiment of the present invention is to this without tool
Body limits.Server 102 is used to add label information automatically for multimedia resource, and is determined and be somebody's turn to do based on the label information of addition
Other similar multimedia resources of item multimedia resource.
In another embodiment, the embodiment of the present invention is based on the big data on internet, using machine learning side
Method, realizes the label information that weight is carried for multimedia resource addition, and the label information of addition can well reflect more than one
The subject matter type or core subject of media resource.
For example, for " The Shawshank Redemption " this film, other than plot and crime this two label informations, the present invention
Embodiment can for its addition such as [citizen's right, 0.325], [prison, 0.212], [freely, 0.23], [conviction, 0.14],
Label information as [life and death, 0.093].Wherein, before square brackets it is specific label information, being behind square brackets should
The corresponding weight of item label information.
In conclusion the embodiment of the present invention realizes:
1), the addition full automation of label information, is added without putting into manpower, intelligent preferable.
2) big data on internet, has been used to carry out the addition of label information so that the label of addition is more smart
It is accurate.
3), the label information added carries weight, and multimedia resource recommendation is being carried out based on the label information with weight
When, better recommendation effect can be obtained.
In short, the embodiment of the present invention is based on the big data on internet, it can be automatic by machine learning method
The label information of weight is carried for multimedia resource addition, and it is similar to carry out to be further based on the label information with weight
The recommendation of multimedia resource.
In another embodiment, the embodiment of the present invention will be mainly reflected in two aspects, a side in product side angle degree
Face is the displaying of label information, is the application of label information on the other hand, i.e., can carry out phase by the label information of addition
As multimedia resource recommend.With the label information of " The Shawshank Redemption " for " citizen's right, prison, freedom, conviction, life and death "
For, then the way of recommendation provided in an embodiment of the present invention is used, accurately can recommend such as " Once Upon a Time in America, perfection to user
The world, godfather 3, trainspotting " etc. films, rather than belong under the scope of plot and crime it is a series of be not very relevant shadow
Piece.
It should be noted that the personalized recommendation mode that the embodiment of the present invention proposes, can be widely applied to newly reach the standard grade more
Media resource.Because for the multimedia resource newly reached the standard grade, the number of users of viewing may be insufficient to, so can not pass through
The behavioral data of user carries out associated multimedia resource recommendation, therefore can be taken based on the similar of content between multimedia resource
To complete to recommend.Certainly, the above-mentioned personalized recommendation mode that the embodiment of the present invention proposes also can be applicable under other scenes, this hair
Bright embodiment is to this without specifically limiting.
In another embodiment, first the disposed of in its entirety flow of the embodiment of the present invention is briefly described.
Referring to Fig. 2, the process flow that the embodiment of the present invention includes is as follows:
A, data acquire;
The step is for acquiring the big data on internet.It is by taking multimedia resource is film as an example, then collected
Big data is the film review information that mass users evaluate film.
B, data processing;
The step is mainly used for being processed collected data, for example, the poor comment information of filter quality, to commenting
Word segmentation processing etc. is carried out by information.
C, term vector is trained;
The step is used to carry out term vector training at least one vocabulary obtained after participle, and training result is by each word
Remittance is expressed as a unified vector of dimension.
D, term vector clusters, and extracts theme;
The step is used to cluster each term vector that step c is obtained, and to clustering obtained each classified vocabulary
Carry out the mark of subject information.
E, the key vocabularies extraction of multimedia resource;
The step is used to extract part vocabulary in the comment information of multimedia resource according to certain way, and will extraction
The part vocabulary gone out is as the key vocabularies for commenting on this multimedia resource.Specific extracting mode refers to be retouched hereinafter
It states.
F, it is that multimedia resource adds label information automatically;
The step is that multimedia resource adds label information automatically for the result based on step d and step e.
G, the label information for being based upon multimedia resource generation carries out similar multimedia resource recommendation.
Explanation is explained in detail to each step of foregoing description in particular embodiments below.
Fig. 3 is a kind of flow chart of the label information generation method of multimedia resource provided in an embodiment of the present invention.Referring to
Fig. 3, method flow provided in an embodiment of the present invention include:
301, server obtains the comment information of destination multimedia resource.
In embodiments of the present invention, the multimedia resource of label information to be added is referred to as destination multimedia resource.And
Comment information is different according to the type of multimedia resource, usually has different appellations.By taking multimedia resource is film as an example, then
Above-mentioned comment information is also referred to as film review information, and by taking multimedia resource is TV play as an example, then above-mentioned comment information can also claim
Be dramatic criticism information.One comment information generally refers to the comment to a multimedia resource that a user delivers.
Wherein, the comment information embodiment of the present invention of destination multimedia resource can be from the data source with a large amount of comment datas
It obtains, such as each World Jam, website, community etc., the embodiment of the present invention is to this without specifically limiting.In addition, being commented on obtaining
The reptile software scrapy that increases income specifically can be used to realize when information.
By taking destination multimedia resource is film as an example, it is assumed that A community-specifics in the film review information for accumulating each film,
In, these film review information describe cognition of the different user to same portion's film from different perspectives, then being directed to destination multimedia
For resource, the film review letter that the reptile software scrapy that increases income crawls mass users to it from the communities A can be used in the embodiment of the present invention
Breath.
It should be noted that for different data sources, some data sources other than recording comment information itself,
Can may also record relevant evaluation of the user to each single item comment information, for example, each user to the scoring of each film review information,
Each user determines that the whether useful polled data of this film review information, the embodiment of the present invention are carrying out crawling for comment information
When, the relevant evaluation of comment information can together will also be crawled, with use it for subsequently handling the data crawled
Step.
302, server carries out word segmentation processing to the comment information got.
In embodiments of the present invention, if the data crawled include the relevant evaluation for comment information, the present invention
Embodiment is also supported according to these evaluations come the poor comment information of filter quality, to purify data.
In specific be filtered, may be selected scoring being more than useful ballot no more than default score value or useless votes
Several comment informations filter out, because these film reviews is of low quality, bad shadow may be brought to being subsequently generated label information
It rings.It it is 5 points in the case of full marks, then it can be 1 point or 2 points to preset score value, and the embodiment of the present invention is to this without specifically limiting.
Wherein, the branch that word segmentation processing belongs to progress data processing in above-mentioned steps b is carried out to comment information.Right
After the comment information crawled completes filtering, the jieba to increase income participle tools can be used to believe filtered comment for server
Breath is segmented.
Needing at illustrate first point is, jieba participle tools of increasing income mainly support three kinds of participle patterns:One kind is accurate mould
Formula, it is intended to sentence most accurately be cut, text analyzing is primarily adapted for use in;Another kind is syntype, by it is all in sentence can
Come with all being scanned at the word of word, although speed is very fast, ambiguity problem cannot be solved;It is last a kind of for search engine
Pattern, to long word cutting again, improves recall rate on the basis of accurate model.The embodiment of the present invention can be based on last a kind of
Participle pattern carries out word segmentation processing to comment information.
To need the second point illustrated be, due to the embodiment of the present invention it is desirable that some can be described, be summarized more than one
The descriptive words of media resource, therefore after being segmented to comment information, usually only retain the vocabulary with target part of speech.
Wherein, target part of speech includes but not limited to noun, adjective and verb.
As an example it is assumed that the segment word in comment information is " finally to see whole movie when midnight and be over, certainly
Surely it goes to buy book.The plot that whole movie does not have any violence bloody, although keynote is always gloomy, prevailing scenario is shark after all
Fort --- prison.Peace the innocent of enlightening is put in prison, and the name to murder wife and her sweet heart is judged to two life imprisonment and closes into shark fort, one
Be full male prisoner, rotten dirt prison.The makings peace enlightening totally different with many prisoners, several years ago received what kind for the treatment of very few band
It crosses, but the injury suffered by him is envisioned that.He allows me to remember Sirius Black because he know oneself be it is not guilty,
This conviction is not really fine and is not just siphoned away by dementor, maintains awake and finally escapes from A Zikaban.I thinks, if raw
Hit has a kind of conviction, has a branch of radiance never to extinguish in the heart, and hundred foldings are not forgiven will be bright.There are one the scene under dusk, labor
It has moved one day prisoner and has drunk beer on vacant lot, matched that sentence that A Rui is said, " I thinks that he merely desires to review freedom, even only
Have in a flash." very beautiful, just so a moment is thought to have escaped constraint, enjoys freely ", then participle of the embodiment of the present invention to it
As a result it is:
" it is finally whole see be over buy the bloody plot keynote of whole violence be always gloomy scene shark fort prison peace enlightening without
Crime, which is put in prison, murders the name of wife and her sweet heart and is judged to life imprisonment to close into shark fort to be that the rotten dirty prison makings prisoner of male prisoner is widely different entirely
It is that his the be hurt imagination allows me to want to be not really fine for his not guilty conviction and just do not take the photograph that kind for the treatment of is different peace enlightening it is very few to have received
Soul, which siphons away, awake final to be escaped from card class I thinks a kind of scene labor under life has conviction radiance never to extinguish there are one light of not forgiving
Dynamic prisoner vacant lot beer A Rui says I think to merely desire to review freely just so escape constraint enjoy it is free ".
303, server obtains the term vector of at least one vocabulary obtained after participle.
Due to being no associated between at least one vocabulary for being obtained after participle, so the embodiment of the present invention passes through calculating
Similarity between two term vectors, to obtain the similarity between two vocabulary.Change a kind of expression way, the embodiment of the present invention
Will determine that between two vocabulary whether semantic similarity the problem of, be converted into and calculate asking for the similarity between two term vectors
Topic.
In embodiments of the present invention, server is using word2vec (word steering volume) tool increased income, to being obtained after participle
At least one vocabulary carry out term vector training, obtain at least one term vector.Wherein, word2vec tools can turn vocabulary
It changes vector into, and ensures that relative similarity and semantic similarity between vector are relevant.
In other words, word2vec technologies are a kind of highly effective algorithm models that word is characterized as to real number value vector, are utilized
Deep learning thought will be reduced to the vector operation in K dimensional vector spaces by training to the processing of content of text, and vector is empty
Between on similarity can be used for indicating that text is similar semantically.
In embodiments of the present invention, the training result of term vector is the vector that each vocabulary is expressed as to K dimensions.Its
In, the value of K can be 400, and the embodiment of the present invention is to this without specifically limiting.
In another embodiment, the training parameter of word2vec tools can be as described in Table 1:
Table 1
304, server clusters the term vector of at least one vocabulary, obtains multiple classified vocabularies, different vocabulary
Classification has different subject informations.
In embodiments of the present invention, after obtaining multiple term vectors by above-mentioned steps 303, it is also necessary to pass through the side of cluster
The similar vocabulary of term vector is gathered into a set by method.And why reason for this is that:Different user pair matchmaker more than one
When body resource is commented on, the vocabulary used is discrepant, but the meaning of different lexical representations may be semantically phase
Close, so this step can get together the vocabulary of semantic similarity, can be manually each vocabulary point that cluster obtains optionally
Not one theme of label, i.e. each classified vocabulary correspond to a theme vocabulary.
Wherein, theme vocabulary can be that highest word of frequency of occurrence in a classified vocabulary, or to a vocabulary
Each vocabulary is summarized the word of summary in classification, and the embodiment of the present invention is to this without specifically limiting.
The embodiment of the present invention takes K-means algorithms to be clustered to obtain multiple term vectors to above-mentioned steps 303, cluster
Parameter can be as described in Table 2:
Table 2
Parameter | N_clusters=200, max_iter=300, n_init=10 |
Parameter declaration | Cluster is 200 clusters, most iteration 300 times, barycenter initial point selection 10 times |
Wherein, barycenter seed refers to the center of mass point that is initialized before being clustered, clusters as 200 clusters, then also can be just
200 center of mass point of beginningization.By above-mentioned table 2 it is found that the embodiment of the present invention by the multiple words clusterings obtained after participle be 200
Cluster, i.e. cluster are 200 classified vocabularies.
This step is illustrated by taking following Table 3 as an example below.6 classified vocabularies are shown in table 3, wherein every
Include semantic similar multiple vocabulary in one classified vocabulary, and each classified vocabulary is respectively provided with a subject information,
Subject information is different between different classified vocabularies.Such as the cluster ID classified vocabularies for being 1 and cluster ID be 2 classified vocabulary between it is main
It is just different to inscribe information, one is to save the nation from extinction, another is that spy is fought.
In addition, where subject information shows the core concept and purport of classified vocabulary.The vocabulary for being 1 with cluster ID
For classification, subject information is " saving the nation from extinction ", and correspondingly, the vocabulary for including in the classified vocabulary is related to saving the nation from extinction, for example wraps
Include " rescue, braves dangers, take back, flee from, recover, run away, and escapes from and, to rescue " etc. vocabulary.
Table 3
305, at least one vocabulary that server obtains after participle, the pass for commenting on destination multimedia resource is extracted
Keyword converges.
For the step, the embodiment of the present invention uses TF-IDF (Term Frequency-Inverse Document
Frequency, term frequency-inverse document frequency) technology closed at least one vocabulary for commenting on destination multimedia resource
Key word retrieval.
In the specific implementation, at least one vocabulary is integrated into a document by the embodiment of the present invention first, and TF is for counting
The frequency that some vocabulary occurs, i.e. TF include for characterizing in the number and the document that a vocabulary occurs in the document
The ratio of total word number;IDF is inverse document word frequency, the significance level for characterizing a vocabulary.
By taking the first probability score refers to TF as an example, then for each vocabulary at least one vocabulary, the vocabulary
The calculation of first probability score is as follows:
First, occurrence number of the vocabulary at least one vocabulary is obtained;Later, the vocabulary is based at least one word
The vocabulary quantity that occurrence number and at least one vocabulary in remittance include, obtains the first probability score of the vocabulary.
Change a kind of expression way, number/vocabulary total number that mono- vocabulary of TF=occurs.
By taking the second probability score refers to IDF as an example, then for each vocabulary at least one vocabulary, the vocabulary
The calculation of second probability score is as follows:
For each vocabulary at least one vocabulary, server first determines packet in whole documents of database purchase
Include at least one document of the vocabulary;Later, the number of whole documents of quantity and database purchase based at least one document
Amount obtains the second probability score.
Change a kind of expression way, IDF=log (number of files+1 of total number of documents/the include vocabulary)
It should be noted that the comment information of each single item multimedia resource of storage is integrated into one by the embodiment of the present invention
Document is stored.That is, a document is corresponding with a multimedia resource.
In conclusion for for a vocabulary, the probability total score of the vocabulary is the first probability based on the vocabulary point
What value and the second probability score obtained, i.e. TF-IDF=TF*IDF.
In another embodiment, in obtaining at least one vocabulary after the probability total score of each vocabulary, the present invention is real
Descending sequence can be carried out to the probability total score of each vocabulary by applying example;Later, probability total score is come into preceding present count
Key vocabularies of the vocabulary of mesh position as destination multimedia resource.
Wherein, the value of preset number can be 10 or 20, and the embodiment of the present invention is to this without specifically limiting.
306, the subject information of key vocabularies and multiple classified vocabulary of the server based on destination multimedia resource is target
Multimedia resource generates label information.
In the subject information that through the above steps 304 get multiple classified vocabularies, and 305 obtain through the above steps
To after the key vocabularies of destination multimedia resource, this step is specifically that above-mentioned key vocabularies are mapped on each subject information,
Corresponding subject information lookup is carried out using key vocabularies, and then using the subject information found as destination multimedia resource
Label information.
That is, being that destination multimedia resource generates label information in the subject information based on key vocabularies and multiple classified vocabularies
When, the embodiment of the present invention takes following manner to realize:First in the subject information of multiple classified vocabularies, destination multimedia is determined
The corresponding subject information of key vocabularies of resource;Later, the corresponding subject information of the key vocabularies of destination multimedia resource is made
For the label information of destination multimedia resource.
Wherein, for the specific reality of the corresponding subject information of determining key vocabularies in the subject information of multiple classified vocabularies
Existing mode, and following step can be subdivided into:
A, whether for any one key vocabularies, it includes the key vocabularies to search in multiple classified vocabularies;
If b, a classified vocabulary includes the key vocabularies, the subject information of the classified vocabulary is determined as the key
The corresponding subject information of vocabulary.
In another embodiment, the embodiment of the present invention can also be that weight is arranged in each single item label information generated.Wherein,
The source of weight is as follows:It is for each single item label information of generation, the probability of key vocabularies corresponding with the label information is total
Score value, the weighted value as the label information.Specifically, if the corresponding key vocabularies number of the label information is at least two
It is a, then by the sum of the probability total score of each key vocabularies corresponding with the label information, the weighted value as the label information.
Below by taking Fig. 4 as an example, generation and weight setting to above-mentioned label information are illustrated.
By taking film " eavesdropping storm " as an example, 4 key vocabularies " eavesdropping, monitoring, secret police, monitoring " of the film are right
" eavesdropping " this subject information is answered, so a label information of the film is " eavesdropping ", and the weight of the label information is
0.149+0.131+0.129+0.052=0.461.Wherein, 0.149 be key vocabularies " eavesdropping " probability total score, 0.131 is
The probability total score of key vocabularies " monitoring ", 0.129 is the probability total score of key vocabularies " secret police ", and 0.052 is key
The probability total score of vocabulary " monitoring ".
It should be noted that server repeat above-mentioned steps 301 to step 306 can be database in store it is each
Item multimedia resource adds label information automatically.And after extracting the label information of multimedia resource, one effectively using just
It is the recommendation for carrying out similar multimedia resource.
In another embodiment, referring to Fig. 5, the multimedia resource way of recommendation provided in an embodiment of the present invention includes following
Step:
501, server obtains the primary vector information of destination multimedia resource.
502, server obtains the secondary vector information of other multimedia resources.
Wherein, other multimedia resources are the resource other than destination multimedia resource of database purchase.
In embodiments of the present invention, in order to calculate in a multimedia resource and database between other multimedia resources
Similarity also needs every multimedia resource vectorization first.Wherein, the process of vectorization includes:
(1), for any one multimedia resource, term vector instruction is carried out to every label information of this multimedia resource
Practice, obtains the term vector of every label information.
For arbitrary label information W, the term vector of W is represented by [W1v1, W1v2 ... W1v400].That is, each word to
Amount can be indicated with the matrix of a 1*400.
(2), for each single item label information, multiplying for the term vector of the label information and the weighted value of the label information is obtained
Product operation result, by the sum of the product calculation result of every label information, the vector information as the multimedia resource.
Assuming that the label information of certain film be respectively " eavesdropping, secret service, performance, human nature, life, politics and law, artist, oneself
By and history ", then the term vector * weights of term vector * weights+secret service of vector=eavesdropping of portion's film+...+history
Term vector * weights.
If each term vector is indicated with the matrix of a 1*400, the vector of portion's film is similarly one
A 1*400 sizes.
503, the second of primary vector information and other multimedia resources of the server based on destination multimedia resource to
Information is measured, the similarity between destination multimedia resource and other multimedia resources is calculated.
It is for any other multimedia resource is B films, then of the invention with destination multimedia resource for A films
Cosine similarity algorithm can be used to calculate the similarity between A films and B films in embodiment:
Wherein, i and n is positive integer, and n refers to the dimension of the vector information of two films, for example the value of n is 400.
504, server chooses the specified multimedia resource that similarity is more than predetermined threshold value in other multimedia resources.
Assuming that destination multimedia resource is " The Shawshank Redemption ", then other stored in portion's film and database are calculated
The similarity of all films between any two.Wherein, the size of predetermined threshold value can be 0.8 or 0.9 etc., the present invention implement to this not into
Row is specific to be limited.Continue by taking " The Shawshank Redemption " as an example, as described in Table 4, which can be led to the similar of other films
Degree is ranked up according to numerical values recited.
Table 4
Film title | Similarity |
Once Upon a Time in America | 0.816 |
The perfect world | 0.811 |
Godfather 3 | 0.805 |
Trainspotting | 0.802 |
It collides | 0.802 |
You shut up at bifurcation! | 0.742 |
Aerial prison | 0.724 |
21 grams | 0.723 |
11 arhats | 0.723 |
This killer is not too cold | 0.720 |
505, server is recommended specified multimedia resource as resource similar with destination multimedia resource.
Assuming that the size of predetermined threshold value is 0.8, then by " Once Upon a Time in America, the perfect world, godfather 3, trainspotting, collision " etc.
Several films are recommended as the similar film with " The Shawshank Redemption ".
506, terminal is when showing the label information of destination multimedia resource, while showing similar to destination multimedia resource
Resource.
Continue so that destination multimedia resource is " The Shawshank Redemption " as an example, then terminal is in the label letter for showing portion's film
Breath and when with its similar resource, can be shown, the embodiment of the present invention is to this without tool according to mode as shown in FIG. 6
Body limits.
In conclusion method provided in an embodiment of the present invention has the advantages that:
1) full automation, is realized when adding label for multimedia resource, due to being not necessarily to put into manpower into row label
The addition of information, so without consuming a large amount of manpower and time, it is intelligent preferable.
2) big data on internet, has been crawled, the comment information of multimedia resource has been obtained with this, and also complete
Second-rate comment information has been filtered out in the comment information in portion, and is based further on filtered comment information to generate mark
Information is signed, so the label information generated is more accurate, and then subsequently carries out multimedia resource in the label information based on generation
When recommendation, recommendation effect is more preferably.
3), the label information generated carries weight, is recommending similar multimedia resource based on the label information with weight
When, it is ensured that good recommendation effect.
Fig. 7 is a kind of structural schematic diagram of the label information generating means of multimedia resource provided in an embodiment of the present invention.
Referring to Fig. 7, which includes:
First acquisition module 701, the comment information for obtaining destination multimedia resource divide the comment information
Word processing;
Second acquisition module 702, the term vector for obtaining at least one vocabulary obtained after participle;
Cluster module 703 clusters for the term vector at least one vocabulary, obtains multiple classified vocabularies,
The different classified vocabularies has different subject informations;
Extraction module 704 at least one vocabulary for being obtained after participle, extracts the destination multimedia resource
Key vocabularies;
Generation module 705 is used for the subject information based on the key vocabularies and the multiple classified vocabulary, is the mesh
It marks multimedia resource and generates label information.
Device provided in an embodiment of the present invention realizes full automation when generating label information for multimedia resource,
Due to carrying out the addition of label information without putting into manpower, so without consuming a large amount of manpower and time, it is intelligent preferable;
And comment information of the embodiment of the present invention based on multimedia resource, get multiple vocabulary point for this multimedia resource
The subject information of class and multiple key vocabularies for commenting this multimedia resource to give birth to for this multimedia resource with this
At label information, not only so that the label information generated is more accurate, and improves and subsequently carrying out multimedia resource recommendation
When precision.
In another embodiment, extraction module is additionally operable to, for each vocabulary at least one vocabulary, obtain
The first probability score and the second probability score of the vocabulary, first probability score are used to characterize the appearance frequency of the vocabulary
Rate, second probability score are used to characterize the significance level of the vocabulary;Based on first probability score and described second
Probability score obtains the probability total score of the vocabulary;According to descending sequence, acquisition probability total score is default before coming
The vocabulary of number position is as the key vocabularies.
In another embodiment, extraction module is additionally operable at least one vocabulary being integrated into a document;For
Each vocabulary at least one vocabulary, determination includes at least the one of the vocabulary in whole documents of database purchase
A document;The quantity of whole documents of quantity and the database purchase based at least one document obtains institute's predicate
The second probability score converged.
In another embodiment, generation module is additionally operable in the subject information of the multiple classified vocabulary, determines institute
State the corresponding subject information of key vocabularies;Using the corresponding subject information of the key vocabularies as the destination multimedia resource
Label information.
In another embodiment, each classified vocabulary includes semantic similar at least one vocabulary;Generate mould
Block is additionally operable to for any one key vocabularies, and whether search in the multiple classified vocabulary includes the key vocabularies;If one
A classified vocabulary includes the key vocabularies, then the subject information of the classified vocabulary is determined as any one described key
The corresponding subject information of vocabulary.
In another embodiment, which further includes:
Setup module will be with the mark for each single item label information for being generated for the destination multimedia resource
Sign the probability total score of the corresponding key vocabularies of information, the weighted value as the label information.
In another embodiment, setup module, if it is extremely to be additionally operable to the corresponding key vocabularies number of the label information
It is two few, then by the sum of the probability total score of each key vocabularies corresponding with the label information, as the label information
Weighted value.
In another embodiment, which further includes:
Recommending module, the primary vector information for obtaining the destination multimedia resource;Obtain other multimedia resources
Secondary vector information, other described multimedia resources be database purchase the money other than the destination multimedia resource
Source;Based on the primary vector information and the secondary vector information, obtain the destination multimedia resource with it is described other
Similarity between multimedia resource;Resource similar with the destination multimedia resource is carried out according to the similarity got to push away
It recommends.
In another embodiment, recommending module is additionally operable to, for any one multimedia resource, obtain the multimedia
The term vector of every label information of resource;Based on the term vector and weighted value of every label information, more matchmakers are obtained
The vector information of body resource.
The alternative embodiment that any combination forms the disclosure may be used, herein no longer in above-mentioned all optional technical solutions
It repeats one by one.
It should be noted that:The label information generating means for the multimedia resource that above-described embodiment provides are generating label letter
It, only the example of the division of the above functional modules, can be as needed and by above-mentioned function in practical application when breath
Distribution is completed by different function modules, i.e., the internal structure of device is divided into different function modules, to complete above retouch
The all or part of function of stating.In addition, the label information generating means for the multimedia resource that above-described embodiment provides and more matchmakers
The label information generation method embodiment of body resource belongs to same design, and specific implementation process refers to embodiment of the method, here
It repeats no more.
Fig. 8 is a kind of structural schematic diagram for generating the equipment of label information provided in an embodiment of the present invention, the equipment
800 can generate bigger difference because configuration or performance are different, may include one or more processors (central
Processing units, CPU) 801 and one or more memory 802, wherein it is stored in the memory 802
There are at least one instruction, at least one instruction to be loaded by the processor 801 and executed to realize that above-mentioned each method is real
The label information generation method of the multimedia resource of example offer is provided.Certainly, which can also have wired or wireless network
The components such as interface, keyboard and input/output interface, to carry out input and output, which can also include other for real
The component of existing functions of the equipments, this will not be repeated here.
In the exemplary embodiment, a kind of computer readable storage medium, such as the memory including instruction are additionally provided,
Above-metioned instruction can be executed by the processor in terminal to complete the label information generation side of the multimedia resource in above-described embodiment
Method.For example, the computer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk
With optical data storage devices etc..
One of ordinary skill in the art will appreciate that realizing that all or part of step of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can be stored in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.
Claims (12)
1. a kind of label information generation method of multimedia resource, which is characterized in that the method includes:
The comment information for obtaining destination multimedia resource carries out word segmentation processing to the comment information;
Obtain the term vector of at least one vocabulary obtained after participle;
The term vector of at least one vocabulary is clustered, multiple classified vocabularies, different classified vocabularies tool are obtained
There is different subject informations;
In at least one vocabulary obtained after participle, the key vocabularies of the destination multimedia resource are extracted;
Subject information based on the key vocabularies and the multiple classified vocabulary generates label for the destination multimedia resource
Information.
2. according to the method described in claim 1, it is characterized in that, at least one vocabulary obtained after participle, carry
The key vocabularies of the destination multimedia resource are taken, including:
For each vocabulary at least one vocabulary, the first probability score and the second probability point of the vocabulary are obtained
Value, first probability score are used to characterize the frequency of occurrences of the vocabulary, and second probability score is for characterizing institute's predicate
The significance level of remittance;
Based on first probability score and second probability score, the probability total score of the vocabulary is obtained;
According to descending sequence, the vocabulary of preset number position is as the key vocabularies before acquisition probability total score comes.
3. according to the method described in claim 2, it is characterized in that, obtain the second probability score process, including:
At least one vocabulary is integrated into a document;
For each vocabulary at least one vocabulary, determination includes the vocabulary in whole documents of database purchase
At least one document;
The quantity of whole documents of quantity and the database purchase based at least one document, obtains the vocabulary
Second probability score.
4. according to the method described in claim 1, it is characterized in that, described based on the key vocabularies and the multiple vocabulary point
The subject information of class generates label information for the destination multimedia resource, including:
In the subject information of the multiple classified vocabulary, the corresponding subject information of the key vocabularies is determined;
Using the corresponding subject information of the key vocabularies as the label information of the destination multimedia resource.
5. according to the method described in claim 4, it is characterized in that, each classified vocabulary includes that semanteme is similar at least
One vocabulary;
It is described to determine the corresponding subject information of the key vocabularies in the subject information of the multiple classified vocabulary, including:
Whether for any one key vocabularies, it includes the key vocabularies to search in the multiple classified vocabulary;
If a classified vocabulary includes the key vocabularies, the subject information of the classified vocabulary is determined as described arbitrary
The corresponding subject information of one key vocabularies.
6. according to the method described in claim 1, it is characterized in that, the method further includes:
It, will keyword corresponding with the label information for each single item label information generated for the destination multimedia resource
The probability total score of remittance, the weighted value as the label information.
7. according to the method described in claim 6, it is characterized in that, described by key vocabularies corresponding with the label information
Probability total score, as the weighted value of the label information, including:
If the corresponding key vocabularies number of the label information is at least two, will each pass corresponding with the label information
The sum of the probability total score that keyword converges, the weighted value as the label information.
8. the method according to any claim in claim 1 to 7, which is characterized in that the method further includes:
Obtain the primary vector information of the destination multimedia resource;
The secondary vector information of other multimedia resources is obtained, other described multimedia resources are database purchase in addition to described
Resource except destination multimedia resource;
Based on the primary vector information and the secondary vector information, obtain the destination multimedia resource with it is described other
Similarity between multimedia resource;
Resource recommendation similar with the destination multimedia resource is carried out according to the similarity got.
9. according to the method described in claim 8, it is characterized in that, the vector information acquisition process of any one multimedia resource
Including:
For any one multimedia resource, the term vector of every label information of the multimedia resource is obtained;
Based on the term vector and weighted value of every label information, the vector information of the multimedia resource is obtained.
10. a kind of label information generating means of multimedia resource, which is characterized in that described device includes:
First acquisition module, the comment information for obtaining destination multimedia resource carry out word segmentation processing to the comment information;
Second acquisition module, the term vector for obtaining at least one vocabulary obtained after participle;
Cluster module is clustered for the term vector at least one vocabulary, obtains multiple classified vocabularies, different institutes
Stating classified vocabulary has different subject informations;
Extraction module at least one vocabulary for being obtained after participle, extracts the keyword of the destination multimedia resource
It converges;
Generation module is used for the subject information based on the key vocabularies and the multiple classified vocabulary, is the more matchmakers of the target
Body resource generates label information.
11. a kind of storage medium, which is characterized in that it is stored at least one instruction in the storage medium, described at least one
Instruction is loaded by processor and is executed to realize the multimedia resource as described in any of claim 1 to 9 claim
Label information generation method.
12. a kind of equipment for generating label information, which is characterized in that the equipment includes processor and memory, described
At least one instruction is stored in memory, at least one instruction is loaded by the processor and executed to realize such as right
It is required that the label information generation method of the multimedia resource described in any of 1 to 9 claim.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810400431.5A CN108595660A (en) | 2018-04-28 | 2018-04-28 | Label information generation method, device, storage medium and the equipment of multimedia resource |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810400431.5A CN108595660A (en) | 2018-04-28 | 2018-04-28 | Label information generation method, device, storage medium and the equipment of multimedia resource |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108595660A true CN108595660A (en) | 2018-09-28 |
Family
ID=63619153
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810400431.5A Pending CN108595660A (en) | 2018-04-28 | 2018-04-28 | Label information generation method, device, storage medium and the equipment of multimedia resource |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108595660A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670080A (en) * | 2018-12-21 | 2019-04-23 | 深圳创维数字技术有限公司 | A kind of determination method, apparatus, equipment and the storage medium of video display label |
CN110188356A (en) * | 2019-05-30 | 2019-08-30 | 腾讯音乐娱乐科技(深圳)有限公司 | Information processing method and device |
CN110597977A (en) * | 2019-09-16 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN110598011A (en) * | 2019-09-27 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Data processing method, data processing device, computer equipment and readable storage medium |
CN110798719A (en) * | 2019-09-27 | 2020-02-14 | 深圳市轱辘汽车维修技术有限公司 | Charging method, device and server for video on demand |
CN111125387A (en) * | 2019-12-12 | 2020-05-08 | 科大讯飞股份有限公司 | Multimedia list generation and naming method and device, electronic equipment and storage medium |
CN111177569A (en) * | 2020-01-07 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Recommendation processing method, device and equipment based on artificial intelligence |
CN111191011A (en) * | 2020-04-17 | 2020-05-22 | 郑州工程技术学院 | Search matching method, device and equipment for text label and storage medium |
CN111325030A (en) * | 2020-03-31 | 2020-06-23 | 卓尔智联(武汉)研究院有限公司 | Text label construction method and device, computer equipment and storage medium |
CN111400516A (en) * | 2020-03-16 | 2020-07-10 | 北京奇艺世纪科技有限公司 | Label determination method, electronic device and storage medium |
CN111625716A (en) * | 2020-05-12 | 2020-09-04 | 聚好看科技股份有限公司 | Media asset recommendation method, server and display device |
CN111625620A (en) * | 2019-02-28 | 2020-09-04 | 北京京东尚科信息技术有限公司 | Information processing method and device |
CN111738009A (en) * | 2019-03-19 | 2020-10-02 | 百度在线网络技术(北京)有限公司 | Method and device for generating entity word label, computer equipment and readable storage medium |
CN111783468A (en) * | 2020-06-28 | 2020-10-16 | 百度在线网络技术(北京)有限公司 | Text processing method, device, equipment and medium |
CN111813944A (en) * | 2020-09-09 | 2020-10-23 | 北京神州泰岳智能数据技术有限公司 | Live comment analysis method and device, electronic equipment and storage medium |
CN112000817A (en) * | 2020-08-21 | 2020-11-27 | 北京达佳互联信息技术有限公司 | Multimedia resource processing method and device, electronic equipment and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103186662A (en) * | 2012-12-28 | 2013-07-03 | 中联竞成(北京)科技有限公司 | System and method for extracting dynamic public sentiment keywords |
CN103778207A (en) * | 2014-01-15 | 2014-05-07 | 杭州电子科技大学 | LDA-based news comment topic digging method |
CN104778209A (en) * | 2015-03-13 | 2015-07-15 | 国家计算机网络与信息安全管理中心 | Opinion mining method for ten-million-scale news comments |
CN104978332A (en) * | 2014-04-04 | 2015-10-14 | 腾讯科技(深圳)有限公司 | UGC label data generating method, UGC label data generating device, relevant method and relevant device |
CN105279208A (en) * | 2014-07-25 | 2016-01-27 | 北京龙源创新信息技术有限公司 | Data marking method and management system |
CN106294830A (en) * | 2016-08-17 | 2017-01-04 | 合智能科技(深圳)有限公司 | The recommendation method and device of multimedia resource |
CN106446135A (en) * | 2016-09-19 | 2017-02-22 | 北京搜狐新动力信息技术有限公司 | Method and device for generating multi-media data label |
CN106528894A (en) * | 2016-12-28 | 2017-03-22 | 北京小米移动软件有限公司 | Method and device for setting label information |
CN107122352A (en) * | 2017-05-18 | 2017-09-01 | 成都四方伟业软件股份有限公司 | A kind of method of the extracting keywords based on K MEANS, WORD2VEC |
CN107169049A (en) * | 2017-04-25 | 2017-09-15 | 腾讯科技(深圳)有限公司 | The label information generation method and device of application |
CN107220295A (en) * | 2017-04-27 | 2017-09-29 | 银江股份有限公司 | A kind of people's contradiction reconciles case retrieval and mediation strategy recommends method |
CN107515934A (en) * | 2017-08-29 | 2017-12-26 | 四川长虹电器股份有限公司 | A kind of film semanteme personalized labels optimization method based on big data |
CN107633007A (en) * | 2017-08-09 | 2018-01-26 | 五邑大学 | A kind of comment on commodity data label system and method based on stratification AP clusters |
-
2018
- 2018-04-28 CN CN201810400431.5A patent/CN108595660A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103186662A (en) * | 2012-12-28 | 2013-07-03 | 中联竞成(北京)科技有限公司 | System and method for extracting dynamic public sentiment keywords |
CN103778207A (en) * | 2014-01-15 | 2014-05-07 | 杭州电子科技大学 | LDA-based news comment topic digging method |
CN104978332A (en) * | 2014-04-04 | 2015-10-14 | 腾讯科技(深圳)有限公司 | UGC label data generating method, UGC label data generating device, relevant method and relevant device |
CN105279208A (en) * | 2014-07-25 | 2016-01-27 | 北京龙源创新信息技术有限公司 | Data marking method and management system |
CN104778209A (en) * | 2015-03-13 | 2015-07-15 | 国家计算机网络与信息安全管理中心 | Opinion mining method for ten-million-scale news comments |
CN106294830A (en) * | 2016-08-17 | 2017-01-04 | 合智能科技(深圳)有限公司 | The recommendation method and device of multimedia resource |
CN106446135A (en) * | 2016-09-19 | 2017-02-22 | 北京搜狐新动力信息技术有限公司 | Method and device for generating multi-media data label |
CN106528894A (en) * | 2016-12-28 | 2017-03-22 | 北京小米移动软件有限公司 | Method and device for setting label information |
CN107169049A (en) * | 2017-04-25 | 2017-09-15 | 腾讯科技(深圳)有限公司 | The label information generation method and device of application |
CN107220295A (en) * | 2017-04-27 | 2017-09-29 | 银江股份有限公司 | A kind of people's contradiction reconciles case retrieval and mediation strategy recommends method |
CN107122352A (en) * | 2017-05-18 | 2017-09-01 | 成都四方伟业软件股份有限公司 | A kind of method of the extracting keywords based on K MEANS, WORD2VEC |
CN107633007A (en) * | 2017-08-09 | 2018-01-26 | 五邑大学 | A kind of comment on commodity data label system and method based on stratification AP clusters |
CN107515934A (en) * | 2017-08-29 | 2017-12-26 | 四川长虹电器股份有限公司 | A kind of film semanteme personalized labels optimization method based on big data |
Non-Patent Citations (1)
Title |
---|
彭云,万红新: "基于语义约束主题模型的商品特征和情感词提取研究", 北京理工大学出版社, pages: 6 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109670080A (en) * | 2018-12-21 | 2019-04-23 | 深圳创维数字技术有限公司 | A kind of determination method, apparatus, equipment and the storage medium of video display label |
CN111625620A (en) * | 2019-02-28 | 2020-09-04 | 北京京东尚科信息技术有限公司 | Information processing method and device |
CN111738009B (en) * | 2019-03-19 | 2023-10-20 | 百度在线网络技术(北京)有限公司 | Entity word label generation method, entity word label generation device, computer equipment and readable storage medium |
CN111738009A (en) * | 2019-03-19 | 2020-10-02 | 百度在线网络技术(北京)有限公司 | Method and device for generating entity word label, computer equipment and readable storage medium |
CN110188356A (en) * | 2019-05-30 | 2019-08-30 | 腾讯音乐娱乐科技(深圳)有限公司 | Information processing method and device |
CN110188356B (en) * | 2019-05-30 | 2023-05-19 | 腾讯音乐娱乐科技(深圳)有限公司 | Information processing method and device |
CN110597977A (en) * | 2019-09-16 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN110597977B (en) * | 2019-09-16 | 2022-01-11 | 腾讯科技(深圳)有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN110598011A (en) * | 2019-09-27 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Data processing method, data processing device, computer equipment and readable storage medium |
CN110798719A (en) * | 2019-09-27 | 2020-02-14 | 深圳市轱辘汽车维修技术有限公司 | Charging method, device and server for video on demand |
CN111125387A (en) * | 2019-12-12 | 2020-05-08 | 科大讯飞股份有限公司 | Multimedia list generation and naming method and device, electronic equipment and storage medium |
CN111177569A (en) * | 2020-01-07 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Recommendation processing method, device and equipment based on artificial intelligence |
CN111400516A (en) * | 2020-03-16 | 2020-07-10 | 北京奇艺世纪科技有限公司 | Label determination method, electronic device and storage medium |
CN111400516B (en) * | 2020-03-16 | 2024-04-16 | 北京奇艺世纪科技有限公司 | Label determining method, electronic device and storage medium |
CN111325030A (en) * | 2020-03-31 | 2020-06-23 | 卓尔智联(武汉)研究院有限公司 | Text label construction method and device, computer equipment and storage medium |
CN111191011A (en) * | 2020-04-17 | 2020-05-22 | 郑州工程技术学院 | Search matching method, device and equipment for text label and storage medium |
CN111191011B (en) * | 2020-04-17 | 2024-02-23 | 郑州工程技术学院 | Text label searching and matching method, device, equipment and storage medium |
CN111625716A (en) * | 2020-05-12 | 2020-09-04 | 聚好看科技股份有限公司 | Media asset recommendation method, server and display device |
CN111625716B (en) * | 2020-05-12 | 2023-10-31 | 聚好看科技股份有限公司 | Media asset recommendation method, server and display device |
CN111783468A (en) * | 2020-06-28 | 2020-10-16 | 百度在线网络技术(北京)有限公司 | Text processing method, device, equipment and medium |
CN111783468B (en) * | 2020-06-28 | 2023-08-15 | 百度在线网络技术(北京)有限公司 | Text processing method, device, equipment and medium |
CN112000817A (en) * | 2020-08-21 | 2020-11-27 | 北京达佳互联信息技术有限公司 | Multimedia resource processing method and device, electronic equipment and storage medium |
CN112000817B (en) * | 2020-08-21 | 2023-12-29 | 北京达佳互联信息技术有限公司 | Multimedia resource processing method and device, electronic equipment and storage medium |
CN111813944A (en) * | 2020-09-09 | 2020-10-23 | 北京神州泰岳智能数据技术有限公司 | Live comment analysis method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108595660A (en) | Label information generation method, device, storage medium and the equipment of multimedia resource | |
Zhao et al. | An image-text consistency driven multimodal sentiment analysis approach for social media | |
CN107436922B (en) | Text label generation method and device | |
Rule et al. | Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790–2014 | |
US9201880B2 (en) | Processing a content item with regard to an event and a location | |
CN111507097B (en) | Title text processing method and device, electronic equipment and storage medium | |
JP2017508214A (en) | Provide search recommendations | |
CN101346718A (en) | Method for providing user of chosen content item | |
CN110134792B (en) | Text recognition method and device, electronic equipment and storage medium | |
CN103886081A (en) | Information sending method and system | |
CN103377258A (en) | Method and device for classification display of microblog information | |
CN109299277A (en) | The analysis of public opinion method, server and computer readable storage medium | |
US20190087414A1 (en) | Linguistic analysis of differences in portrayal of movie characters | |
US11797590B2 (en) | Generating structured data for rich experiences from unstructured data streams | |
CN105869058B (en) | A kind of method that multilayer latent variable model user portrait extracts | |
Kim et al. | Finding core topics: Topic extraction with clustering on tweet | |
CN105512300B (en) | information filtering method and system | |
CN111813993A (en) | Video content expanding method and device, terminal equipment and storage medium | |
Qu et al. | A novel approach based on multi-view content analysis and semi-supervised enrichment for movie recommendation | |
Penta et al. | What is this cluster about? Explaining textual clusters by extracting relevant keywords | |
CN110019556A (en) | A kind of topic news acquisition methods, device and its equipment | |
JP2016081265A (en) | Picture selection device, picture selection method, picture selection program, characteristic-amount generation device, characteristic-amount generation method and characteristic-amount generation program | |
CN115168568B (en) | Data content identification method, device and storage medium | |
Paz-Trillo et al. | An information retrieval application using ontologies | |
Akasaki et al. | Early discovery of emerging entities in microblogs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |