CN110147499A - Label method, recommended method and recording medium - Google Patents
Label method, recommended method and recording medium Download PDFInfo
- Publication number
- CN110147499A CN110147499A CN201910423246.2A CN201910423246A CN110147499A CN 110147499 A CN110147499 A CN 110147499A CN 201910423246 A CN201910423246 A CN 201910423246A CN 110147499 A CN110147499 A CN 110147499A
- Authority
- CN
- China
- Prior art keywords
- content
- label
- vector
- classification
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This disclosure relates to the method for labelling, recommended method and recording mediums.According to one embodiment of the present disclosure, which includes: to select multiple keywords from the textual portions of object content;The corresponding crucial term vector of each keyword is determined using language model;By the content vector for determining object content to each corresponding crucial term vector weighted sum;Similarity of the object content about each label is determined based on the label vector of each label in content vector sum tally set;And the content type label of object content is determined based on each similarity;Wherein, each label in tally set is the classification deictic words of the candidate categories of indicative of targeted content;And each label vector is the vector determined by language model based on respective classes deictic words.The scheme of present disclosure at least contributes to realize one of following effect: being accurately content assignment label, is in real time content assignment label, recommends matched content to user.
Description
Technical field
Present disclosure generally relates to information processings, more particularly, to the method for labelling, recommended method and storage
There is the computer readable recording medium for the program for executing preceding method.
Background technique
In recent years, flourishing with internet, issues content and more and more general to user's recommendation on network
Time.How to user, effectively recommendation is important research direction.
Summary of the invention
It will be given for the brief overview of present disclosure, hereinafter in order to provide certain sides about present disclosure
The basic comprehension in face.It should be appreciated that this general introduction is not the exhaustive general introduction about present disclosure.It is not intended to determine
The key or pith of present disclosure, nor intended limitation scope of the present disclosure.Its purpose is only with simplification
Form provide certain concepts, taking this as a prelude to a more detailed description discussed later.
The content issued on network is related to various classifications.For example, sport and body-building, household services, flower present, wedding photo
Deng.The classification of the interested content of user is also multiplicity.For example, user may be only strong to movement in certain period of time
Body is interested.Alternatively, user is interested in certain fields, interest is lacked to certain fields.It is understood that, it is contemplated that aforementioned thing
It is real, in order to the effective recommendation of user and improve the clicking rate that user is directed to recommendation, by content label with incite somebody to action
Content is classified, so as to recommend the content with respective labels to the interested user of certain types of content.Cause
This, is accurately and effectively desired for content assignment label.
According to the one side of present disclosure, provides one kind and label method, comprising: from the textual portions of object content
Select multiple keywords;The corresponding crucial term vector of each keyword is determined using language model;By to each corresponding keyword to
Amount weighted sum determines the content vector of object content;It is determined based on the label vector of each label in content vector sum tally set
Similarity of the object content about each label;And the content type label of object content is determined based on each similarity;Wherein, it marks
Each label that label are concentrated is the classification deictic words of the candidate categories of indicative of targeted content;And each label vector is by language model
The vector determined based on respective classes deictic words.
According to the one side of present disclosure, a kind of recommended method is provided, comprising: based in each of multiple contents
Content type tally set and the content of interest classification collection of user, which determine, is used for alternating content collection recommended to the user;Out of candidate
Hold collection and selects recommendation recommended to the user;And generate the instruction that the expression of recommendation is shown to user;Wherein, it indicates
For being selected for user;And multiple contents include at least one of object content, the content type tally set of object content
Content type label is to label method by aforementioned and be determined.
According to another aspect of the present disclosure, the computer readable recording medium for being stored with program is provided, wherein should
Program makes computer execute the method above-mentioned that labels.
According to the another aspect of present disclosure, a kind of computer readable recording medium for being stored with program is provided,
In, which makes computer execute aforementioned recommended method.
The method that labels, recommended method and the recording medium of present disclosure at least contribute to realize following effect it
One: be efficiently content assignment label, be accurately content assignment label, in real time be content assignment label, to user recommend
The clicking rate of the content, raising content matched and the cold start-up for easily realizing new content.
Detailed description of the invention
Referring to the embodiment for illustrating present disclosure below attached drawing, this will be helpful to be more readily understood that present disclosure
Above and other purposes, features and advantages.Attached drawing is intended merely to show the principle of present disclosure.It in the accompanying drawings need not be according to
Ratio draws out size and the relative position of unit.In the accompanying drawings:
Fig. 1 shows the exemplary process diagram of the method that labels according to one embodiment of the present disclosure;
Fig. 2 shows the exemplary flows of the method for the multiple keywords of selection according to one embodiment of the present disclosure
Figure;
Fig. 3 shows the exemplary process diagram of recommended method according to one embodiment of the present disclosure;
Fig. 4 shows the exemplary block diagram of the device that labels according to one embodiment of the present disclosure;And
Fig. 5 shows the exemplary block diagram of recommendation apparatus according to one embodiment of the present disclosure.
Specific embodiment
It is described hereinafter in connection with exemplary embodiment of the attached drawing to present disclosure.It rises for clarity and conciseness
See, does not describe all features of practical embodiments in the description.It should be understood, however, that any this practical real developing
Much decisions specific to embodiment can be made during applying example, to realize the objectives of developer, and this
It is a little to determine to change with the difference of embodiment.
Here, and also it should be noted is that, in order to avoid having obscured present disclosure because of unnecessary details, attached
Illustrate only in figure with the apparatus structure closely related according to the scheme of present disclosure, and be omitted and present disclosure close
It is little other details.
It should be understood that present disclosure is not compromised by the following description referring to attached drawing and is only limited to described implementation
Form.Herein, in feasible situation, embodiment be can be combined with each other, the feature replacement between different embodiments or borrow
With, omit one or more features in one embodiment.
According to one aspect of the present disclosure, this disclosure relates to determine the method that labels of the label of content.Under
Face refers to the method that labels of Fig. 1 exemplary description present disclosure.
Fig. 1 shows the exemplary process diagram of the method 100 that labels according to one embodiment of the present disclosure.It can
Understand, there may be multiple contents that distribute label, the method for labelling 100 can be used and come one by one or in parallel for these
Content labels (label of content is also referred to as content type label).Here, select one in multiple contents as in target
CO is held illustratively to illustrate the method for labelling 100.
At step 101, keyword is selected, wherein the quantity of keyword is multiple, and keyword comes from object content
Textual portions.KW can be usedjIndicate each keyword, j is index, and j takes 1 to maximum value jmaxIn one, and jmax
Indicate the quantity of the keyword for object content CO selection.Facilitated in accurate, comprehensive characterization target using multiple keywords
Hold field or classification involved in CO, facilitates accurate, comprehensive for object content distribution content type label.Object content CO
Including textual portions.Object content CO can be multimedia content, advertisement, article, merchandise news or image.The quantity of keyword
It can be 2,3,4,5,6,7,8,9,10 or more.For example, can select to close according to the length of the textual portions of object content CO
Suitable jmax.Further, for example, can be selected suitably according to the minimum text size of the textual portions of object content CO
jmax.Textual portions may include the text for including in image or audio in object content.Text in image can for example lead to
Optical character identification is crossed to obtain.Text in audio can for example be obtained by speech recognition.Textual portions may include
Title division and body part.
At step 102, corresponding crucial term vector VK is determinedj, wherein corresponding key term vector VKjIt is keyword KWj's
Crucial term vector, and corresponding crucial term vector VK is determined using language model MLj.Language model ML can be by the word of input
It is mapped as a vector.
At step 103, content vector VC is determined, wherein content vector VC is the vector for characterizing object content CO, and
It and is by each corresponding crucial term vector VKjWeighted sum determines content vector VC.
At step 104, similarity SI is determinedi, wherein similarity here is object content CO about tally set { Li}
In each label LiSimilarity SIi, similarity SIiIt is based on the label vector VL based on content vector VC and label LiiCome true
It is fixed.I is index, can be taken 1 to imax, imaxIndicate tally set { LiIn label number, that is, the candidate categories of object content CO
Quantity.Tally set { LiIn each label LiFor the classification deictic words WI of the candidate categories of indicative of targeted content COi.Each label to
Measure VLiIt is that respective classes deictic words WI is based on by language model MLiDetermining vector.Similarity SIiCan for content vector VC with
Label vector VLiFolder cosine of an angle, that is, the dot product of two vectors and they mould product ratio.It should be understood that at this
In disclosure, unless stated otherwise, otherwise with mark { eiIndicate to include element e1、……、emaxSet, i.e. i=
1 ..., max is not only to refer to comprising an element eiSet, i.e., mark { eiIndicate the collection including one or more elements
It closes.
At step 105, the content type label LC of object content CO is determinedk, wherein it is based on each similarity SIiIn determination
Hold class label LCk, k is index, and k can take 1 to kmax, kmaxIndicate object content CO about tally set { LiContent class
The quantity of distinguishing label.For example, working as similarity SIiMore than or equal to predetermined similarity threshold Th, then by similarity SIiIt is corresponding
Label LiObject content CO is distributed to, the content type label as object content CO.It optionally, can be by imaxIt is a similar
Degree arranges in descending order, k before selectingmaxLabel corresponding to a similarity distributes to object content CO, as in object content CO
Hold class label.Similarity SIi being capable of classification degree of correlation of the indicative of targeted content about respective classes.It is then possible to record
Each similarity SIi, in recommendation, to select the categorical match interested with user, and classification degree of correlation is higher interior
Hold and is used as recommendation., it is understood that object content CO may other useful modes of labelling distribute other mark
Label, other labels and kmaxIt is a to may be constructed object content CO's together with the label that method determines that labels of present disclosure
Content type tally set { LCm, m is index, wherein without repeat element in content type tally set;The content of object content CO
Content type label that class label collection can also be determined by the method for labelling of present disclosure completely is constituted, that is, { LCm}
={ LCk}。
In order to enhance the real-time of content tab, new content can be obtained as target online by kafka queue
Content so as to tagged in time to emerging content, and is recommended based on the label stamped to user, in time to be easy
Ground solves the problems, such as the cold start-up of new content.In a variation, the method that labels 100 further include: by kafka queue come
Line obtains new content as object content CO.
In one embodiment, the method that labels 100 further include: obtain text portion by carrying out processing to object content
Point.
In this disclosure, various ways can be used and realize the step 101 to label in method 100.Fig. 2 shows
A kind of illustrative methods for realizing step 101.
Fig. 2 shows the exemplary of the method 210 of the multiple keywords of selection according to one embodiment of the present disclosure
Flow chart.
At step 211, the textual portions of object content are segmented to obtain multiple candidate keywords KWj’, j ' is
Index, j ' take 1 to maximum value j 'maxIn one, and j 'maxIndicate candidate keywords quantity.If there is j 'max<jmax's
Situation can execute particular routine to identify object content, for example, by object content be identified as predetermined content class label and/or
It is subsequent by artificial treatment.Further, if there is one or more stop words, step 211 further includes removal stop words, i.e., multiple
Candidate keywords KWj’In do not include any stop words.
At step 212, word frequency is determined, wherein word frequency refers to each candidate keywords KWj’The word frequency about textual portions
TFj’.To obtain j 'maxA word frequency value.
At step 213, inverse document frequency is determined, wherein inverse document frequency is each candidate keywords KWj’About predetermined
The inverse document frequency IDF of corpus CPj’.To obtain j 'maxA inverse document word frequency value.Predetermined corpus CP has sufficient amount
Document, these documents can be for the document that screens to the accurate labeling of content.For example, if the text of object content
Part is simplified form of Chinese Character, then the document that predetermined corpus CP includes can be the document of simplified form of Chinese Character coding.Preferably, in advance
Determine each document coded format having the same of corpus CP.
At step 214, multiple keywords are selected, wherein be based on each candidate keywords KWj’Word frequency TFj’With inverse document
Frequency IDFj’Product TFj’*IDFj’Select the candidate keywords of predetermined quantity as multiple keyword KWj.For example, will product TFj’*
IDFj’Long-pending sequence S is obtained by arrangement from big to small, and selects the preceding j in long-pending sequence SmaxCandidate keywords corresponding to a product are made
For subsequent keyword to be used.
The method for selecting multiple keywords is not limited to method 210.For example, word can also be based only upon as a kind of variation
Frequency TFj’Select the candidate keywords of predetermined quantity as multiple keyword KWj.Alternatively, by inverse document frequency IDFj’Scaling is certain
The inverse document frequency r*IDF that is adjusted of ratio rj’, it is based on each candidate keywords KWj’Word frequency TFj’With the inverse document of adjustment
Frequency r*IDFj’Product TFj’*r*IDFj’Select the candidate keywords of predetermined quantity as multiple keyword KWj, wherein r can be with
It is related to the type of candidate keywords.
The various Natural Language Processing Models for characterizing the term vector of the word that word-based can generate can be used as this public affairs
Open the language model ML in content.For example, being handled for predetermined corpus using the natural-sounding after the training of word2vec tool
Model.As an example, the size of word_embedding can be set as when using word2vec tool train language model
64, window size is set as 10, and minimum word frequency is set as 5, and operation iteration wheel number is set as 10.As previously mentioned, language model ML being capable of base
In the keyword KWj of input, keyword KW is determinedjCorresponding crucial term vector VKj, additionally it is possible to the label L based on inputi
(that is, classification deictic words WIi) determine label LiLabel vector VLi。
In view of different classifying content systems, each classification (label) may be more sensitive to certain keywords.Therefore, may be used
To consider the classification of each keyword when determining content vector, to improve the accuracy to label.For example, implementing at one
In example, by each corresponding crucial term vector VKjWeighted sum determines that the content vector VC of object content comprises determining that each key
The classification C of wordj;And it is based on classification CjDetermine each corresponding crucial term vector VKjRespective weights wj.It is true that equation (1) can be used
Determine content vector VC.
For example, the classification group { C based on keywordjThe weighting levels of keyword are divided into three grades, weight takes respectively
First value v1、v2And v3, wherein v1>v2>v3。
Further, for example, classification group { CjCan be made of following: commodity, name, place name, number, the time and its
He.As keyword KWjClassification be " commodity " when, respective weights wjFor the first value, that is, wj=v1;As keyword KWjClassification be
When " other ", respective weights wjFor second value, that is, wj=v1;As keyword KWjClassification be " name ", " place name ", " number "
Or when " time ", respective weights wjFor third value, that is, wj=v3.The classification of keyword can for example pass through search keyword category
Database determines.
In one embodiment, the first, second and third value v1、v2、v3It can be respectively set to 2.0,1.0 and 0.5.
In one embodiment, tally set { LiIn each label LiSelected from second level category of employment.Table 1 is industry class at different levels
Other example, wherein merely exemplary to show part category of employment.It can be seen that the range that level-one category of employment covers is excessively wide,
It is meticulous that three-level category of employment divides.Therefore, level-one category of employment and three-level category of employment are for efficiently and conveniently distinguishing user
Interested content may be unfavorable.As it can be seen that selection second level category of employment is come classified to object content (that is, labelling)
It is preferred.Skilled artisans appreciate that: in order to accurately distinguish the interested content type of user, can choose thinner
Classification, alternatively, can be designed that the specific classification classification specific to the purpose for specific purpose.
The example of the category of employment at different levels of table 1
Present disclosure also provides a kind of recommended method.The recommended method of present disclosure is described referring to Fig. 3.
Fig. 3 shows the exemplary process diagram of recommended method 300 according to one embodiment of the present disclosure.
At step 302, alternating content collection is determined.Specifically, based on content type mark in each of multiple contents
Label collection and user USaContent of interest classification collection { Luk’Determine for user USaThe alternating content collection of recommendation, a are mark
The index of user, alternating content collection include for user USaThe alternating content of recommendation.Specifically, to every in multiple contents
A carry out matching judgment, that is, the content type tally set { LC based on content CmAnd user USaContent of interest classification collection
{Luk’Determining whether content C is to want alternating content recommended to the user, wherein multiple contents include object content, object content
Content type tally set at least one content type label be labelling method and be determined by present disclosure,
K ' is index, is taken 1 to k 'max, user USaThere is k 'maxA content of interest classification.It is easily understood that the mark of object content CO
The quantity of label can be 1,2 or more.
As label LCmWith user USaWhen matching, that is, as a label LC in the tally set of content CmIndicated content
Classification includes the content of interest classification collection { Lu in userk’In when, content C is determined as alternating content.Preferably, Ke Yiji
Record is directed to the user, the maximum similarity in similarity corresponding to the matching label of selected content C, as the content needle
To the content relevancy scores S of the userx.For example, to user USa, the selected content C as alternating content has 2 labels
It is included in user USaContent of interest classification concentrate, similarity corresponding to 2 labels is respectively 0.6 and 0.8, then for using
Family USa, record the content relevancy scores S of content CxIt is 0.8.In addition, each content of interest classification Lu of userk’It can be right
Answer a score S interestedk’, score interested and content relevancy scores can be summed or whether long-pending be used as selects in candidate
Hold the foundation for recommendation, wherein score S interestedk’For indicating user to content of interest classification Luk’Content
Interest level.For example, the content relevancy scores S of content CxIt is 0.8, and it is corresponded to for the similar of " sport and body-building " label
Degree, and user USaContent of interest classification collection includes " sport and body-building ", and the score S interested of the categoryk’=0.6, then it can be with
Based on SxAnd Sk’(for example, being based on Sx*Sk’Or Sx+Sk’Size) selection recommendation.
When label and user mismatch, that is, in indicated by each label in the content type tally set of content C
Content C not when the interested content type of user is concentrated, is not charged to alternating content collection by appearance classification.
At step 303, selection recommendation is concentrated from alternating content, wherein alternating content collection includes pushing away to user
At least one alternating content recommended.The quantity of recommendation can be 1,2 or more.Selection rule can according to need really
It is fixed, select newest content to recommend to user for example, concentrating from alternating content;It is concentrated in random selection recommendation from alternating content
Hold;Select the recommended least content of number as recommendation from alternating content concentration;Selection is concentrated to have from alternating content
The content of greatest content relevance score is as recommendation;It alternatively, selecting multiple Considerations, and is this multiple Consideration
Priority is set, to select recommendation according to priority, multiple Considerations may include: alternating content when including
Between, alternating content recommend number, score interested and content relevancy scores and/product, the clicking rate and time of alternating content
Select content relevance score corresponding to content.
At step 304, the instruction of the expression of display recommendation is generated, wherein herein, shown in recommendation to user
The expression of appearance, and indicate for being selected for user.The thumbnail of the expression of recommendation e.g. recommendation.For example, if
User clicks the thumbnail in terminal used in it, then recommendation displayed on the terminals.
When recommended method 300 is performed by server end, recommended method 300 can also include: that reception is directed to
The request of recommendation.The request can be to be issued by client used by a user.Request may include user's letter of user
Breath, to determine the interested content type collection of user based on user information.Recommended method 300 can be executed by server end.
When recommended method 300 is performed by server end, recommended method 300 can also include: to user institute
The client used sends the instruction that the expression of recommendation is shown to user.To, after client receives the instruction, client
It holds to user and shows the expression of recommendation for selection by the user.
Preferably, multiple contents can be executed with the method that labels of present disclosure, to determine multiple contents respectively
Content type label.Further, all the elements class label for all alternating contents that alternating content is concentrated is by the disclosure
Content labels method to determine.
Optionally, kafka queue can be used and obtain what be new as object content.
In the following, with a specific example, the method that labels of exemplary description present disclosure.
Tally set { LiThere are 10 labels, that is, 10 classification deictic words are respectively as follows: sport and body-building, household services, fresh flower
Gift, wedding photo, medical department, shaping medical treatment, women and children hospital, hotel reservation, people place and airline.
The new content obtained online from message queue it is as follows:
Content title are as follows: " what experience is does it that one family cooks together in journey people place? Airbnb tells your overtemperature fragrant ";
Content text are as follows: " and household goes to different places, enjoys scenery, and eats characteristic, moreover it is possible to different places family together
Cooking, it is just very warm to think about it, the whole world Airbnb appoints you to select ".
The new content is selected as object content CO to select keyword.
Object content CO is segmented, and based on TF*IDF select maximum preceding 9 candidate keywords of TF*IDF as
Keyword.This 9 keywords be respectively as follows: Airbnb, people in journey, Min Su, place, warmth, cook, landscape, characteristic, the whole world.
The classification of determining each keyword such as table 2.
The classification and weight of 2 keyword of table
By using the word2vec language model ML after training, obtain 9 64 dimension keywords of this 9 keywords to
Amount, and summation is weighted to 9 crucial term vectors according to the classification of keyword, obtain the object content vector VC of 64 dimensions such as
Under: [- 0.14115450160929885, -0.24425549793780627, -0.30044687888376137, -
0.05763183483727175,0.15561235974744236,0.010583868380962057,
0.013591076247417138,-0.06848938692135165,-0.02732886928430746,-
0.034710140155875834,0.03750085532692744,0.046927746483094245,
0.01581604176379293,0.16177491753452636,-0.237404869703128,-
0.06449884472860959,-0.10758427322849924,-0.07626917726376475,
0.006169830778924875,0.11237461946713251,-0.17831536577928542,
0.0819056485434265,-0.12827313774691287,0.0020619466900970483,-
0.016215964088673797,-0.14129457714696125,-0.0905078577328344,
0.01599747926662087,-0.13264012880481604,-0.05488182080912134,
0.15804649074807617,-0.15541510850124396,0.0344278284956769,
0.154474302607422,-0.27187228106139893,-0.04848808005948619,
0.07496522631347169,-0.09970821588166821,-0.21192385737972327,-
0.10144228362039891,-0.03206756311276709,0.08181443401576366,-
0.022456738055021172,0.07263042977339229,-0.05359920849368456,-
0.012039215785374473,0.05122092769789547,-0.011626157154404461,-
0.009008863938227746,-0.22059785870647422,0.004545139343459065,
0.056822009826923224,0.10528190567950048,-0.16259849732059495,
0.1074273601363384,0.16346525357742392,0.0016458175006195614,-
0.10910192190291954,0.22706467011122444,0.23295105654493278,
0.1703301017317971,0.017352765286693526,-0.14180094380902827,-
0.18815346922446488]。
Based on content vector VC and tally set { LiIn the label vectors of 10 labels determine object content VC about each
The similarity of label, wherein the label vector of 10 labels is 10 determined by language model ML based on respective classes deictic words
A 64 dimensional vector.0.32484788901811973 the value of 10 similarities is as follows: 0.10955877033307335,
0.18443480388501027、0.32851210400292546、-0.1871856053931387、
0.057516092361998145、-0.10459164508515512、0.5691629355855871、
0.8078326422773067、0.3179727610239934。
Assuming that predetermined similarity threshold Th is 0.5.Then determine that " people place " and " hotel reservation " is the content of object content CO
Class label, that is, the content type tally set of object content CO is { people place, hotel reservation }.
Present disclosure also provides a kind of computer readable recording medium for being stored with program, wherein the program makes to calculate
The method that labels of machine execution present disclosure.
Present disclosure also provides a kind of computer readable recording medium for being stored with program, wherein the program makes to calculate
The recommended method of machine execution present disclosure.
Present disclosure also provides one kind and labels device.Fig. 4 is shown according to one embodiment of the present disclosure
Label the exemplary block diagram of device 400.The device 400 that labels includes: keyword determination unit 401, term vector determination unit
402, content vector determination unit 403, similarity determining unit 404 and tag determination unit 405.Keyword determination unit 401
It is configured to: selecting multiple keywords from the textual portions of object content.Term vector determination unit 402 is configured to: using language
Speech model determines the corresponding crucial term vector of each keyword, and the label vector of each label is determined using language model, wherein
Tally set is made of label, and each label is the classification deictic words of the candidate categories of indicative of targeted content.Content vector determination unit
403 are configured to: the content vector by determining object content to each corresponding crucial term vector weighted sum.Similarity determines single
Member 404 is configured to: determining object content about each label based on the label vector of each label in content vector sum tally set
Similarity.Tag determination unit 405 is configured to: the content type label of object content is determined based on each similarity.Label
Determination unit 405 can export the content type label of object content, wherein the quantity of the content type label of object content can
To be 1,2 or more.Label device 400 and present disclosure the method that labels with corresponding relationship.Feasible
In the case of, the more specific details for the device 400 that labels can be identical as the correspondence details in the method that labels of present disclosure.
Preferably, all the elements class label that the device 400 that labels generates object content CO can be used.
Present disclosure also provides a kind of recommendation apparatus.Fig. 5 shows pushing away according to one embodiment of the present disclosure
Recommend the exemplary block diagram of device 500.Recommendation apparatus 500 includes: alternating content collection determination unit 501, selecting unit 502 and instruction
Generation unit 503.Alternating content determination unit 501 is configured to: based on content type tally set in each of multiple contents
It is determined with the content of interest classification collection of user and is used for alternating content collection recommended to the user, wherein multiple contents include target
Content, at least one content type label in the content type tally set of object content is labelling by present disclosure
Method is determined.Selecting unit 502 is configured to: selecting recommendation recommended to the user from alternating content collection.Instruction life
It is configured at unit 503: generating the instruction for showing the expression of recommendation to user, wherein the instruction for user for selecting
It selects.The recommendation label method of recommendation apparatus 500 and present disclosure has corresponding relationship.In feasible situation, recommendation apparatus 500
More specific details can be identical as the correspondence details in the recommended method of present disclosure.
According to the description above to the specific embodiment of present disclosure, it will be appreciated by those skilled in the art that the disclosure
Content approach is at least able to achieve one of following effect: the use of the method for labelling being content automatic labeling, does not need manually to mark
Label, the error for saving the time, saving mark cost, efficiently labelling, the subjectivity manually marked is avoided to introduce;It labels
Method uses neural network, can efficiently and accurately be content assignment label;It does not need to mark the content of predetermined corpus
Note, therefore when label system changes, it does not need to be labeled the content of predetermined corpus, not need to language mould yet
Type is trained again, and therefore, the method that labels has stronger robustness;The method of labelling can be beaten in real time for content online
Label, content is by real-time tag;It can recommend personalized matching content to user, to improve the clicking rate of content;
New content can be recommended in time user by online real-time tag, so as to be easy the cold start-up of realization new content.
It should be understood that term " includes " refers to the presence of feature, one integral piece, step or component when using herein, but do not arrange
Except the presence of other one or more features, one integral piece, step or component or additional.
It should be understood that describing and/or showing for one embodiment without departing from the spirit of present disclosure
Feature can be used in one or more other embodiments in a manner of same or similar, with the feature in other embodiments
It is combined, or the feature in substitution other embodiments.
In addition, the method for present disclosure be not limited to specifications described in time sequencing execute, if from original
It says feasible in reason, can also according to other time sequencings, concurrently or independently execute.Therefore, it is described in this specification
Method execution sequence not to scope of the present disclosure be construed as limiting.
Above in conjunction with specific embodiments to the present disclosure has been descriptions, it will be appreciated by those skilled in the art that
These descriptions are all exemplary, and are not the limitation to the protection scope of present disclosure.Those skilled in the art can root
Various variants and modifications are made to present disclosure according to the spirit and principle of present disclosure, these variants and modifications are also in this public affairs
In the range of opening content.
Claims (10)
- The method 1. one kind labels, comprising:Multiple keywords are selected from the textual portions of object content;The corresponding crucial term vector of each keyword is determined using language model;By the content vector for determining the object content to each corresponding crucial term vector weighted sum;Determine the object content about each label based on the label vector of each label in the content vector sum tally set Similarity;AndThe content type label of the object content is determined based on each similarity;Wherein, each label in the tally set is the classification deictic words for indicating the candidate categories of the object content;AndEach label vector is the vector determined by the language model based on respective classes deictic words.
- 2. the method according to claim 1 that labels, wherein each label in the tally set is selected from second level industry class Not.
- 3. the method according to claim 1 that labels, wherein select multiple keyword packets from the textual portions of object content It includes:The textual portions are segmented to obtain multiple candidate keywords;Determine the word frequency about the textual portions of each candidate keywords;Determine the inverse document frequency about predetermined corpus of each candidate keywords;AndThe product of word frequency and inverse document frequency based on each candidate keywords selects the candidate keywords of predetermined quantity as described more A keyword.
- 4. the method according to claim 1 that labels, wherein by determining institute to each corresponding crucial term vector weighted sum The content vector for stating object content includes:Determine the classification of each keyword;AndThe respective weights of each corresponding crucial term vector are determined based on the classification of each keyword.
- 5. the method according to claim 4 that labels, wherein the classification is selected from by the following classification group constituted: quotient Product, name, place name, number, the time and other;When the classification is commodity, the respective weights are the first value;When the classification is other, the respective weights are second value;When the classification is name, place name, number or time, the respective weights are third value;First value is greater than the second value;AndThe second value is greater than the third value.
- 6. the method according to claim 1 that labels, wherein the language model is used for predetermined corpus Natural-sounding after the training of word2vec tool handles model.
- 7. a kind of recommended method, comprising:It is determined based on content type tally set in each of multiple contents and the content of interest classification collection of user for institute State the alternating content collection of user's recommendation;The recommendation recommended to the user is selected from alternating content collection;AndGenerate the instruction that the expression of the recommendation is shown to the user;Wherein, the instruction for the user for selecting;AndThe multiple content includes object content, at least one content type in the content type tally set of the object content Label is determined by labelling method described in any one of claim 1 to 6.
- 8. recommended method according to claim 7, further includes: kafka queue is used to obtain what be new as the mesh Mark content.
- 9. a kind of computer readable recording medium for being stored with program, wherein described program make computer perform claim require 1 to Label method described in any one of 6.
- 10. a kind of computer readable recording medium for being stored with program, wherein described program makes computer perform claim require 7 Or recommended method described in 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910423246.2A CN110147499B (en) | 2019-05-21 | 2019-05-21 | Labeling method, recommendation method and recording medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910423246.2A CN110147499B (en) | 2019-05-21 | 2019-05-21 | Labeling method, recommendation method and recording medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110147499A true CN110147499A (en) | 2019-08-20 |
CN110147499B CN110147499B (en) | 2021-09-14 |
Family
ID=67592502
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910423246.2A Active CN110147499B (en) | 2019-05-21 | 2019-05-21 | Labeling method, recommendation method and recording medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110147499B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516030A (en) * | 2019-08-26 | 2019-11-29 | 北京百度网讯科技有限公司 | It is intended to determination method, apparatus, equipment and the computer readable storage medium of word |
CN111104526A (en) * | 2019-11-21 | 2020-05-05 | 新华智云科技有限公司 | Financial label extraction method and system based on keyword semantics |
CN111309919A (en) * | 2020-03-23 | 2020-06-19 | 智者四海(北京)技术有限公司 | System and training method of text classification model |
CN111858915A (en) * | 2020-08-07 | 2020-10-30 | 成都理工大学 | Information recommendation method and system based on label similarity |
CN113313344A (en) * | 2021-04-13 | 2021-08-27 | 武汉烽火众智数字技术有限责任公司 | Label system construction method and system fusing multiple modes |
CN113723513A (en) * | 2021-08-31 | 2021-11-30 | 平安国际智慧城市科技股份有限公司 | Multi-label image classification method and device and related equipment |
CN113961725A (en) * | 2021-10-25 | 2022-01-21 | 北京明略软件系统有限公司 | Automatic label labeling method, system, equipment and storage medium |
CN114827745A (en) * | 2022-04-08 | 2022-07-29 | 海信集团控股股份有限公司 | Video subtitle generation method and electronic equipment |
WO2024027125A1 (en) * | 2022-08-03 | 2024-02-08 | 百度在线网络技术(北京)有限公司 | Object recommendation method and apparatus, electronic device, and storage medium |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021838A (en) * | 2007-03-02 | 2007-08-22 | 华为技术有限公司 | Text handling method and system |
CN104965889A (en) * | 2015-06-17 | 2015-10-07 | 腾讯科技(深圳)有限公司 | Content recommendation method and apparatus |
CN106095845A (en) * | 2016-06-02 | 2016-11-09 | 腾讯科技(深圳)有限公司 | File classification method and device |
CN108319630A (en) * | 2017-07-05 | 2018-07-24 | 腾讯科技(深圳)有限公司 | Information processing method, device, storage medium and computer equipment |
CN108694647A (en) * | 2018-05-11 | 2018-10-23 | 北京三快在线科技有限公司 | A kind of method for digging and device of trade company's rationale for the recommendation, electronic equipment |
CN108829822A (en) * | 2018-06-12 | 2018-11-16 | 腾讯科技(深圳)有限公司 | The recommended method and device of media content, storage medium, electronic device |
CN108984658A (en) * | 2018-06-28 | 2018-12-11 | 阿里巴巴集团控股有限公司 | A kind of intelligent answer data processing method and device |
CN109033087A (en) * | 2018-08-07 | 2018-12-18 | 中证征信(深圳)有限公司 | Calculate method, De-weight method, clustering method and the device of text semantic distance |
CN109063133A (en) * | 2018-08-02 | 2018-12-21 | 武汉斗鱼网络科技有限公司 | A kind of adding method, system, equipment and the medium of direct broadcasting room label |
CN109165380A (en) * | 2018-07-26 | 2019-01-08 | 咪咕数字传媒有限公司 | A kind of neural network model training method and device, text label determine method and device |
CN109242604A (en) * | 2018-08-15 | 2019-01-18 | 深圳壹账通智能科技有限公司 | Service recommendation method, electronic equipment and computer readable storage medium |
CN109241277A (en) * | 2018-07-18 | 2019-01-18 | 北京航天云路有限公司 | The method and system of text vector weighting based on news keyword |
CN109325229A (en) * | 2018-09-19 | 2019-02-12 | 中译语通科技股份有限公司 | A method of text similarity is calculated using semantic information |
CN109740152A (en) * | 2018-12-25 | 2019-05-10 | 腾讯科技(深圳)有限公司 | Determination method, apparatus, storage medium and the computer equipment of text classification |
-
2019
- 2019-05-21 CN CN201910423246.2A patent/CN110147499B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021838A (en) * | 2007-03-02 | 2007-08-22 | 华为技术有限公司 | Text handling method and system |
CN104965889A (en) * | 2015-06-17 | 2015-10-07 | 腾讯科技(深圳)有限公司 | Content recommendation method and apparatus |
CN106095845A (en) * | 2016-06-02 | 2016-11-09 | 腾讯科技(深圳)有限公司 | File classification method and device |
CN108319630A (en) * | 2017-07-05 | 2018-07-24 | 腾讯科技(深圳)有限公司 | Information processing method, device, storage medium and computer equipment |
CN108694647A (en) * | 2018-05-11 | 2018-10-23 | 北京三快在线科技有限公司 | A kind of method for digging and device of trade company's rationale for the recommendation, electronic equipment |
CN108829822A (en) * | 2018-06-12 | 2018-11-16 | 腾讯科技(深圳)有限公司 | The recommended method and device of media content, storage medium, electronic device |
CN108984658A (en) * | 2018-06-28 | 2018-12-11 | 阿里巴巴集团控股有限公司 | A kind of intelligent answer data processing method and device |
CN109241277A (en) * | 2018-07-18 | 2019-01-18 | 北京航天云路有限公司 | The method and system of text vector weighting based on news keyword |
CN109165380A (en) * | 2018-07-26 | 2019-01-08 | 咪咕数字传媒有限公司 | A kind of neural network model training method and device, text label determine method and device |
CN109063133A (en) * | 2018-08-02 | 2018-12-21 | 武汉斗鱼网络科技有限公司 | A kind of adding method, system, equipment and the medium of direct broadcasting room label |
CN109033087A (en) * | 2018-08-07 | 2018-12-18 | 中证征信(深圳)有限公司 | Calculate method, De-weight method, clustering method and the device of text semantic distance |
CN109242604A (en) * | 2018-08-15 | 2019-01-18 | 深圳壹账通智能科技有限公司 | Service recommendation method, electronic equipment and computer readable storage medium |
CN109325229A (en) * | 2018-09-19 | 2019-02-12 | 中译语通科技股份有限公司 | A method of text similarity is calculated using semantic information |
CN109740152A (en) * | 2018-12-25 | 2019-05-10 | 腾讯科技(深圳)有限公司 | Determination method, apparatus, storage medium and the computer equipment of text classification |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516030A (en) * | 2019-08-26 | 2019-11-29 | 北京百度网讯科技有限公司 | It is intended to determination method, apparatus, equipment and the computer readable storage medium of word |
CN111104526A (en) * | 2019-11-21 | 2020-05-05 | 新华智云科技有限公司 | Financial label extraction method and system based on keyword semantics |
CN111309919B (en) * | 2020-03-23 | 2024-04-16 | 智者四海(北京)技术有限公司 | Text classification model system and training method thereof |
CN111309919A (en) * | 2020-03-23 | 2020-06-19 | 智者四海(北京)技术有限公司 | System and training method of text classification model |
CN111858915A (en) * | 2020-08-07 | 2020-10-30 | 成都理工大学 | Information recommendation method and system based on label similarity |
CN113313344A (en) * | 2021-04-13 | 2021-08-27 | 武汉烽火众智数字技术有限责任公司 | Label system construction method and system fusing multiple modes |
CN113313344B (en) * | 2021-04-13 | 2023-03-31 | 武汉烽火众智数字技术有限责任公司 | Label system construction method and system fusing multiple modes |
CN113723513A (en) * | 2021-08-31 | 2021-11-30 | 平安国际智慧城市科技股份有限公司 | Multi-label image classification method and device and related equipment |
CN113723513B (en) * | 2021-08-31 | 2024-05-03 | 平安国际智慧城市科技股份有限公司 | Multi-label image classification method and device and related equipment |
CN113961725A (en) * | 2021-10-25 | 2022-01-21 | 北京明略软件系统有限公司 | Automatic label labeling method, system, equipment and storage medium |
CN114827745B (en) * | 2022-04-08 | 2023-11-14 | 海信集团控股股份有限公司 | Video subtitle generation method and electronic equipment |
CN114827745A (en) * | 2022-04-08 | 2022-07-29 | 海信集团控股股份有限公司 | Video subtitle generation method and electronic equipment |
WO2024027125A1 (en) * | 2022-08-03 | 2024-02-08 | 百度在线网络技术(北京)有限公司 | Object recommendation method and apparatus, electronic device, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110147499B (en) | 2021-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110147499A (en) | Label method, recommended method and recording medium | |
CN108763362B (en) | Local model weighted fusion Top-N movie recommendation method based on random anchor point pair selection | |
Moghaddam et al. | On the design of LDA models for aspect-based opinion mining | |
US20160117295A1 (en) | Method and apparatus for forming a structured document from unstructured information | |
CN110442781A (en) | It is a kind of based on generate confrontation network to grade ranked items recommended method | |
CN106021364A (en) | Method and device for establishing picture search correlation prediction model, and picture search method and device | |
Zhang et al. | Multimodal marketing intent analysis for effective targeted advertising | |
Gomathi et al. | Restaurant recommendation system for user preference and services based on rating and amenities | |
CN111309936A (en) | Method for constructing portrait of movie user | |
Hanni et al. | Summarization of customer reviews for a product on a website using natural language processing | |
Balog et al. | On interpretation and measurement of soft attributes for recommendation | |
Wu et al. | Bridging music and image via cross-modal ranking analysis | |
Liu et al. | Using collaborative filtering algorithms combined with Doc2Vec for movie recommendation | |
CN110990670A (en) | Growth incentive book recommendation method and system | |
Angadi et al. | Multimodal sentiment analysis using reliefF feature selection and random forest classifier | |
CN113934835A (en) | Retrieval type reply dialogue method and system combining keywords and semantic understanding representation | |
JP5599073B2 (en) | Kansei analysis system and program | |
CN112989053A (en) | Periodical recommendation method and device | |
Li et al. | Meta hierarchical reinforced learning to rank for recommendation: a comprehensive study in moocs | |
Fuxman et al. | Improving classification accuracy using automatically extracted training data | |
CN110472056A (en) | A kind of comment data classification method and system | |
CN110968675A (en) | Recommendation method and system based on multi-field semantic fusion | |
Hu et al. | Reading broadly to open your mind improving open relation extraction with search documents under self-supervisions | |
Fan et al. | Mining collective knowledge: inferring functional labels from online review for business | |
Xia et al. | Semantic similarity metric learning for sketch-based 3d shape retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |