CN110147499A

CN110147499A - Label method, recommended method and recording medium

Info

Publication number: CN110147499A
Application number: CN201910423246.2A
Authority: CN
Inventors: 张炜
Original assignee: Wise Four Seas (beijing) Technology Co Ltd
Current assignee: Wise Four Seas (beijing) Technology Co Ltd
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2019-08-20
Anticipated expiration: 2039-05-21
Also published as: CN110147499B

Abstract

This disclosure relates to the method for labelling, recommended method and recording mediums.According to one embodiment of the present disclosure, which includes: to select multiple keywords from the textual portions of object content；The corresponding crucial term vector of each keyword is determined using language model；By the content vector for determining object content to each corresponding crucial term vector weighted sum；Similarity of the object content about each label is determined based on the label vector of each label in content vector sum tally set；And the content type label of object content is determined based on each similarity；Wherein, each label in tally set is the classification deictic words of the candidate categories of indicative of targeted content；And each label vector is the vector determined by language model based on respective classes deictic words.The scheme of present disclosure at least contributes to realize one of following effect: being accurately content assignment label, is in real time content assignment label, recommends matched content to user.

Description

Label method, recommended method and recording medium

Technical field

Present disclosure generally relates to information processings, more particularly, to the method for labelling, recommended method and storage There is the computer readable recording medium for the program for executing preceding method.

Background technique

In recent years, flourishing with internet, issues content and more and more general to user's recommendation on network Time.How to user, effectively recommendation is important research direction.

Summary of the invention

It will be given for the brief overview of present disclosure, hereinafter in order to provide certain sides about present disclosure The basic comprehension in face.It should be appreciated that this general introduction is not the exhaustive general introduction about present disclosure.It is not intended to determine The key or pith of present disclosure, nor intended limitation scope of the present disclosure.Its purpose is only with simplification Form provide certain concepts, taking this as a prelude to a more detailed description discussed later.

The content issued on network is related to various classifications.For example, sport and body-building, household services, flower present, wedding photo Deng.The classification of the interested content of user is also multiplicity.For example, user may be only strong to movement in certain period of time Body is interested.Alternatively, user is interested in certain fields, interest is lacked to certain fields.It is understood that, it is contemplated that aforementioned thing It is real, in order to the effective recommendation of user and improve the clicking rate that user is directed to recommendation, by content label with incite somebody to action Content is classified, so as to recommend the content with respective labels to the interested user of certain types of content.Cause This, is accurately and effectively desired for content assignment label.

According to the one side of present disclosure, provides one kind and label method, comprising: from the textual portions of object content Select multiple keywords；The corresponding crucial term vector of each keyword is determined using language model；By to each corresponding keyword to Amount weighted sum determines the content vector of object content；It is determined based on the label vector of each label in content vector sum tally set Similarity of the object content about each label；And the content type label of object content is determined based on each similarity；Wherein, it marks Each label that label are concentrated is the classification deictic words of the candidate categories of indicative of targeted content；And each label vector is by language model The vector determined based on respective classes deictic words.

According to the one side of present disclosure, a kind of recommended method is provided, comprising: based in each of multiple contents Content type tally set and the content of interest classification collection of user, which determine, is used for alternating content collection recommended to the user；Out of candidate Hold collection and selects recommendation recommended to the user；And generate the instruction that the expression of recommendation is shown to user；Wherein, it indicates For being selected for user；And multiple contents include at least one of object content, the content type tally set of object content Content type label is to label method by aforementioned and be determined.

According to another aspect of the present disclosure, the computer readable recording medium for being stored with program is provided, wherein should Program makes computer execute the method above-mentioned that labels.

According to the another aspect of present disclosure, a kind of computer readable recording medium for being stored with program is provided, In, which makes computer execute aforementioned recommended method.

The method that labels, recommended method and the recording medium of present disclosure at least contribute to realize following effect it One: be efficiently content assignment label, be accurately content assignment label, in real time be content assignment label, to user recommend The clicking rate of the content, raising content matched and the cold start-up for easily realizing new content.

Detailed description of the invention

Referring to the embodiment for illustrating present disclosure below attached drawing, this will be helpful to be more readily understood that present disclosure Above and other purposes, features and advantages.Attached drawing is intended merely to show the principle of present disclosure.It in the accompanying drawings need not be according to Ratio draws out size and the relative position of unit.In the accompanying drawings:

Fig. 1 shows the exemplary process diagram of the method that labels according to one embodiment of the present disclosure；

Fig. 2 shows the exemplary flows of the method for the multiple keywords of selection according to one embodiment of the present disclosure Figure；

Fig. 3 shows the exemplary process diagram of recommended method according to one embodiment of the present disclosure；

Fig. 4 shows the exemplary block diagram of the device that labels according to one embodiment of the present disclosure；And

Fig. 5 shows the exemplary block diagram of recommendation apparatus according to one embodiment of the present disclosure.

Specific embodiment

It is described hereinafter in connection with exemplary embodiment of the attached drawing to present disclosure.It rises for clarity and conciseness See, does not describe all features of practical embodiments in the description.It should be understood, however, that any this practical real developing Much decisions specific to embodiment can be made during applying example, to realize the objectives of developer, and this It is a little to determine to change with the difference of embodiment.

Here, and also it should be noted is that, in order to avoid having obscured present disclosure because of unnecessary details, attached Illustrate only in figure with the apparatus structure closely related according to the scheme of present disclosure, and be omitted and present disclosure close It is little other details.

It should be understood that present disclosure is not compromised by the following description referring to attached drawing and is only limited to described implementation Form.Herein, in feasible situation, embodiment be can be combined with each other, the feature replacement between different embodiments or borrow With, omit one or more features in one embodiment.

According to one aspect of the present disclosure, this disclosure relates to determine the method that labels of the label of content.Under Face refers to the method that labels of Fig. 1 exemplary description present disclosure.

Fig. 1 shows the exemplary process diagram of the method 100 that labels according to one embodiment of the present disclosure.It can Understand, there may be multiple contents that distribute label, the method for labelling 100 can be used and come one by one or in parallel for these Content labels (label of content is also referred to as content type label).Here, select one in multiple contents as in target CO is held illustratively to illustrate the method for labelling 100.

At step 101, keyword is selected, wherein the quantity of keyword is multiple, and keyword comes from object content Textual portions.KW can be used_jIndicate each keyword, j is index, and j takes 1 to maximum value j_maxIn one, and j_max Indicate the quantity of the keyword for object content CO selection.Facilitated in accurate, comprehensive characterization target using multiple keywords Hold field or classification involved in CO, facilitates accurate, comprehensive for object content distribution content type label.Object content CO Including textual portions.Object content CO can be multimedia content, advertisement, article, merchandise news or image.The quantity of keyword It can be 2,3,4,5,6,7,8,9,10 or more.For example, can select to close according to the length of the textual portions of object content CO Suitable j_max.Further, for example, can be selected suitably according to the minimum text size of the textual portions of object content CO j_max.Textual portions may include the text for including in image or audio in object content.Text in image can for example lead to Optical character identification is crossed to obtain.Text in audio can for example be obtained by speech recognition.Textual portions may include Title division and body part.

At step 102, corresponding crucial term vector VK is determined_j, wherein corresponding key term vector VK_jIt is keyword KW_j's Crucial term vector, and corresponding crucial term vector VK is determined using language model ML_j.Language model ML can be by the word of input It is mapped as a vector.

At step 103, content vector VC is determined, wherein content vector VC is the vector for characterizing object content CO, and It and is by each corresponding crucial term vector VK_jWeighted sum determines content vector VC.

At step 104, similarity SI is determined_i, wherein similarity here is object content CO about tally set { L_i} In each label L_iSimilarity SI_i, similarity SI_iIt is based on the label vector VL based on content vector VC and label Li_iCome true It is fixed.I is index, can be taken 1 to i_max, i_maxIndicate tally set { L_iIn label number, that is, the candidate categories of object content CO Quantity.Tally set { L_iIn each label L_iFor the classification deictic words WI of the candidate categories of indicative of targeted content CO_i.Each label to Measure VL_iIt is that respective classes deictic words WI is based on by language model ML_iDetermining vector.Similarity SI_iCan for content vector VC with Label vector VL_iFolder cosine of an angle, that is, the dot product of two vectors and they mould product ratio.It should be understood that at this In disclosure, unless stated otherwise, otherwise with mark { e_iIndicate to include element e₁、……、e_maxSet, i.e. i= 1 ..., max is not only to refer to comprising an element e_iSet, i.e., mark { e_iIndicate the collection including one or more elements It closes.

At step 105, the content type label LC of object content CO is determined_k, wherein it is based on each similarity SI_iIn determination Hold class label LC_k, k is index, and k can take 1 to k_max, k_maxIndicate object content CO about tally set { L_iContent class The quantity of distinguishing label.For example, working as similarity SI_iMore than or equal to predetermined similarity threshold Th, then by similarity SI_iIt is corresponding Label L_iObject content CO is distributed to, the content type label as object content CO.It optionally, can be by i_maxIt is a similar Degree arranges in descending order, k before selecting_maxLabel corresponding to a similarity distributes to object content CO, as in object content CO Hold class label.Similarity SIi being capable of classification degree of correlation of the indicative of targeted content about respective classes.It is then possible to record Each similarity SI_i, in recommendation, to select the categorical match interested with user, and classification degree of correlation is higher interior Hold and is used as recommendation., it is understood that object content CO may other useful modes of labelling distribute other mark Label, other labels and k_maxIt is a to may be constructed object content CO's together with the label that method determines that labels of present disclosure Content type tally set { LC_m, m is index, wherein without repeat element in content type tally set；The content of object content CO Content type label that class label collection can also be determined by the method for labelling of present disclosure completely is constituted, that is, { LC_m} ={ LC_k}。

In order to enhance the real-time of content tab, new content can be obtained as target online by kafka queue Content so as to tagged in time to emerging content, and is recommended based on the label stamped to user, in time to be easy Ground solves the problems, such as the cold start-up of new content.In a variation, the method that labels 100 further include: by kafka queue come Line obtains new content as object content CO.

In one embodiment, the method that labels 100 further include: obtain text portion by carrying out processing to object content Point.

In this disclosure, various ways can be used and realize the step 101 to label in method 100.Fig. 2 shows A kind of illustrative methods for realizing step 101.

Fig. 2 shows the exemplary of the method 210 of the multiple keywords of selection according to one embodiment of the present disclosure Flow chart.

At step 211, the textual portions of object content are segmented to obtain multiple candidate keywords KW_j’, j ' is Index, j ' take 1 to maximum value j '_maxIn one, and j '_maxIndicate candidate keywords quantity.If there is j '_max<j_max's Situation can execute particular routine to identify object content, for example, by object content be identified as predetermined content class label and/or It is subsequent by artificial treatment.Further, if there is one or more stop words, step 211 further includes removal stop words, i.e., multiple Candidate keywords KW_j’In do not include any stop words.

At step 212, word frequency is determined, wherein word frequency refers to each candidate keywords KW_j’The word frequency about textual portions TF_j’.To obtain j '_maxA word frequency value.

At step 213, inverse document frequency is determined, wherein inverse document frequency is each candidate keywords KW_j’About predetermined The inverse document frequency IDF of corpus CP_j’.To obtain j '_maxA inverse document word frequency value.Predetermined corpus CP has sufficient amount Document, these documents can be for the document that screens to the accurate labeling of content.For example, if the text of object content Part is simplified form of Chinese Character, then the document that predetermined corpus CP includes can be the document of simplified form of Chinese Character coding.Preferably, in advance Determine each document coded format having the same of corpus CP.

At step 214, multiple keywords are selected, wherein be based on each candidate keywords KW_j’Word frequency TF_j’With inverse document Frequency IDF_j’Product TF_j’*IDF_j’Select the candidate keywords of predetermined quantity as multiple keyword KW_j.For example, will product TF_j’* IDF_j’Long-pending sequence S is obtained by arrangement from big to small, and selects the preceding j in long-pending sequence S_maxCandidate keywords corresponding to a product are made For subsequent keyword to be used.

The method for selecting multiple keywords is not limited to method 210.For example, word can also be based only upon as a kind of variation Frequency TF_j’Select the candidate keywords of predetermined quantity as multiple keyword KW_j.Alternatively, by inverse document frequency IDF_j’Scaling is certain The inverse document frequency r*IDF that is adjusted of ratio r_j’, it is based on each candidate keywords KW_j’Word frequency TF_j’With the inverse document of adjustment Frequency r*IDF_j’Product TF_j’*r*IDF_j’Select the candidate keywords of predetermined quantity as multiple keyword KW_j, wherein r can be with It is related to the type of candidate keywords.

The various Natural Language Processing Models for characterizing the term vector of the word that word-based can generate can be used as this public affairs Open the language model ML in content.For example, being handled for predetermined corpus using the natural-sounding after the training of word2vec tool Model.As an example, the size of word_embedding can be set as when using word2vec tool train language model 64, window size is set as 10, and minimum word frequency is set as 5, and operation iteration wheel number is set as 10.As previously mentioned, language model ML being capable of base In the keyword KWj of input, keyword KW is determined_jCorresponding crucial term vector VK_j, additionally it is possible to the label L based on input_i (that is, classification deictic words WI_i) determine label L_iLabel vector VL_i。

In view of different classifying content systems, each classification (label) may be more sensitive to certain keywords.Therefore, may be used To consider the classification of each keyword when determining content vector, to improve the accuracy to label.For example, implementing at one In example, by each corresponding crucial term vector VK_jWeighted sum determines that the content vector VC of object content comprises determining that each key The classification C of word_j；And it is based on classification C_jDetermine each corresponding crucial term vector VK_jRespective weights w_j.It is true that equation (1) can be used Determine content vector VC.

For example, the classification group { C based on keyword_jThe weighting levels of keyword are divided into three grades, weight takes respectively First value v₁、v₂And v₃, wherein v₁>v₂>v₃。

Further, for example, classification group { C_jCan be made of following: commodity, name, place name, number, the time and its He.As keyword KW_jClassification be " commodity " when, respective weights w_jFor the first value, that is, w_j=v₁；As keyword KW_jClassification be When " other ", respective weights w_jFor second value, that is, w_j=v₁；As keyword KW_jClassification be " name ", " place name ", " number " Or when " time ", respective weights w_jFor third value, that is, w_j=v₃.The classification of keyword can for example pass through search keyword category Database determines.

In one embodiment, the first, second and third value v₁、v₂、v₃It can be respectively set to 2.0,1.0 and 0.5.

In one embodiment, tally set { L_iIn each label L_iSelected from second level category of employment.Table 1 is industry class at different levels Other example, wherein merely exemplary to show part category of employment.It can be seen that the range that level-one category of employment covers is excessively wide, It is meticulous that three-level category of employment divides.Therefore, level-one category of employment and three-level category of employment are for efficiently and conveniently distinguishing user Interested content may be unfavorable.As it can be seen that selection second level category of employment is come classified to object content (that is, labelling) It is preferred.Skilled artisans appreciate that: in order to accurately distinguish the interested content type of user, can choose thinner Classification, alternatively, can be designed that the specific classification classification specific to the purpose for specific purpose.

The example of the category of employment at different levels of table 1

Present disclosure also provides a kind of recommended method.The recommended method of present disclosure is described referring to Fig. 3.

Fig. 3 shows the exemplary process diagram of recommended method 300 according to one embodiment of the present disclosure.

At step 302, alternating content collection is determined.Specifically, based on content type mark in each of multiple contents Label collection and user US_aContent of interest classification collection { Lu_k’Determine for user US_aThe alternating content collection of recommendation, a are mark The index of user, alternating content collection include for user US_aThe alternating content of recommendation.Specifically, to every in multiple contents A carry out matching judgment, that is, the content type tally set { LC based on content C_mAnd user US_aContent of interest classification collection {Lu_k’Determining whether content C is to want alternating content recommended to the user, wherein multiple contents include object content, object content Content type tally set at least one content type label be labelling method and be determined by present disclosure, K ' is index, is taken 1 to k '_max, user US_aThere is k '_maxA content of interest classification.It is easily understood that the mark of object content CO The quantity of label can be 1,2 or more.

As label LC_mWith user US_aWhen matching, that is, as a label LC in the tally set of content C_mIndicated content Classification includes the content of interest classification collection { Lu in user_k’In when, content C is determined as alternating content.Preferably, Ke Yiji Record is directed to the user, the maximum similarity in similarity corresponding to the matching label of selected content C, as the content needle To the content relevancy scores S of the user_x.For example, to user US_a, the selected content C as alternating content has 2 labels It is included in user US_aContent of interest classification concentrate, similarity corresponding to 2 labels is respectively 0.6 and 0.8, then for using Family US_a, record the content relevancy scores S of content C_xIt is 0.8.In addition, each content of interest classification Lu of user_k’It can be right Answer a score S interested_k’, score interested and content relevancy scores can be summed or whether long-pending be used as selects in candidate Hold the foundation for recommendation, wherein score S interested_k’For indicating user to content of interest classification Lu_k’Content Interest level.For example, the content relevancy scores S of content C_xIt is 0.8, and it is corresponded to for the similar of " sport and body-building " label Degree, and user US_aContent of interest classification collection includes " sport and body-building ", and the score S interested of the category_k’=0.6, then it can be with Based on S_xAnd S_k’(for example, being based on S_x*S_k’Or S_x+S_k’Size) selection recommendation.

When label and user mismatch, that is, in indicated by each label in the content type tally set of content C Content C not when the interested content type of user is concentrated, is not charged to alternating content collection by appearance classification.

At step 303, selection recommendation is concentrated from alternating content, wherein alternating content collection includes pushing away to user At least one alternating content recommended.The quantity of recommendation can be 1,2 or more.Selection rule can according to need really It is fixed, select newest content to recommend to user for example, concentrating from alternating content；It is concentrated in random selection recommendation from alternating content Hold；Select the recommended least content of number as recommendation from alternating content concentration；Selection is concentrated to have from alternating content The content of greatest content relevance score is as recommendation；It alternatively, selecting multiple Considerations, and is this multiple Consideration Priority is set, to select recommendation according to priority, multiple Considerations may include: alternating content when including Between, alternating content recommend number, score interested and content relevancy scores and/product, the clicking rate and time of alternating content Select content relevance score corresponding to content.

At step 304, the instruction of the expression of display recommendation is generated, wherein herein, shown in recommendation to user The expression of appearance, and indicate for being selected for user.The thumbnail of the expression of recommendation e.g. recommendation.For example, if User clicks the thumbnail in terminal used in it, then recommendation displayed on the terminals.

When recommended method 300 is performed by server end, recommended method 300 can also include: that reception is directed to The request of recommendation.The request can be to be issued by client used by a user.Request may include user's letter of user Breath, to determine the interested content type collection of user based on user information.Recommended method 300 can be executed by server end.

When recommended method 300 is performed by server end, recommended method 300 can also include: to user institute The client used sends the instruction that the expression of recommendation is shown to user.To, after client receives the instruction, client It holds to user and shows the expression of recommendation for selection by the user.

Preferably, multiple contents can be executed with the method that labels of present disclosure, to determine multiple contents respectively Content type label.Further, all the elements class label for all alternating contents that alternating content is concentrated is by the disclosure Content labels method to determine.

Optionally, kafka queue can be used and obtain what be new as object content.

In the following, with a specific example, the method that labels of exemplary description present disclosure.

Tally set { L_iThere are 10 labels, that is, 10 classification deictic words are respectively as follows: sport and body-building, household services, fresh flower Gift, wedding photo, medical department, shaping medical treatment, women and children hospital, hotel reservation, people place and airline.

The new content obtained online from message queue it is as follows:

Content title are as follows: " what experience is does it that one family cooks together in journey people place? Airbnb tells your overtemperature fragrant "；

Content text are as follows: " and household goes to different places, enjoys scenery, and eats characteristic, moreover it is possible to different places family together Cooking, it is just very warm to think about it, the whole world Airbnb appoints you to select ".

The new content is selected as object content CO to select keyword.

Object content CO is segmented, and based on TF*IDF select maximum preceding 9 candidate keywords of TF*IDF as Keyword.This 9 keywords be respectively as follows: Airbnb, people in journey, Min Su, place, warmth, cook, landscape, characteristic, the whole world.

The classification of determining each keyword such as table 2.

The classification and weight of 2 keyword of table

By using the word2vec language model ML after training, obtain 9 64 dimension keywords of this 9 keywords to Amount, and summation is weighted to 9 crucial term vectors according to the classification of keyword, obtain the object content vector VC of 64 dimensions such as Under: [- 0.14115450160929885, -0.24425549793780627, -0.30044687888376137, - 0.05763183483727175,0.15561235974744236,0.010583868380962057, 0.013591076247417138,-0.06848938692135165,-0.02732886928430746,- 0.034710140155875834,0.03750085532692744,0.046927746483094245, 0.01581604176379293,0.16177491753452636,-0.237404869703128,- 0.06449884472860959,-0.10758427322849924,-0.07626917726376475, 0.006169830778924875,0.11237461946713251,-0.17831536577928542, 0.0819056485434265,-0.12827313774691287,0.0020619466900970483,- 0.016215964088673797,-0.14129457714696125,-0.0905078577328344, 0.01599747926662087,-0.13264012880481604,-0.05488182080912134, 0.15804649074807617,-0.15541510850124396,0.0344278284956769, 0.154474302607422,-0.27187228106139893,-0.04848808005948619, 0.07496522631347169,-0.09970821588166821,-0.21192385737972327,- 0.10144228362039891,-0.03206756311276709,0.08181443401576366,- 0.022456738055021172,0.07263042977339229,-0.05359920849368456,- 0.012039215785374473,0.05122092769789547,-0.011626157154404461,- 0.009008863938227746,-0.22059785870647422,0.004545139343459065, 0.056822009826923224,0.10528190567950048,-0.16259849732059495, 0.1074273601363384,0.16346525357742392,0.0016458175006195614,- 0.10910192190291954,0.22706467011122444,0.23295105654493278, 0.1703301017317971,0.017352765286693526,-0.14180094380902827,- 0.18815346922446488]。

Based on content vector VC and tally set { L_iIn the label vectors of 10 labels determine object content VC about each The similarity of label, wherein the label vector of 10 labels is 10 determined by language model ML based on respective classes deictic words A 64 dimensional vector.0.32484788901811973 the value of 10 similarities is as follows: 0.10955877033307335, 0.18443480388501027、0.32851210400292546、-0.1871856053931387、 0.057516092361998145、-0.10459164508515512、0.5691629355855871、 0.8078326422773067、0.3179727610239934。

Assuming that predetermined similarity threshold Th is 0.5.Then determine that " people place " and " hotel reservation " is the content of object content CO Class label, that is, the content type tally set of object content CO is { people place, hotel reservation }.

Present disclosure also provides a kind of computer readable recording medium for being stored with program, wherein the program makes to calculate The method that labels of machine execution present disclosure.

Present disclosure also provides a kind of computer readable recording medium for being stored with program, wherein the program makes to calculate The recommended method of machine execution present disclosure.

Present disclosure also provides one kind and labels device.Fig. 4 is shown according to one embodiment of the present disclosure Label the exemplary block diagram of device 400.The device 400 that labels includes: keyword determination unit 401, term vector determination unit 402, content vector determination unit 403, similarity determining unit 404 and tag determination unit 405.Keyword determination unit 401 It is configured to: selecting multiple keywords from the textual portions of object content.Term vector determination unit 402 is configured to: using language Speech model determines the corresponding crucial term vector of each keyword, and the label vector of each label is determined using language model, wherein Tally set is made of label, and each label is the classification deictic words of the candidate categories of indicative of targeted content.Content vector determination unit 403 are configured to: the content vector by determining object content to each corresponding crucial term vector weighted sum.Similarity determines single Member 404 is configured to: determining object content about each label based on the label vector of each label in content vector sum tally set Similarity.Tag determination unit 405 is configured to: the content type label of object content is determined based on each similarity.Label Determination unit 405 can export the content type label of object content, wherein the quantity of the content type label of object content can To be 1,2 or more.Label device 400 and present disclosure the method that labels with corresponding relationship.Feasible In the case of, the more specific details for the device 400 that labels can be identical as the correspondence details in the method that labels of present disclosure. Preferably, all the elements class label that the device 400 that labels generates object content CO can be used.

Present disclosure also provides a kind of recommendation apparatus.Fig. 5 shows pushing away according to one embodiment of the present disclosure Recommend the exemplary block diagram of device 500.Recommendation apparatus 500 includes: alternating content collection determination unit 501, selecting unit 502 and instruction Generation unit 503.Alternating content determination unit 501 is configured to: based on content type tally set in each of multiple contents It is determined with the content of interest classification collection of user and is used for alternating content collection recommended to the user, wherein multiple contents include target Content, at least one content type label in the content type tally set of object content is labelling by present disclosure Method is determined.Selecting unit 502 is configured to: selecting recommendation recommended to the user from alternating content collection.Instruction life It is configured at unit 503: generating the instruction for showing the expression of recommendation to user, wherein the instruction for user for selecting It selects.The recommendation label method of recommendation apparatus 500 and present disclosure has corresponding relationship.In feasible situation, recommendation apparatus 500 More specific details can be identical as the correspondence details in the recommended method of present disclosure.

According to the description above to the specific embodiment of present disclosure, it will be appreciated by those skilled in the art that the disclosure Content approach is at least able to achieve one of following effect: the use of the method for labelling being content automatic labeling, does not need manually to mark Label, the error for saving the time, saving mark cost, efficiently labelling, the subjectivity manually marked is avoided to introduce；It labels Method uses neural network, can efficiently and accurately be content assignment label；It does not need to mark the content of predetermined corpus Note, therefore when label system changes, it does not need to be labeled the content of predetermined corpus, not need to language mould yet Type is trained again, and therefore, the method that labels has stronger robustness；The method of labelling can be beaten in real time for content online Label, content is by real-time tag；It can recommend personalized matching content to user, to improve the clicking rate of content； New content can be recommended in time user by online real-time tag, so as to be easy the cold start-up of realization new content.

It should be understood that term " includes " refers to the presence of feature, one integral piece, step or component when using herein, but do not arrange Except the presence of other one or more features, one integral piece, step or component or additional.

It should be understood that describing and/or showing for one embodiment without departing from the spirit of present disclosure Feature can be used in one or more other embodiments in a manner of same or similar, with the feature in other embodiments It is combined, or the feature in substitution other embodiments.

In addition, the method for present disclosure be not limited to specifications described in time sequencing execute, if from original It says feasible in reason, can also according to other time sequencings, concurrently or independently execute.Therefore, it is described in this specification Method execution sequence not to scope of the present disclosure be construed as limiting.

Above in conjunction with specific embodiments to the present disclosure has been descriptions, it will be appreciated by those skilled in the art that These descriptions are all exemplary, and are not the limitation to the protection scope of present disclosure.Those skilled in the art can root Various variants and modifications are made to present disclosure according to the spirit and principle of present disclosure, these variants and modifications are also in this public affairs In the range of opening content.

Claims

The method 1. one kind labels, comprising:

Multiple keywords are selected from the textual portions of object content；

The corresponding crucial term vector of each keyword is determined using language model；

By the content vector for determining the object content to each corresponding crucial term vector weighted sum；

Determine the object content about each label based on the label vector of each label in the content vector sum tally set Similarity；And

The content type label of the object content is determined based on each similarity；

Wherein, each label in the tally set is the classification deictic words for indicating the candidate categories of the object content；And

Each label vector is the vector determined by the language model based on respective classes deictic words.
2. the method according to claim 1 that labels, wherein each label in the tally set is selected from second level industry class Not.
3. the method according to claim 1 that labels, wherein select multiple keyword packets from the textual portions of object content It includes:

The textual portions are segmented to obtain multiple candidate keywords；

Determine the word frequency about the textual portions of each candidate keywords；

Determine the inverse document frequency about predetermined corpus of each candidate keywords；And

The product of word frequency and inverse document frequency based on each candidate keywords selects the candidate keywords of predetermined quantity as described more A keyword.
4. the method according to claim 1 that labels, wherein by determining institute to each corresponding crucial term vector weighted sum The content vector for stating object content includes:

Determine the classification of each keyword；And

The respective weights of each corresponding crucial term vector are determined based on the classification of each keyword.
5. the method according to claim 4 that labels, wherein the classification is selected from by the following classification group constituted: quotient Product, name, place name, number, the time and other；

When the classification is commodity, the respective weights are the first value；

When the classification is other, the respective weights are second value；

When the classification is name, place name, number or time, the respective weights are third value；

First value is greater than the second value；And

The second value is greater than the third value.
6. the method according to claim 1 that labels, wherein the language model is used for predetermined corpus Natural-sounding after the training of word2vec tool handles model.
7. a kind of recommended method, comprising:

It is determined based on content type tally set in each of multiple contents and the content of interest classification collection of user for institute State the alternating content collection of user's recommendation；

The recommendation recommended to the user is selected from alternating content collection；And

Generate the instruction that the expression of the recommendation is shown to the user；

Wherein, the instruction for the user for selecting；And

The multiple content includes object content, at least one content type in the content type tally set of the object content Label is determined by labelling method described in any one of claim 1 to 6.
8. recommended method according to claim 7, further includes: kafka queue is used to obtain what be new as the mesh Mark content.
9. a kind of computer readable recording medium for being stored with program, wherein described program make computer perform claim require 1 to Label method described in any one of 6.
10. a kind of computer readable recording medium for being stored with program, wherein described program makes computer perform claim require 7 Or recommended method described in 8.