CN110147499A - Label method, recommended method and recording medium - Google Patents

Label method, recommended method and recording medium Download PDF

Info

Publication number
CN110147499A
CN110147499A CN201910423246.2A CN201910423246A CN110147499A CN 110147499 A CN110147499 A CN 110147499A CN 201910423246 A CN201910423246 A CN 201910423246A CN 110147499 A CN110147499 A CN 110147499A
Authority
CN
China
Prior art keywords
content
label
vector
classification
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910423246.2A
Other languages
Chinese (zh)
Other versions
CN110147499B (en
Inventor
张炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wise Four Seas (beijing) Technology Co Ltd
Original Assignee
Wise Four Seas (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wise Four Seas (beijing) Technology Co Ltd filed Critical Wise Four Seas (beijing) Technology Co Ltd
Priority to CN201910423246.2A priority Critical patent/CN110147499B/en
Publication of CN110147499A publication Critical patent/CN110147499A/en
Application granted granted Critical
Publication of CN110147499B publication Critical patent/CN110147499B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This disclosure relates to the method for labelling, recommended method and recording mediums.According to one embodiment of the present disclosure, which includes: to select multiple keywords from the textual portions of object content;The corresponding crucial term vector of each keyword is determined using language model;By the content vector for determining object content to each corresponding crucial term vector weighted sum;Similarity of the object content about each label is determined based on the label vector of each label in content vector sum tally set;And the content type label of object content is determined based on each similarity;Wherein, each label in tally set is the classification deictic words of the candidate categories of indicative of targeted content;And each label vector is the vector determined by language model based on respective classes deictic words.The scheme of present disclosure at least contributes to realize one of following effect: being accurately content assignment label, is in real time content assignment label, recommends matched content to user.

Description

Label method, recommended method and recording medium
Technical field
Present disclosure generally relates to information processings, more particularly, to the method for labelling, recommended method and storage There is the computer readable recording medium for the program for executing preceding method.
Background technique
In recent years, flourishing with internet, issues content and more and more general to user's recommendation on network Time.How to user, effectively recommendation is important research direction.
Summary of the invention
It will be given for the brief overview of present disclosure, hereinafter in order to provide certain sides about present disclosure The basic comprehension in face.It should be appreciated that this general introduction is not the exhaustive general introduction about present disclosure.It is not intended to determine The key or pith of present disclosure, nor intended limitation scope of the present disclosure.Its purpose is only with simplification Form provide certain concepts, taking this as a prelude to a more detailed description discussed later.
The content issued on network is related to various classifications.For example, sport and body-building, household services, flower present, wedding photo Deng.The classification of the interested content of user is also multiplicity.For example, user may be only strong to movement in certain period of time Body is interested.Alternatively, user is interested in certain fields, interest is lacked to certain fields.It is understood that, it is contemplated that aforementioned thing It is real, in order to the effective recommendation of user and improve the clicking rate that user is directed to recommendation, by content label with incite somebody to action Content is classified, so as to recommend the content with respective labels to the interested user of certain types of content.Cause This, is accurately and effectively desired for content assignment label.
According to the one side of present disclosure, provides one kind and label method, comprising: from the textual portions of object content Select multiple keywords;The corresponding crucial term vector of each keyword is determined using language model;By to each corresponding keyword to Amount weighted sum determines the content vector of object content;It is determined based on the label vector of each label in content vector sum tally set Similarity of the object content about each label;And the content type label of object content is determined based on each similarity;Wherein, it marks Each label that label are concentrated is the classification deictic words of the candidate categories of indicative of targeted content;And each label vector is by language model The vector determined based on respective classes deictic words.
According to the one side of present disclosure, a kind of recommended method is provided, comprising: based in each of multiple contents Content type tally set and the content of interest classification collection of user, which determine, is used for alternating content collection recommended to the user;Out of candidate Hold collection and selects recommendation recommended to the user;And generate the instruction that the expression of recommendation is shown to user;Wherein, it indicates For being selected for user;And multiple contents include at least one of object content, the content type tally set of object content Content type label is to label method by aforementioned and be determined.
According to another aspect of the present disclosure, the computer readable recording medium for being stored with program is provided, wherein should Program makes computer execute the method above-mentioned that labels.
According to the another aspect of present disclosure, a kind of computer readable recording medium for being stored with program is provided, In, which makes computer execute aforementioned recommended method.
The method that labels, recommended method and the recording medium of present disclosure at least contribute to realize following effect it One: be efficiently content assignment label, be accurately content assignment label, in real time be content assignment label, to user recommend The clicking rate of the content, raising content matched and the cold start-up for easily realizing new content.
Detailed description of the invention
Referring to the embodiment for illustrating present disclosure below attached drawing, this will be helpful to be more readily understood that present disclosure Above and other purposes, features and advantages.Attached drawing is intended merely to show the principle of present disclosure.It in the accompanying drawings need not be according to Ratio draws out size and the relative position of unit.In the accompanying drawings:
Fig. 1 shows the exemplary process diagram of the method that labels according to one embodiment of the present disclosure;
Fig. 2 shows the exemplary flows of the method for the multiple keywords of selection according to one embodiment of the present disclosure Figure;
Fig. 3 shows the exemplary process diagram of recommended method according to one embodiment of the present disclosure;
Fig. 4 shows the exemplary block diagram of the device that labels according to one embodiment of the present disclosure;And
Fig. 5 shows the exemplary block diagram of recommendation apparatus according to one embodiment of the present disclosure.
Specific embodiment
It is described hereinafter in connection with exemplary embodiment of the attached drawing to present disclosure.It rises for clarity and conciseness See, does not describe all features of practical embodiments in the description.It should be understood, however, that any this practical real developing Much decisions specific to embodiment can be made during applying example, to realize the objectives of developer, and this It is a little to determine to change with the difference of embodiment.
Here, and also it should be noted is that, in order to avoid having obscured present disclosure because of unnecessary details, attached Illustrate only in figure with the apparatus structure closely related according to the scheme of present disclosure, and be omitted and present disclosure close It is little other details.
It should be understood that present disclosure is not compromised by the following description referring to attached drawing and is only limited to described implementation Form.Herein, in feasible situation, embodiment be can be combined with each other, the feature replacement between different embodiments or borrow With, omit one or more features in one embodiment.
According to one aspect of the present disclosure, this disclosure relates to determine the method that labels of the label of content.Under Face refers to the method that labels of Fig. 1 exemplary description present disclosure.
Fig. 1 shows the exemplary process diagram of the method 100 that labels according to one embodiment of the present disclosure.It can Understand, there may be multiple contents that distribute label, the method for labelling 100 can be used and come one by one or in parallel for these Content labels (label of content is also referred to as content type label).Here, select one in multiple contents as in target CO is held illustratively to illustrate the method for labelling 100.
At step 101, keyword is selected, wherein the quantity of keyword is multiple, and keyword comes from object content Textual portions.KW can be usedjIndicate each keyword, j is index, and j takes 1 to maximum value jmaxIn one, and jmax Indicate the quantity of the keyword for object content CO selection.Facilitated in accurate, comprehensive characterization target using multiple keywords Hold field or classification involved in CO, facilitates accurate, comprehensive for object content distribution content type label.Object content CO Including textual portions.Object content CO can be multimedia content, advertisement, article, merchandise news or image.The quantity of keyword It can be 2,3,4,5,6,7,8,9,10 or more.For example, can select to close according to the length of the textual portions of object content CO Suitable jmax.Further, for example, can be selected suitably according to the minimum text size of the textual portions of object content CO jmax.Textual portions may include the text for including in image or audio in object content.Text in image can for example lead to Optical character identification is crossed to obtain.Text in audio can for example be obtained by speech recognition.Textual portions may include Title division and body part.
At step 102, corresponding crucial term vector VK is determinedj, wherein corresponding key term vector VKjIt is keyword KWj's Crucial term vector, and corresponding crucial term vector VK is determined using language model MLj.Language model ML can be by the word of input It is mapped as a vector.
At step 103, content vector VC is determined, wherein content vector VC is the vector for characterizing object content CO, and It and is by each corresponding crucial term vector VKjWeighted sum determines content vector VC.
At step 104, similarity SI is determinedi, wherein similarity here is object content CO about tally set { Li} In each label LiSimilarity SIi, similarity SIiIt is based on the label vector VL based on content vector VC and label LiiCome true It is fixed.I is index, can be taken 1 to imax, imaxIndicate tally set { LiIn label number, that is, the candidate categories of object content CO Quantity.Tally set { LiIn each label LiFor the classification deictic words WI of the candidate categories of indicative of targeted content COi.Each label to Measure VLiIt is that respective classes deictic words WI is based on by language model MLiDetermining vector.Similarity SIiCan for content vector VC with Label vector VLiFolder cosine of an angle, that is, the dot product of two vectors and they mould product ratio.It should be understood that at this In disclosure, unless stated otherwise, otherwise with mark { eiIndicate to include element e1、……、emaxSet, i.e. i= 1 ..., max is not only to refer to comprising an element eiSet, i.e., mark { eiIndicate the collection including one or more elements It closes.
At step 105, the content type label LC of object content CO is determinedk, wherein it is based on each similarity SIiIn determination Hold class label LCk, k is index, and k can take 1 to kmax, kmaxIndicate object content CO about tally set { LiContent class The quantity of distinguishing label.For example, working as similarity SIiMore than or equal to predetermined similarity threshold Th, then by similarity SIiIt is corresponding Label LiObject content CO is distributed to, the content type label as object content CO.It optionally, can be by imaxIt is a similar Degree arranges in descending order, k before selectingmaxLabel corresponding to a similarity distributes to object content CO, as in object content CO Hold class label.Similarity SIi being capable of classification degree of correlation of the indicative of targeted content about respective classes.It is then possible to record Each similarity SIi, in recommendation, to select the categorical match interested with user, and classification degree of correlation is higher interior Hold and is used as recommendation., it is understood that object content CO may other useful modes of labelling distribute other mark Label, other labels and kmaxIt is a to may be constructed object content CO's together with the label that method determines that labels of present disclosure Content type tally set { LCm, m is index, wherein without repeat element in content type tally set;The content of object content CO Content type label that class label collection can also be determined by the method for labelling of present disclosure completely is constituted, that is, { LCm} ={ LCk}。
In order to enhance the real-time of content tab, new content can be obtained as target online by kafka queue Content so as to tagged in time to emerging content, and is recommended based on the label stamped to user, in time to be easy Ground solves the problems, such as the cold start-up of new content.In a variation, the method that labels 100 further include: by kafka queue come Line obtains new content as object content CO.
In one embodiment, the method that labels 100 further include: obtain text portion by carrying out processing to object content Point.
In this disclosure, various ways can be used and realize the step 101 to label in method 100.Fig. 2 shows A kind of illustrative methods for realizing step 101.
Fig. 2 shows the exemplary of the method 210 of the multiple keywords of selection according to one embodiment of the present disclosure Flow chart.
At step 211, the textual portions of object content are segmented to obtain multiple candidate keywords KWj’, j ' is Index, j ' take 1 to maximum value j 'maxIn one, and j 'maxIndicate candidate keywords quantity.If there is j 'max<jmax's Situation can execute particular routine to identify object content, for example, by object content be identified as predetermined content class label and/or It is subsequent by artificial treatment.Further, if there is one or more stop words, step 211 further includes removal stop words, i.e., multiple Candidate keywords KWj’In do not include any stop words.
At step 212, word frequency is determined, wherein word frequency refers to each candidate keywords KWj’The word frequency about textual portions TFj’.To obtain j 'maxA word frequency value.
At step 213, inverse document frequency is determined, wherein inverse document frequency is each candidate keywords KWj’About predetermined The inverse document frequency IDF of corpus CPj’.To obtain j 'maxA inverse document word frequency value.Predetermined corpus CP has sufficient amount Document, these documents can be for the document that screens to the accurate labeling of content.For example, if the text of object content Part is simplified form of Chinese Character, then the document that predetermined corpus CP includes can be the document of simplified form of Chinese Character coding.Preferably, in advance Determine each document coded format having the same of corpus CP.
At step 214, multiple keywords are selected, wherein be based on each candidate keywords KWj’Word frequency TFj’With inverse document Frequency IDFj’Product TFj’*IDFj’Select the candidate keywords of predetermined quantity as multiple keyword KWj.For example, will product TFj’* IDFj’Long-pending sequence S is obtained by arrangement from big to small, and selects the preceding j in long-pending sequence SmaxCandidate keywords corresponding to a product are made For subsequent keyword to be used.
The method for selecting multiple keywords is not limited to method 210.For example, word can also be based only upon as a kind of variation Frequency TFj’Select the candidate keywords of predetermined quantity as multiple keyword KWj.Alternatively, by inverse document frequency IDFj’Scaling is certain The inverse document frequency r*IDF that is adjusted of ratio rj’, it is based on each candidate keywords KWj’Word frequency TFj’With the inverse document of adjustment Frequency r*IDFj’Product TFj’*r*IDFj’Select the candidate keywords of predetermined quantity as multiple keyword KWj, wherein r can be with It is related to the type of candidate keywords.
The various Natural Language Processing Models for characterizing the term vector of the word that word-based can generate can be used as this public affairs Open the language model ML in content.For example, being handled for predetermined corpus using the natural-sounding after the training of word2vec tool Model.As an example, the size of word_embedding can be set as when using word2vec tool train language model 64, window size is set as 10, and minimum word frequency is set as 5, and operation iteration wheel number is set as 10.As previously mentioned, language model ML being capable of base In the keyword KWj of input, keyword KW is determinedjCorresponding crucial term vector VKj, additionally it is possible to the label L based on inputi (that is, classification deictic words WIi) determine label LiLabel vector VLi
In view of different classifying content systems, each classification (label) may be more sensitive to certain keywords.Therefore, may be used To consider the classification of each keyword when determining content vector, to improve the accuracy to label.For example, implementing at one In example, by each corresponding crucial term vector VKjWeighted sum determines that the content vector VC of object content comprises determining that each key The classification C of wordj;And it is based on classification CjDetermine each corresponding crucial term vector VKjRespective weights wj.It is true that equation (1) can be used Determine content vector VC.
For example, the classification group { C based on keywordjThe weighting levels of keyword are divided into three grades, weight takes respectively First value v1、v2And v3, wherein v1>v2>v3
Further, for example, classification group { CjCan be made of following: commodity, name, place name, number, the time and its He.As keyword KWjClassification be " commodity " when, respective weights wjFor the first value, that is, wj=v1;As keyword KWjClassification be When " other ", respective weights wjFor second value, that is, wj=v1;As keyword KWjClassification be " name ", " place name ", " number " Or when " time ", respective weights wjFor third value, that is, wj=v3.The classification of keyword can for example pass through search keyword category Database determines.
In one embodiment, the first, second and third value v1、v2、v3It can be respectively set to 2.0,1.0 and 0.5.
In one embodiment, tally set { LiIn each label LiSelected from second level category of employment.Table 1 is industry class at different levels Other example, wherein merely exemplary to show part category of employment.It can be seen that the range that level-one category of employment covers is excessively wide, It is meticulous that three-level category of employment divides.Therefore, level-one category of employment and three-level category of employment are for efficiently and conveniently distinguishing user Interested content may be unfavorable.As it can be seen that selection second level category of employment is come classified to object content (that is, labelling) It is preferred.Skilled artisans appreciate that: in order to accurately distinguish the interested content type of user, can choose thinner Classification, alternatively, can be designed that the specific classification classification specific to the purpose for specific purpose.
The example of the category of employment at different levels of table 1
Present disclosure also provides a kind of recommended method.The recommended method of present disclosure is described referring to Fig. 3.
Fig. 3 shows the exemplary process diagram of recommended method 300 according to one embodiment of the present disclosure.
At step 302, alternating content collection is determined.Specifically, based on content type mark in each of multiple contents Label collection and user USaContent of interest classification collection { Luk’Determine for user USaThe alternating content collection of recommendation, a are mark The index of user, alternating content collection include for user USaThe alternating content of recommendation.Specifically, to every in multiple contents A carry out matching judgment, that is, the content type tally set { LC based on content CmAnd user USaContent of interest classification collection {Luk’Determining whether content C is to want alternating content recommended to the user, wherein multiple contents include object content, object content Content type tally set at least one content type label be labelling method and be determined by present disclosure, K ' is index, is taken 1 to k 'max, user USaThere is k 'maxA content of interest classification.It is easily understood that the mark of object content CO The quantity of label can be 1,2 or more.
As label LCmWith user USaWhen matching, that is, as a label LC in the tally set of content CmIndicated content Classification includes the content of interest classification collection { Lu in userk’In when, content C is determined as alternating content.Preferably, Ke Yiji Record is directed to the user, the maximum similarity in similarity corresponding to the matching label of selected content C, as the content needle To the content relevancy scores S of the userx.For example, to user USa, the selected content C as alternating content has 2 labels It is included in user USaContent of interest classification concentrate, similarity corresponding to 2 labels is respectively 0.6 and 0.8, then for using Family USa, record the content relevancy scores S of content CxIt is 0.8.In addition, each content of interest classification Lu of userk’It can be right Answer a score S interestedk’, score interested and content relevancy scores can be summed or whether long-pending be used as selects in candidate Hold the foundation for recommendation, wherein score S interestedk’For indicating user to content of interest classification Luk’Content Interest level.For example, the content relevancy scores S of content CxIt is 0.8, and it is corresponded to for the similar of " sport and body-building " label Degree, and user USaContent of interest classification collection includes " sport and body-building ", and the score S interested of the categoryk’=0.6, then it can be with Based on SxAnd Sk’(for example, being based on Sx*Sk’Or Sx+Sk’Size) selection recommendation.
When label and user mismatch, that is, in indicated by each label in the content type tally set of content C Content C not when the interested content type of user is concentrated, is not charged to alternating content collection by appearance classification.
At step 303, selection recommendation is concentrated from alternating content, wherein alternating content collection includes pushing away to user At least one alternating content recommended.The quantity of recommendation can be 1,2 or more.Selection rule can according to need really It is fixed, select newest content to recommend to user for example, concentrating from alternating content;It is concentrated in random selection recommendation from alternating content Hold;Select the recommended least content of number as recommendation from alternating content concentration;Selection is concentrated to have from alternating content The content of greatest content relevance score is as recommendation;It alternatively, selecting multiple Considerations, and is this multiple Consideration Priority is set, to select recommendation according to priority, multiple Considerations may include: alternating content when including Between, alternating content recommend number, score interested and content relevancy scores and/product, the clicking rate and time of alternating content Select content relevance score corresponding to content.
At step 304, the instruction of the expression of display recommendation is generated, wherein herein, shown in recommendation to user The expression of appearance, and indicate for being selected for user.The thumbnail of the expression of recommendation e.g. recommendation.For example, if User clicks the thumbnail in terminal used in it, then recommendation displayed on the terminals.
When recommended method 300 is performed by server end, recommended method 300 can also include: that reception is directed to The request of recommendation.The request can be to be issued by client used by a user.Request may include user's letter of user Breath, to determine the interested content type collection of user based on user information.Recommended method 300 can be executed by server end.
When recommended method 300 is performed by server end, recommended method 300 can also include: to user institute The client used sends the instruction that the expression of recommendation is shown to user.To, after client receives the instruction, client It holds to user and shows the expression of recommendation for selection by the user.
Preferably, multiple contents can be executed with the method that labels of present disclosure, to determine multiple contents respectively Content type label.Further, all the elements class label for all alternating contents that alternating content is concentrated is by the disclosure Content labels method to determine.
Optionally, kafka queue can be used and obtain what be new as object content.
In the following, with a specific example, the method that labels of exemplary description present disclosure.
Tally set { LiThere are 10 labels, that is, 10 classification deictic words are respectively as follows: sport and body-building, household services, fresh flower Gift, wedding photo, medical department, shaping medical treatment, women and children hospital, hotel reservation, people place and airline.
The new content obtained online from message queue it is as follows:
Content title are as follows: " what experience is does it that one family cooks together in journey people place? Airbnb tells your overtemperature fragrant ";
Content text are as follows: " and household goes to different places, enjoys scenery, and eats characteristic, moreover it is possible to different places family together Cooking, it is just very warm to think about it, the whole world Airbnb appoints you to select ".
The new content is selected as object content CO to select keyword.
Object content CO is segmented, and based on TF*IDF select maximum preceding 9 candidate keywords of TF*IDF as Keyword.This 9 keywords be respectively as follows: Airbnb, people in journey, Min Su, place, warmth, cook, landscape, characteristic, the whole world.
The classification of determining each keyword such as table 2.
The classification and weight of 2 keyword of table
By using the word2vec language model ML after training, obtain 9 64 dimension keywords of this 9 keywords to Amount, and summation is weighted to 9 crucial term vectors according to the classification of keyword, obtain the object content vector VC of 64 dimensions such as Under: [- 0.14115450160929885, -0.24425549793780627, -0.30044687888376137, - 0.05763183483727175,0.15561235974744236,0.010583868380962057, 0.013591076247417138,-0.06848938692135165,-0.02732886928430746,- 0.034710140155875834,0.03750085532692744,0.046927746483094245, 0.01581604176379293,0.16177491753452636,-0.237404869703128,- 0.06449884472860959,-0.10758427322849924,-0.07626917726376475, 0.006169830778924875,0.11237461946713251,-0.17831536577928542, 0.0819056485434265,-0.12827313774691287,0.0020619466900970483,- 0.016215964088673797,-0.14129457714696125,-0.0905078577328344, 0.01599747926662087,-0.13264012880481604,-0.05488182080912134, 0.15804649074807617,-0.15541510850124396,0.0344278284956769, 0.154474302607422,-0.27187228106139893,-0.04848808005948619, 0.07496522631347169,-0.09970821588166821,-0.21192385737972327,- 0.10144228362039891,-0.03206756311276709,0.08181443401576366,- 0.022456738055021172,0.07263042977339229,-0.05359920849368456,- 0.012039215785374473,0.05122092769789547,-0.011626157154404461,- 0.009008863938227746,-0.22059785870647422,0.004545139343459065, 0.056822009826923224,0.10528190567950048,-0.16259849732059495, 0.1074273601363384,0.16346525357742392,0.0016458175006195614,- 0.10910192190291954,0.22706467011122444,0.23295105654493278, 0.1703301017317971,0.017352765286693526,-0.14180094380902827,- 0.18815346922446488]。
Based on content vector VC and tally set { LiIn the label vectors of 10 labels determine object content VC about each The similarity of label, wherein the label vector of 10 labels is 10 determined by language model ML based on respective classes deictic words A 64 dimensional vector.0.32484788901811973 the value of 10 similarities is as follows: 0.10955877033307335, 0.18443480388501027、0.32851210400292546、-0.1871856053931387、 0.057516092361998145、-0.10459164508515512、0.5691629355855871、 0.8078326422773067、0.3179727610239934。
Assuming that predetermined similarity threshold Th is 0.5.Then determine that " people place " and " hotel reservation " is the content of object content CO Class label, that is, the content type tally set of object content CO is { people place, hotel reservation }.
Present disclosure also provides a kind of computer readable recording medium for being stored with program, wherein the program makes to calculate The method that labels of machine execution present disclosure.
Present disclosure also provides a kind of computer readable recording medium for being stored with program, wherein the program makes to calculate The recommended method of machine execution present disclosure.
Present disclosure also provides one kind and labels device.Fig. 4 is shown according to one embodiment of the present disclosure Label the exemplary block diagram of device 400.The device 400 that labels includes: keyword determination unit 401, term vector determination unit 402, content vector determination unit 403, similarity determining unit 404 and tag determination unit 405.Keyword determination unit 401 It is configured to: selecting multiple keywords from the textual portions of object content.Term vector determination unit 402 is configured to: using language Speech model determines the corresponding crucial term vector of each keyword, and the label vector of each label is determined using language model, wherein Tally set is made of label, and each label is the classification deictic words of the candidate categories of indicative of targeted content.Content vector determination unit 403 are configured to: the content vector by determining object content to each corresponding crucial term vector weighted sum.Similarity determines single Member 404 is configured to: determining object content about each label based on the label vector of each label in content vector sum tally set Similarity.Tag determination unit 405 is configured to: the content type label of object content is determined based on each similarity.Label Determination unit 405 can export the content type label of object content, wherein the quantity of the content type label of object content can To be 1,2 or more.Label device 400 and present disclosure the method that labels with corresponding relationship.Feasible In the case of, the more specific details for the device 400 that labels can be identical as the correspondence details in the method that labels of present disclosure. Preferably, all the elements class label that the device 400 that labels generates object content CO can be used.
Present disclosure also provides a kind of recommendation apparatus.Fig. 5 shows pushing away according to one embodiment of the present disclosure Recommend the exemplary block diagram of device 500.Recommendation apparatus 500 includes: alternating content collection determination unit 501, selecting unit 502 and instruction Generation unit 503.Alternating content determination unit 501 is configured to: based on content type tally set in each of multiple contents It is determined with the content of interest classification collection of user and is used for alternating content collection recommended to the user, wherein multiple contents include target Content, at least one content type label in the content type tally set of object content is labelling by present disclosure Method is determined.Selecting unit 502 is configured to: selecting recommendation recommended to the user from alternating content collection.Instruction life It is configured at unit 503: generating the instruction for showing the expression of recommendation to user, wherein the instruction for user for selecting It selects.The recommendation label method of recommendation apparatus 500 and present disclosure has corresponding relationship.In feasible situation, recommendation apparatus 500 More specific details can be identical as the correspondence details in the recommended method of present disclosure.
According to the description above to the specific embodiment of present disclosure, it will be appreciated by those skilled in the art that the disclosure Content approach is at least able to achieve one of following effect: the use of the method for labelling being content automatic labeling, does not need manually to mark Label, the error for saving the time, saving mark cost, efficiently labelling, the subjectivity manually marked is avoided to introduce;It labels Method uses neural network, can efficiently and accurately be content assignment label;It does not need to mark the content of predetermined corpus Note, therefore when label system changes, it does not need to be labeled the content of predetermined corpus, not need to language mould yet Type is trained again, and therefore, the method that labels has stronger robustness;The method of labelling can be beaten in real time for content online Label, content is by real-time tag;It can recommend personalized matching content to user, to improve the clicking rate of content; New content can be recommended in time user by online real-time tag, so as to be easy the cold start-up of realization new content.
It should be understood that term " includes " refers to the presence of feature, one integral piece, step or component when using herein, but do not arrange Except the presence of other one or more features, one integral piece, step or component or additional.
It should be understood that describing and/or showing for one embodiment without departing from the spirit of present disclosure Feature can be used in one or more other embodiments in a manner of same or similar, with the feature in other embodiments It is combined, or the feature in substitution other embodiments.
In addition, the method for present disclosure be not limited to specifications described in time sequencing execute, if from original It says feasible in reason, can also according to other time sequencings, concurrently or independently execute.Therefore, it is described in this specification Method execution sequence not to scope of the present disclosure be construed as limiting.
Above in conjunction with specific embodiments to the present disclosure has been descriptions, it will be appreciated by those skilled in the art that These descriptions are all exemplary, and are not the limitation to the protection scope of present disclosure.Those skilled in the art can root Various variants and modifications are made to present disclosure according to the spirit and principle of present disclosure, these variants and modifications are also in this public affairs In the range of opening content.

Claims (10)

  1. The method 1. one kind labels, comprising:
    Multiple keywords are selected from the textual portions of object content;
    The corresponding crucial term vector of each keyword is determined using language model;
    By the content vector for determining the object content to each corresponding crucial term vector weighted sum;
    Determine the object content about each label based on the label vector of each label in the content vector sum tally set Similarity;And
    The content type label of the object content is determined based on each similarity;
    Wherein, each label in the tally set is the classification deictic words for indicating the candidate categories of the object content;And
    Each label vector is the vector determined by the language model based on respective classes deictic words.
  2. 2. the method according to claim 1 that labels, wherein each label in the tally set is selected from second level industry class Not.
  3. 3. the method according to claim 1 that labels, wherein select multiple keyword packets from the textual portions of object content It includes:
    The textual portions are segmented to obtain multiple candidate keywords;
    Determine the word frequency about the textual portions of each candidate keywords;
    Determine the inverse document frequency about predetermined corpus of each candidate keywords;And
    The product of word frequency and inverse document frequency based on each candidate keywords selects the candidate keywords of predetermined quantity as described more A keyword.
  4. 4. the method according to claim 1 that labels, wherein by determining institute to each corresponding crucial term vector weighted sum The content vector for stating object content includes:
    Determine the classification of each keyword;And
    The respective weights of each corresponding crucial term vector are determined based on the classification of each keyword.
  5. 5. the method according to claim 4 that labels, wherein the classification is selected from by the following classification group constituted: quotient Product, name, place name, number, the time and other;
    When the classification is commodity, the respective weights are the first value;
    When the classification is other, the respective weights are second value;
    When the classification is name, place name, number or time, the respective weights are third value;
    First value is greater than the second value;And
    The second value is greater than the third value.
  6. 6. the method according to claim 1 that labels, wherein the language model is used for predetermined corpus Natural-sounding after the training of word2vec tool handles model.
  7. 7. a kind of recommended method, comprising:
    It is determined based on content type tally set in each of multiple contents and the content of interest classification collection of user for institute State the alternating content collection of user's recommendation;
    The recommendation recommended to the user is selected from alternating content collection;And
    Generate the instruction that the expression of the recommendation is shown to the user;
    Wherein, the instruction for the user for selecting;And
    The multiple content includes object content, at least one content type in the content type tally set of the object content Label is determined by labelling method described in any one of claim 1 to 6.
  8. 8. recommended method according to claim 7, further includes: kafka queue is used to obtain what be new as the mesh Mark content.
  9. 9. a kind of computer readable recording medium for being stored with program, wherein described program make computer perform claim require 1 to Label method described in any one of 6.
  10. 10. a kind of computer readable recording medium for being stored with program, wherein described program makes computer perform claim require 7 Or recommended method described in 8.
CN201910423246.2A 2019-05-21 2019-05-21 Labeling method, recommendation method and recording medium Active CN110147499B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910423246.2A CN110147499B (en) 2019-05-21 2019-05-21 Labeling method, recommendation method and recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910423246.2A CN110147499B (en) 2019-05-21 2019-05-21 Labeling method, recommendation method and recording medium

Publications (2)

Publication Number Publication Date
CN110147499A true CN110147499A (en) 2019-08-20
CN110147499B CN110147499B (en) 2021-09-14

Family

ID=67592502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910423246.2A Active CN110147499B (en) 2019-05-21 2019-05-21 Labeling method, recommendation method and recording medium

Country Status (1)

Country Link
CN (1) CN110147499B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516030A (en) * 2019-08-26 2019-11-29 北京百度网讯科技有限公司 It is intended to determination method, apparatus, equipment and the computer readable storage medium of word
CN111104526A (en) * 2019-11-21 2020-05-05 新华智云科技有限公司 Financial label extraction method and system based on keyword semantics
CN111309919A (en) * 2020-03-23 2020-06-19 智者四海(北京)技术有限公司 System and training method of text classification model
CN111858915A (en) * 2020-08-07 2020-10-30 成都理工大学 Information recommendation method and system based on label similarity
CN113313344A (en) * 2021-04-13 2021-08-27 武汉烽火众智数字技术有限责任公司 Label system construction method and system fusing multiple modes
CN113723513A (en) * 2021-08-31 2021-11-30 平安国际智慧城市科技股份有限公司 Multi-label image classification method and device and related equipment
CN113961725A (en) * 2021-10-25 2022-01-21 北京明略软件系统有限公司 Automatic label labeling method, system, equipment and storage medium
CN114827745A (en) * 2022-04-08 2022-07-29 海信集团控股股份有限公司 Video subtitle generation method and electronic equipment
WO2024027125A1 (en) * 2022-08-03 2024-02-08 百度在线网络技术(北京)有限公司 Object recommendation method and apparatus, electronic device, and storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021838A (en) * 2007-03-02 2007-08-22 华为技术有限公司 Text handling method and system
CN104965889A (en) * 2015-06-17 2015-10-07 腾讯科技(深圳)有限公司 Content recommendation method and apparatus
CN106095845A (en) * 2016-06-02 2016-11-09 腾讯科技(深圳)有限公司 File classification method and device
CN108319630A (en) * 2017-07-05 2018-07-24 腾讯科技(深圳)有限公司 Information processing method, device, storage medium and computer equipment
CN108694647A (en) * 2018-05-11 2018-10-23 北京三快在线科技有限公司 A kind of method for digging and device of trade company's rationale for the recommendation, electronic equipment
CN108829822A (en) * 2018-06-12 2018-11-16 腾讯科技(深圳)有限公司 The recommended method and device of media content, storage medium, electronic device
CN108984658A (en) * 2018-06-28 2018-12-11 阿里巴巴集团控股有限公司 A kind of intelligent answer data processing method and device
CN109033087A (en) * 2018-08-07 2018-12-18 中证征信(深圳)有限公司 Calculate method, De-weight method, clustering method and the device of text semantic distance
CN109063133A (en) * 2018-08-02 2018-12-21 武汉斗鱼网络科技有限公司 A kind of adding method, system, equipment and the medium of direct broadcasting room label
CN109165380A (en) * 2018-07-26 2019-01-08 咪咕数字传媒有限公司 A kind of neural network model training method and device, text label determine method and device
CN109242604A (en) * 2018-08-15 2019-01-18 深圳壹账通智能科技有限公司 Service recommendation method, electronic equipment and computer readable storage medium
CN109241277A (en) * 2018-07-18 2019-01-18 北京航天云路有限公司 The method and system of text vector weighting based on news keyword
CN109325229A (en) * 2018-09-19 2019-02-12 中译语通科技股份有限公司 A method of text similarity is calculated using semantic information
CN109740152A (en) * 2018-12-25 2019-05-10 腾讯科技(深圳)有限公司 Determination method, apparatus, storage medium and the computer equipment of text classification

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021838A (en) * 2007-03-02 2007-08-22 华为技术有限公司 Text handling method and system
CN104965889A (en) * 2015-06-17 2015-10-07 腾讯科技(深圳)有限公司 Content recommendation method and apparatus
CN106095845A (en) * 2016-06-02 2016-11-09 腾讯科技(深圳)有限公司 File classification method and device
CN108319630A (en) * 2017-07-05 2018-07-24 腾讯科技(深圳)有限公司 Information processing method, device, storage medium and computer equipment
CN108694647A (en) * 2018-05-11 2018-10-23 北京三快在线科技有限公司 A kind of method for digging and device of trade company's rationale for the recommendation, electronic equipment
CN108829822A (en) * 2018-06-12 2018-11-16 腾讯科技(深圳)有限公司 The recommended method and device of media content, storage medium, electronic device
CN108984658A (en) * 2018-06-28 2018-12-11 阿里巴巴集团控股有限公司 A kind of intelligent answer data processing method and device
CN109241277A (en) * 2018-07-18 2019-01-18 北京航天云路有限公司 The method and system of text vector weighting based on news keyword
CN109165380A (en) * 2018-07-26 2019-01-08 咪咕数字传媒有限公司 A kind of neural network model training method and device, text label determine method and device
CN109063133A (en) * 2018-08-02 2018-12-21 武汉斗鱼网络科技有限公司 A kind of adding method, system, equipment and the medium of direct broadcasting room label
CN109033087A (en) * 2018-08-07 2018-12-18 中证征信(深圳)有限公司 Calculate method, De-weight method, clustering method and the device of text semantic distance
CN109242604A (en) * 2018-08-15 2019-01-18 深圳壹账通智能科技有限公司 Service recommendation method, electronic equipment and computer readable storage medium
CN109325229A (en) * 2018-09-19 2019-02-12 中译语通科技股份有限公司 A method of text similarity is calculated using semantic information
CN109740152A (en) * 2018-12-25 2019-05-10 腾讯科技(深圳)有限公司 Determination method, apparatus, storage medium and the computer equipment of text classification

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516030A (en) * 2019-08-26 2019-11-29 北京百度网讯科技有限公司 It is intended to determination method, apparatus, equipment and the computer readable storage medium of word
CN111104526A (en) * 2019-11-21 2020-05-05 新华智云科技有限公司 Financial label extraction method and system based on keyword semantics
CN111309919B (en) * 2020-03-23 2024-04-16 智者四海(北京)技术有限公司 Text classification model system and training method thereof
CN111309919A (en) * 2020-03-23 2020-06-19 智者四海(北京)技术有限公司 System and training method of text classification model
CN111858915A (en) * 2020-08-07 2020-10-30 成都理工大学 Information recommendation method and system based on label similarity
CN113313344A (en) * 2021-04-13 2021-08-27 武汉烽火众智数字技术有限责任公司 Label system construction method and system fusing multiple modes
CN113313344B (en) * 2021-04-13 2023-03-31 武汉烽火众智数字技术有限责任公司 Label system construction method and system fusing multiple modes
CN113723513A (en) * 2021-08-31 2021-11-30 平安国际智慧城市科技股份有限公司 Multi-label image classification method and device and related equipment
CN113723513B (en) * 2021-08-31 2024-05-03 平安国际智慧城市科技股份有限公司 Multi-label image classification method and device and related equipment
CN113961725A (en) * 2021-10-25 2022-01-21 北京明略软件系统有限公司 Automatic label labeling method, system, equipment and storage medium
CN114827745B (en) * 2022-04-08 2023-11-14 海信集团控股股份有限公司 Video subtitle generation method and electronic equipment
CN114827745A (en) * 2022-04-08 2022-07-29 海信集团控股股份有限公司 Video subtitle generation method and electronic equipment
WO2024027125A1 (en) * 2022-08-03 2024-02-08 百度在线网络技术(北京)有限公司 Object recommendation method and apparatus, electronic device, and storage medium

Also Published As

Publication number Publication date
CN110147499B (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN110147499A (en) Label method, recommended method and recording medium
CN108763362B (en) Local model weighted fusion Top-N movie recommendation method based on random anchor point pair selection
Moghaddam et al. On the design of LDA models for aspect-based opinion mining
US20160117295A1 (en) Method and apparatus for forming a structured document from unstructured information
CN110442781A (en) It is a kind of based on generate confrontation network to grade ranked items recommended method
CN106021364A (en) Method and device for establishing picture search correlation prediction model, and picture search method and device
Zhang et al. Multimodal marketing intent analysis for effective targeted advertising
Gomathi et al. Restaurant recommendation system for user preference and services based on rating and amenities
CN111309936A (en) Method for constructing portrait of movie user
Hanni et al. Summarization of customer reviews for a product on a website using natural language processing
Balog et al. On interpretation and measurement of soft attributes for recommendation
Wu et al. Bridging music and image via cross-modal ranking analysis
Liu et al. Using collaborative filtering algorithms combined with Doc2Vec for movie recommendation
CN110990670A (en) Growth incentive book recommendation method and system
Angadi et al. Multimodal sentiment analysis using reliefF feature selection and random forest classifier
CN113934835A (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
JP5599073B2 (en) Kansei analysis system and program
CN112989053A (en) Periodical recommendation method and device
Li et al. Meta hierarchical reinforced learning to rank for recommendation: a comprehensive study in moocs
Fuxman et al. Improving classification accuracy using automatically extracted training data
CN110472056A (en) A kind of comment data classification method and system
CN110968675A (en) Recommendation method and system based on multi-field semantic fusion
Hu et al. Reading broadly to open your mind improving open relation extraction with search documents under self-supervisions
Fan et al. Mining collective knowledge: inferring functional labels from online review for business
Xia et al. Semantic similarity metric learning for sketch-based 3d shape retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant