CN104699696B - File recommendation method and device - Google Patents

File recommendation method and device Download PDF

Info

Publication number
CN104699696B
CN104699696B CN201310652678.3A CN201310652678A CN104699696B CN 104699696 B CN104699696 B CN 104699696B CN 201310652678 A CN201310652678 A CN 201310652678A CN 104699696 B CN104699696 B CN 104699696B
Authority
CN
China
Prior art keywords
title
keyword
weight
frequency
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310652678.3A
Other languages
Chinese (zh)
Other versions
CN104699696A (en
Inventor
尹程果
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Computer Systems Co Ltd
Original Assignee
Shenzhen Tencent Computer Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tencent Computer Systems Co Ltd filed Critical Shenzhen Tencent Computer Systems Co Ltd
Priority to CN201310652678.3A priority Critical patent/CN104699696B/en
Priority to PCT/CN2015/072103 priority patent/WO2015081909A1/en
Publication of CN104699696A publication Critical patent/CN104699696A/en
Application granted granted Critical
Publication of CN104699696B publication Critical patent/CN104699696B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Abstract

The invention discloses a kind of file recommendation method and devices, belong to network technique field.The described method includes: being segmented to obtain the first keyword set to the first title;According to default corresponding relationship, at least one second title and the second keyword set are obtained, the default corresponding relationship includes the corresponding relationship between keyword and file name comprising the keyword;Identical keyword is obtained in first keyword set and corresponding second keyword set of each second title as matching keywords;Obtain the weight of matching keywords that each second title includes in first title;Determine the second title to be recommended;Recommend file indicated by the second title of the determination.The present invention determines the second title to be recommended from multiple the second alternative titles by determining weight according to the part of speech of matching keywords, according to weight, improves the degree of correlation of consequently recommended file name and the title when front opening file, improves recommendation success rate.

Description

File recommendation method and device
Technical field
The present invention relates to network technique field, in particular to a kind of file recommendation method and device.
Background technique
In daily Above-the-line, user is difficult therefrom at every moment in facing various information Filter out oneself really interested information.For the ease of the screening of user, server can record according to the browsing of user, is emerging Interest hobby etc. recommends it may interested information for user.
By taking video as an example, when recommending video, server can be recommended under type belonging to currently playing video for user Most popular video, e.g., when currently playing video is the video of " sport " type, server is that user recommends under " sport " type Most popular video.Alternatively, server calculates the LD between the title of each video and the title of currently playing video (Levenshtein Distance, editing distance), by the LD between title and the title of currently playing video apart from the smallest Video recommendations are to user.
When recommending video most popular under type belonging to currently playing video, the most popular video and currently playing view The degree of correlation of frequency may be very low, and then causes to recommend success rate low;And server recommends video using the method for calculating LD distance When, LD distance can only mechanically measure the difference of copy editor's level between different video title, so that final determining recommendation Video name and currently playing video name semantically may differ by very remote, and it is very low equally to will cause the video degree of correlation, in turn Cause to recommend success rate very low.
Summary of the invention
In order to solve problems in the prior art, the embodiment of the invention provides a kind of file recommendation method and devices.It is described Technical solution is as follows:
In a first aspect, providing a kind of file recommendation method, which comprises
First title is segmented, the first keyword set, the first entitled current name for opening file are obtained Claim, first keyword set includes at least one keyword that first title segments;
According to default corresponding relationship, at least one second title and at least one described second title corresponding second are obtained Keyword set, the corresponding file name of keyword in second entitled first keyword set are described default Corresponding relationship includes the corresponding relationship between keyword and file name comprising the keyword;
Obtain identical key in first keyword set and corresponding second keyword set of each second title Word, using the identical keyword as matching keywords;
Obtain the weight of matching keywords that each second title includes in first title;
Weight of the matching keywords for including according to each second title in first title, determines to be recommended The second title;
Recommend file indicated by the second title of the determination.
Second aspect, provides a kind of file recommendation device, and described device includes:
First participle module obtains the first keyword set, described first is entitled for segmenting to the first title When the title of front opening file, first keyword set includes at least one key that first title segments Word;
Second set obtain module, for according to preset corresponding relationship, obtain at least one second title and it is described at least Corresponding second keyword set of one the second title, the keyword pair in second entitled first keyword set The file name answered, the default corresponding relationship include the corresponding pass between keyword and the file name comprising the keyword System;
Matching module, for obtaining first keyword set and corresponding second keyword set of each second title In identical keyword, using the identical keyword as matching keywords;
Weight Acquisition module, for obtaining matching keywords that each second title includes in first title Weight;
Title determining module, the matching keywords for including according to each second title are in first title Weight, determine the second title to be recommended;
Recommending module, for recommending file indicated by the second title of the determination.
Technical solution provided in an embodiment of the present invention has the benefit that
Method and apparatus provided in an embodiment of the present invention are obtained by handling the first title when front opening file To multiple the second alternative titles, each second title is matched according to first title, determines each second title packet Matching keywords included, and weight is determined according to the part of speech of matching keywords, thus according to weight from multiple alternative second places The second title to be recommended is determined in title, and recommends file indicated by the second title of the determination, is improved consequently recommended The degree of correlation of file name and the title when front opening file, improves recommendation success rate.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is a kind of flow chart of file recommendation method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of file recommendation method provided in an embodiment of the present invention;
Fig. 3 is a kind of file recommendation apparatus structure schematic diagram provided in an embodiment of the present invention;
Fig. 4 is a kind of server architecture schematic diagram provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
Fig. 1 is a kind of flow chart of file recommendation method provided in an embodiment of the present invention.The execution master of the inventive embodiments Body is server, referring to Fig. 1, which comprises
101, the first title is segmented, obtains the first keyword set, the first entitled current opening file Title, first keyword set include at least one keyword that first title segments.
102, according to corresponding relationship is preset, obtain at least one second title with this at least one second title corresponding the Two keyword sets, the corresponding file name of keyword in second entitled first keyword set, this is default to correspond to Relationship includes the corresponding relationship between keyword and file name comprising the keyword.
103, identical pass in first keyword set and corresponding second keyword set of each second title is obtained Keyword, using the identical keyword as matching keywords.
104, the weight of matching keywords that each second title includes in first title is obtained.
105, weight of the matching keywords for including according to each second title in first title, determines to be recommended The second title.
106, recommend file indicated by the second title of the determination.
Method provided in an embodiment of the present invention is obtained multiple by handling the first title when front opening file The second alternative title matches each second title according to first title, determines that each second title includes Weight is determined with keyword, and according to the part of speech of matching keywords, thus true from multiple the second alternative titles according to weight Fixed second title to be recommended, and recommend file indicated by the second title of the determination, improve consequently recommended filename Claim to improve recommendation success rate with the degree of correlation of the title when front opening file.
Optionally, according to default corresponding relationship, obtaining at least one second title, at least one second title is corresponding with this The second keyword set include:
Corresponding relationship is preset according to this, obtains at least one second title;
The second title of each of at least one second title for this segments second title, obtains second Keyword set, second keyword set include at least one keyword that second title segments.
Optionally, matching keywords that each second title includes are obtained before the weight in first title, it should Method further include:
According at least one in first keyword set in the type and the frequency of occurrences of each keyword, it is every to obtain this Weight of a keyword in first title.
Optionally, according in first keyword set in the type and the frequency of occurrences of each keyword at least one of, Obtaining weight of each keyword in first title includes:
It is that this is every according to the sequence of weight rank from high to low according to the corresponding weight rank of the type of each keyword A keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than the low keyword of weight rank and divides The weight matched;Or,
Frequency of occurrences sequence from high to low according to each keyword is that each keyword distributes weight, makes to obtain The weight that the high keyword of existing frequency is distributed is greater than the weight that the low keyword of the frequency of occurrences is distributed;Or,
It is that this is every according to the sequence of weight rank from high to low according to the corresponding weight rank of the type of each keyword A keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than the low keyword of weight rank and divides The weight matched;
According to the frequency of occurrences of each keyword, the weight distributed each keyword is adjusted.
Optionally, the type of the keyword includes noun, verb or function word, and the weight of noun is superior to verb and function word Weight rank;
The frequency of occurrences of the keyword is the frequency that the keyword occurs in stored file name, alternatively, the pass The frequency of occurrences of keyword is the frequency that the keyword occurs in the file name of stored specified classification, which is Deserve classification belonging to front opening file.
Optionally, the weight of name is superior to the weight ranks of other nouns in noun.
Optionally, weight of the matching keywords for including according to each second title in first title, determine to Recommend the second title include:
Weight of the matching keywords for including according to each second title in first title, determine this each second The weight of title;
According to the weight sequence from big to small of each second title, by the second title of preset number be determined as this to The second title recommended.
Optionally, weight of the matching keywords for including according to each second title in first title, determining should The weight of each second title includes:
By the weight of matching keywords that each second title includes in first title and value to be determined as this every The weight of a second title;Or,
According to the issuing time of file indicated by each second title, the time weighting of each second title is determined, According to preset ratio, weight of the matching keywords for including to each second title in first title and value and this Time weighting is weighted, and obtains weighted sum, which is determined as to the weight of each second title.
All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer It repeats one by one.
Fig. 2 is a kind of flow chart of file recommendation method provided in an embodiment of the present invention.The execution master of the inventive embodiments Body is server, referring to fig. 2, which comprises
201, the server segments the first title, obtains the first keyword set, this first entitled is currently beaten The title of open file, first keyword set include at least one keyword that first title segments.
The embodiment of the present invention be applied to user's opened file, the server according to when front opening file title, for Recommend under the scene of alternative document at family.The server can for when front opening file association server or with work as front opening Functional module in the server of file association, it is not limited in the embodiment of the present invention.
Further, the embodiment of the present invention is applied to when the field of the customized title of entitled publisher of front opening file Under scape.Different from the title that movie name or TV play title etc. have been provided in publication, the customized title of publisher can Can be very long or very short, it may be a simple word, it is also possible to be a complicated sentence, the embodiment of the present invention, that is, basis The customized personalized name of publisher recommends file for user.
Wherein, this document can be video file provided by the server, audio file or text file etc., such as video Net provided by the audio file or document sharing server of network video file, the offer of audio website that Website server provides Network document etc., it is not limited in the embodiment of the present invention.
Specifically, which obtains when detecting that user opens file when the name of front opening file is referred to as first Title, and first title is segmented, at least one keyword of first title is obtained, by least one keyword Form first keyword set.
For example, first entitled " Liu Dehua attends the clothes worn when the concert of schoolmate ", then to first title It is segmented, obtains first keyword set { Liu Dehua, schoolmate, concert, clothes }.
Wherein, the server to first title segment when, can using based on string matching segmenting method or Segmenting method of the person based on statistics, it is not limited in the embodiment of the present invention.
202, the server obtains at least one second title according to default corresponding relationship, this second it is entitled this The corresponding file name of keyword in one keyword set, the default corresponding relationship include keyword with comprising the keyword Corresponding relationship between file name.
Wherein, which includes at least one keyword, and for every in first keyword set For a keyword, which be can be obtained by inquiring the default corresponding relationship comprising in first keyword set The file name of any one or more keywords.
For example, first title, the keyword in first keyword set and the corresponding second place of each keyword Corresponding relationship between referred to as is as shown in table 1.
Table 1
Optionally, before the step 202, this method further include: according to the stored file name of the server, establish The default corresponding relationship.
Specifically, which segments the title of stored All Files, obtains each file name and includes Keyword;The file comprising the keyword is obtained according to the keyword that each file name includes for a keyword Title;Establish the default corresponding relationship between the keyword and file name comprising the keyword.
Still optionally further, the keyword which includes to each file name establishes inverted index, by foundation Inverted index is determined as the default corresponding relationship.
203, the second title of each of at least one second title for this, the server divide second title Word obtains the second keyword set, which includes at least one keyword that second title segments.
Citing based on step 202, second entitled " the Liu De China concert complete or collected works ", then the server is to the second place Title obtains second keyword set { Liu Dehua, concert, complete or collected works } after being segmented.
Wherein, which can also use the segmenting method based on string matching when segmenting to second title Or the segmenting method based on statistics, it is not limited in the embodiment of the present invention.
204, the server obtains in first keyword set and corresponding second keyword set of each second title Identical keyword, using the identical keyword as matching keywords.
Specifically, for a keyword in first keyword set, second keyword set is traversed, judgement should It whether include the keyword in second keyword set, when including the keyword, by the key in second keyword set Word continues to carry out above-mentioned judgement to each keyword in first keyword set, obtains at least one as matching keywords A matching keywords.Alternatively, a keyword in second keyword set is traversed first keyword set, is sentenced It whether include the keyword in first keyword set of breaking, when including the keyword, by this in first keyword set Keyword continues to carry out above-mentioned judgement to each keyword in second keyword set, obtain extremely as matching keywords Few matching keywords.
Citing based on step 201 and step 203, first keyword set are combined into { Liu Dehua, schoolmate, concert, clothes Dress }, which is combined into { Liu Dehua, concert, complete or collected works }, then the matching keywords are " Liu Dehua " and " sing Meeting ".
205, the server is according to the corresponding weight rank of type of each keyword in first keyword set, according to The sequence of weight rank from high to low is that each keyword distributes weight, so that the power that the high keyword of weight rank is distributed The weight that the great keyword low in weight rank is distributed.
In embodiments of the present invention, which includes that at least one is identical with second keyword set Matching keywords, but first title and second title semantically may differ by very greatly.Therefore, to be recommended the is being selected When two titles, in order to improve the degree of correlation of the second title to be recommended and first title, by for first keyword set In keyword distribute weight, accordingly determine the weight of each second title, with improve finally determine second place to be recommended Claim the degree of correlation with first title.
Specifically, which presets weight rank corresponding to the type of each keyword, true in the server In fixed first keyword set when type of each keyword, according to the corresponding power of the preset each type of the server Heavy duty is other, determines the weight rank of each keyword, according to the sequence of weight rank from high to low, to each keyword into Row sequence, and distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than the low keyword institute of weight rank The weight of distribution.
Optionally, the weight that each keyword is distributed in first keyword set is 1 with value.
Still optionally further, the type of the keyword includes noun, verb or function word, and the weight of noun is superior to verb With the weight rank of function word, and the weight of name is superior to the weight rank of other nouns in noun.
Such as, first entitled " Liu Dehua attends the clothes worn when the concert of schoolmate ", noun " Liu De therein China ", " schoolmate ", " concert ", " clothes " weight rank be higher than verb " attending ", " wearing " and function word " ", " when " Weight rank.
Wherein, the name in noun can be name, place name, organization names, brand name etc., and the embodiment of the present invention is to this Without limitation.The weight of name is superior to the weight rank of other nouns, and such as " Liu Dehua ", the weight rank of " schoolmate " are high In " concert ", the weight rank of " clothes ".
Still by taking first entitled " Liu Dehua attends the clothes worn when the concert of schoolmate " as an example, which is determined The weight of " Liu Dehua ", " schoolmate " are superior to the weight rank of " concert ", " clothes ", the power of " concert ", " clothes " Be superior to again " attending ", " wearing ", " ", " when " weight rank, then the server can distribute for keyword " Liu Dehua " Weight 0.3 is keyword " schoolmate " distribution weight 0.3, distributes weight 0.2 for keyword " concert ", is keyword " clothes Dress " distribution weight 0.1 distributes weight 0.1 for keyword " attending ", remaining keyword distributes weight 0.
In another embodiment provided in an embodiment of the present invention, which can be replaced by following steps (1):
(1) it is that each keyword distributes weight according to the sequence of the frequency of occurrences of each keyword from high to low, makes It obtains the weight that the high keyword of the frequency of occurrences is distributed and is greater than the weight that the low keyword of the frequency of occurrences is distributed.
In embodiments of the present invention, it is believed that the higher keyword of the frequency of occurrences is more warm in first keyword set Door, then user is likely to interested in file relevant to the higher keyword of the frequency of occurrences, it can according to first pass The frequency of occurrences of each keyword of keyword set distributes weight.
Optionally, the frequency of occurrences of the keyword is the frequency that the keyword occurs in stored file name, or Person, the frequency of occurrences of the keyword are the frequency that the keyword occurs in the file name of stored specified classification, this refers to Determining classification is to deserve classification belonging to front opening file.
Wherein, some subclass may be belonged to by deserving front opening file, which still belongs to a certain female classification, then should Server can deserve specified classification belonging to front opening file according to the difference for recommending accuracy requirement, determination.
Such as when entitled " perseverance is won the championship greatly " of front opening file, belong to the football classification in Sport Class, then the server The frequency of occurrences of the keyword " winning the championship " in the file name of football classification can be calculated, for the keyword " winning the championship " distribution Weight, rather than calculate the frequency of occurrences of the keyword " winning the championship " in the file name of all categories or in Sport Class The frequency of occurrences in file name.
Further, which can be TF(Term Frequency, word frequency) or DF(Document Frequency, document-frequency).
Still by taking first entitled " Liu Dehua attends the clothes worn when the concert of schoolmate " as an example, which is determined First title belongs to singer's classification, then calculates keyword " Liu Dehua ", " schoolmate ", " concert ", " clothes " in singer's class The frequency of occurrences in other file name, if final calculated keyword " Liu Dehua ", " schoolmate " and " concert " The frequency of occurrences is respectively 0.3,0.2 and 0.1, then the server can be keyword according to the sequence of the frequency of occurrences from high to low " Liu Dehua " distributes weight 0.5, is keyword " schoolmate " distribution weight 0.4, distributes weight 0.1 for keyword " concert ", It is 0 that remaining keyword, which distributes weight,.
Still optionally further, which calculates each pass in the file name that the server stores in preset duration The frequency of occurrences of keyword.Wherein, which can be preset by the server.
Above-mentioned steps 205 and step (1) are corresponding according to the type of each keyword in first keyword set respectively Weight rank and the frequency of occurrences of each keyword distribute weight, in fact, the server can also be every by comprehensively considering The corresponding weight rank of the type of a keyword and the frequency of occurrences distribute weight.I.e. provided in an embodiment of the present invention another In embodiment, which can also be replaced by following steps (2):
(2) according to the corresponding weight rank of the type of each keyword, it is according to the sequence of weight rank from high to low Each keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than the low keyword of weight rank The weight distributed;According to the frequency of occurrences of each keyword, the weight distributed each keyword is adjusted.
In practical applications, it is believed that the high keyword of the frequency of occurrences is more popular, but the key that the frequency of occurrences is high The degree of correlation of corresponding second title of word and first title may be very low, and user might not be to the second title institute of the hot topic The file of instruction is interested.And in embodiments of the present invention, which can also be corresponding in the type according to each keyword Weight rank, after distributing weight for each keyword, according to the frequency of occurrences of each keyword, to each keyword The weight distributed is adjusted.By the degree of correlation and second title that comprehensively consider second title and first title The frequency of occurrences, both can be improved finally determine the second title to be recommended and first title the degree of correlation, can also be excellent First the selection higher file recommendation of the frequency of occurrences is to user.
Further, " according to the frequency of occurrences of each keyword, which is divided in the step (2) The weight matched is adjusted ", following any mode can be used:
(2-1) determines adjustment amplitude according to the frequency of occurrences of each keyword, according to determining adjustment amplitude, to this The weight that each keyword is distributed is increased or reduced accordingly.
Such as, the server be keyword " Liu Dehua ", " schoolmate ", " concert ", " clothes ", " attending " distribution weight Be 0.3,0.3,0.2,0.1,0.1, and calculate during Week keyword " Liu Dehua ", " schoolmate ", " concert ", The frequency of occurrences of " clothes " and " attending " is respectively 0.3,0.2,0.1,0.2 and 0.01, it is determined that keyword " Liu Dehua ", " Schoolmate ", " concert ", " clothes " and " attending " adjustment amplitude be 0.025,0.025, -0.1,0.15, -0.1, then according to should Adjustment amplitude, after being adjusted to each keyword, the final weight for determining distribution is 0.275,0.275,0.1,0.25,0.
The frequency of occurrences is more than or equal to the keyword institute of preset threshold according to the frequency of occurrences of each keyword by (2-2) The weight of distribution increases default adjustment weight, described in the weight reduction that the keyword that the frequency of occurrences is less than preset threshold is distributed Default adjustment weight.
Such as, which determines that the preset threshold is 0.2, which is 0.05, then when the server is to close The weight that keyword " Liu Dehua ", " schoolmate ", " concert ", " clothes ", " attending " are distributed is 0.3,0.3,0.2,0.1,0.1, And calculate keyword " Liu Dehua ", " schoolmate ", " concert ", " clothes " and " attending " the frequency of occurrences be respectively 0.3, 0.2,0.1,0.2 and 0.01 when, keyword " Liu Dehua ", " schoolmate ", " clothes " by the frequency of occurrences more than or equal to 0.2 divide The weight matched increases by 0.05, and the weight that the keyword " concert " by the frequency of occurrences less than 0.2, " attending " are distributed is reduced 0.05, then the final weight for determining distribution is 0.25,0.25,0.15,0.15,0.05.
It should be noted that the embodiment of the present invention is illustrated so that the step 205 executes after the step 204 as an example, In fact, the step 205 need to only execute after the step 201, before the step 206, i.e. the step 205 can also be Execute before the step 204, or be performed simultaneously with the step 204, the embodiment of the present invention to execution opportunity of the step 205 not It limits.
206, the server obtains weight of the matching keywords in first title included by each second title.
In embodiments of the present invention, which has determined that each keyword is in the first place in first keyword set Weight in title that is to say the weight for having determined that each matching keywords in first title, then the server determines each Power of the matching keywords that the matching keywords and each second title that second title includes include in first title Weight.
Based on table 1, it is assumed that first entitled " Liu Dehua attends the clothes worn when the concert of schoolmate ", and the service Device is that keyword " Liu Dehua " distributes weight 0.3, is keyword " schoolmate " distribution weight 0.3, for keyword " concert " point With weight 0.2, weight 0.1 is distributed for keyword " clothes ", distributes weight 0.1, the distribution of remaining keyword for keyword " attending " Weight 0, then the server each of determines that weight of the matching keywords in first title that the second title includes can be as Shown in table 2.
Table 2
207, the issuing time of server file according to indicated by each second title, determines each second title Time weighting, according to preset ratio, weight of the matching keywords for including to each second title in first title And value and the time weighting be weighted, obtain weighted sum, which be determined as each second place The weight of title.
In embodiments of the present invention, file indicated by the second title may be the file of newest publication, it is also possible to for already The file of publication, and the issuing time of file is different, the interested degree of user is also different, i.e., issuing time influences whether user Interested degree, and then influence recommendation success rate.Therefore, it when determining second title to be recommended, needs to consider The issuing time of file indicated by each second title.
Specifically, which calculates the weight of matching keywords that each second title includes in first title And value each second title is ranked up and according to the issuing time of file indicated by each second title, according to It puts in order, time weighting is distributed for each second title, so that the time weighting of second title in issuing time evening is higher than The time weighting of the second title of issuing time morning.The server according to the preset ratio, to this and value and the time weighting into Row weighted calculation obtains weighted sum, the weight of as each second title.
Wherein, which refers to the ratio between value and the time weighting, according to the ratio, can determine The weighting coefficient of this and value and time weighting when being weighted.The preset ratio can be preset by the server, It can also be adjusted in use by the server, such as when the issuing time of front opening file is more early, time power Weight proportion is smaller, and when front opening file is the file of the stronger types of timeliness such as " news ", the time weighting institute Large percentage is accounted for, it is not limited in the embodiment of the present invention.
Based on table 2, second entitled " the Liu De China concert complete or collected works ", it is assumed that the server is second title distribution Time weighting is 0.4, and the preset ratio is 6:4, then the server calculates the power for the matching keywords that second title includes Weight and value are 0.5, and the weight for calculating second title is 0.5*0.6+0.4*0.4=0.46.
Further, which can preset time interval and time power between issuing time and current time Weight corresponding relationship, that is, determine time weighting corresponding to each time interval, then the server can calculate this each second Name referring shows the time interval between the issuing time of file and current time, according to the preset corresponding relationship, really The time weighting of fixed each second title.
Such as, the time weighting which presets the second title that the time interval is 1 day is 0.9, between the time Be divided into 2 days the second titles time weighting be 0.8 ... then for second title, the server determine this second When time interval between the issuing time and current time of the file that name referring shows is 4 days, determine second title when Between weight be 0.6.
It should be noted that above-mentioned steps 207 are optional step, which can not also consider the file distribution time It influences, and the only weight according to matching keywords included by each second title in first title, determine each second The weight of title, i.e., in another embodiment provided in an embodiment of the present invention, which can be replaced by following steps: will Weight of the matching keywords that each second title includes in first title is determined as each second title with value Weight.It is such as based on table 2, second entitled " the Liu De China concert complete or collected works ", then the server calculates second title and includes Matching keywords weight and value be 0.5, that is, determine second title weight be 0.5.
208, the server according to each second title weight sequence from big to small, by the second place of preset number Title is determined as second title to be recommended.
Wherein, which can be preset by the server, or by the server according to when front opening file Display interface in the number of files that can show of recommendation region determine that it is not limited in the embodiment of the present invention.
Specifically, sequence of the server according to weight from big to small is ranked up each second title, and is arranging It is determined as second title to be recommended in the second title of preceding preset number, so that the second title of preceding preset number will be come Indicated file recommendation is to user.
209, the server recommends file indicated by the second title of the determination.
In embodiments of the present invention, when which recommends file indicated by the second title of the determination, can work as The chained address of the second title of the determination is provided on the display interface of front opening file, the chained address is for jumping to this really File indicated by the second fixed title.In addition, the server can also show file indicated by the second title of the determination Relevant informations such as the thumbnail of generation, or display publisher, issuing time etc., it is not limited in the embodiment of the present invention.
Further, for the second title of multiple determinations, can successively be recommended according to weight order, also Can successively be recommended according to issuing time, the embodiment of the present invention to this without limitation.
Method provided in an embodiment of the present invention is obtained multiple by handling the first title when front opening file The second alternative title matches each second title according to first title, determines that each second title includes Weight is determined with keyword, and according to the part of speech of matching keywords, thus true from multiple the second alternative titles according to weight Fixed second title to be recommended, and recommend file indicated by the second title of the determination, improve consequently recommended filename Claim to improve recommendation success rate with the degree of correlation of the title when front opening file.Further, it is contemplated that the issuing time of file Factor, second title to be recommended is determined by calculating the time weighting of each second title, is further improved Recommend success rate.
Fig. 3 is a kind of file recommendation apparatus structure schematic diagram provided in an embodiment of the present invention, and referring to Fig. 3, which includes: First participle module 301, second set obtain module 302, matching module 303, Weight Acquisition module 304, title determining module 305, recommending module 306,
Wherein, first participle module 301 obtains the first keyword set, this first for segmenting to the first title The entitled title when front opening file, first keyword set include first title segment at least one is crucial Word;
Second set obtains module 302 and connect with first participle module 301, for obtaining extremely according to corresponding relationship is preset Few second title corresponding second keyword set of at least one second title, second entitled first key with this The corresponding file name of keyword in set of words, which includes keyword and the filename comprising the keyword Corresponding relationship between referred to as;
Matching module 303 and second set obtain module 302 and connect, for obtaining first keyword set and each the Identical keyword in corresponding second keyword set of two titles, using the identical keyword as matching keywords;
Weight Acquisition module 304 is connect with matching module 303, crucial for obtaining the matching that each second title includes Weight of the word in first title;
Title determining module 305 is connect with Weight Acquisition module 304, the matching for including according to each second title Weight of the keyword in first title, determines the second title to be recommended;
Recommending module 306 is connect with title determining module 305, for recommending text indicated by the second title of the determination Part.
Optionally, second set acquisition module 302 includes:
Second title acquiring unit obtains at least one second title for presetting corresponding relationship according to this;
Second participle unit, for the second title of each of at least one second title for this, to second title It is segmented, obtains the second keyword set, which includes at least one that second title segments Keyword.
Optionally, the device further include:
First Weight Acquisition module, for the type and the frequency of occurrences according to each keyword in first keyword set At least one of in, obtain weight of each keyword in first title.
Optionally, which includes:
First Weight Acquisition unit, for the corresponding weight rank of type according to each keyword, according to weight grade Sequence not from high to low is that each keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than The weight that the low keyword of weight rank is distributed;Or,
Second Weight Acquisition unit is that this is each for the sequence of the frequency of occurrences according to each keyword from high to low Keyword distributes weight, so that the weight that the high keyword of the frequency of occurrences is distributed is greater than the low keyword of the frequency of occurrences and is distributed Weight;Or,
Third Weight Acquisition unit, for the corresponding weight rank of type according to each keyword, according to weight grade Sequence not from high to low is that each keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than The weight that the low keyword of weight rank is distributed;
Adjustment unit, for the frequency of occurrences according to each keyword, weight which is distributed into Row adjustment.
Optionally, the type of the keyword includes noun, verb or function word, and the weight of noun is superior to verb and function word Weight rank;
The frequency of occurrences of the keyword is the frequency that the keyword occurs in stored file name, alternatively, the pass The frequency of occurrences of keyword is the frequency that the keyword occurs in the file name of stored specified classification, which is Deserve classification belonging to front opening file.
Optionally, the weight of name is superior to the weight ranks of other nouns in noun.
Optionally, which includes:
Weight determining unit, power of the matching keywords in first title for including according to each second title Weight, determines the weight of each second title;
Title determination unit to be recommended will be preset for the sequence of the weight according to each second title from big to small Second title of number is determined as second title to be recommended.
Optionally, the matching keywords which is used to include by each second title are in first title In weight and value be determined as the weight of each second title;Or,
The weight determining unit is used for the issuing time of the file according to indicated by each second title, determine this each the The time weighting of two titles, according to preset ratio, the matching keywords for including to each second title are in first title Weight and value and the time weighting be weighted, obtain weighted sum, which be determined as this each The weight of second title.
Device provided in an embodiment of the present invention is obtained multiple by handling the first title when front opening file The second alternative title matches each second title according to first title, determines that each second title includes Weight is determined with keyword, and according to the part of speech of matching keywords, thus true from multiple the second alternative titles according to weight Fixed second title to be recommended, and recommend file indicated by the second title of the determination, improve consequently recommended filename Claim to improve recommendation success rate with the degree of correlation of the title when front opening file.
It should be understood that file recommendation device provided by the above embodiment is when recommending file, only with above-mentioned each function The division progress of module can according to need and for example, in practical application by above-mentioned function distribution by different function moulds Block is completed, i.e., the internal structure of server is divided into different functional modules, described above all or part of to complete Function.In addition, file recommendation device provided by the above embodiment and file recommendation method embodiment belong to same design, it is specific Realization process is detailed in embodiment of the method, and which is not described herein again.
Fig. 4 is a kind of server architecture schematic diagram provided in an embodiment of the present invention, which can be because of configuration or performance It is different and generate bigger difference, it may include one or more central processing units (central processing Units, CPU) 422(is for example, one or more processors) and memory 432, one or more storages apply journey The storage medium 430(of sequence 442 or data 444 such as one or more mass memory units).Wherein, 432 He of memory Storage medium 430 can be of short duration storage or persistent storage.The program for being stored in storage medium 430 may include one or one With upper module (diagram does not mark), each module may include to the series of instructions operation in server.Further, in Central processor 422 can be set to communicate with storage medium 430, execute on server 400 a series of in storage medium 430 Instruction operation.
Server 400 can also include one or more power supplys 426, one or more wired or wireless networks Interface 450, one or more input/output interfaces 458, and/or, one or more operating systems 441, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
The step as performed by server described in above-described embodiment can be based on the server architecture shown in Fig. 4.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (14)

1. a kind of file recommendation method, which is characterized in that the described method includes:
First title is segmented, the first keyword set, the first entitled current title for opening file, institute are obtained Stating the first keyword set includes at least one keyword that first title segments, the first entitled publisher Customized title;
According to default corresponding relationship, at least one second title and corresponding second key of at least one described second title are obtained Set of words, the corresponding file name of keyword in second entitled first keyword set, the default correspondence Relationship includes the corresponding relationship between keyword and file name comprising the keyword;
Identical keyword in first keyword set and corresponding second keyword set of each second title is obtained, it will The identical keyword is as matching keywords;
According at least one in first keyword set in the type and the frequency of occurrences of each keyword, obtain described every Weight of a keyword in first title;
Obtain the weight of matching keywords that each second title includes in first title;
Weight of the matching keywords for including according to each second title in first title determines to be recommended Two titles;
Recommend file indicated by the second title of the determination.
2. the method according to claim 1, wherein obtaining at least one second place according to default corresponding relationship Corresponding at least one described second title the second keyword set is claimed to include:
According to the default corresponding relationship, at least one described second title is obtained;
For the second title of each of at least one second title, second title is segmented, obtains second Keyword set, second keyword set include at least one keyword that second title segments.
3. the method according to claim 1, wherein according to each keyword in first keyword set At least one of in type and the frequency of occurrences, obtaining weight of each keyword in first title includes:
It is described every according to the sequence of weight rank from high to low according to the corresponding weight rank of the type of each keyword A keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than the low keyword of weight rank and divides The weight matched;Or,
Frequency of occurrences sequence from high to low according to each keyword is that each keyword distributes weight, makes to obtain The weight that the high keyword of existing frequency is distributed is greater than the weight that the low keyword of the frequency of occurrences is distributed;Or,
It is described every according to the sequence of weight rank from high to low according to the corresponding weight rank of the type of each keyword A keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than the low keyword of weight rank and divides The weight matched;
According to the frequency of occurrences of each keyword, the weight distributed each keyword is adjusted.
4. the method according to claim 1, wherein the type of the keyword includes noun, verb or function word, The weight of noun is superior to the weight rank of verb and function word;
The frequency of occurrences of the keyword is the frequency that the keyword occurs in stored file name, alternatively, described The frequency of occurrences of keyword is the frequency that the keyword occurs in the file name of stored specified classification, described specified Classification works as classification belonging to front opening file described in being.
5. according to the method described in claim 4, it is characterized in that, the weight of name is superior to the power of other nouns in noun Heavy duty is other.
6. the method according to claim 1, wherein the matching keywords for including according to each second title Weight in first title determines that the second title to be recommended includes:
Weight of the matching keywords for including according to each second title in first title determines described each The weight of two titles;
According to the weight sequence from big to small of each second title, by the second title of preset number be determined as it is described to The second title recommended.
7. according to the method described in claim 6, it is characterized in that, the matching keywords for including according to each second title Weight in first title determines that the weight of each second title includes:
By the weight of matching keywords that each second title includes in first title and value be determined as it is described The weight of each second title;Or,
According to the issuing time of file indicated by each second title, the time weighting of each second title is determined, According to preset ratio, weight of the matching keywords for including to each second title in first title and value with And the time weighting is weighted, and obtains weighted sum, and the weighted sum is determined as each second title Weight.
8. a kind of file recommendation device, which is characterized in that described device includes:
First participle module obtains the first keyword set, described first is entitled current for segmenting to the first title The title of file is opened, first keyword set includes at least one keyword that first title segments, institute State the first entitled customized title of publisher;
Second set obtain module, for according to preset corresponding relationship, obtain at least one second title and it is described at least one Corresponding second keyword set of second title, the keyword in second entitled first keyword set are corresponding File name, the default corresponding relationship include the corresponding relationship between keyword and file name comprising the keyword;
Matching module, for obtaining phase in first keyword set and corresponding second keyword set of each second title Same keyword, using the identical keyword as matching keywords;
First Weight Acquisition module, for according in first keyword set in the type and the frequency of occurrences of each keyword At least one of, obtain weight of each keyword in first title;
Weight Acquisition module, for obtaining the power of matching keywords that each second title includes in first title Weight;
Title determining module, power of the matching keywords in first title for including according to each second title Weight, determines the second title to be recommended;
Recommending module, for recommending file indicated by the second title of the determination.
9. device according to claim 8, which is characterized in that the second set obtains module and includes:
Second title acquiring unit, for obtaining at least one described second title according to the default corresponding relationship;
Second participle unit, for for the second title of each of at least one second title, to second title It is segmented, obtains the second keyword set, second keyword set includes that second title segments at least One keyword.
10. device according to claim 8, which is characterized in that the first Weight Acquisition module includes:
First Weight Acquisition unit, for the corresponding weight rank of type according to each keyword, according to weight rank Sequence from high to low is that each keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than The weight that the low keyword of weight rank is distributed;Or,
Second Weight Acquisition unit is described each for the sequence of the frequency of occurrences according to each keyword from high to low Keyword distributes weight, so that the weight that the high keyword of the frequency of occurrences is distributed is greater than the low keyword of the frequency of occurrences and is distributed Weight;Or,
Third Weight Acquisition unit, for the corresponding weight rank of type according to each keyword, according to weight rank Sequence from high to low is that each keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than The weight that the low keyword of weight rank is distributed;
Adjustment unit, for the frequency of occurrences according to each keyword, weight that each keyword is distributed into Row adjustment.
11. device according to claim 8, which is characterized in that the type of the keyword includes noun, verb or void Word, the weight of noun are superior to the weight rank of verb and function word;
The frequency of occurrences of the keyword is the frequency that the keyword occurs in stored file name, alternatively, described The frequency of occurrences of keyword is the frequency that the keyword occurs in the file name of stored specified classification, described specified Classification works as classification belonging to front opening file described in being.
12. device according to claim 11, which is characterized in that the weight of name is superior to other nouns in noun Weight rank.
13. device according to claim 8, which is characterized in that the title determining module includes:
Weight determining unit, power of the matching keywords in first title for including according to each second title Weight, determines the weight of each second title;
Title determination unit to be recommended, for the sequence of the weight according to each second title from big to small, by present count The second title of purpose is determined as second title to be recommended.
14. device according to claim 13, which is characterized in that the weight determining unit is used for described each second The weight for being determined as each second title with value of weight of the matching keywords that title includes in first title; Or,
The weight determining unit is used for the issuing time of the file according to indicated by each second title, determines described each The time weighting of second title, according to preset ratio, the matching keywords for including to each second title are described first Weight in title and value and the time weighting be weighted, obtain weighted sum, the weighted sum is true It is set to the weight of each second title.
CN201310652678.3A 2013-12-05 2013-12-05 File recommendation method and device Active CN104699696B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201310652678.3A CN104699696B (en) 2013-12-05 2013-12-05 File recommendation method and device
PCT/CN2015/072103 WO2015081909A1 (en) 2013-12-05 2015-02-02 File recommendation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310652678.3A CN104699696B (en) 2013-12-05 2013-12-05 File recommendation method and device

Publications (2)

Publication Number Publication Date
CN104699696A CN104699696A (en) 2015-06-10
CN104699696B true CN104699696B (en) 2018-12-28

Family

ID=53272920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310652678.3A Active CN104699696B (en) 2013-12-05 2013-12-05 File recommendation method and device

Country Status (2)

Country Link
CN (1) CN104699696B (en)
WO (1) WO2015081909A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106331778B (en) 2015-07-06 2020-08-14 腾讯科技(深圳)有限公司 Video recommendation method and device
US10387431B2 (en) 2015-08-24 2019-08-20 Google Llc Video recommendation based on video titles
CN105205159B (en) * 2015-09-29 2020-06-02 陈中和 Device and method for automatically feeding back information
CN106708858A (en) 2015-11-13 2017-05-24 阿里巴巴集团控股有限公司 Information recommendation method and device
CN107832405A (en) * 2017-11-03 2018-03-23 北京小度互娱科技有限公司 The method and apparatus for calculating the correlation between title
CN110020132B (en) * 2017-11-03 2023-04-11 腾讯科技(北京)有限公司 Keyword recommendation method and device, computing equipment and storage medium
CN108256010A (en) * 2018-01-03 2018-07-06 阿里巴巴集团控股有限公司 Content recommendation method and device
CN109144954B (en) * 2018-09-18 2021-03-16 北京字节跳动网络技术有限公司 Resource recommendation method and device for editing document and electronic equipment
CN109240991B (en) * 2018-09-26 2021-07-30 Oppo广东移动通信有限公司 File recommendation method and device, storage medium and intelligent terminal
CN112256843B (en) * 2020-12-22 2021-04-20 华东交通大学 News keyword extraction method and system based on TF-IDF method optimization

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102760124A (en) * 2011-04-25 2012-10-31 阿里巴巴集团控股有限公司 Pushing method and system for recommended data
CN102789453A (en) * 2011-05-16 2012-11-21 阿里巴巴集团控股有限公司 Advertising information release method and device
CN102799589A (en) * 2011-05-25 2012-11-28 阿里巴巴集团控股有限公司 Information pushing method and device
CN103164405A (en) * 2011-12-08 2013-06-19 盛乐信息技术(上海)有限公司 Generation method for relevant video data bank, recommendation method and recommendation system for relevant videos
CN103365899A (en) * 2012-04-01 2013-10-23 腾讯科技(深圳)有限公司 Question recommending method and question recommending system both in questions-and-answers community
CN103425687A (en) * 2012-05-21 2013-12-04 阿里巴巴集团控股有限公司 Retrieval method and system based on queries

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079894A (en) * 2006-12-21 2007-11-28 腾讯科技(深圳)有限公司 A system and method for pushing network information
US20110218994A1 (en) * 2010-03-05 2011-09-08 International Business Machines Corporation Keyword automation of video content
CN103106208B (en) * 2011-11-11 2017-09-15 中国移动通信集团公司 A kind of streaming medium content in mobile Internet recommends method and system
CN103186550A (en) * 2011-12-27 2013-07-03 盛乐信息技术(上海)有限公司 Method and system for generating video-related video list

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102760124A (en) * 2011-04-25 2012-10-31 阿里巴巴集团控股有限公司 Pushing method and system for recommended data
CN102789453A (en) * 2011-05-16 2012-11-21 阿里巴巴集团控股有限公司 Advertising information release method and device
CN102799589A (en) * 2011-05-25 2012-11-28 阿里巴巴集团控股有限公司 Information pushing method and device
CN103164405A (en) * 2011-12-08 2013-06-19 盛乐信息技术(上海)有限公司 Generation method for relevant video data bank, recommendation method and recommendation system for relevant videos
CN103365899A (en) * 2012-04-01 2013-10-23 腾讯科技(深圳)有限公司 Question recommending method and question recommending system both in questions-and-answers community
CN103425687A (en) * 2012-05-21 2013-12-04 阿里巴巴集团控股有限公司 Retrieval method and system based on queries

Also Published As

Publication number Publication date
CN104699696A (en) 2015-06-10
WO2015081909A1 (en) 2015-06-11

Similar Documents

Publication Publication Date Title
CN104699696B (en) File recommendation method and device
CN107832437B (en) Audio/video pushing method, device, equipment and storage medium
CN105653705B (en) Hot event searching method and device
CN104598505B (en) Multimedia resource recommends method and device
Bonnin et al. Automated generation of music playlists: Survey and experiments
TWI636416B (en) Method and system for multi-phase ranking for content personalization
US8244751B2 (en) Information processing apparatus and presenting method of related items
US9251532B2 (en) Method and apparatus for providing search capability and targeted advertising for audio, image, and video content over the internet
US8972419B2 (en) Item selecting apparatus, item selecting method and item selecting program
CN104504059B (en) Multimedia resource recommends method
CN102084645B (en) Related scene addition device and related scene addition method
CN108235141A (en) Live video turns method, apparatus, server and the storage medium of fragmentation program request
CN104021140B (en) A kind of processing method and processing device of Internet video
US10482142B2 (en) Information processing device, information processing method, and program
CN106454536B (en) The determination method and device of information recommendation degree
Hopfgartner et al. Semantic user profiling techniques for personalised multimedia recommendation
KR100518724B1 (en) Methods for constructing multimedia database and providing multimedia-search service and apparatus therefor
CN103761263A (en) Method for recommending information for users
US20130262458A1 (en) Information processing device and program
JP6538866B2 (en) Identify content appropriate for children algorithmically without human intervention
CN109922357A (en) The method and device of video recommendations
CN106815284A (en) The recommendation method and recommendation apparatus of news video
KR101682659B1 (en) Method for customized news alarm based on keyword and management server for news search for the same
Schneider et al. Five decades of US, UK, German and Dutch music charts show that cultural processes are accelerating
CN108140034B (en) Selecting content items based on received terms using a topic model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant