CN104699696B - File recommendation method and device - Google Patents
File recommendation method and device Download PDFInfo
- Publication number
- CN104699696B CN104699696B CN201310652678.3A CN201310652678A CN104699696B CN 104699696 B CN104699696 B CN 104699696B CN 201310652678 A CN201310652678 A CN 201310652678A CN 104699696 B CN104699696 B CN 104699696B
- Authority
- CN
- China
- Prior art keywords
- title
- keyword
- weight
- frequency
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
Abstract
The invention discloses a kind of file recommendation method and devices, belong to network technique field.The described method includes: being segmented to obtain the first keyword set to the first title;According to default corresponding relationship, at least one second title and the second keyword set are obtained, the default corresponding relationship includes the corresponding relationship between keyword and file name comprising the keyword;Identical keyword is obtained in first keyword set and corresponding second keyword set of each second title as matching keywords;Obtain the weight of matching keywords that each second title includes in first title;Determine the second title to be recommended;Recommend file indicated by the second title of the determination.The present invention determines the second title to be recommended from multiple the second alternative titles by determining weight according to the part of speech of matching keywords, according to weight, improves the degree of correlation of consequently recommended file name and the title when front opening file, improves recommendation success rate.
Description
Technical field
The present invention relates to network technique field, in particular to a kind of file recommendation method and device.
Background technique
In daily Above-the-line, user is difficult therefrom at every moment in facing various information
Filter out oneself really interested information.For the ease of the screening of user, server can record according to the browsing of user, is emerging
Interest hobby etc. recommends it may interested information for user.
By taking video as an example, when recommending video, server can be recommended under type belonging to currently playing video for user
Most popular video, e.g., when currently playing video is the video of " sport " type, server is that user recommends under " sport " type
Most popular video.Alternatively, server calculates the LD between the title of each video and the title of currently playing video
(Levenshtein Distance, editing distance), by the LD between title and the title of currently playing video apart from the smallest
Video recommendations are to user.
When recommending video most popular under type belonging to currently playing video, the most popular video and currently playing view
The degree of correlation of frequency may be very low, and then causes to recommend success rate low;And server recommends video using the method for calculating LD distance
When, LD distance can only mechanically measure the difference of copy editor's level between different video title, so that final determining recommendation
Video name and currently playing video name semantically may differ by very remote, and it is very low equally to will cause the video degree of correlation, in turn
Cause to recommend success rate very low.
Summary of the invention
In order to solve problems in the prior art, the embodiment of the invention provides a kind of file recommendation method and devices.It is described
Technical solution is as follows:
In a first aspect, providing a kind of file recommendation method, which comprises
First title is segmented, the first keyword set, the first entitled current name for opening file are obtained
Claim, first keyword set includes at least one keyword that first title segments;
According to default corresponding relationship, at least one second title and at least one described second title corresponding second are obtained
Keyword set, the corresponding file name of keyword in second entitled first keyword set are described default
Corresponding relationship includes the corresponding relationship between keyword and file name comprising the keyword;
Obtain identical key in first keyword set and corresponding second keyword set of each second title
Word, using the identical keyword as matching keywords;
Obtain the weight of matching keywords that each second title includes in first title;
Weight of the matching keywords for including according to each second title in first title, determines to be recommended
The second title;
Recommend file indicated by the second title of the determination.
Second aspect, provides a kind of file recommendation device, and described device includes:
First participle module obtains the first keyword set, described first is entitled for segmenting to the first title
When the title of front opening file, first keyword set includes at least one key that first title segments
Word;
Second set obtain module, for according to preset corresponding relationship, obtain at least one second title and it is described at least
Corresponding second keyword set of one the second title, the keyword pair in second entitled first keyword set
The file name answered, the default corresponding relationship include the corresponding pass between keyword and the file name comprising the keyword
System;
Matching module, for obtaining first keyword set and corresponding second keyword set of each second title
In identical keyword, using the identical keyword as matching keywords;
Weight Acquisition module, for obtaining matching keywords that each second title includes in first title
Weight;
Title determining module, the matching keywords for including according to each second title are in first title
Weight, determine the second title to be recommended;
Recommending module, for recommending file indicated by the second title of the determination.
Technical solution provided in an embodiment of the present invention has the benefit that
Method and apparatus provided in an embodiment of the present invention are obtained by handling the first title when front opening file
To multiple the second alternative titles, each second title is matched according to first title, determines each second title packet
Matching keywords included, and weight is determined according to the part of speech of matching keywords, thus according to weight from multiple alternative second places
The second title to be recommended is determined in title, and recommends file indicated by the second title of the determination, is improved consequently recommended
The degree of correlation of file name and the title when front opening file, improves recommendation success rate.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is a kind of flow chart of file recommendation method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of file recommendation method provided in an embodiment of the present invention;
Fig. 3 is a kind of file recommendation apparatus structure schematic diagram provided in an embodiment of the present invention;
Fig. 4 is a kind of server architecture schematic diagram provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
Fig. 1 is a kind of flow chart of file recommendation method provided in an embodiment of the present invention.The execution master of the inventive embodiments
Body is server, referring to Fig. 1, which comprises
101, the first title is segmented, obtains the first keyword set, the first entitled current opening file
Title, first keyword set include at least one keyword that first title segments.
102, according to corresponding relationship is preset, obtain at least one second title with this at least one second title corresponding the
Two keyword sets, the corresponding file name of keyword in second entitled first keyword set, this is default to correspond to
Relationship includes the corresponding relationship between keyword and file name comprising the keyword.
103, identical pass in first keyword set and corresponding second keyword set of each second title is obtained
Keyword, using the identical keyword as matching keywords.
104, the weight of matching keywords that each second title includes in first title is obtained.
105, weight of the matching keywords for including according to each second title in first title, determines to be recommended
The second title.
106, recommend file indicated by the second title of the determination.
Method provided in an embodiment of the present invention is obtained multiple by handling the first title when front opening file
The second alternative title matches each second title according to first title, determines that each second title includes
Weight is determined with keyword, and according to the part of speech of matching keywords, thus true from multiple the second alternative titles according to weight
Fixed second title to be recommended, and recommend file indicated by the second title of the determination, improve consequently recommended filename
Claim to improve recommendation success rate with the degree of correlation of the title when front opening file.
Optionally, according to default corresponding relationship, obtaining at least one second title, at least one second title is corresponding with this
The second keyword set include:
Corresponding relationship is preset according to this, obtains at least one second title;
The second title of each of at least one second title for this segments second title, obtains second
Keyword set, second keyword set include at least one keyword that second title segments.
Optionally, matching keywords that each second title includes are obtained before the weight in first title, it should
Method further include:
According at least one in first keyword set in the type and the frequency of occurrences of each keyword, it is every to obtain this
Weight of a keyword in first title.
Optionally, according in first keyword set in the type and the frequency of occurrences of each keyword at least one of,
Obtaining weight of each keyword in first title includes:
It is that this is every according to the sequence of weight rank from high to low according to the corresponding weight rank of the type of each keyword
A keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than the low keyword of weight rank and divides
The weight matched;Or,
Frequency of occurrences sequence from high to low according to each keyword is that each keyword distributes weight, makes to obtain
The weight that the high keyword of existing frequency is distributed is greater than the weight that the low keyword of the frequency of occurrences is distributed;Or,
It is that this is every according to the sequence of weight rank from high to low according to the corresponding weight rank of the type of each keyword
A keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than the low keyword of weight rank and divides
The weight matched;
According to the frequency of occurrences of each keyword, the weight distributed each keyword is adjusted.
Optionally, the type of the keyword includes noun, verb or function word, and the weight of noun is superior to verb and function word
Weight rank;
The frequency of occurrences of the keyword is the frequency that the keyword occurs in stored file name, alternatively, the pass
The frequency of occurrences of keyword is the frequency that the keyword occurs in the file name of stored specified classification, which is
Deserve classification belonging to front opening file.
Optionally, the weight of name is superior to the weight ranks of other nouns in noun.
Optionally, weight of the matching keywords for including according to each second title in first title, determine to
Recommend the second title include:
Weight of the matching keywords for including according to each second title in first title, determine this each second
The weight of title;
According to the weight sequence from big to small of each second title, by the second title of preset number be determined as this to
The second title recommended.
Optionally, weight of the matching keywords for including according to each second title in first title, determining should
The weight of each second title includes:
By the weight of matching keywords that each second title includes in first title and value to be determined as this every
The weight of a second title;Or,
According to the issuing time of file indicated by each second title, the time weighting of each second title is determined,
According to preset ratio, weight of the matching keywords for including to each second title in first title and value and this
Time weighting is weighted, and obtains weighted sum, which is determined as to the weight of each second title.
All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer
It repeats one by one.
Fig. 2 is a kind of flow chart of file recommendation method provided in an embodiment of the present invention.The execution master of the inventive embodiments
Body is server, referring to fig. 2, which comprises
201, the server segments the first title, obtains the first keyword set, this first entitled is currently beaten
The title of open file, first keyword set include at least one keyword that first title segments.
The embodiment of the present invention be applied to user's opened file, the server according to when front opening file title, for
Recommend under the scene of alternative document at family.The server can for when front opening file association server or with work as front opening
Functional module in the server of file association, it is not limited in the embodiment of the present invention.
Further, the embodiment of the present invention is applied to when the field of the customized title of entitled publisher of front opening file
Under scape.Different from the title that movie name or TV play title etc. have been provided in publication, the customized title of publisher can
Can be very long or very short, it may be a simple word, it is also possible to be a complicated sentence, the embodiment of the present invention, that is, basis
The customized personalized name of publisher recommends file for user.
Wherein, this document can be video file provided by the server, audio file or text file etc., such as video
Net provided by the audio file or document sharing server of network video file, the offer of audio website that Website server provides
Network document etc., it is not limited in the embodiment of the present invention.
Specifically, which obtains when detecting that user opens file when the name of front opening file is referred to as first
Title, and first title is segmented, at least one keyword of first title is obtained, by least one keyword
Form first keyword set.
For example, first entitled " Liu Dehua attends the clothes worn when the concert of schoolmate ", then to first title
It is segmented, obtains first keyword set { Liu Dehua, schoolmate, concert, clothes }.
Wherein, the server to first title segment when, can using based on string matching segmenting method or
Segmenting method of the person based on statistics, it is not limited in the embodiment of the present invention.
202, the server obtains at least one second title according to default corresponding relationship, this second it is entitled this
The corresponding file name of keyword in one keyword set, the default corresponding relationship include keyword with comprising the keyword
Corresponding relationship between file name.
Wherein, which includes at least one keyword, and for every in first keyword set
For a keyword, which be can be obtained by inquiring the default corresponding relationship comprising in first keyword set
The file name of any one or more keywords.
For example, first title, the keyword in first keyword set and the corresponding second place of each keyword
Corresponding relationship between referred to as is as shown in table 1.
Table 1
Optionally, before the step 202, this method further include: according to the stored file name of the server, establish
The default corresponding relationship.
Specifically, which segments the title of stored All Files, obtains each file name and includes
Keyword;The file comprising the keyword is obtained according to the keyword that each file name includes for a keyword
Title;Establish the default corresponding relationship between the keyword and file name comprising the keyword.
Still optionally further, the keyword which includes to each file name establishes inverted index, by foundation
Inverted index is determined as the default corresponding relationship.
203, the second title of each of at least one second title for this, the server divide second title
Word obtains the second keyword set, which includes at least one keyword that second title segments.
Citing based on step 202, second entitled " the Liu De China concert complete or collected works ", then the server is to the second place
Title obtains second keyword set { Liu Dehua, concert, complete or collected works } after being segmented.
Wherein, which can also use the segmenting method based on string matching when segmenting to second title
Or the segmenting method based on statistics, it is not limited in the embodiment of the present invention.
204, the server obtains in first keyword set and corresponding second keyword set of each second title
Identical keyword, using the identical keyword as matching keywords.
Specifically, for a keyword in first keyword set, second keyword set is traversed, judgement should
It whether include the keyword in second keyword set, when including the keyword, by the key in second keyword set
Word continues to carry out above-mentioned judgement to each keyword in first keyword set, obtains at least one as matching keywords
A matching keywords.Alternatively, a keyword in second keyword set is traversed first keyword set, is sentenced
It whether include the keyword in first keyword set of breaking, when including the keyword, by this in first keyword set
Keyword continues to carry out above-mentioned judgement to each keyword in second keyword set, obtain extremely as matching keywords
Few matching keywords.
Citing based on step 201 and step 203, first keyword set are combined into { Liu Dehua, schoolmate, concert, clothes
Dress }, which is combined into { Liu Dehua, concert, complete or collected works }, then the matching keywords are " Liu Dehua " and " sing
Meeting ".
205, the server is according to the corresponding weight rank of type of each keyword in first keyword set, according to
The sequence of weight rank from high to low is that each keyword distributes weight, so that the power that the high keyword of weight rank is distributed
The weight that the great keyword low in weight rank is distributed.
In embodiments of the present invention, which includes that at least one is identical with second keyword set
Matching keywords, but first title and second title semantically may differ by very greatly.Therefore, to be recommended the is being selected
When two titles, in order to improve the degree of correlation of the second title to be recommended and first title, by for first keyword set
In keyword distribute weight, accordingly determine the weight of each second title, with improve finally determine second place to be recommended
Claim the degree of correlation with first title.
Specifically, which presets weight rank corresponding to the type of each keyword, true in the server
In fixed first keyword set when type of each keyword, according to the corresponding power of the preset each type of the server
Heavy duty is other, determines the weight rank of each keyword, according to the sequence of weight rank from high to low, to each keyword into
Row sequence, and distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than the low keyword institute of weight rank
The weight of distribution.
Optionally, the weight that each keyword is distributed in first keyword set is 1 with value.
Still optionally further, the type of the keyword includes noun, verb or function word, and the weight of noun is superior to verb
With the weight rank of function word, and the weight of name is superior to the weight rank of other nouns in noun.
Such as, first entitled " Liu Dehua attends the clothes worn when the concert of schoolmate ", noun " Liu De therein
China ", " schoolmate ", " concert ", " clothes " weight rank be higher than verb " attending ", " wearing " and function word " ", " when "
Weight rank.
Wherein, the name in noun can be name, place name, organization names, brand name etc., and the embodiment of the present invention is to this
Without limitation.The weight of name is superior to the weight rank of other nouns, and such as " Liu Dehua ", the weight rank of " schoolmate " are high
In " concert ", the weight rank of " clothes ".
Still by taking first entitled " Liu Dehua attends the clothes worn when the concert of schoolmate " as an example, which is determined
The weight of " Liu Dehua ", " schoolmate " are superior to the weight rank of " concert ", " clothes ", the power of " concert ", " clothes "
Be superior to again " attending ", " wearing ", " ", " when " weight rank, then the server can distribute for keyword " Liu Dehua "
Weight 0.3 is keyword " schoolmate " distribution weight 0.3, distributes weight 0.2 for keyword " concert ", is keyword " clothes
Dress " distribution weight 0.1 distributes weight 0.1 for keyword " attending ", remaining keyword distributes weight 0.
In another embodiment provided in an embodiment of the present invention, which can be replaced by following steps (1):
(1) it is that each keyword distributes weight according to the sequence of the frequency of occurrences of each keyword from high to low, makes
It obtains the weight that the high keyword of the frequency of occurrences is distributed and is greater than the weight that the low keyword of the frequency of occurrences is distributed.
In embodiments of the present invention, it is believed that the higher keyword of the frequency of occurrences is more warm in first keyword set
Door, then user is likely to interested in file relevant to the higher keyword of the frequency of occurrences, it can according to first pass
The frequency of occurrences of each keyword of keyword set distributes weight.
Optionally, the frequency of occurrences of the keyword is the frequency that the keyword occurs in stored file name, or
Person, the frequency of occurrences of the keyword are the frequency that the keyword occurs in the file name of stored specified classification, this refers to
Determining classification is to deserve classification belonging to front opening file.
Wherein, some subclass may be belonged to by deserving front opening file, which still belongs to a certain female classification, then should
Server can deserve specified classification belonging to front opening file according to the difference for recommending accuracy requirement, determination.
Such as when entitled " perseverance is won the championship greatly " of front opening file, belong to the football classification in Sport Class, then the server
The frequency of occurrences of the keyword " winning the championship " in the file name of football classification can be calculated, for the keyword " winning the championship " distribution
Weight, rather than calculate the frequency of occurrences of the keyword " winning the championship " in the file name of all categories or in Sport Class
The frequency of occurrences in file name.
Further, which can be TF(Term Frequency, word frequency) or DF(Document
Frequency, document-frequency).
Still by taking first entitled " Liu Dehua attends the clothes worn when the concert of schoolmate " as an example, which is determined
First title belongs to singer's classification, then calculates keyword " Liu Dehua ", " schoolmate ", " concert ", " clothes " in singer's class
The frequency of occurrences in other file name, if final calculated keyword " Liu Dehua ", " schoolmate " and " concert "
The frequency of occurrences is respectively 0.3,0.2 and 0.1, then the server can be keyword according to the sequence of the frequency of occurrences from high to low
" Liu Dehua " distributes weight 0.5, is keyword " schoolmate " distribution weight 0.4, distributes weight 0.1 for keyword " concert ",
It is 0 that remaining keyword, which distributes weight,.
Still optionally further, which calculates each pass in the file name that the server stores in preset duration
The frequency of occurrences of keyword.Wherein, which can be preset by the server.
Above-mentioned steps 205 and step (1) are corresponding according to the type of each keyword in first keyword set respectively
Weight rank and the frequency of occurrences of each keyword distribute weight, in fact, the server can also be every by comprehensively considering
The corresponding weight rank of the type of a keyword and the frequency of occurrences distribute weight.I.e. provided in an embodiment of the present invention another
In embodiment, which can also be replaced by following steps (2):
(2) according to the corresponding weight rank of the type of each keyword, it is according to the sequence of weight rank from high to low
Each keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than the low keyword of weight rank
The weight distributed;According to the frequency of occurrences of each keyword, the weight distributed each keyword is adjusted.
In practical applications, it is believed that the high keyword of the frequency of occurrences is more popular, but the key that the frequency of occurrences is high
The degree of correlation of corresponding second title of word and first title may be very low, and user might not be to the second title institute of the hot topic
The file of instruction is interested.And in embodiments of the present invention, which can also be corresponding in the type according to each keyword
Weight rank, after distributing weight for each keyword, according to the frequency of occurrences of each keyword, to each keyword
The weight distributed is adjusted.By the degree of correlation and second title that comprehensively consider second title and first title
The frequency of occurrences, both can be improved finally determine the second title to be recommended and first title the degree of correlation, can also be excellent
First the selection higher file recommendation of the frequency of occurrences is to user.
Further, " according to the frequency of occurrences of each keyword, which is divided in the step (2)
The weight matched is adjusted ", following any mode can be used:
(2-1) determines adjustment amplitude according to the frequency of occurrences of each keyword, according to determining adjustment amplitude, to this
The weight that each keyword is distributed is increased or reduced accordingly.
Such as, the server be keyword " Liu Dehua ", " schoolmate ", " concert ", " clothes ", " attending " distribution weight
Be 0.3,0.3,0.2,0.1,0.1, and calculate during Week keyword " Liu Dehua ", " schoolmate ", " concert ",
The frequency of occurrences of " clothes " and " attending " is respectively 0.3,0.2,0.1,0.2 and 0.01, it is determined that keyword " Liu Dehua ", "
Schoolmate ", " concert ", " clothes " and " attending " adjustment amplitude be 0.025,0.025, -0.1,0.15, -0.1, then according to should
Adjustment amplitude, after being adjusted to each keyword, the final weight for determining distribution is 0.275,0.275,0.1,0.25,0.
The frequency of occurrences is more than or equal to the keyword institute of preset threshold according to the frequency of occurrences of each keyword by (2-2)
The weight of distribution increases default adjustment weight, described in the weight reduction that the keyword that the frequency of occurrences is less than preset threshold is distributed
Default adjustment weight.
Such as, which determines that the preset threshold is 0.2, which is 0.05, then when the server is to close
The weight that keyword " Liu Dehua ", " schoolmate ", " concert ", " clothes ", " attending " are distributed is 0.3,0.3,0.2,0.1,0.1,
And calculate keyword " Liu Dehua ", " schoolmate ", " concert ", " clothes " and " attending " the frequency of occurrences be respectively 0.3,
0.2,0.1,0.2 and 0.01 when, keyword " Liu Dehua ", " schoolmate ", " clothes " by the frequency of occurrences more than or equal to 0.2 divide
The weight matched increases by 0.05, and the weight that the keyword " concert " by the frequency of occurrences less than 0.2, " attending " are distributed is reduced
0.05, then the final weight for determining distribution is 0.25,0.25,0.15,0.15,0.05.
It should be noted that the embodiment of the present invention is illustrated so that the step 205 executes after the step 204 as an example,
In fact, the step 205 need to only execute after the step 201, before the step 206, i.e. the step 205 can also be
Execute before the step 204, or be performed simultaneously with the step 204, the embodiment of the present invention to execution opportunity of the step 205 not
It limits.
206, the server obtains weight of the matching keywords in first title included by each second title.
In embodiments of the present invention, which has determined that each keyword is in the first place in first keyword set
Weight in title that is to say the weight for having determined that each matching keywords in first title, then the server determines each
Power of the matching keywords that the matching keywords and each second title that second title includes include in first title
Weight.
Based on table 1, it is assumed that first entitled " Liu Dehua attends the clothes worn when the concert of schoolmate ", and the service
Device is that keyword " Liu Dehua " distributes weight 0.3, is keyword " schoolmate " distribution weight 0.3, for keyword " concert " point
With weight 0.2, weight 0.1 is distributed for keyword " clothes ", distributes weight 0.1, the distribution of remaining keyword for keyword " attending "
Weight 0, then the server each of determines that weight of the matching keywords in first title that the second title includes can be as
Shown in table 2.
Table 2
207, the issuing time of server file according to indicated by each second title, determines each second title
Time weighting, according to preset ratio, weight of the matching keywords for including to each second title in first title
And value and the time weighting be weighted, obtain weighted sum, which be determined as each second place
The weight of title.
In embodiments of the present invention, file indicated by the second title may be the file of newest publication, it is also possible to for already
The file of publication, and the issuing time of file is different, the interested degree of user is also different, i.e., issuing time influences whether user
Interested degree, and then influence recommendation success rate.Therefore, it when determining second title to be recommended, needs to consider
The issuing time of file indicated by each second title.
Specifically, which calculates the weight of matching keywords that each second title includes in first title
And value each second title is ranked up and according to the issuing time of file indicated by each second title, according to
It puts in order, time weighting is distributed for each second title, so that the time weighting of second title in issuing time evening is higher than
The time weighting of the second title of issuing time morning.The server according to the preset ratio, to this and value and the time weighting into
Row weighted calculation obtains weighted sum, the weight of as each second title.
Wherein, which refers to the ratio between value and the time weighting, according to the ratio, can determine
The weighting coefficient of this and value and time weighting when being weighted.The preset ratio can be preset by the server,
It can also be adjusted in use by the server, such as when the issuing time of front opening file is more early, time power
Weight proportion is smaller, and when front opening file is the file of the stronger types of timeliness such as " news ", the time weighting institute
Large percentage is accounted for, it is not limited in the embodiment of the present invention.
Based on table 2, second entitled " the Liu De China concert complete or collected works ", it is assumed that the server is second title distribution
Time weighting is 0.4, and the preset ratio is 6:4, then the server calculates the power for the matching keywords that second title includes
Weight and value are 0.5, and the weight for calculating second title is 0.5*0.6+0.4*0.4=0.46.
Further, which can preset time interval and time power between issuing time and current time
Weight corresponding relationship, that is, determine time weighting corresponding to each time interval, then the server can calculate this each second
Name referring shows the time interval between the issuing time of file and current time, according to the preset corresponding relationship, really
The time weighting of fixed each second title.
Such as, the time weighting which presets the second title that the time interval is 1 day is 0.9, between the time
Be divided into 2 days the second titles time weighting be 0.8 ... then for second title, the server determine this second
When time interval between the issuing time and current time of the file that name referring shows is 4 days, determine second title when
Between weight be 0.6.
It should be noted that above-mentioned steps 207 are optional step, which can not also consider the file distribution time
It influences, and the only weight according to matching keywords included by each second title in first title, determine each second
The weight of title, i.e., in another embodiment provided in an embodiment of the present invention, which can be replaced by following steps: will
Weight of the matching keywords that each second title includes in first title is determined as each second title with value
Weight.It is such as based on table 2, second entitled " the Liu De China concert complete or collected works ", then the server calculates second title and includes
Matching keywords weight and value be 0.5, that is, determine second title weight be 0.5.
208, the server according to each second title weight sequence from big to small, by the second place of preset number
Title is determined as second title to be recommended.
Wherein, which can be preset by the server, or by the server according to when front opening file
Display interface in the number of files that can show of recommendation region determine that it is not limited in the embodiment of the present invention.
Specifically, sequence of the server according to weight from big to small is ranked up each second title, and is arranging
It is determined as second title to be recommended in the second title of preceding preset number, so that the second title of preceding preset number will be come
Indicated file recommendation is to user.
209, the server recommends file indicated by the second title of the determination.
In embodiments of the present invention, when which recommends file indicated by the second title of the determination, can work as
The chained address of the second title of the determination is provided on the display interface of front opening file, the chained address is for jumping to this really
File indicated by the second fixed title.In addition, the server can also show file indicated by the second title of the determination
Relevant informations such as the thumbnail of generation, or display publisher, issuing time etc., it is not limited in the embodiment of the present invention.
Further, for the second title of multiple determinations, can successively be recommended according to weight order, also
Can successively be recommended according to issuing time, the embodiment of the present invention to this without limitation.
Method provided in an embodiment of the present invention is obtained multiple by handling the first title when front opening file
The second alternative title matches each second title according to first title, determines that each second title includes
Weight is determined with keyword, and according to the part of speech of matching keywords, thus true from multiple the second alternative titles according to weight
Fixed second title to be recommended, and recommend file indicated by the second title of the determination, improve consequently recommended filename
Claim to improve recommendation success rate with the degree of correlation of the title when front opening file.Further, it is contemplated that the issuing time of file
Factor, second title to be recommended is determined by calculating the time weighting of each second title, is further improved
Recommend success rate.
Fig. 3 is a kind of file recommendation apparatus structure schematic diagram provided in an embodiment of the present invention, and referring to Fig. 3, which includes:
First participle module 301, second set obtain module 302, matching module 303, Weight Acquisition module 304, title determining module
305, recommending module 306,
Wherein, first participle module 301 obtains the first keyword set, this first for segmenting to the first title
The entitled title when front opening file, first keyword set include first title segment at least one is crucial
Word;
Second set obtains module 302 and connect with first participle module 301, for obtaining extremely according to corresponding relationship is preset
Few second title corresponding second keyword set of at least one second title, second entitled first key with this
The corresponding file name of keyword in set of words, which includes keyword and the filename comprising the keyword
Corresponding relationship between referred to as;
Matching module 303 and second set obtain module 302 and connect, for obtaining first keyword set and each the
Identical keyword in corresponding second keyword set of two titles, using the identical keyword as matching keywords;
Weight Acquisition module 304 is connect with matching module 303, crucial for obtaining the matching that each second title includes
Weight of the word in first title;
Title determining module 305 is connect with Weight Acquisition module 304, the matching for including according to each second title
Weight of the keyword in first title, determines the second title to be recommended;
Recommending module 306 is connect with title determining module 305, for recommending text indicated by the second title of the determination
Part.
Optionally, second set acquisition module 302 includes:
Second title acquiring unit obtains at least one second title for presetting corresponding relationship according to this;
Second participle unit, for the second title of each of at least one second title for this, to second title
It is segmented, obtains the second keyword set, which includes at least one that second title segments
Keyword.
Optionally, the device further include:
First Weight Acquisition module, for the type and the frequency of occurrences according to each keyword in first keyword set
At least one of in, obtain weight of each keyword in first title.
Optionally, which includes:
First Weight Acquisition unit, for the corresponding weight rank of type according to each keyword, according to weight grade
Sequence not from high to low is that each keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than
The weight that the low keyword of weight rank is distributed;Or,
Second Weight Acquisition unit is that this is each for the sequence of the frequency of occurrences according to each keyword from high to low
Keyword distributes weight, so that the weight that the high keyword of the frequency of occurrences is distributed is greater than the low keyword of the frequency of occurrences and is distributed
Weight;Or,
Third Weight Acquisition unit, for the corresponding weight rank of type according to each keyword, according to weight grade
Sequence not from high to low is that each keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than
The weight that the low keyword of weight rank is distributed;
Adjustment unit, for the frequency of occurrences according to each keyword, weight which is distributed into
Row adjustment.
Optionally, the type of the keyword includes noun, verb or function word, and the weight of noun is superior to verb and function word
Weight rank;
The frequency of occurrences of the keyword is the frequency that the keyword occurs in stored file name, alternatively, the pass
The frequency of occurrences of keyword is the frequency that the keyword occurs in the file name of stored specified classification, which is
Deserve classification belonging to front opening file.
Optionally, the weight of name is superior to the weight ranks of other nouns in noun.
Optionally, which includes:
Weight determining unit, power of the matching keywords in first title for including according to each second title
Weight, determines the weight of each second title;
Title determination unit to be recommended will be preset for the sequence of the weight according to each second title from big to small
Second title of number is determined as second title to be recommended.
Optionally, the matching keywords which is used to include by each second title are in first title
In weight and value be determined as the weight of each second title;Or,
The weight determining unit is used for the issuing time of the file according to indicated by each second title, determine this each the
The time weighting of two titles, according to preset ratio, the matching keywords for including to each second title are in first title
Weight and value and the time weighting be weighted, obtain weighted sum, which be determined as this each
The weight of second title.
Device provided in an embodiment of the present invention is obtained multiple by handling the first title when front opening file
The second alternative title matches each second title according to first title, determines that each second title includes
Weight is determined with keyword, and according to the part of speech of matching keywords, thus true from multiple the second alternative titles according to weight
Fixed second title to be recommended, and recommend file indicated by the second title of the determination, improve consequently recommended filename
Claim to improve recommendation success rate with the degree of correlation of the title when front opening file.
It should be understood that file recommendation device provided by the above embodiment is when recommending file, only with above-mentioned each function
The division progress of module can according to need and for example, in practical application by above-mentioned function distribution by different function moulds
Block is completed, i.e., the internal structure of server is divided into different functional modules, described above all or part of to complete
Function.In addition, file recommendation device provided by the above embodiment and file recommendation method embodiment belong to same design, it is specific
Realization process is detailed in embodiment of the method, and which is not described herein again.
Fig. 4 is a kind of server architecture schematic diagram provided in an embodiment of the present invention, which can be because of configuration or performance
It is different and generate bigger difference, it may include one or more central processing units (central processing
Units, CPU) 422(is for example, one or more processors) and memory 432, one or more storages apply journey
The storage medium 430(of sequence 442 or data 444 such as one or more mass memory units).Wherein, 432 He of memory
Storage medium 430 can be of short duration storage or persistent storage.The program for being stored in storage medium 430 may include one or one
With upper module (diagram does not mark), each module may include to the series of instructions operation in server.Further, in
Central processor 422 can be set to communicate with storage medium 430, execute on server 400 a series of in storage medium 430
Instruction operation.
Server 400 can also include one or more power supplys 426, one or more wired or wireless networks
Interface 450, one or more input/output interfaces 458, and/or, one or more operating systems 441, such as
Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
The step as performed by server described in above-described embodiment can be based on the server architecture shown in Fig. 4.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (14)
1. a kind of file recommendation method, which is characterized in that the described method includes:
First title is segmented, the first keyword set, the first entitled current title for opening file, institute are obtained
Stating the first keyword set includes at least one keyword that first title segments, the first entitled publisher
Customized title;
According to default corresponding relationship, at least one second title and corresponding second key of at least one described second title are obtained
Set of words, the corresponding file name of keyword in second entitled first keyword set, the default correspondence
Relationship includes the corresponding relationship between keyword and file name comprising the keyword;
Identical keyword in first keyword set and corresponding second keyword set of each second title is obtained, it will
The identical keyword is as matching keywords;
According at least one in first keyword set in the type and the frequency of occurrences of each keyword, obtain described every
Weight of a keyword in first title;
Obtain the weight of matching keywords that each second title includes in first title;
Weight of the matching keywords for including according to each second title in first title determines to be recommended
Two titles;
Recommend file indicated by the second title of the determination.
2. the method according to claim 1, wherein obtaining at least one second place according to default corresponding relationship
Corresponding at least one described second title the second keyword set is claimed to include:
According to the default corresponding relationship, at least one described second title is obtained;
For the second title of each of at least one second title, second title is segmented, obtains second
Keyword set, second keyword set include at least one keyword that second title segments.
3. the method according to claim 1, wherein according to each keyword in first keyword set
At least one of in type and the frequency of occurrences, obtaining weight of each keyword in first title includes:
It is described every according to the sequence of weight rank from high to low according to the corresponding weight rank of the type of each keyword
A keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than the low keyword of weight rank and divides
The weight matched;Or,
Frequency of occurrences sequence from high to low according to each keyword is that each keyword distributes weight, makes to obtain
The weight that the high keyword of existing frequency is distributed is greater than the weight that the low keyword of the frequency of occurrences is distributed;Or,
It is described every according to the sequence of weight rank from high to low according to the corresponding weight rank of the type of each keyword
A keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than the low keyword of weight rank and divides
The weight matched;
According to the frequency of occurrences of each keyword, the weight distributed each keyword is adjusted.
4. the method according to claim 1, wherein the type of the keyword includes noun, verb or function word,
The weight of noun is superior to the weight rank of verb and function word;
The frequency of occurrences of the keyword is the frequency that the keyword occurs in stored file name, alternatively, described
The frequency of occurrences of keyword is the frequency that the keyword occurs in the file name of stored specified classification, described specified
Classification works as classification belonging to front opening file described in being.
5. according to the method described in claim 4, it is characterized in that, the weight of name is superior to the power of other nouns in noun
Heavy duty is other.
6. the method according to claim 1, wherein the matching keywords for including according to each second title
Weight in first title determines that the second title to be recommended includes:
Weight of the matching keywords for including according to each second title in first title determines described each
The weight of two titles;
According to the weight sequence from big to small of each second title, by the second title of preset number be determined as it is described to
The second title recommended.
7. according to the method described in claim 6, it is characterized in that, the matching keywords for including according to each second title
Weight in first title determines that the weight of each second title includes:
By the weight of matching keywords that each second title includes in first title and value be determined as it is described
The weight of each second title;Or,
According to the issuing time of file indicated by each second title, the time weighting of each second title is determined,
According to preset ratio, weight of the matching keywords for including to each second title in first title and value with
And the time weighting is weighted, and obtains weighted sum, and the weighted sum is determined as each second title
Weight.
8. a kind of file recommendation device, which is characterized in that described device includes:
First participle module obtains the first keyword set, described first is entitled current for segmenting to the first title
The title of file is opened, first keyword set includes at least one keyword that first title segments, institute
State the first entitled customized title of publisher;
Second set obtain module, for according to preset corresponding relationship, obtain at least one second title and it is described at least one
Corresponding second keyword set of second title, the keyword in second entitled first keyword set are corresponding
File name, the default corresponding relationship include the corresponding relationship between keyword and file name comprising the keyword;
Matching module, for obtaining phase in first keyword set and corresponding second keyword set of each second title
Same keyword, using the identical keyword as matching keywords;
First Weight Acquisition module, for according in first keyword set in the type and the frequency of occurrences of each keyword
At least one of, obtain weight of each keyword in first title;
Weight Acquisition module, for obtaining the power of matching keywords that each second title includes in first title
Weight;
Title determining module, power of the matching keywords in first title for including according to each second title
Weight, determines the second title to be recommended;
Recommending module, for recommending file indicated by the second title of the determination.
9. device according to claim 8, which is characterized in that the second set obtains module and includes:
Second title acquiring unit, for obtaining at least one described second title according to the default corresponding relationship;
Second participle unit, for for the second title of each of at least one second title, to second title
It is segmented, obtains the second keyword set, second keyword set includes that second title segments at least
One keyword.
10. device according to claim 8, which is characterized in that the first Weight Acquisition module includes:
First Weight Acquisition unit, for the corresponding weight rank of type according to each keyword, according to weight rank
Sequence from high to low is that each keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than
The weight that the low keyword of weight rank is distributed;Or,
Second Weight Acquisition unit is described each for the sequence of the frequency of occurrences according to each keyword from high to low
Keyword distributes weight, so that the weight that the high keyword of the frequency of occurrences is distributed is greater than the low keyword of the frequency of occurrences and is distributed
Weight;Or,
Third Weight Acquisition unit, for the corresponding weight rank of type according to each keyword, according to weight rank
Sequence from high to low is that each keyword distributes weight, so that the weight that the high keyword of weight rank is distributed is greater than
The weight that the low keyword of weight rank is distributed;
Adjustment unit, for the frequency of occurrences according to each keyword, weight that each keyword is distributed into
Row adjustment.
11. device according to claim 8, which is characterized in that the type of the keyword includes noun, verb or void
Word, the weight of noun are superior to the weight rank of verb and function word;
The frequency of occurrences of the keyword is the frequency that the keyword occurs in stored file name, alternatively, described
The frequency of occurrences of keyword is the frequency that the keyword occurs in the file name of stored specified classification, described specified
Classification works as classification belonging to front opening file described in being.
12. device according to claim 11, which is characterized in that the weight of name is superior to other nouns in noun
Weight rank.
13. device according to claim 8, which is characterized in that the title determining module includes:
Weight determining unit, power of the matching keywords in first title for including according to each second title
Weight, determines the weight of each second title;
Title determination unit to be recommended, for the sequence of the weight according to each second title from big to small, by present count
The second title of purpose is determined as second title to be recommended.
14. device according to claim 13, which is characterized in that the weight determining unit is used for described each second
The weight for being determined as each second title with value of weight of the matching keywords that title includes in first title;
Or,
The weight determining unit is used for the issuing time of the file according to indicated by each second title, determines described each
The time weighting of second title, according to preset ratio, the matching keywords for including to each second title are described first
Weight in title and value and the time weighting be weighted, obtain weighted sum, the weighted sum is true
It is set to the weight of each second title.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310652678.3A CN104699696B (en) | 2013-12-05 | 2013-12-05 | File recommendation method and device |
PCT/CN2015/072103 WO2015081909A1 (en) | 2013-12-05 | 2015-02-02 | File recommendation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310652678.3A CN104699696B (en) | 2013-12-05 | 2013-12-05 | File recommendation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104699696A CN104699696A (en) | 2015-06-10 |
CN104699696B true CN104699696B (en) | 2018-12-28 |
Family
ID=53272920
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310652678.3A Active CN104699696B (en) | 2013-12-05 | 2013-12-05 | File recommendation method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104699696B (en) |
WO (1) | WO2015081909A1 (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106331778B (en) | 2015-07-06 | 2020-08-14 | 腾讯科技(深圳)有限公司 | Video recommendation method and device |
US10387431B2 (en) | 2015-08-24 | 2019-08-20 | Google Llc | Video recommendation based on video titles |
CN105205159B (en) * | 2015-09-29 | 2020-06-02 | 陈中和 | Device and method for automatically feeding back information |
CN106708858A (en) | 2015-11-13 | 2017-05-24 | 阿里巴巴集团控股有限公司 | Information recommendation method and device |
CN107832405A (en) * | 2017-11-03 | 2018-03-23 | 北京小度互娱科技有限公司 | The method and apparatus for calculating the correlation between title |
CN110020132B (en) * | 2017-11-03 | 2023-04-11 | 腾讯科技(北京)有限公司 | Keyword recommendation method and device, computing equipment and storage medium |
CN108256010A (en) * | 2018-01-03 | 2018-07-06 | 阿里巴巴集团控股有限公司 | Content recommendation method and device |
CN109144954B (en) * | 2018-09-18 | 2021-03-16 | 北京字节跳动网络技术有限公司 | Resource recommendation method and device for editing document and electronic equipment |
CN109240991B (en) * | 2018-09-26 | 2021-07-30 | Oppo广东移动通信有限公司 | File recommendation method and device, storage medium and intelligent terminal |
CN112256843B (en) * | 2020-12-22 | 2021-04-20 | 华东交通大学 | News keyword extraction method and system based on TF-IDF method optimization |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102760124A (en) * | 2011-04-25 | 2012-10-31 | 阿里巴巴集团控股有限公司 | Pushing method and system for recommended data |
CN102789453A (en) * | 2011-05-16 | 2012-11-21 | 阿里巴巴集团控股有限公司 | Advertising information release method and device |
CN102799589A (en) * | 2011-05-25 | 2012-11-28 | 阿里巴巴集团控股有限公司 | Information pushing method and device |
CN103164405A (en) * | 2011-12-08 | 2013-06-19 | 盛乐信息技术(上海)有限公司 | Generation method for relevant video data bank, recommendation method and recommendation system for relevant videos |
CN103365899A (en) * | 2012-04-01 | 2013-10-23 | 腾讯科技(深圳)有限公司 | Question recommending method and question recommending system both in questions-and-answers community |
CN103425687A (en) * | 2012-05-21 | 2013-12-04 | 阿里巴巴集团控股有限公司 | Retrieval method and system based on queries |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079894A (en) * | 2006-12-21 | 2007-11-28 | 腾讯科技(深圳)有限公司 | A system and method for pushing network information |
US20110218994A1 (en) * | 2010-03-05 | 2011-09-08 | International Business Machines Corporation | Keyword automation of video content |
CN103106208B (en) * | 2011-11-11 | 2017-09-15 | 中国移动通信集团公司 | A kind of streaming medium content in mobile Internet recommends method and system |
CN103186550A (en) * | 2011-12-27 | 2013-07-03 | 盛乐信息技术(上海)有限公司 | Method and system for generating video-related video list |
-
2013
- 2013-12-05 CN CN201310652678.3A patent/CN104699696B/en active Active
-
2015
- 2015-02-02 WO PCT/CN2015/072103 patent/WO2015081909A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102760124A (en) * | 2011-04-25 | 2012-10-31 | 阿里巴巴集团控股有限公司 | Pushing method and system for recommended data |
CN102789453A (en) * | 2011-05-16 | 2012-11-21 | 阿里巴巴集团控股有限公司 | Advertising information release method and device |
CN102799589A (en) * | 2011-05-25 | 2012-11-28 | 阿里巴巴集团控股有限公司 | Information pushing method and device |
CN103164405A (en) * | 2011-12-08 | 2013-06-19 | 盛乐信息技术(上海)有限公司 | Generation method for relevant video data bank, recommendation method and recommendation system for relevant videos |
CN103365899A (en) * | 2012-04-01 | 2013-10-23 | 腾讯科技(深圳)有限公司 | Question recommending method and question recommending system both in questions-and-answers community |
CN103425687A (en) * | 2012-05-21 | 2013-12-04 | 阿里巴巴集团控股有限公司 | Retrieval method and system based on queries |
Also Published As
Publication number | Publication date |
---|---|
CN104699696A (en) | 2015-06-10 |
WO2015081909A1 (en) | 2015-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104699696B (en) | File recommendation method and device | |
CN107832437B (en) | Audio/video pushing method, device, equipment and storage medium | |
CN105653705B (en) | Hot event searching method and device | |
CN104598505B (en) | Multimedia resource recommends method and device | |
Bonnin et al. | Automated generation of music playlists: Survey and experiments | |
TWI636416B (en) | Method and system for multi-phase ranking for content personalization | |
US8244751B2 (en) | Information processing apparatus and presenting method of related items | |
US9251532B2 (en) | Method and apparatus for providing search capability and targeted advertising for audio, image, and video content over the internet | |
US8972419B2 (en) | Item selecting apparatus, item selecting method and item selecting program | |
CN104504059B (en) | Multimedia resource recommends method | |
CN102084645B (en) | Related scene addition device and related scene addition method | |
CN108235141A (en) | Live video turns method, apparatus, server and the storage medium of fragmentation program request | |
CN104021140B (en) | A kind of processing method and processing device of Internet video | |
US10482142B2 (en) | Information processing device, information processing method, and program | |
CN106454536B (en) | The determination method and device of information recommendation degree | |
Hopfgartner et al. | Semantic user profiling techniques for personalised multimedia recommendation | |
KR100518724B1 (en) | Methods for constructing multimedia database and providing multimedia-search service and apparatus therefor | |
CN103761263A (en) | Method for recommending information for users | |
US20130262458A1 (en) | Information processing device and program | |
JP6538866B2 (en) | Identify content appropriate for children algorithmically without human intervention | |
CN109922357A (en) | The method and device of video recommendations | |
CN106815284A (en) | The recommendation method and recommendation apparatus of news video | |
KR101682659B1 (en) | Method for customized news alarm based on keyword and management server for news search for the same | |
Schneider et al. | Five decades of US, UK, German and Dutch music charts show that cultural processes are accelerating | |
CN108140034B (en) | Selecting content items based on received terms using a topic model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |