WO2015081909A1

WO2015081909A1 - File recommendation method and device

Info

Publication number: WO2015081909A1
Application number: PCT/CN2015/072103
Authority: WO
Inventors: 尹程果
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2013-12-05
Filing date: 2015-02-02
Publication date: 2015-06-11
Also published as: CN104699696B; CN104699696A

Abstract

The present invention relates to the field of network technology, and discloses a file recommendation method and device. The method comprises: tokenizing a first title to obtain a first keyword set; obtaining, according to pre-set correspondence between keywords and file titles containing the keywords, at least one second title corresponding to the at least one keyword contained in the first keyword set, and obtaining a second keyword set corresponding to the at least one second title; obtaining keywords that appear both in the first keyword set and in the second keyword sets corresponding to each second title, and using the keywords as the matching keywords; obtaining the weight of the matching keywords in the first title; determining the second title to be recommended; recommending the file indicated by the second title that has been determined. The present invention determines the second title to be recommended by determining the weight of matching keywords, thereby enhancing the relevance between the file title to be recommended and the title of the currently open file, and improving the recommendation success rate.

Description

Document recommendation method and device

The present application claims the priority of the Chinese Patent Application, filed on Dec. 5, 2013, the entire disclosure of which is hereby incorporated by reference.

Technical field

The present invention relates to the field of network technologies, and in particular, to a file recommendation method and apparatus.

Background technique

In daily online activities, users are always faced with all kinds of information, but it is difficult to screen out the information that they are really interested in. In order to facilitate user screening, the server may recommend information that may be of interest to the user according to the user's browsing history, interests, and the like.

Taking video as an example, when recommending a video, the server can recommend the most popular video of the type to which the currently played video belongs. For example, when the currently playing video is a "sports" type video, the server recommends the "sports" type for the user. The most popular videos. Alternatively, the server calculates the edit distance (Levenshtein Distance, LD) between the name of each video and the name of the currently playing video, and recommends the video with the smallest LD between the name and the name of the currently playing video to the user.

When recommending the most popular video of the type to which the currently playing video belongs, the relevance of the most popular video to the currently playing video may be low, which may result in a low recommendation success rate; while the server uses the method of calculating LD to recommend the video, the LD only It can mechanically measure the difference in the editing level between different video names, so that the final definition of the recommended video name and the currently playing video name may be far from the semantics, which also causes the video correlation to be very low, which leads to a high recommendation success rate. low.

Summary of the invention

In order to solve the problem of the prior art, the embodiment of the present invention provides a file recommendation method and device, and the technical solution is as follows.

In a first aspect, a file recommendation method is provided, the method comprising:

Performing word segmentation on the first name to obtain a first keyword set, the first name is a name of a currently open file, and the first keyword set includes at least one keyword obtained by the first name word segmentation;

Acquiring at least one second name corresponding to the at least one keyword included in the first keyword set according to a preset correspondence between a keyword and a file name including the keyword, and acquiring the At least one second keyword set corresponding to the second name;

Obtaining the same keyword in the first keyword set and the second keyword set corresponding to each second name, and using the same keyword as a matching keyword;

Obtaining a weight of the matching keyword included in each of the second names in the first name;

Determining a second name to be recommended according to a weight of the matching keyword included in each second name in the first name;

The file indicated by the determined second name is recommended.

In a second aspect, a file recommendation apparatus is provided, the apparatus comprising:

a first participle module, configured to perform word segmentation on the first name, to obtain a first keyword set, the first name is a name of a currently open file, and the first keyword set includes at least the first name word segmentation obtained a keyword;

a second set obtaining module, configured to acquire, according to a preset preset correspondence between a keyword and a file name including the keyword, the at least one included in the first keyword set Obtaining at least one second name corresponding to a keyword, and acquiring a second keyword set corresponding to the at least one second name;

a matching module, configured to acquire the same keyword in the first keyword set and the second keyword set corresponding to each second name, and use the same keyword as a matching keyword;

a weight obtaining module, configured to obtain a weight of the matching keyword included in each of the second names in the first name;

a name determining module, configured to determine a second name to be recommended according to a weight of the matching keyword included in each second name in the first name;

A recommendation module for recommending the file indicated by the determined second name.

The beneficial effects brought by the technical solutions provided by the embodiments of the present invention are:

The method and the device provided by the embodiment of the present invention, by processing the first name of the currently open file, obtaining a plurality of alternative second names, matching the first name with each second name, and determining each The second name includes a matching keyword, and determines a weight of the matching keyword according to the part of speech of the matching keyword, thereby determining a second name to be recommended from the plurality of alternative second names according to the weight of the matching keyword, and recommending the The determined file indicated by the second name improves the relevance of the recommended file name to the name of the currently open file, and improves the recommendation success rate.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.

FIG. 1 is a flowchart of a file recommendation method according to an embodiment of the present invention;

2 is a flowchart of a file recommendation method according to an embodiment of the present invention;

3 is a schematic structural diagram of a file recommendation apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

FIG. 1 is a flowchart of a file recommendation method according to an embodiment of the present invention. The execution body of the embodiment of the present invention is a server. Referring to FIG. 1, the method includes:

101. Perform word segmentation on the first name to obtain a first keyword set, where the first name is a name of a currently open file, and the first keyword set includes at least one keyword obtained by the first name segmentation.

Obtaining, by the preset correspondence between the keyword and the file name including the keyword, acquiring at least one second name corresponding to the at least one keyword included in the first keyword set, and acquiring The second keyword set corresponding to the at least one second name.

103. Acquire the same keyword in the first keyword set and the second keyword set corresponding to each second name, and use the same keyword as the matching keyword.

104. Acquire a weight of the matching keyword included in each second name in the first name.

105. Determine, according to the weight of the matching keyword included in each second name in the first name, the second name to be recommended.

106. Recommend the file indicated by the determined second name.

The method provided by the embodiment of the present invention obtains a plurality of alternative second names by processing the first name of the currently opened file, and matches the first name with each second name. Determining a matching keyword included in each second name, and determining a weight of the matching keyword according to the part of speech of the matching keyword, thereby determining a second name to be recommended from the plurality of alternative second names according to the weight of the matching keyword And recommending the file indicated by the determined second name, improving the relevance of the final recommended file name to the name of the currently open file, and improving the recommendation success rate.

Optionally, obtaining the second keyword set corresponding to the at least one second name includes:

For each second name in the at least one second name, the second name is segmented to obtain a second keyword set, and the second keyword set includes at least one keyword obtained by the second name participle.

Optionally, before obtaining the weight of the matching keyword included in each second name in the first name, the method further includes:

Obtaining a weight of each keyword in the first keyword set in the first name according to at least one of a type and an appearance frequency of each keyword in the first keyword set;

Obtaining a weight of the matching keyword included in each of the second names in the first name includes: matching keywords included in each second name in the first keyword set in the first name The weight is used as the weight of the matching keyword included in each of the second names in the first name.

Optionally, according to at least one of a type and an appearance frequency of each keyword in the first keyword set, obtaining weights of each keyword in the first keyword set in the first name includes:

According to the weight level corresponding to the type of each keyword in the first keyword set, the weights are assigned to each keyword according to the order of weights, so that the keywords with high weight levels are assigned weights greater than the weights. The weight assigned to a keyword with a low level; or,

Assigning weights to each keyword according to the order of occurrence frequency of each keyword in the first keyword set, so that the keywords with high frequency of occurrence are assigned with the weight of the keywords with low frequency of occurrence. Weight; or,

According to the weight level corresponding to the type of each keyword in the first keyword set, the weights are assigned to each keyword according to the order of weights, so that the keywords with high weight levels are assigned weights greater than the weights. The weight assigned to the keyword with a low level; and the weight assigned to each keyword is adjusted according to the frequency of occurrence of each keyword.

Optionally, the type of the keyword includes a noun, a verb or a function word, and the weight level of the noun is higher than the weight level of the verb and the function word;

The frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name, or the frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name of the specified category, the specified category The category to which the currently open file belongs.

Optionally, the weight of the name in the noun is higher than the weight level of the other noun.

Optionally, determining, according to the weight of the matching keyword included in each second name in the first name, determining the second name to be recommended includes:

Determining the weight of each second name according to the weight of the matching keyword included in each second name in the first name;

A preset number of second names is determined as the second name to be recommended in descending order of the weight of each of the second names.

Optionally, determining, according to the weight of the matching keyword included in each second name in the first name, determining the weight of each second name includes:

Determining a sum of the weights of the matching keywords included in each of the second names in the first name as the weight of each of the second names; or

Determining a time weight of each second name according to a publishing time of the file indicated by each second name, and weighting the matching keyword included in each second name in the first name according to a preset ratio The sum value and the time weight are weighted to obtain a weighted sum value, and the weighted sum value is determined as the weight of each of the second names.

All of the above optional technical solutions may be combined to form an optional implementation of the present invention. For example, we will not repeat them here.

FIG. 2 is a flowchart of a file recommendation method according to an embodiment of the present invention. The executive body of the embodiment of the invention is a server. Referring to FIG. 2, the method includes the following steps.

201. The server performs segmentation on the first name to obtain a first keyword set, where the first name is a name of a currently open file, and the first keyword set includes at least one keyword obtained by the first name word segmentation.

The embodiment of the present invention is applied to a scenario in which a user has opened a file, and the server recommends other files according to the name of the currently opened file. The server may be a function module in the server associated with the currently open file or the server associated with the currently open file, which is not limited in this embodiment of the present invention.

Further, the embodiment of the present invention can be applied to a scenario in which the name of the currently open file is a publisher-defined name. Different from the name that has been specified at the time of publication, such as the name of the movie or the name of the TV show, the name of the publisher may be very long or short, and may be a simple word or a complicated sentence. Recommend files for users based on the publisher's customized personalized name.

The file may be a video file, an audio file, or a text file provided by the server, such as a network video file provided by a video website server, an audio file provided by an audio website, or a network document provided by a document sharing server, etc., which is implemented by the present invention. This example does not limit this.

Specifically, when detecting that the user opens the file, the server obtains the name of the currently opened file as the first name, and performs segmentation on the first name to obtain at least one keyword of the first name, and the at least one keyword. The first keyword set is composed.

Wherein, segmenting the first name means dividing the first name into one or several words or morphemes.

For example, the first name is “The costume worn by Andy Lau when attending Jacky Cheung’s concert”, and the first name is segmented to get the first keyword set {Andy Lau, Jacky Cheung, concert, clothing}.

The server may use a word segmentation-based word segmentation method or a statistical-based word segmentation method when the first name is segmented. The embodiment of the present invention does not limit this.

202. The server acquires at least one second name corresponding to the at least one keyword included in the first keyword set according to a preset correspondence between a keyword and a file name including the keyword.

The first keyword set includes at least one keyword, and for each keyword in the first keyword set, the server obtains the first keyword by querying the preset correspondence. The file name of any one or more keywords in the collection.

For example, the correspondence between the first name, the keywords in the first keyword set, and the second name corresponding to each keyword is as shown in Table 1.

Table 1

Optionally, before the step 202, the method further includes: establishing the preset correspondence according to the file name that is stored by the server.

Specifically, the server classifies the names of all the stored files to obtain keywords included in each file name; for a keyword, according to the keywords included in each file name, the file name including the keyword is obtained. Establishing a preset correspondence between the keyword and the file name containing the keyword.

Further optionally, the server establishes an inverted index for the keywords included in each file name, and determines the established inverted index as the preset correspondence.

203. For each second name in the at least one second name, the server classifies the second name to obtain a second keyword set, where the second keyword set includes at least one obtained by the second name participle Key words.

Based on the example of step 202, one of the second names is "The Complete Works of Andy Lau Concert", and the server divides the second name to obtain the second keyword set {Andy Lau, concert, complete works}.

Wherein, the server may also use a string based on the word segmentation The method of the word segmentation or the method of word segmentation based on statistics is not limited in the embodiment of the present invention.

204. The server acquires the same keyword in the first keyword set and the second keyword set corresponding to each second name, and uses the same keyword as the matching keyword.

Specifically, for a keyword in the first keyword set, traversing the second keyword set, determining whether the keyword is included in the second keyword set, and including the keyword in the second keyword set When the keyword is used as a matching keyword, the above judgment is performed on each keyword in the first keyword set, and at least one matching keyword is acquired. Or, for one keyword in the second keyword set, traversing the first keyword set, determining whether the keyword is included in the first keyword set, when the keyword is included in the first keyword set The keyword is used as a matching keyword, and the above judgment is performed on each keyword in the second keyword set, thereby acquiring at least one matching keyword.

Based on the examples of step 201 and step 203, the first keyword set is {Andy Lau, Jacky Cheung, concert, costume}, and the second keyword set is {Andy Lau, concert, complete set}, then the matching keyword is “ Andy Lau" and "concert".

205. The server allocates a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set according to a weighting level, so that a keyword with a high weight level is allocated. The weight of the key is greater than the weight assigned by the keyword with a low weight level.

In the embodiment of the present invention, the first keyword set and the second keyword set include at least one identical matching keyword, but the first name and the second name may be semantically different. Therefore, when selecting the second name to be recommended, in order to improve the relevance of the second name to be recommended and the first name, each second is determined correspondingly by assigning weights to the keywords in the first keyword set. The weight of the name to improve the relevance of the finalized second name to be recommended to the first name.

Specifically, the server presets a weight level corresponding to each keyword type, and when the server determines the type of each keyword in the first keyword set, according to the server Determining the weight level corresponding to each type, determining the weight level of each keyword, sorting each keyword according to the order of weight level from high to low, and assigning weights, so that the weight level is high. The weight assigned to the keyword is greater than the weight assigned by the keyword with a low weight level.

Optionally, the sum of the weights assigned to each keyword in the first keyword set is 1.

Further optionally, the type of the keyword includes a noun, a verb or a function word, the weight level of the noun is higher than the weight level of the verb and the function word, and the weight level of the name in the noun is higher than the weight level of the other noun.

For example, the first name is “The costume worn by Andy Lau when attending Zhang Xueyou’s concert”. The weights of the terms “Andy Lau”, “Zhang Xueyou”, “concert” and “clothing” are higher than the verb “attendance”. The weight level of “wearing” and the vocabulary “of” and “time”.

The name in the noun may be a person name, a place name, an organization name, a brand name, and the like, which is not limited by the embodiment of the present invention. The weight level of the name is higher than the weight level of other nouns. For example, the weight level of "Andy Lau" and "Zhang Xueyou" is higher than the weight level of "concert" and "clothing".

For example, the first name is “The clothing worn by Andy Lau at the concert of Jacky Cheung’s concert”. The server determines that the weight of “Andy Lau” and “Zhang Xueyou” is higher than the weight level of “concert” and “clothing”. The weight of the concert and clothing is higher than the weight of the "attendance", "wearing", "", and "time", then the server can assign a weight of 0.3 to the keyword "Andy Lau" as the keyword "Zhang Xueyou" The distribution weight is 0.3, the weight is 0.2 for the keyword "concert", the weight is 0.1 for the keyword "clothing", the weight is 0.1 for the keyword "attendance", and the weight is 0 for the remaining keywords.

In another embodiment provided by the embodiment of the present invention, the step 205 may be replaced by the following step (1):

(1) assigning weights to each keyword according to the order in which the frequency of occurrence of each keyword is from high to low, so that the key to the keyword with high frequency is assigned to the key having a lower frequency of occurrence. The weight assigned by the word.

In the embodiment of the present invention, it may be considered that a keyword with a higher frequency of occurrence in the first keyword set is more popular, and the user is likely to be interested in a file related to the keyword having a higher frequency of occurrence, that is, according to The frequency of occurrence of each keyword of the first keyword set is assigned a weight.

Optionally, the frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name, or the frequency of occurrence of the keyword is the frequency at which the keyword appears in the file name of the specified specified category. , the specified category is the category to which the currently open file belongs.

The current open file may belong to a certain subcategory, and the subcategory also belongs to a certain parent category, and the server may determine the specified category to which the currently open file belongs according to different requirements of the recommended precision.

If the name of the currently open file is "Hengda wins the championship" and belongs to the football category in the sports category, the server can calculate the frequency of occurrence of the keyword "winning the crown" in the file name of the football category, thinking that the keyword "wins the championship" Instead of calculating the frequency of occurrence of the keyword "winning" in the file name of all categories or the frequency of occurrence in the file name of the sports category, the weight is assigned.

Further, the frequency of occurrence may be a term frequency (TF) or a file frequency (DF).

For example, if the first name is "clothes worn by Andy Lau at the concert of Jacky Cheung", the server determines that the first name belongs to the singer category, and then calculates the keywords "Andy Lau", "Zhang Xueyou", "concert", The frequency of occurrence of "clothing" in the file name of the singer category. If the final calculated keywords "Andy Lau", "Zhang Xueyou" and "concert" appear at frequencies of 0.3, 0.2 and 0.1 respectively, the server can appear as follows. In the order of frequency from high to low, the keyword "Andy Lau" is assigned a weight of 0.5, the keyword "Zhang Xueyou" is assigned a weight of 0.4, the keyword "concert" is assigned a weight of 0.1, and the remaining keywords are assigned a weight of 0.

Further optionally, the server calculates a file name stored by the server within a preset duration The frequency of occurrence of each keyword is called. The preset duration can be preset by the server.

The above step 205 and step (1) are respectively assigning weights according to the weight level corresponding to the type of each keyword in the first keyword set and the appearance frequency of each keyword. In fact, the server can also comprehensively consider each The type of the keyword corresponds to the weight level and the frequency of occurrence to assign weights. That is, in another embodiment provided by the embodiment of the present invention, the step 205 may be replaced by the following step (2):

(2) assigning weights to each keyword according to the weight level corresponding to the type of each keyword, so that the weight of the keyword with a high weight level is greater than the weight level. The assigned weights of the keywords; the weights assigned to each keyword are adjusted according to the frequency of occurrence of each keyword.

In practical applications, keywords with high frequency can be considered more popular, but the relevance of the second name corresponding to the keyword with high frequency and the first name may be low, and the user may not necessarily be the popular one. The file indicated by the second name is of interest. In the embodiment of the present invention, the server may further assign a weight to each keyword according to a weight level corresponding to the type of each keyword, and according to the frequency of occurrence of each keyword, each key The weight assigned by the word is adjusted. By comprehensively considering the degree of correlation between the second name and the first name and the frequency of occurrence of the second name, the degree of correlation between the finally determined second name to be recommended and the first name may be improved, or may be preferentially selected. Higher frequency files are recommended to the user.

Further, in the step (2), "the weight assigned to each keyword is adjusted according to the frequency of occurrence of each keyword", and any of the following methods may be adopted:

(2-1) According to the frequency of occurrence of each keyword, the adjustment range is determined, and the weights assigned to each keyword are correspondingly increased or decreased according to the determined adjustment range.

For example, the server assigns weights of 0.3, 0.3, 0.2, 0.1, and 0.1 to the keywords “Andy Lau”, “Zhang Xueyou”, “concert”, “clothing”, and “attendance”, and calculates the relationship during the fashion week. The frequency of the keywords "Andy Lau", "Zhang Xueyou", "concert", "clothing" and "attendance" are 0.3, 0.2, 0.1, 0.2 and 0.01 respectively, and the keywords "Andy Lau", "Zhang Xueyou", " The adjustment range of "concert", "clothing" and "attendance" is 0.025, 0.025, -0.1, 0.15, -0.1. According to the adjustment range, after adjusting each keyword, the weight of the distribution is finally determined to be 0.275. , 0.275, 0.1, 0.25, 0.

(2-2) according to the frequency of occurrence of each keyword, the weight assigned to the keyword whose frequency is greater than or equal to the preset threshold is increased by a preset adjustment weight, and the weight of the keyword whose frequency is less than the preset threshold is assigned The preset adjustment weight is reduced.

For example, the server determines that the preset threshold is 0.2, and the preset adjustment weight is 0.05, and the server assigns weights to the keywords “Andy Lau”, “Zhang Xueyou”, “concert”, “clothing”, and “attendance”. For 0.3, 0.3, 0.2, 0.1, 0.1, and calculate the frequency of the keywords "Andy Lau", "Zhang Xueyou", "concert", "clothing" and "attendance" are 0.3, 0.2, 0.1, 0.2 and 0.01 respectively. At the time, the weights assigned to the keywords "Andy Lau", "Zhang Xueyou" and "clothing" with a frequency greater than or equal to 0.2 will increase by 0.05, and the weights assigned to the keywords "concert" and "attendance" with a frequency less than 0.2 will be reduced. 0.05, the weight of the distribution is finally determined to be 0.25, 0.25, 0.15, 0.15, 0.05.

It should be noted that the embodiment of the present invention is described by taking the step 205 after the step 204 as an example. In fact, the step 205 only needs to be performed after the step 201 and before the step 206, that is, the step 205 The execution time of the step 205 is not limited, and may be performed by the embodiment of the present invention.

206. The server acquires a weight of the matching keyword included in each second name in the first name.

In the embodiment of the present invention, the server has determined the weight of each keyword in the first keyword set in the first name, that is, the weight of each matching keyword in the first name has been determined. Then the server determines a matching keyword included in each second name, and a weight of the matching keyword included in each second name in the first name.

Based on Table 1, it is assumed that the first name is "the costume worn by Andy Lau when attending Zhang Xueyou's concert", and the server assigns a weight of 0.3 to the keyword "Andy Lau" and a weight of 0.3 to the keyword "Zhang Xueyou" as the keyword " The concert assigns a weight of 0.2, assigns a weight of 0.1 to the keyword "clothing", assigns a weight of 0.1 to the keyword "attendance", and assigns a weight of 0 to the remaining keywords, and the matching keyword included in each second name determined by the server is The weights in the first name can be as shown in Table 2.

Table 2

207. The server determines, according to a publishing time of the file indicated by each second name, a time weight of each second name. According to a preset ratio, a matching keyword included in each second name is at the first The sum value of the weights in the name and the time weight are weighted to obtain a weighted sum value, and the weighted sum value is determined as the weight of each of the second names.

In the embodiment of the present invention, the file indicated by the second name may be the newly released file, or may be a file that has already been released, and the publishing time of the file is different, and the degree of interest of the user is also different, that is, the publishing time affects the user. The degree of interest, which in turn affects the recommended success rate. Therefore, when determining the second name to be recommended, it is necessary to consider the release time of the file indicated by each second name.

Specifically, the server calculates a sum value of the weights of the matching keywords included in each second name in the first name, and according to the publishing time of the file indicated by each second name, Each of the second names is sorted, and each of the second names is assigned a time weight according to the sorting order, such that the time weight of the second name with the late release time is higher than the time weight of the second name with the earlier release time. The server performs weighting calculation on the sum value and the time weight according to the preset ratio, and obtains a weighted sum value, that is, a weight of each second name.

The preset ratio refers to a ratio between the sum value and the time weight, and according to the ratio, the weighting coefficient of the sum value and the time weight when performing the weighting calculation may be determined. The preset ratio may be preset by the server, or may be adjusted by the server during use. For example, when the currently opened file is published earlier, the time weight is smaller, and the current open file is “ When the type of the document is of a type that is more time-sensitive, the time weight is a large proportion, which is not limited by the embodiment of the present invention.

Based on Table 2, for the second name "Andy Lau Concert Complete Works", assuming that the server assigns a time weight of 0.4 to the second name, and the preset ratio is 6:4, the server calculates the second name included The weight of the matching keyword is 0.5, and the weight of the second name is 0.5*0.6+0.4*0.4=0.46.

Further, the server may pre-set a correspondence between a time interval and a time weight between the release time and the current time, that is, determine a time weight corresponding to each time interval, and the server may calculate each second name And indicating a time interval between the publishing time of the file and the current time, and determining a time weight of each of the second names according to the preset correspondence.

For example, the server presets that the time weight of the second name with the time interval of 1 day is 0.9, and the time weight of the second name with the time interval of 2 days is 0.8... for a second name For example, when the server determines that the time interval between the publishing time of the file indicated by the second name and the current time is 4 days, the time weight of the second name is determined to be 0.6.

It should be noted that the foregoing step 207 is an optional step, and the server may also consider the impact of the file publishing time only, and only according to the matching keyword included in each second name. A weight in a name is used to determine the weight of each second name. In another embodiment provided by the embodiment of the present invention, the step 207 may be replaced by the following steps: matching keywords included in each second name The sum of the weights in the first name is determined as the weight of each of the second names. For example, based on Table 2, the second name is "The Complete Works of Andy Lau Concert", the server calculates the weight of the matching keyword included in the second name and the value is 0.5, that is, the weight of the second name is determined to be 0.5.

208. The server determines, according to a weight of each second name, a preset number of second names as the second name to be recommended.

The preset number may be preset by the server, or may be determined by the server according to the number of files that can be displayed in the recommended area of the currently open file display interface, which is not limited by the embodiment of the present invention.

Specifically, the server sorts each second name according to the order of weights from large to small, and determines a second name to be recommended as the second name to be recommended before being ranked, so as to be ranked The file indicated by the previous preset number of second names is recommended to the user.

209. The server recommends the file indicated by the determined second name.

In the embodiment of the present invention, when the server recommends the file indicated by the determined second name, the determined link address of the second name may be provided on the display interface of the currently open file, and the link address is used to jump to The file indicated by the second name of the determination. In addition, the server may also display a thumbnail generated by the file indicated by the determined second name, or display related information such as a publisher, a publishing time, and the like, which are not limited in this embodiment of the present invention.

Further, for a plurality of the determined second names, the recommendations may be sequentially performed in the order of weights, and the recommendations may be sequentially performed according to the release time, which is not limited in the embodiment of the present invention.

The method provided by the embodiment of the present invention obtains a plurality of alternative second names by processing the first name of the currently opened file, and matching the first name with each second name to determine each second name. Include matching keywords and determine matches based on the part of speech of the matching keywords The weight of the keyword, thereby determining the second name to be recommended from the plurality of alternative second names according to the weight of the matching keyword, and recommending the file indicated by the determined second name, and improving the final recommended file name The degree of relevance to the name of the currently open file increases the recommended success rate. Further, considering the factor of the publication time of the file, determining the second name to be recommended by calculating the time weight of each second name further improves the recommendation success rate.

FIG. 3 is a schematic structural diagram of a file recommendation apparatus according to an embodiment of the present invention. Referring to FIG. 3, the apparatus includes: a first word segmentation module 301, a second set acquisition module 302, a matching module 303, a weight acquisition module 304, and a name determination. Module 305, recommendation module 306.

The first participle module 301 is configured to perform segmentation on the first name to obtain a first keyword set, where the first name is a name of a currently open file, and the first keyword set includes at least one key obtained by the first name word segmentation. word.

The second set obtaining module 302 is connected to the first word segmentation module 301, and configured to acquire, according to a preset correspondence between the keyword and the file name including the keyword, the method included in the first keyword set. At least one second name corresponding to the at least one keyword, and acquiring the second keyword set corresponding to the at least one second name.

The matching module 303 is connected to the second set obtaining module 302, and is configured to acquire the same keyword in the first keyword set and the second keyword set corresponding to each second name, and use the same keyword as a matching keyword. .

The weight obtaining module 304 is connected to the matching module 303, and is configured to obtain a weight of the matching keyword included in each second name in the first name.

The name determining module 305 is connected to the weight obtaining module 304, and is configured to determine a second name to be recommended according to the weight of the matching keyword included in each second name in the first name.

The recommendation module 306 is coupled to the name determination module 305 for recommending the file indicated by the determined second name.

Optionally, the second set obtaining module 302 includes:

a second name obtaining unit, configured to acquire at least one corresponding to the at least one keyword included in the first keyword set according to a preset correspondence between a keyword and a file name including the keyword Second name

a second word segment unit, configured to perform word segmentation on the second name for each second name in the at least one second name, to obtain a second keyword set, where the second keyword set includes the second name word segmentation At least one keyword.

Optionally, the device further includes:

a first weight obtaining module, configured to acquire, according to at least one of a type and an appearance frequency of each keyword in the first keyword set, each keyword in the first keyword set in the first name Weights.

Optionally, the first weight obtaining module includes:

a first weight acquiring unit, configured to assign a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weight level is high. The weight assigned to the keyword is greater than the weight assigned to the keyword with a low weight level; or,

a second weight obtaining unit, configured to assign a weight to each keyword according to an order of occurrence frequency of each keyword in the first keyword set, so that a keyword with a high frequency of occurrence is assigned a weight greater than The weight assigned to a keyword with a low frequency; or,

a third weight obtaining unit, configured to assign a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weight level is high. The weight assigned to the keyword is greater than the weight assigned by the keyword with a low weight level;

The adjusting unit is configured to adjust the weight assigned to each keyword according to the frequency of occurrence of each keyword in the first keyword set.

Optionally, the type of the keyword includes a noun, a verb or a function word, and a weight level of the noun Higher than the weight level of verbs and function words;

Optionally, the name determining module 305 includes:

a weight determining unit, configured to determine a weight of each second name according to a weight of the matching keyword included in each second name in the first name;

The to-be-recommended name determining unit is configured to determine a preset number of second names as the second name to be recommended according to the order of the weight of each second name from large to small.

Optionally, the weight determining unit is configured to determine, as the weight of each of the second names, the sum of the weights of the matching keywords included in each second name in the first name; or

The weight determining unit is configured to determine a time weight of each second name according to a publishing time of the file indicated by each second name, and the matching keyword included in each second name is in the preset ratio. The sum value of the weights in the first name and the time weight are weighted to obtain a weighted sum value, and the weighted sum value is determined as the weight of each of the second names.

The device provided by the embodiment of the present invention obtains a plurality of alternative second names by processing the first name of the currently opened file, and matches the first name with each second name to determine each second name. Included matching keywords, and determining a weight of the matching keyword according to the part of speech of the matching keyword, thereby determining a second name to be recommended from the plurality of alternative second names according to the weight of the matching keyword, and recommending the determined The file indicated by the second name improves the relevance of the final recommended file name to the name of the currently open file, and improves the recommendation success rate.

It should be noted that, when the file recommendation device provided by the foregoing embodiment is used for recommending a file, only the division of the above functional modules is illustrated. In an actual application, the function distribution may be completed by different functional modules as needed. The internal structure of the server is divided into different Functional modules to perform all or part of the functions described above. In addition, the document recommendation device and the file recommendation method embodiment provided by the foregoing embodiments are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.

FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present invention. The server 400 may generate a large difference due to different configurations or performances, and may include one or more central processing units (CPUs) 422 (for example, One or more processors) and memory 432, one or more storage media 430 that store application 442 or data 444 (eg, one or one storage device in Shanghai). Among them, the memory 432 and the storage medium 430 may be short-term storage or persistent storage. The program stored on storage medium 430 may include one or more modules (not shown), each of which may include a series of instruction operations in the server. Still further, central processor 422 can be configured to communicate with storage medium 430, executing a series of instruction operations in storage medium 430 on server 400.

Server 400 may also include one or more power sources 426, one or more wired or wireless network interfaces 450, one or more input and output interfaces 458, and/or one or more operating systems 441, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.

The steps performed by the server described in the above embodiments may be based on the server structure shown in FIG.

Specifically, in the embodiment of the present invention, the processor 422 included in the server may execute program instructions stored in the memory 432 to perform the following functions: segmenting the first name to obtain a first keyword set, The first name is a name of the currently open file, the first keyword set includes at least one keyword obtained by the first name word segmentation; and a preset correspondence relationship between the keyword and the file name including the keyword Obtaining at least one second name corresponding to the at least one keyword included in the first keyword set, and acquiring a second keyword set corresponding to the at least one second name; acquiring the first key Word collection and each And the same keyword in the second keyword set corresponding to the second name, the same keyword is used as a matching keyword; and the weight of the matching keyword included in each second name is obtained in the first name Determining a second name to be recommended according to the weight of the matching keyword included in each of the second names in the first name; recommending the file indicated by the determined second name.

Optionally, the acquiring the second keyword set corresponding to the at least one second name includes: segmenting the second name for each second name of the at least one second name, and obtaining a second a set of keywords, the second set of keywords including at least one keyword obtained by the second name word segmentation.

Optionally, before obtaining the weight of the matching keyword included in each of the second names, the method further includes: according to the type and frequency of occurrence of each keyword in the first keyword set. Obtaining a weight of each keyword in the first keyword set in the first name; obtaining a weight of the matching keyword included in each second name in the first name The method includes: obtaining weights of the matching keywords included in each of the second names in the first name from weights of each keyword in the first keyword set in the first name.

Optionally, obtaining, according to at least one of a type and an appearance frequency of each keyword in the first keyword set, a weight of each keyword in the first keyword set in the first name The method includes: assigning a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set, so that a keyword with a high weight level is allocated The weight of the key is greater than the weight assigned by the keyword with a low weight level; or, the weight of each keyword in the first keyword set is assigned a weight according to the frequency of occurrence of each keyword, so that the key appears The key of the high frequency keyword is assigned to the weight assigned by the keyword having a low frequency of occurrence; or, according to the weight level corresponding to the type of each keyword in the first keyword set, according to the weight level from high to low The order is assigned to each of the keywords, so that the key weighted by the keyword with a higher weight level is greater than the key with a low weight level. The weight assigned by the word; and adjusting the weight assigned to each keyword according to the frequency of occurrence of each keyword in the first keyword set.

Optionally, determining, according to the weight of the matching keyword included in each second name in the first name, determining the second name to be recommended includes: matching keywords according to each of the second names a weight in the first name, determining a weight of each of the second names; determining, according to a weight from each of the second names, a preset number of second names as the waiting The second name recommended.

Optionally, determining, according to the weight of the matching keyword included in each second name in the first name, determining the weight of each of the second names comprises: matching each of the second names The sum value of the weights of the keywords in the first name is determined as the weight of each of the second names; or, according to the publishing time of the file indicated by each of the second names, determining each of the second The time weight of the name, according to a preset ratio, weighting the sum of the weights of the matching keywords included in each of the second names in the first name and the time weights to obtain a weighted sum value, The weighted sum value is determined as the weight of each of the second names.

A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

The above are only the preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are within the spirit and scope of the present invention, should be included in the protection of the present invention. Within the scope.

Claims

A file recommendation method, characterized in that the method comprises:

Performing word segmentation on the first name to obtain a first keyword set, the first name is a name of a currently open file, and the first keyword set includes at least one keyword obtained by the first name word segmentation;

Acquiring at least one second name corresponding to the at least one keyword included in the first keyword set according to a preset correspondence between a keyword and a file name including the keyword, and acquiring the At least one second keyword set corresponding to the second name;

Obtaining the same keyword in the first keyword set and the second keyword set corresponding to each second name, and using the same keyword as a matching keyword;

Obtaining a weight of the matching keyword included in each of the second names in the first name;

Determining a second name to be recommended according to a weight of the matching keyword included in each second name in the first name;

The file indicated by the determined second name is recommended.
The method according to claim 1, wherein the acquiring the second keyword set corresponding to the at least one second name comprises:

For each of the at least one second name, segmenting the second name to obtain a second keyword set, the second keyword set including at least one of the second name word segmentation Key words.
The method according to claim 1, wherein before the weighting of the matching keyword included in each of the second names in the first name, the method further comprises:

Obtaining a weight of each keyword in the first keyword set in the first name according to at least one of a type and an appearance frequency of each keyword in the first keyword set;

Obtaining the weight of the matching keyword included in each of the second names in the first name includes: obtaining, according to the weight of each keyword in the first name, the weight in the first name Each second name includes a weight of the matching keyword in the first name.
The method according to claim 3, wherein each keyword in the first keyword set is obtained according to at least one of a type and an appearance frequency of each keyword in the first keyword set The weights in the first name include:

Assigning a weight to each of the keywords according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weights assigned by the keywords with high weight levels are assigned weights a weight greater than a keyword assigned to a keyword with a lower weight level; or,

Assigning a weight to each keyword according to an order of occurrence frequency of each keyword in the first keyword set, so that a keyword with a high frequency of occurrence is assigned a key weight greater than a keyword having a low frequency of occurrence The weight assigned; or,

Assigning a weight to each of the keywords according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weights assigned by the keywords with high weight levels are assigned weights a weight greater than a keyword assigned to a keyword having a low weight level; and adjusting a weight assigned to each keyword according to an appearance frequency of each keyword in the first keyword set.
The method according to claim 3, wherein the type of the keyword comprises a noun, a verb or a function word, and the weight level of the noun is higher than the weight level of the verb and the function word;

The frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name, or the frequency of occurrence of the keyword is the frequency at which the keyword appears in the file name of the specified category that has been stored. The specified category is a category to which the currently open file belongs.
The method of claim 5 wherein the name of the noun has a higher weight level than the other noun.
The method according to claim 1, wherein determining the second name to be recommended according to the weight of the matching keyword included in each of the second names in the first name comprises:

Determining weights of each of the second names according to weights of the matching keywords included in each of the second names in the first name;

And determining, according to the weight of each of the second names, a preset number of second names as the second name to be recommended.
The method according to claim 7, wherein determining the weight of each of the second names according to the weight of the matching keywords included in each of the second names in the first name comprises:

Determining a sum of weights of the matching keywords included in each of the second names in the first name as a weight of each of the second names; or

Determining, according to a release time of the file indicated by each second name, a time weight of each of the second names, and according to a preset ratio, matching keywords included in each of the second names are in the first The sum value of the weights in the name and the time weight are weighted to obtain a weighted sum value, and the weighted sum value is determined as the weight of each of the second names.
A document recommendation device, characterized in that the device comprises:

a first participle module, configured to perform word segmentation on the first name, to obtain a first keyword set, the first name is a name of a currently open file, and the first keyword set includes at least the first name word segmentation obtained a keyword;

a second set obtaining module, configured to acquire at least one corresponding to the at least one keyword included in the first keyword set according to a preset correspondence between a keyword and a file name including the keyword a second name, and acquiring a second keyword set corresponding to the at least one second name;

a matching module, configured to acquire the same keyword in the first keyword set and the second keyword set corresponding to each second name, and use the same keyword as a matching keyword;

a weight obtaining module, configured to obtain a weight of the matching keyword included in each of the second names in the first name;

a name determining module, configured to determine a second name to be recommended according to a weight of the matching keyword included in each second name in the first name;

A recommendation module for recommending the file indicated by the determined second name.
The apparatus according to claim 9, wherein the second set acquisition module comprises:

a second name obtaining unit, configured to acquire, according to a preset correspondence between the keyword and a file name including the keyword, a corresponding to the at least one keyword included in the first keyword set The at least one second name;

a second word segment unit, configured to perform word segmentation on the second name for each second name in the at least one second name to obtain a second keyword set, where the second keyword set includes the first At least one keyword obtained by the second name participle.
The device according to claim 9, wherein the device further comprises:

a first weight obtaining module, configured to acquire, according to at least one of a type and an appearance frequency of each keyword in the first keyword set, each keyword in the first keyword set in the first The weight in the name.
The apparatus according to claim 11, wherein the first weight acquisition module comprises:

a first weight obtaining unit, configured to assign a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weight level is High keywords are assigned weights that are greater than those assigned to keywords with low weight levels; or,

a second weight obtaining unit, configured to assign a weight to each keyword according to an order of occurrence frequency of each keyword in the first keyword set, so that a keyword with a high frequency is allocated Weight is greater than the weight assigned to keywords with low frequency; or,

a third weight obtaining unit, configured to assign a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weight level is High keywords are assigned weights that are greater than those assigned to keywords with low weight levels;

The adjusting unit is configured to adjust the weight assigned to each keyword according to the frequency of occurrence of each keyword in the first keyword set.
The apparatus according to claim 11, wherein the type of the keyword comprises a noun, a verb or a function word, and the weight level of the noun is higher than the weight level of the verb and the function word;

The frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name, or the frequency of occurrence of the keyword is the frequency at which the keyword appears in the file name of the specified category that has been stored. The specified category is a category to which the currently open file belongs.
The apparatus according to claim 13, wherein the name of the noun has a higher weight level than the other noun.
The device according to claim 9, wherein the name determining module comprises:

a weight determining unit, configured to determine a weight of each of the second names according to weights of the matching keywords included in each of the second names in the first name;

The to-be-recommended name determining unit is configured to determine a preset number of second names as the second name to be recommended according to the order in which the weights of the second names are from large to small.
The apparatus according to claim 15, wherein the weight determining unit is configured to determine a sum value of weights of the matching keywords included in each of the second names in the first name as the each The weight of the second name; or,

The weight determining unit is configured to determine a time weight of each of the second names according to a publishing time of the file indicated by each of the second names, and include a match for each of the second names according to a preset ratio. The sum of the weights of the keywords in the first name and the time weights are weighted to obtain a weighted sum value, and the weighted sum value is determined as the weight of each of the second names.