WO2015081909A1 - File recommendation method and device - Google Patents

File recommendation method and device Download PDF

Info

Publication number
WO2015081909A1
WO2015081909A1 PCT/CN2015/072103 CN2015072103W WO2015081909A1 WO 2015081909 A1 WO2015081909 A1 WO 2015081909A1 CN 2015072103 W CN2015072103 W CN 2015072103W WO 2015081909 A1 WO2015081909 A1 WO 2015081909A1
Authority
WO
WIPO (PCT)
Prior art keywords
keyword
name
weight
names
file
Prior art date
Application number
PCT/CN2015/072103
Other languages
French (fr)
Chinese (zh)
Inventor
尹程果
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2015081909A1 publication Critical patent/WO2015081909A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Definitions

  • the present invention relates to the field of network technologies, and in particular, to a file recommendation method and apparatus.
  • the server may recommend information that may be of interest to the user according to the user's browsing history, interests, and the like.
  • the server when recommending a video, can recommend the most popular video of the type to which the currently played video belongs. For example, when the currently playing video is a "sports" type video, the server recommends the "sports" type for the user. The most popular videos. Alternatively, the server calculates the edit distance (Levenshtein Distance, LD) between the name of each video and the name of the currently playing video, and recommends the video with the smallest LD between the name and the name of the currently playing video to the user.
  • LD Longshtein Distance
  • the server uses the method of calculating LD to recommend the video, the LD only It can mechanically measure the difference in the editing level between different video names, so that the final definition of the recommended video name and the currently playing video name may be far from the semantics, which also causes the video correlation to be very low, which leads to a high recommendation success rate. low.
  • the embodiment of the present invention provides a file recommendation method and device, and the technical solution is as follows.
  • a file recommendation method comprising:
  • the first name is a name of a currently open file
  • the first keyword set includes at least one keyword obtained by the first name word segmentation
  • the file indicated by the determined second name is recommended.
  • a file recommendation apparatus comprising:
  • a first participle module configured to perform word segmentation on the first name, to obtain a first keyword set, the first name is a name of a currently open file, and the first keyword set includes at least the first name word segmentation obtained a keyword;
  • a second set obtaining module configured to acquire, according to a preset preset correspondence between a keyword and a file name including the keyword, the at least one included in the first keyword set Obtaining at least one second name corresponding to a keyword, and acquiring a second keyword set corresponding to the at least one second name;
  • a matching module configured to acquire the same keyword in the first keyword set and the second keyword set corresponding to each second name, and use the same keyword as a matching keyword
  • a weight obtaining module configured to obtain a weight of the matching keyword included in each of the second names in the first name
  • a name determining module configured to determine a second name to be recommended according to a weight of the matching keyword included in each second name in the first name
  • a recommendation module for recommending the file indicated by the determined second name.
  • the method and the device provided by the embodiment of the present invention, by processing the first name of the currently open file, obtaining a plurality of alternative second names, matching the first name with each second name, and determining each The second name includes a matching keyword, and determines a weight of the matching keyword according to the part of speech of the matching keyword, thereby determining a second name to be recommended from the plurality of alternative second names according to the weight of the matching keyword, and recommending the The determined file indicated by the second name improves the relevance of the recommended file name to the name of the currently open file, and improves the recommendation success rate.
  • FIG. 1 is a flowchart of a file recommendation method according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a file recommendation method according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a file recommendation apparatus according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a file recommendation method according to an embodiment of the present invention.
  • the execution body of the embodiment of the present invention is a server. Referring to FIG. 1, the method includes:
  • the method provided by the embodiment of the present invention obtains a plurality of alternative second names by processing the first name of the currently opened file, and matches the first name with each second name. Determining a matching keyword included in each second name, and determining a weight of the matching keyword according to the part of speech of the matching keyword, thereby determining a second name to be recommended from the plurality of alternative second names according to the weight of the matching keyword And recommending the file indicated by the determined second name, improving the relevance of the final recommended file name to the name of the currently open file, and improving the recommendation success rate.
  • obtaining the second keyword set corresponding to the at least one second name includes:
  • the second name is segmented to obtain a second keyword set, and the second keyword set includes at least one keyword obtained by the second name participle.
  • the method before obtaining the weight of the matching keyword included in each second name in the first name, the method further includes:
  • Obtaining a weight of the matching keyword included in each of the second names in the first name includes: matching keywords included in each second name in the first keyword set in the first name The weight is used as the weight of the matching keyword included in each of the second names in the first name.
  • obtaining weights of each keyword in the first keyword set in the first name includes:
  • the weights are assigned to each keyword according to the order of weights, so that the keywords with high weight levels are assigned weights greater than the weights.
  • the weights are assigned to each keyword according to the order of weights, so that the keywords with high weight levels are assigned weights greater than the weights.
  • the weight assigned to the keyword with a low level; and the weight assigned to each keyword is adjusted according to the frequency of occurrence of each keyword.
  • the type of the keyword includes a noun, a verb or a function word, and the weight level of the noun is higher than the weight level of the verb and the function word;
  • the frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name, or the frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name of the specified category, the specified category The category to which the currently open file belongs.
  • the weight of the name in the noun is higher than the weight level of the other noun.
  • determining, according to the weight of the matching keyword included in each second name in the first name, determining the second name to be recommended includes:
  • a preset number of second names is determined as the second name to be recommended in descending order of the weight of each of the second names.
  • determining, according to the weight of the matching keyword included in each second name in the first name, determining the weight of each second name includes:
  • FIG. 2 is a flowchart of a file recommendation method according to an embodiment of the present invention.
  • the executive body of the embodiment of the invention is a server. Referring to FIG. 2, the method includes the following steps.
  • the server performs segmentation on the first name to obtain a first keyword set, where the first name is a name of a currently open file, and the first keyword set includes at least one keyword obtained by the first name word segmentation.
  • the embodiment of the present invention is applied to a scenario in which a user has opened a file, and the server recommends other files according to the name of the currently opened file.
  • the server may be a function module in the server associated with the currently open file or the server associated with the currently open file, which is not limited in this embodiment of the present invention.
  • the embodiment of the present invention can be applied to a scenario in which the name of the currently open file is a publisher-defined name.
  • the name of the publisher may be very long or short, and may be a simple word or a complicated sentence. Recommend files for users based on the publisher's customized personalized name.
  • the file may be a video file, an audio file, or a text file provided by the server, such as a network video file provided by a video website server, an audio file provided by an audio website, or a network document provided by a document sharing server, etc., which is implemented by the present invention. This example does not limit this.
  • the server when detecting that the user opens the file, the server obtains the name of the currently opened file as the first name, and performs segmentation on the first name to obtain at least one keyword of the first name, and the at least one keyword.
  • the first keyword set is composed.
  • segmenting the first name means dividing the first name into one or several words or morphemes.
  • the first name is “The costume worn by Andy Lau when attending Jacky Cheung’s concert”, and the first name is segmented to get the first keyword set ⁇ Andy Lau, Jacky Cheung, concert, clothing ⁇ .
  • the server may use a word segmentation-based word segmentation method or a statistical-based word segmentation method when the first name is segmented.
  • the embodiment of the present invention does not limit this.
  • the server acquires at least one second name corresponding to the at least one keyword included in the first keyword set according to a preset correspondence between a keyword and a file name including the keyword.
  • the first keyword set includes at least one keyword, and for each keyword in the first keyword set, the server obtains the first keyword by querying the preset correspondence.
  • the file name of any one or more keywords in the collection includes at least one keyword, and for each keyword in the first keyword set, the server obtains the first keyword by querying the preset correspondence.
  • the correspondence between the first name, the keywords in the first keyword set, and the second name corresponding to each keyword is as shown in Table 1.
  • the method further includes: establishing the preset correspondence according to the file name that is stored by the server.
  • the server classifies the names of all the stored files to obtain keywords included in each file name; for a keyword, according to the keywords included in each file name, the file name including the keyword is obtained. Establishing a preset correspondence between the keyword and the file name containing the keyword.
  • the server establishes an inverted index for the keywords included in each file name, and determines the established inverted index as the preset correspondence.
  • the server classifies the second name to obtain a second keyword set, where the second keyword set includes at least one obtained by the second name participle Key words.
  • one of the second names is "The Complete Works of Andy Lau Concert", and the server divides the second name to obtain the second keyword set ⁇ Andy Lau, concert, complete works ⁇ .
  • the server may also use a string based on the word segmentation
  • the method of the word segmentation or the method of word segmentation based on statistics is not limited in the embodiment of the present invention.
  • the server acquires the same keyword in the first keyword set and the second keyword set corresponding to each second name, and uses the same keyword as the matching keyword.
  • a keyword in the first keyword set traversing the second keyword set, determining whether the keyword is included in the second keyword set, and including the keyword in the second keyword set
  • the above judgment is performed on each keyword in the first keyword set, and at least one matching keyword is acquired.
  • traversing the first keyword set determining whether the keyword is included in the first keyword set, when the keyword is included in the first keyword set
  • the keyword is used as a matching keyword, and the above judgment is performed on each keyword in the second keyword set, thereby acquiring at least one matching keyword.
  • the first keyword set is ⁇ Andy Lau, Jacky Cheung, concert, costume ⁇
  • the second keyword set is ⁇ Andy Lau, concert, complete set ⁇
  • the matching keyword is “ Andy Lau” and "concert”.
  • the server allocates a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set according to a weighting level, so that a keyword with a high weight level is allocated.
  • the weight of the key is greater than the weight assigned by the keyword with a low weight level.
  • the first keyword set and the second keyword set include at least one identical matching keyword, but the first name and the second name may be semantically different. Therefore, when selecting the second name to be recommended, in order to improve the relevance of the second name to be recommended and the first name, each second is determined correspondingly by assigning weights to the keywords in the first keyword set. The weight of the name to improve the relevance of the finalized second name to be recommended to the first name.
  • the server presets a weight level corresponding to each keyword type, and when the server determines the type of each keyword in the first keyword set, according to the server Determining the weight level corresponding to each type, determining the weight level of each keyword, sorting each keyword according to the order of weight level from high to low, and assigning weights, so that the weight level is high.
  • the weight assigned to the keyword is greater than the weight assigned by the keyword with a low weight level.
  • the sum of the weights assigned to each keyword in the first keyword set is 1.
  • the type of the keyword includes a noun, a verb or a function word
  • the weight level of the noun is higher than the weight level of the verb and the function word
  • the weight level of the name in the noun is higher than the weight level of the other noun.
  • the first name is “The costume worn by Andy Lau when attending Zhang Xueyou’s concert”.
  • the weights of the terms “Andy Lau”, “Zhang Xueyou”, “concert” and “clothing” are higher than the verb “attendance”.
  • the name in the noun may be a person name, a place name, an organization name, a brand name, and the like, which is not limited by the embodiment of the present invention.
  • the weight level of the name is higher than the weight level of other nouns. For example, the weight level of "Andy Lau” and “Zhang Xueyou” is higher than the weight level of "concert” and "clothing".
  • the first name is “The clothing worn by Andy Lau at the concert of Jacky Cheung’s concert”.
  • the server determines that the weight of “Andy Lau” and “Zhang Xueyou” is higher than the weight level of “concert” and “clothing”.
  • the weight of the concert and clothing is higher than the weight of the "attendance", "wearing", "", and "time”, then the server can assign a weight of 0.3 to the keyword "Andy Lau” as the keyword "Zhang Xueyou”
  • the distribution weight is 0.3, the weight is 0.2 for the keyword "concert”, the weight is 0.1 for the keyword "clothing”, the weight is 0.1 for the keyword "attendance”, and the weight is 0 for the remaining keywords.
  • step 205 may be replaced by the following step (1):
  • a keyword with a higher frequency of occurrence in the first keyword set is more popular, and the user is likely to be interested in a file related to the keyword having a higher frequency of occurrence, that is, according to The frequency of occurrence of each keyword of the first keyword set is assigned a weight.
  • the frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name, or the frequency of occurrence of the keyword is the frequency at which the keyword appears in the file name of the specified specified category.
  • the specified category is the category to which the currently open file belongs.
  • the current open file may belong to a certain subcategory, and the subcategory also belongs to a certain parent category, and the server may determine the specified category to which the currently open file belongs according to different requirements of the recommended precision.
  • the server can calculate the frequency of occurrence of the keyword "winning the crown” in the file name of the football category, thinking that the keyword "wins the championship” Instead of calculating the frequency of occurrence of the keyword "winning” in the file name of all categories or the frequency of occurrence in the file name of the sports category, the weight is assigned.
  • the frequency of occurrence may be a term frequency (TF) or a file frequency (DF).
  • TF term frequency
  • DF file frequency
  • the server determines that the first name belongs to the singer category, and then calculates the keywords “Andy Lau”, “Zhang Xueyou”, “concert”, The frequency of occurrence of "clothing” in the file name of the singer category. If the final calculated keywords "Andy Lau”, “Zhang Xueyou” and “concert” appear at frequencies of 0.3, 0.2 and 0.1 respectively, the server can appear as follows.
  • the keyword “Andy Lau” is assigned a weight of 0.5
  • the keyword “Zhang Xueyou” is assigned a weight of 0.4
  • the keyword “concert” is assigned a weight of 0.1
  • the remaining keywords are assigned a weight of 0.
  • the server calculates a file name stored by the server within a preset duration
  • the frequency of occurrence of each keyword is called.
  • the preset duration can be preset by the server.
  • step 205 and step (1) are respectively assigning weights according to the weight level corresponding to the type of each keyword in the first keyword set and the appearance frequency of each keyword.
  • the server can also comprehensively consider each The type of the keyword corresponds to the weight level and the frequency of occurrence to assign weights. That is, in another embodiment provided by the embodiment of the present invention, the step 205 may be replaced by the following step (2):
  • keywords with high frequency can be considered more popular, but the relevance of the second name corresponding to the keyword with high frequency and the first name may be low, and the user may not necessarily be the popular one.
  • the file indicated by the second name is of interest.
  • the server may further assign a weight to each keyword according to a weight level corresponding to the type of each keyword, and according to the frequency of occurrence of each keyword, each key The weight assigned by the word is adjusted.
  • the weight assigned to each keyword is adjusted according to the frequency of occurrence of each keyword, and any of the following methods may be adopted:
  • the adjustment range is determined, and the weights assigned to each keyword are correspondingly increased or decreased according to the determined adjustment range.
  • the server assigns weights of 0.3, 0.3, 0.2, 0.1, and 0.1 to the keywords “Andy Lau”, “Zhang Xueyou”, “concert”, “clothing”, and “attendance”, and calculates the relationship during the fashion week.
  • the frequency of the keywords “Andy Lau”, “Zhang Xueyou”, “concert”, “clothing” and “attendance” are 0.3, 0.2, 0.1, 0.2 and 0.01 respectively, and the keywords “Andy Lau", “Zhang Xueyou", "
  • the adjustment range of "concert”, “clothing” and “attendance” is 0.025, 0.025, -0.1, 0.15, -0.1. According to the adjustment range, after adjusting each keyword, the weight of the distribution is finally determined to be 0.275. , 0.275, 0.1, 0.25, 0.
  • the weight assigned to the keyword whose frequency is greater than or equal to the preset threshold is increased by a preset adjustment weight, and the weight of the keyword whose frequency is less than the preset threshold is assigned The preset adjustment weight is reduced.
  • the server determines that the preset threshold is 0.2, and the preset adjustment weight is 0.05, and the server assigns weights to the keywords “Andy Lau”, “Zhang Xueyou”, “concert”, “clothing”, and “attendance”. For 0.3, 0.3, 0.2, 0.1, 0.1, and calculate the frequency of the keywords “Andy Lau”, “Zhang Xueyou”, “concert”, “clothing” and “attendance” are 0.3, 0.2, 0.1, 0.2 and 0.01 respectively.
  • the weights assigned to the keywords “Andy Lau”, “Zhang Xueyou” and “clothing” with a frequency greater than or equal to 0.2 will increase by 0.05, and the weights assigned to the keywords “concert” and “attendance” with a frequency less than 0.2 will be reduced.
  • the weight of the distribution is finally determined to be 0.25, 0.25, 0.15, 0.15, 0.05.
  • the embodiment of the present invention is described by taking the step 205 after the step 204 as an example.
  • the step 205 only needs to be performed after the step 201 and before the step 206, that is, the step 205
  • the execution time of the step 205 is not limited, and may be performed by the embodiment of the present invention.
  • the server acquires a weight of the matching keyword included in each second name in the first name.
  • the server has determined the weight of each keyword in the first keyword set in the first name, that is, the weight of each matching keyword in the first name has been determined. Then the server determines a matching keyword included in each second name, and a weight of the matching keyword included in each second name in the first name.
  • the server assigns a weight of 0.3 to the keyword “Andy Lau” and a weight of 0.3 to the keyword “Zhang Xueyou” as the keyword "
  • the concert assigns a weight of 0.2, assigns a weight of 0.1 to the keyword “clothing”, assigns a weight of 0.1 to the keyword “attendance”, and assigns a weight of 0 to the remaining keywords, and the matching keyword included in each second name determined by the server is
  • the weights in the first name can be as shown in Table 2.
  • the server determines, according to a publishing time of the file indicated by each second name, a time weight of each second name. According to a preset ratio, a matching keyword included in each second name is at the first The sum value of the weights in the name and the time weight are weighted to obtain a weighted sum value, and the weighted sum value is determined as the weight of each of the second names.
  • the file indicated by the second name may be the newly released file, or may be a file that has already been released, and the publishing time of the file is different, and the degree of interest of the user is also different, that is, the publishing time affects the user.
  • the degree of interest which in turn affects the recommended success rate. Therefore, when determining the second name to be recommended, it is necessary to consider the release time of the file indicated by each second name.
  • the server calculates a sum value of the weights of the matching keywords included in each second name in the first name, and according to the publishing time of the file indicated by each second name, Each of the second names is sorted, and each of the second names is assigned a time weight according to the sorting order, such that the time weight of the second name with the late release time is higher than the time weight of the second name with the earlier release time.
  • the server performs weighting calculation on the sum value and the time weight according to the preset ratio, and obtains a weighted sum value, that is, a weight of each second name.
  • the preset ratio refers to a ratio between the sum value and the time weight, and according to the ratio, the weighting coefficient of the sum value and the time weight when performing the weighting calculation may be determined.
  • the preset ratio may be preset by the server, or may be adjusted by the server during use. For example, when the currently opened file is published earlier, the time weight is smaller, and the current open file is “ When the type of the document is of a type that is more time-sensitive, the time weight is a large proportion, which is not limited by the embodiment of the present invention.
  • the server calculates the second name included
  • the weight of the matching keyword is 0.5
  • the server may pre-set a correspondence between a time interval and a time weight between the release time and the current time, that is, determine a time weight corresponding to each time interval, and the server may calculate each second name And indicating a time interval between the publishing time of the file and the current time, and determining a time weight of each of the second names according to the preset correspondence.
  • the server presets that the time weight of the second name with the time interval of 1 day is 0.9, and the time weight of the second name with the time interval of 2 days is 0.8... for a second name
  • the server determines that the time interval between the publishing time of the file indicated by the second name and the current time is 4 days, the time weight of the second name is determined to be 0.6.
  • step 207 is an optional step, and the server may also consider the impact of the file publishing time only, and only according to the matching keyword included in each second name.
  • a weight in a name is used to determine the weight of each second name.
  • the step 207 may be replaced by the following steps: matching keywords included in each second name The sum of the weights in the first name is determined as the weight of each of the second names. For example, based on Table 2, the second name is "The Complete Works of Andy Lau Concert", the server calculates the weight of the matching keyword included in the second name and the value is 0.5, that is, the weight of the second name is determined to be 0.5.
  • the server determines, according to a weight of each second name, a preset number of second names as the second name to be recommended.
  • the preset number may be preset by the server, or may be determined by the server according to the number of files that can be displayed in the recommended area of the currently open file display interface, which is not limited by the embodiment of the present invention.
  • the server sorts each second name according to the order of weights from large to small, and determines a second name to be recommended as the second name to be recommended before being ranked, so as to be ranked
  • the file indicated by the previous preset number of second names is recommended to the user.
  • the server recommends the file indicated by the determined second name.
  • the server when the server recommends the file indicated by the determined second name, the determined link address of the second name may be provided on the display interface of the currently open file, and the link address is used to jump to The file indicated by the second name of the determination.
  • the server may also display a thumbnail generated by the file indicated by the determined second name, or display related information such as a publisher, a publishing time, and the like, which are not limited in this embodiment of the present invention.
  • the recommendations may be sequentially performed in the order of weights, and the recommendations may be sequentially performed according to the release time, which is not limited in the embodiment of the present invention.
  • the method provided by the embodiment of the present invention obtains a plurality of alternative second names by processing the first name of the currently opened file, and matching the first name with each second name to determine each second name. Include matching keywords and determine matches based on the part of speech of the matching keywords The weight of the keyword, thereby determining the second name to be recommended from the plurality of alternative second names according to the weight of the matching keyword, and recommending the file indicated by the determined second name, and improving the final recommended file name The degree of relevance to the name of the currently open file increases the recommended success rate. Further, considering the factor of the publication time of the file, determining the second name to be recommended by calculating the time weight of each second name further improves the recommendation success rate.
  • FIG. 3 is a schematic structural diagram of a file recommendation apparatus according to an embodiment of the present invention.
  • the apparatus includes: a first word segmentation module 301, a second set acquisition module 302, a matching module 303, a weight acquisition module 304, and a name determination.
  • Module 305 recommendation module 306.
  • the first participle module 301 is configured to perform segmentation on the first name to obtain a first keyword set, where the first name is a name of a currently open file, and the first keyword set includes at least one key obtained by the first name word segmentation. word.
  • the second set obtaining module 302 is connected to the first word segmentation module 301, and configured to acquire, according to a preset correspondence between the keyword and the file name including the keyword, the method included in the first keyword set. At least one second name corresponding to the at least one keyword, and acquiring the second keyword set corresponding to the at least one second name.
  • the matching module 303 is connected to the second set obtaining module 302, and is configured to acquire the same keyword in the first keyword set and the second keyword set corresponding to each second name, and use the same keyword as a matching keyword. .
  • the weight obtaining module 304 is connected to the matching module 303, and is configured to obtain a weight of the matching keyword included in each second name in the first name.
  • the name determining module 305 is connected to the weight obtaining module 304, and is configured to determine a second name to be recommended according to the weight of the matching keyword included in each second name in the first name.
  • the recommendation module 306 is coupled to the name determination module 305 for recommending the file indicated by the determined second name.
  • the second set obtaining module 302 includes:
  • a second name obtaining unit configured to acquire at least one corresponding to the at least one keyword included in the first keyword set according to a preset correspondence between a keyword and a file name including the keyword Second name
  • a second word segment unit configured to perform word segmentation on the second name for each second name in the at least one second name, to obtain a second keyword set, where the second keyword set includes the second name word segmentation At least one keyword.
  • the device further includes:
  • a first weight obtaining module configured to acquire, according to at least one of a type and an appearance frequency of each keyword in the first keyword set, each keyword in the first keyword set in the first name Weights.
  • the first weight obtaining module includes:
  • a first weight acquiring unit configured to assign a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weight level is high.
  • the weight assigned to the keyword is greater than the weight assigned to the keyword with a low weight level; or,
  • a second weight obtaining unit configured to assign a weight to each keyword according to an order of occurrence frequency of each keyword in the first keyword set, so that a keyword with a high frequency of occurrence is assigned a weight greater than The weight assigned to a keyword with a low frequency;
  • a third weight obtaining unit configured to assign a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weight level is high.
  • the weight assigned to the keyword is greater than the weight assigned by the keyword with a low weight level;
  • the adjusting unit is configured to adjust the weight assigned to each keyword according to the frequency of occurrence of each keyword in the first keyword set.
  • the type of the keyword includes a noun, a verb or a function word, and a weight level of the noun Higher than the weight level of verbs and function words;
  • the frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name, or the frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name of the specified category, the specified category The category to which the currently open file belongs.
  • the weight of the name in the noun is higher than the weight level of the other noun.
  • the name determining module 305 includes:
  • a weight determining unit configured to determine a weight of each second name according to a weight of the matching keyword included in each second name in the first name
  • the to-be-recommended name determining unit is configured to determine a preset number of second names as the second name to be recommended according to the order of the weight of each second name from large to small.
  • the weight determining unit is configured to determine, as the weight of each of the second names, the sum of the weights of the matching keywords included in each second name in the first name; or
  • the weight determining unit is configured to determine a time weight of each second name according to a publishing time of the file indicated by each second name, and the matching keyword included in each second name is in the preset ratio.
  • the sum value of the weights in the first name and the time weight are weighted to obtain a weighted sum value, and the weighted sum value is determined as the weight of each of the second names.
  • the device provided by the embodiment of the present invention obtains a plurality of alternative second names by processing the first name of the currently opened file, and matches the first name with each second name to determine each second name. Included matching keywords, and determining a weight of the matching keyword according to the part of speech of the matching keyword, thereby determining a second name to be recommended from the plurality of alternative second names according to the weight of the matching keyword, and recommending the determined
  • the file indicated by the second name improves the relevance of the final recommended file name to the name of the currently open file, and improves the recommendation success rate.
  • the file recommendation device provided by the foregoing embodiment is used for recommending a file
  • only the division of the above functional modules is illustrated.
  • the function distribution may be completed by different functional modules as needed.
  • the internal structure of the server is divided into different Functional modules to perform all or part of the functions described above.
  • the document recommendation device and the file recommendation method embodiment provided by the foregoing embodiments are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • the server 400 may generate a large difference due to different configurations or performances, and may include one or more central processing units (CPUs) 422 (for example, One or more processors) and memory 432, one or more storage media 430 that store application 442 or data 444 (eg, one or one storage device in Shanghai).
  • the memory 432 and the storage medium 430 may be short-term storage or persistent storage.
  • the program stored on storage medium 430 may include one or more modules (not shown), each of which may include a series of instruction operations in the server.
  • central processor 422 can be configured to communicate with storage medium 430, executing a series of instruction operations in storage medium 430 on server 400.
  • Server 400 may also include one or more power sources 426, one or more wired or wireless network interfaces 450, one or more input and output interfaces 458, and/or one or more operating systems 441, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
  • the steps performed by the server described in the above embodiments may be based on the server structure shown in FIG.
  • the processor 422 included in the server may execute program instructions stored in the memory 432 to perform the following functions: segmenting the first name to obtain a first keyword set,
  • the first name is a name of the currently open file, the first keyword set includes at least one keyword obtained by the first name word segmentation; and a preset correspondence relationship between the keyword and the file name including the keyword
  • the acquiring the second keyword set corresponding to the at least one second name includes: segmenting the second name for each second name of the at least one second name, and obtaining a second a set of keywords, the second set of keywords including at least one keyword obtained by the second name word segmentation.
  • the method before obtaining the weight of the matching keyword included in each of the second names, further includes: according to the type and frequency of occurrence of each keyword in the first keyword set. Obtaining a weight of each keyword in the first keyword set in the first name; obtaining a weight of the matching keyword included in each second name in the first name The method includes: obtaining weights of the matching keywords included in each of the second names in the first name from weights of each keyword in the first keyword set in the first name.
  • obtaining, according to at least one of a type and an appearance frequency of each keyword in the first keyword set, a weight of each keyword in the first keyword set in the first name includes: assigning a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set, so that a keyword with a high weight level is allocated The weight of the key is greater than the weight assigned by the keyword with a low weight level; or, the weight of each keyword in the first keyword set is assigned a weight according to the frequency of occurrence of each keyword, so that the key appears
  • the key of the high frequency keyword is assigned to the weight assigned by the keyword having a low frequency of occurrence; or, according to the weight level corresponding to the type of each keyword in the first keyword set, according to the weight level from high to low
  • the order is assigned to each of the keywords, so that the key weighted by the keyword with a higher weight level is greater than the key with a low weight level.
  • determining, according to the weight of the matching keyword included in each second name in the first name, determining the second name to be recommended includes: matching keywords according to each of the second names a weight in the first name, determining a weight of each of the second names; determining, according to a weight from each of the second names, a preset number of second names as the waiting The second name recommended.
  • determining, according to the weight of the matching keyword included in each second name in the first name, determining the weight of each of the second names comprises: matching each of the second names The sum value of the weights of the keywords in the first name is determined as the weight of each of the second names; or, according to the publishing time of the file indicated by each of the second names, determining each of the second The time weight of the name, according to a preset ratio, weighting the sum of the weights of the matching keywords included in each of the second names in the first name and the time weights to obtain a weighted sum value, The weighted sum value is determined as the weight of each of the second names.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the field of network technology, and discloses a file recommendation method and device. The method comprises: tokenizing a first title to obtain a first keyword set; obtaining, according to pre-set correspondence between keywords and file titles containing the keywords, at least one second title corresponding to the at least one keyword contained in the first keyword set, and obtaining a second keyword set corresponding to the at least one second title; obtaining keywords that appear both in the first keyword set and in the second keyword sets corresponding to each second title, and using the keywords as the matching keywords; obtaining the weight of the matching keywords in the first title; determining the second title to be recommended; recommending the file indicated by the second title that has been determined. The present invention determines the second title to be recommended by determining the weight of matching keywords, thereby enhancing the relevance between the file title to be recommended and the title of the currently open file, and improving the recommendation success rate.

Description

文件推荐方法和装置Document recommendation method and device
本申请要求于2013年12月5日提交中国专利局、申请号为201310652678.3、发明名称为“文件推荐方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims the priority of the Chinese Patent Application, filed on Dec. 5, 2013, the entire disclosure of which is hereby incorporated by reference.
技术领域Technical field
本发明涉及网络技术领域,特别涉及一种文件推荐方法和装置。The present invention relates to the field of network technologies, and in particular, to a file recommendation method and apparatus.
背景技术Background technique
在日常的线上活动中,用户时时刻刻都在面对着各种各样的信息,但却很难从中筛选出自己真正感兴趣的信息。为了便于用户的筛选,服务器可以根据用户的浏览记录、兴趣爱好等,为用户推荐其可能感兴趣的信息。In daily online activities, users are always faced with all kinds of information, but it is difficult to screen out the information that they are really interested in. In order to facilitate user screening, the server may recommend information that may be of interest to the user according to the user's browsing history, interests, and the like.
以视频为例,在推荐视频时,服务器可以为用户推荐当前播放视频所属的类型下最热门的视频,如,当前播放视频为“体育”类型的视频时,服务器为用户推荐“体育”类型下最热门的视频。或者,服务器计算每个视频的名称与当前播放视频的名称之间的编辑距离(Levenshtein Distance,LD),将名称与当前播放视频的名称之间的LD最小的视频推荐给用户。Taking video as an example, when recommending a video, the server can recommend the most popular video of the type to which the currently played video belongs. For example, when the currently playing video is a "sports" type video, the server recommends the "sports" type for the user. The most popular videos. Alternatively, the server calculates the edit distance (Levenshtein Distance, LD) between the name of each video and the name of the currently playing video, and recommends the video with the smallest LD between the name and the name of the currently playing video to the user.
推荐当前播放视频所属的类型下最热门的视频时,该最热门的视频与当前播放视频的相关度可能很低,进而导致推荐成功率低;而服务器采用计算LD的方法推荐视频时,LD只能机械地度量不同视频名称之间文字编辑层面的差异,使得最终确定推荐的视频名称与当前播放视频名称在语义上可能相差甚远,同样会造成视频相关度很低,进而导致推荐成功率很 低。When recommending the most popular video of the type to which the currently playing video belongs, the relevance of the most popular video to the currently playing video may be low, which may result in a low recommendation success rate; while the server uses the method of calculating LD to recommend the video, the LD only It can mechanically measure the difference in the editing level between different video names, so that the final definition of the recommended video name and the currently playing video name may be far from the semantics, which also causes the video correlation to be very low, which leads to a high recommendation success rate. low.
发明内容Summary of the invention
为了解决现有技术的问题,本发明实施例提供了一种文件推荐方法和装置,技术方案如下。In order to solve the problem of the prior art, the embodiment of the present invention provides a file recommendation method and device, and the technical solution is as follows.
第一方面,提供了一种文件推荐方法,所述方法包括:In a first aspect, a file recommendation method is provided, the method comprising:
对第一名称进行分词,得到第一关键词集合,所述第一名称为当前打开文件的名称,所述第一关键词集合包括所述第一名称分词得到的至少一个关键词;Performing word segmentation on the first name to obtain a first keyword set, the first name is a name of a currently open file, and the first keyword set includes at least one keyword obtained by the first name word segmentation;
根据关键词与包含所述关键词的文件名称之间的预设对应关系,获取与所述第一关键词集合中包括的所述至少一个关键词对应的至少一个第二名称,并获取所述至少一个第二名称对应的第二关键词集合;Acquiring at least one second name corresponding to the at least one keyword included in the first keyword set according to a preset correspondence between a keyword and a file name including the keyword, and acquiring the At least one second keyword set corresponding to the second name;
获取所述第一关键词集合和每个第二名称对应的第二关键词集合中相同的关键词,将所述相同的关键词作为匹配关键词;Obtaining the same keyword in the first keyword set and the second keyword set corresponding to each second name, and using the same keyword as a matching keyword;
获取所述每个第二名称包括的匹配关键词在所述第一名称中的权重;Obtaining a weight of the matching keyword included in each of the second names in the first name;
根据所述每个第二名称包括的匹配关键词在所述第一名称中的权重,确定待推荐的第二名称;Determining a second name to be recommended according to a weight of the matching keyword included in each second name in the first name;
推荐所述确定的第二名称所指示的文件。The file indicated by the determined second name is recommended.
第二方面,提供了一种文件推荐装置,所述装置包括:In a second aspect, a file recommendation apparatus is provided, the apparatus comprising:
第一分词模块,用于对第一名称进行分词,得到第一关键词集合,所述第一名称为当前打开文件的名称,所述第一关键词集合包括所述第一名称分词得到的至少一个关键词;a first participle module, configured to perform word segmentation on the first name, to obtain a first keyword set, the first name is a name of a currently open file, and the first keyword set includes at least the first name word segmentation obtained a keyword;
第二集合获取模块,用于根据关键词与包含所述关键词的文件名称之间的预设预设对应关系,获取与所述第一关键词集合中包括的所述至少 一个关键词对应的至少一个第二名称,并获取所述至少一个第二名称对应的第二关键词集合;a second set obtaining module, configured to acquire, according to a preset preset correspondence between a keyword and a file name including the keyword, the at least one included in the first keyword set Obtaining at least one second name corresponding to a keyword, and acquiring a second keyword set corresponding to the at least one second name;
匹配模块,用于获取所述第一关键词集合和每个第二名称对应的第二关键词集合中相同的关键词,将所述相同的关键词作为匹配关键词;a matching module, configured to acquire the same keyword in the first keyword set and the second keyword set corresponding to each second name, and use the same keyword as a matching keyword;
权重获取模块,用于获取所述每个第二名称包括的匹配关键词在所述第一名称中的权重;a weight obtaining module, configured to obtain a weight of the matching keyword included in each of the second names in the first name;
名称确定模块,用于根据所述每个第二名称包括的匹配关键词在所述第一名称中的权重,确定待推荐的第二名称;a name determining module, configured to determine a second name to be recommended according to a weight of the matching keyword included in each second name in the first name;
推荐模块,用于推荐所述确定的第二名称所指示的文件。A recommendation module for recommending the file indicated by the determined second name.
本发明实施例提供的技术方案带来的有益效果是:The beneficial effects brought by the technical solutions provided by the embodiments of the present invention are:
本发明实施例提供的方法和装置,通过对当前打开文件的第一名称进行处理,得到多个备选的第二名称,将该第一名称与每个第二名称进行匹配,确定每个第二名称包括的匹配关键词,并根据匹配关键词的词性确定匹配关键词的权重,从而根据匹配关键词的权重从多个备选的第二名称中确定待推荐的第二名称,并推荐该确定的第二名称所指示的文件,提高了推荐的文件名称与当前打开文件的名称的相关度,提高了推荐成功率。The method and the device provided by the embodiment of the present invention, by processing the first name of the currently open file, obtaining a plurality of alternative second names, matching the first name with each second name, and determining each The second name includes a matching keyword, and determines a weight of the matching keyword according to the part of speech of the matching keyword, thereby determining a second name to be recommended from the plurality of alternative second names according to the weight of the matching keyword, and recommending the The determined file indicated by the second name improves the relevance of the recommended file name to the name of the currently open file, and improves the recommendation success rate.
附图说明DRAWINGS
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.
图1是本发明实施例提供的一种文件推荐方法的流程图;FIG. 1 is a flowchart of a file recommendation method according to an embodiment of the present invention;
图2是本发明实施例提供的一种文件推荐方法的流程图;2 is a flowchart of a file recommendation method according to an embodiment of the present invention;
图3是本发明实施例提供的一种文件推荐装置的结构示意图; 3 is a schematic structural diagram of a file recommendation apparatus according to an embodiment of the present invention;
图4是本发明实施例提供的一种服务器的结构示意图。FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
图1是本发明实施例提供的一种文件推荐方法的流程图。该发明实施例的执行主体为服务器,参见图1,所述方法包括:FIG. 1 is a flowchart of a file recommendation method according to an embodiment of the present invention. The execution body of the embodiment of the present invention is a server. Referring to FIG. 1, the method includes:
101、对第一名称进行分词,得到第一关键词集合,该第一名称为当前打开文件的名称,该第一关键词集合包括该第一名称分词得到的至少一个关键词。101. Perform word segmentation on the first name to obtain a first keyword set, where the first name is a name of a currently open file, and the first keyword set includes at least one keyword obtained by the first name segmentation.
102、根据关键词与包含所述关键词的文件名称之间的预设对应关系,获取与所述第一关键词集合中包括的所述至少一个关键词对应的至少一个第二名称,并获取该至少一个第二名称对应的第二关键词集合。Obtaining, by the preset correspondence between the keyword and the file name including the keyword, acquiring at least one second name corresponding to the at least one keyword included in the first keyword set, and acquiring The second keyword set corresponding to the at least one second name.
103、获取该第一关键词集合和每个第二名称对应的第二关键词集合中相同的关键词,将该相同的关键词作为匹配关键词。103. Acquire the same keyword in the first keyword set and the second keyword set corresponding to each second name, and use the same keyword as the matching keyword.
104、获取该每个第二名称包括的匹配关键词在该第一名称中的权重。104. Acquire a weight of the matching keyword included in each second name in the first name.
105、根据该每个第二名称包括的匹配关键词在该第一名称中的权重,确定待推荐的第二名称。105. Determine, according to the weight of the matching keyword included in each second name in the first name, the second name to be recommended.
106、推荐该确定的第二名称所指示的文件。106. Recommend the file indicated by the determined second name.
本发明实施例提供的方法,通过对当前打开文件的第一名称进行处理,得到多个备选的第二名称,将该第一名称与每个第二名称进行匹配, 确定每个第二名称包括的匹配关键词,并根据匹配关键词的词性确定匹配关键词的权重,从而根据匹配关键词的权重从多个备选的第二名称中确定待推荐的第二名称,并推荐该确定的第二名称所指示的文件,提高了最终推荐的文件名称与当前打开文件的名称的相关度,提高了推荐成功率。The method provided by the embodiment of the present invention obtains a plurality of alternative second names by processing the first name of the currently opened file, and matches the first name with each second name. Determining a matching keyword included in each second name, and determining a weight of the matching keyword according to the part of speech of the matching keyword, thereby determining a second name to be recommended from the plurality of alternative second names according to the weight of the matching keyword And recommending the file indicated by the determined second name, improving the relevance of the final recommended file name to the name of the currently open file, and improving the recommendation success rate.
可选地,获取该至少一个第二名称对应的第二关键词集合包括:Optionally, obtaining the second keyword set corresponding to the at least one second name includes:
对于该至少一个第二名称中的每个第二名称,对该第二名称进行分词,得到第二关键词集合,该第二关键词集合包括该第二名称分词得到的至少一个关键词。For each second name in the at least one second name, the second name is segmented to obtain a second keyword set, and the second keyword set includes at least one keyword obtained by the second name participle.
可选地,获取该每个第二名称包括的匹配关键词在该第一名称中的权重之前,该方法还包括:Optionally, before obtaining the weight of the matching keyword included in each second name in the first name, the method further includes:
根据该第一关键词集合中每个关键词的类型和出现频率中的至少一项,获取该第一关键词集合中每个关键词在该第一名称中的权重;以及Obtaining a weight of each keyword in the first keyword set in the first name according to at least one of a type and an appearance frequency of each keyword in the first keyword set;
获取所述每个第二名称包括的匹配关键词在所述第一名称中的权重包括:将所述第一关键词集合中每个第二名称包括的匹配关键词在所述第一名称中的权重作为所述每个第二名称包括的匹配关键词在所述第一名称中的权重。Obtaining a weight of the matching keyword included in each of the second names in the first name includes: matching keywords included in each second name in the first keyword set in the first name The weight is used as the weight of the matching keyword included in each of the second names in the first name.
可选地,根据该第一关键词集合中每个关键词的类型和出现频率中的至少一项,获取该第一关键词集合中每个关键词在该第一名称中的权重包括:Optionally, according to at least one of a type and an appearance frequency of each keyword in the first keyword set, obtaining weights of each keyword in the first keyword set in the first name includes:
根据该第一关键词集合中每个关键词的类型对应的权重级别,按照权重级别从高到低的顺序为该每个关键词分配权重,使得权重级别高的关键词所分配的权重大于权重级别低的关键词所分配的权重;或,According to the weight level corresponding to the type of each keyword in the first keyword set, the weights are assigned to each keyword according to the order of weights, so that the keywords with high weight levels are assigned weights greater than the weights. The weight assigned to a keyword with a low level; or,
按照该第一关键词集合中每个关键词的出现频率从高到低的顺序为该每个关键词分配权重,使得出现频率高的关键词所分配的权重大于出现频率低的关键词所分配的权重;或, Assigning weights to each keyword according to the order of occurrence frequency of each keyword in the first keyword set, so that the keywords with high frequency of occurrence are assigned with the weight of the keywords with low frequency of occurrence. Weight; or,
根据该第一关键词集合中每个关键词的类型对应的权重级别,按照权重级别从高到低的顺序为该每个关键词分配权重,使得权重级别高的关键词所分配的权重大于权重级别低的关键词所分配的权重;以及根据该每个关键词的出现频率,对该每个关键词所分配的权重进行调整。According to the weight level corresponding to the type of each keyword in the first keyword set, the weights are assigned to each keyword according to the order of weights, so that the keywords with high weight levels are assigned weights greater than the weights. The weight assigned to the keyword with a low level; and the weight assigned to each keyword is adjusted according to the frequency of occurrence of each keyword.
可选地,该关键词的类型包括名词、动词或虚词,名词的权重级别高于动词和虚词的权重级别;Optionally, the type of the keyword includes a noun, a verb or a function word, and the weight level of the noun is higher than the weight level of the verb and the function word;
该关键词的出现频率为该关键词在已存储的文件名称中出现的频率,或者,该关键词的出现频率为该关键词在已存储的指定类别的文件名称中出现的频率,该指定类别为该当前打开文件所属的类别。The frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name, or the frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name of the specified category, the specified category The category to which the currently open file belongs.
可选地,名词中姓名的权重级别高于其他名词的权重级别。Optionally, the weight of the name in the noun is higher than the weight level of the other noun.
可选地,根据该每个第二名称包括的匹配关键词在该第一名称中的权重,确定待推荐的第二名称包括:Optionally, determining, according to the weight of the matching keyword included in each second name in the first name, determining the second name to be recommended includes:
根据该每个第二名称包括的匹配关键词在该第一名称中的权重,确定该每个第二名称的权重;Determining the weight of each second name according to the weight of the matching keyword included in each second name in the first name;
按照该每个第二名称的权重从大到小的顺序,将预设数目的第二名称确定为该待推荐的第二名称。A preset number of second names is determined as the second name to be recommended in descending order of the weight of each of the second names.
可选地,根据该每个第二名称包括的匹配关键词在该第一名称中的权重,确定该每个第二名称的权重包括:Optionally, determining, according to the weight of the matching keyword included in each second name in the first name, determining the weight of each second name includes:
将该每个第二名称包括的匹配关键词在该第一名称中的权重的和值确定为该每个第二名称的权重;或,Determining a sum of the weights of the matching keywords included in each of the second names in the first name as the weight of each of the second names; or
根据该每个第二名称所指示文件的发布时间,确定该每个第二名称的时间权重,按照预设比例,对该每个第二名称包括的匹配关键词在该第一名称中的权重的和值以及该时间权重进行加权计算,得到加权和值,将该加权和值确定为该每个第二名称的权重。Determining a time weight of each second name according to a publishing time of the file indicated by each second name, and weighting the matching keyword included in each second name in the first name according to a preset ratio The sum value and the time weight are weighted to obtain a weighted sum value, and the weighted sum value is determined as the weight of each of the second names.
上述所有可选技术方案,可以采用任意结合形成本发明的可选实施 例,在此不再一一赘述。All of the above optional technical solutions may be combined to form an optional implementation of the present invention. For example, we will not repeat them here.
图2是本发明实施例提供的一种文件推荐方法的流程图。该发明实施例的执行主体为服务器,参见图2,所述方法包括以下步骤。FIG. 2 is a flowchart of a file recommendation method according to an embodiment of the present invention. The executive body of the embodiment of the invention is a server. Referring to FIG. 2, the method includes the following steps.
201、该服务器对第一名称进行分词,得到第一关键词集合,该第一名称为当前打开文件的名称,该第一关键词集合包括该第一名称分词得到的至少一个关键词。201. The server performs segmentation on the first name to obtain a first keyword set, where the first name is a name of a currently open file, and the first keyword set includes at least one keyword obtained by the first name word segmentation.
本发明实施例应用于用户已打开文件,该服务器根据当前打开文件的名称,为用户推荐其他文件的场景。该服务器可以为与当前打开文件关联的服务器或者与当前打开文件关联的服务器中的功能模块,本发明实施例对此不做限定。The embodiment of the present invention is applied to a scenario in which a user has opened a file, and the server recommends other files according to the name of the currently opened file. The server may be a function module in the server associated with the currently open file or the server associated with the currently open file, which is not limited in this embodiment of the present invention.
进一步地,本发明实施例可以应用于当前打开文件的名称为发布者自定义的名称的场景。与电影名称或电视剧名称等在发布时已规定好的名称不同,发布者自定义的名称可能很长或者很短,可能为一个简单的词语,也可能为一个复杂的句子,本发明实施例可以根据发布者自定义的个性化名称,为用户推荐文件。Further, the embodiment of the present invention can be applied to a scenario in which the name of the currently open file is a publisher-defined name. Different from the name that has been specified at the time of publication, such as the name of the movie or the name of the TV show, the name of the publisher may be very long or short, and may be a simple word or a complicated sentence. Recommend files for users based on the publisher's customized personalized name.
其中,该文件可以为服务器所提供的视频文件、音频文件或者文本文件等,如视频网站服务器提供的网络视频文件、音频网站提供的音频文件或文档共享服务器所提供的网络文档等,本发明实施例对此不做限定。The file may be a video file, an audio file, or a text file provided by the server, such as a network video file provided by a video website server, an audio file provided by an audio website, or a network document provided by a document sharing server, etc., which is implemented by the present invention. This example does not limit this.
具体地,该服务器在检测到用户打开文件时,获取当前打开文件的名称作为第一名称,并对该第一名称进行分词,得到该第一名称的至少一个关键词,将该至少一个关键词组成该第一关键词集合。Specifically, when detecting that the user opens the file, the server obtains the name of the currently opened file as the first name, and performs segmentation on the first name to obtain at least one keyword of the first name, and the at least one keyword. The first keyword set is composed.
其中,对第一名称进行分词是指将第一名称分割成一个或若干个词或语素。Wherein, segmenting the first name means dividing the first name into one or several words or morphemes.
例如,该第一名称为“刘德华出席张学友的演唱会时穿的服装”,则对该第一名称进行分词,得到第一关键词集合{刘德华,张学友,演唱会, 服装}。For example, the first name is “The costume worn by Andy Lau when attending Jacky Cheung’s concert”, and the first name is segmented to get the first keyword set {Andy Lau, Jacky Cheung, concert, clothing}.
其中,该服务器在对该第一名称分词时,可以采用基于字符串匹配的分词方法或者基于统计的分词方法,本发明实施例对此不做限定。The server may use a word segmentation-based word segmentation method or a statistical-based word segmentation method when the first name is segmented. The embodiment of the present invention does not limit this.
202、该服务器根据关键词与包含所述关键词的文件名称之间的预设对应关系,获取与所述第一关键词集合中包括的所述至少一个关键词对应的至少一个第二名称。202. The server acquires at least one second name corresponding to the at least one keyword included in the first keyword set according to a preset correspondence between a keyword and a file name including the keyword.
其中,该第一关键词集合包括至少一个关键词,而对于该第一关键词集合中的每个关键词来说,该服务器通过查询该预设对应关系,即可得到包含该第一关键词集合中的任一个或多个关键词的文件名称。The first keyword set includes at least one keyword, and for each keyword in the first keyword set, the server obtains the first keyword by querying the preset correspondence. The file name of any one or more keywords in the collection.
例如,该第一名称、该第一关键词集合中的关键词以及每个关键词对应的第二名称之间的对应关系如表1所示。For example, the correspondence between the first name, the keywords in the first keyword set, and the second name corresponding to each keyword is as shown in Table 1.
表1Table 1
Figure PCTCN2015072103-appb-000001
Figure PCTCN2015072103-appb-000001
Figure PCTCN2015072103-appb-000002
Figure PCTCN2015072103-appb-000002
可选地,在该步骤202之前,该方法还包括:根据该服务器已存储的文件名称,建立该预设对应关系。Optionally, before the step 202, the method further includes: establishing the preset correspondence according to the file name that is stored by the server.
具体地,该服务器对已存储的所有文件的名称进行分词,得到每个文件名称包含的关键词;对于一个关键词,根据该每个文件名称包含的关键词,得到包含该关键词的文件名称;建立该关键词与包含该关键词的文件名称之间的预设对应关系。Specifically, the server classifies the names of all the stored files to obtain keywords included in each file name; for a keyword, according to the keywords included in each file name, the file name including the keyword is obtained. Establishing a preset correspondence between the keyword and the file name containing the keyword.
进一步可选地,该服务器对每个文件名称包含的关键词建立倒排索引,将建立的倒排索引确定为该预设对应关系。Further optionally, the server establishes an inverted index for the keywords included in each file name, and determines the established inverted index as the preset correspondence.
203、对于该至少一个第二名称中的每个第二名称,该服务器对该第二名称进行分词,得到第二关键词集合,该第二关键词集合包括该第二名称分词得到的至少一个关键词。203. For each second name in the at least one second name, the server classifies the second name to obtain a second keyword set, where the second keyword set includes at least one obtained by the second name participle Key words.
基于步骤202的举例,其中一个第二名称为“刘德华演唱会全集”,则该服务器对该第二名称进行分词后得到该第二关键词集合{刘德华,演唱会,全集}。Based on the example of step 202, one of the second names is "The Complete Works of Andy Lau Concert", and the server divides the second name to obtain the second keyword set {Andy Lau, concert, complete works}.
其中,该服务器在对该第二名称分词时,也可以采用基于字符串匹 配的分词方法或者基于统计的分词方法,本发明实施例对此不做限定。Wherein, the server may also use a string based on the word segmentation The method of the word segmentation or the method of word segmentation based on statistics is not limited in the embodiment of the present invention.
204、该服务器获取该第一关键词集合和每个第二名称对应的第二关键词集合中相同的关键词,将该相同的关键词作为匹配关键词。204. The server acquires the same keyword in the first keyword set and the second keyword set corresponding to each second name, and uses the same keyword as the matching keyword.
具体地,对于该第一关键词集合中的一个关键词,遍历该第二关键词集合,判断该第二关键词集合中是否包括该关键词,当该第二关键词集合中包括该关键词时,将该关键词作为匹配关键词,依此对该第一关键词集合中的每个关键词进行上述判断,获取至少一个匹配关键词。或者,对于该第二关键词集合中的一个关键词,遍历该第一关键词集合,判断该第一关键词集合中是否包括该关键词,当该第一关键词集合中包括该关键词时,将该关键词作为匹配关键词,依此对该第二关键词集合中的每个关键词进行上述判断,获取至少一个匹配关键词。Specifically, for a keyword in the first keyword set, traversing the second keyword set, determining whether the keyword is included in the second keyword set, and including the keyword in the second keyword set When the keyword is used as a matching keyword, the above judgment is performed on each keyword in the first keyword set, and at least one matching keyword is acquired. Or, for one keyword in the second keyword set, traversing the first keyword set, determining whether the keyword is included in the first keyword set, when the keyword is included in the first keyword set The keyword is used as a matching keyword, and the above judgment is performed on each keyword in the second keyword set, thereby acquiring at least one matching keyword.
基于步骤201和步骤203的举例,该第一关键词集合为{刘德华,张学友,演唱会,服装},该第二关键词集合为{刘德华,演唱会,全集},则该匹配关键词为“刘德华”和“演唱会”。Based on the examples of step 201 and step 203, the first keyword set is {Andy Lau, Jacky Cheung, concert, costume}, and the second keyword set is {Andy Lau, concert, complete set}, then the matching keyword is “ Andy Lau" and "concert".
205、该服务器根据该第一关键词集合中每个关键词的类型对应的权重级别,按照权重级别从高到低的顺序为该每个关键词分配权重,使得权重级别高的关键词所分配的权重大于权重级别低的关键词所分配的权重。205. The server allocates a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set according to a weighting level, so that a keyword with a high weight level is allocated. The weight of the key is greater than the weight assigned by the keyword with a low weight level.
在本发明实施例中,该第一关键词集合和该第二关键词集合包括至少一个相同的匹配关键词,但该第一名称和该第二名称在语义上可能相差很大。因此,在选择待推荐的第二名称时,为了提高待推荐的第二名称与该第一名称的相关度,通过为该第一关键词集合中的关键词分配权重,相应确定每个第二名称的权重,以提高最终确定的待推荐的第二名称与该第一名称的相关度。In the embodiment of the present invention, the first keyword set and the second keyword set include at least one identical matching keyword, but the first name and the second name may be semantically different. Therefore, when selecting the second name to be recommended, in order to improve the relevance of the second name to be recommended and the first name, each second is determined correspondingly by assigning weights to the keywords in the first keyword set. The weight of the name to improve the relevance of the finalized second name to be recommended to the first name.
具体地,该服务器预先设定每个关键词的类型所对应的权重级别,在该服务器确定该第一关键词集合中每个关键词的类型时,根据该服务器 预先设定的每种类型对应的权重级别,确定该每个关键词的权重级别,按照权重级别从高到低的顺序,对该每个关键词进行排序,并分配权重,使得权重级别高的关键词所分配的权重大于权重级别低的关键词所分配的权重。Specifically, the server presets a weight level corresponding to each keyword type, and when the server determines the type of each keyword in the first keyword set, according to the server Determining the weight level corresponding to each type, determining the weight level of each keyword, sorting each keyword according to the order of weight level from high to low, and assigning weights, so that the weight level is high. The weight assigned to the keyword is greater than the weight assigned by the keyword with a low weight level.
可选地,该第一关键词集合中每个关键词所分配的权重的和值为1。Optionally, the sum of the weights assigned to each keyword in the first keyword set is 1.
进一步可选地,该关键词的类型包括名词、动词或虚词,名词的权重级别高于动词和虚词的权重级别,且名词中姓名的权重级别高于其他名词的权重级别。Further optionally, the type of the keyword includes a noun, a verb or a function word, the weight level of the noun is higher than the weight level of the verb and the function word, and the weight level of the name in the noun is higher than the weight level of the other noun.
如,该第一名称为“刘德华出席张学友的演唱会时穿的服装”,其中的名词“刘德华”、“张学友”、“演唱会”、“服装”的权重级别要高于动词“出席”、“穿”和虚词“的”、“时”的权重级别。For example, the first name is “The costume worn by Andy Lau when attending Zhang Xueyou’s concert”. The weights of the terms “Andy Lau”, “Zhang Xueyou”, “concert” and “clothing” are higher than the verb “attendance”. The weight level of “wearing” and the vocabulary “of” and “time”.
其中,名词中的姓名可以是人名、地名、机构名称、商标名称等,本发明实施例对此不做限定。姓名的权重级别高于其他名词的权重级别,如“刘德华”、“张学友”的权重级别高于“演唱会”、“服装”的权重级别。The name in the noun may be a person name, a place name, an organization name, a brand name, and the like, which is not limited by the embodiment of the present invention. The weight level of the name is higher than the weight level of other nouns. For example, the weight level of "Andy Lau" and "Zhang Xueyou" is higher than the weight level of "concert" and "clothing".
仍以该第一名称为“刘德华出席张学友的演唱会时穿的服装”为例,该服务器确定“刘德华”、“张学友”的权重级别高于“演唱会”、“服装”的权重级别,“演唱会”、“服装”的权重级别高于“出席”、“穿”、“的”、“时”的权重级别,则该服务器可以为关键词“刘德华”分配权重0.3,为关键词“张学友”分配权重0.3,为关键词“演唱会”分配权重0.2,为关键词“服装”分配权重0.1,为关键词“出席”分配权重0.1,其余关键词分配权重0。For example, the first name is “The clothing worn by Andy Lau at the concert of Jacky Cheung’s concert”. The server determines that the weight of “Andy Lau” and “Zhang Xueyou” is higher than the weight level of “concert” and “clothing”. The weight of the concert and clothing is higher than the weight of the "attendance", "wearing", "", and "time", then the server can assign a weight of 0.3 to the keyword "Andy Lau" as the keyword "Zhang Xueyou" The distribution weight is 0.3, the weight is 0.2 for the keyword "concert", the weight is 0.1 for the keyword "clothing", the weight is 0.1 for the keyword "attendance", and the weight is 0 for the remaining keywords.
在本发明实施例提供的另一实施例中,该步骤205可以由以下步骤(1)代替:In another embodiment provided by the embodiment of the present invention, the step 205 may be replaced by the following step (1):
(1)按照该每个关键词的出现频率从高到低的顺序为该每个关键词分配权重,使得出现频率高的关键词所分配的权重大于出现频率低的关键 词所分配的权重。(1) assigning weights to each keyword according to the order in which the frequency of occurrence of each keyword is from high to low, so that the key to the keyword with high frequency is assigned to the key having a lower frequency of occurrence. The weight assigned by the word.
在本发明实施例中,可以认为该第一关键词集合中出现频率较高的关键词更为热门,则用户很可能对与该出现频率较高的关键词相关的文件感兴趣,即可以根据该第一关键词集合的每个关键词的出现频率分配权重。In the embodiment of the present invention, it may be considered that a keyword with a higher frequency of occurrence in the first keyword set is more popular, and the user is likely to be interested in a file related to the keyword having a higher frequency of occurrence, that is, according to The frequency of occurrence of each keyword of the first keyword set is assigned a weight.
可选地,该关键词的出现频率为该关键词在已存储的文件名称中出现的频率,或者,该关键词的出现频率为该关键词在已存储的指定类别的文件名称中出现的频率,该指定类别为该当前打开文件所属的类别。Optionally, the frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name, or the frequency of occurrence of the keyword is the frequency at which the keyword appears in the file name of the specified specified category. , the specified category is the category to which the currently open file belongs.
其中,该当前打开文件可能属于某一个子类别,该子类别还属于某一母类别,则该服务器可以按照推荐精度需求的不同,确定该当前打开文件所属的指定类别。The current open file may belong to a certain subcategory, and the subcategory also belongs to a certain parent category, and the server may determine the specified category to which the currently open file belongs according to different requirements of the recommended precision.
如当前打开文件的名称为“恒大夺冠”,属于体育类别中的足球类别,则该服务器可以计算该关键词“夺冠”在足球类别的文件名称中的出现频率,以为该关键词“夺冠”分配权重,而不是计算该关键词“夺冠”在所有类别的文件名称中的出现频率或者在体育类别的文件名称中的出现频率。If the name of the currently open file is "Hengda wins the championship" and belongs to the football category in the sports category, the server can calculate the frequency of occurrence of the keyword "winning the crown" in the file name of the football category, thinking that the keyword "wins the championship" Instead of calculating the frequency of occurrence of the keyword "winning" in the file name of all categories or the frequency of occurrence in the file name of the sports category, the weight is assigned.
进一步地,该出现频率可以为词频(Term Frequency,TF)或者文件频率(Document Frequency,DF)。Further, the frequency of occurrence may be a term frequency (TF) or a file frequency (DF).
仍以该第一名称为“刘德华出席张学友的演唱会时穿的服装”为例,该服务器确定该第一名称属于歌手类别,则计算关键词“刘德华”、“张学友”、“演唱会”、“服装”在歌手类别的文件名称中的出现频率,如果最终计算出的关键词“刘德华”、“张学友”和“演唱会”的出现频率分别为0.3、0.2和0.1,则该服务器可以按照出现频率从高到低的顺序,为关键词“刘德华”分配权重0.5,为关键词“张学友”分配权重0.4,为关键词“演唱会”分配权重0.1,其余关键词分配权重为0。For example, if the first name is "clothes worn by Andy Lau at the concert of Jacky Cheung", the server determines that the first name belongs to the singer category, and then calculates the keywords "Andy Lau", "Zhang Xueyou", "concert", The frequency of occurrence of "clothing" in the file name of the singer category. If the final calculated keywords "Andy Lau", "Zhang Xueyou" and "concert" appear at frequencies of 0.3, 0.2 and 0.1 respectively, the server can appear as follows. In the order of frequency from high to low, the keyword "Andy Lau" is assigned a weight of 0.5, the keyword "Zhang Xueyou" is assigned a weight of 0.4, the keyword "concert" is assigned a weight of 0.1, and the remaining keywords are assigned a weight of 0.
进一步可选地,该服务器计算在预设时长内该服务器存储的文件名 称中该每个关键词的出现频率。其中,该预设时长可以由该服务器预先设定。Further optionally, the server calculates a file name stored by the server within a preset duration The frequency of occurrence of each keyword is called. The preset duration can be preset by the server.
上述步骤205和步骤(1)是分别根据该第一关键词集合中每个关键词的类型对应的权重级别和每个关键词的出现频率分配权重,事实上,该服务器还可以通过综合考虑每个关键词的类型对应的权重级别以及出现频率来分配权重。即在本发明实施例提供的又一实施例中,该步骤205还可以由以下步骤(2)代替:The above step 205 and step (1) are respectively assigning weights according to the weight level corresponding to the type of each keyword in the first keyword set and the appearance frequency of each keyword. In fact, the server can also comprehensively consider each The type of the keyword corresponds to the weight level and the frequency of occurrence to assign weights. That is, in another embodiment provided by the embodiment of the present invention, the step 205 may be replaced by the following step (2):
(2)根据该每个关键词的类型对应的权重级别,按照权重级别从高到低的顺序为该每个关键词分配权重,使得权重级别高的关键词所分配的权重大于权重级别低的关键词所分配的权重;根据该每个关键词的出现频率,对该每个关键词所分配的权重进行调整。(2) assigning weights to each keyword according to the weight level corresponding to the type of each keyword, so that the weight of the keyword with a high weight level is greater than the weight level. The assigned weights of the keywords; the weights assigned to each keyword are adjusted according to the frequency of occurrence of each keyword.
在实际应用中,可以认为出现频率高的关键词更为热门,但是出现频率高的关键词对应的第二名称与该第一名称的相关度可能很低,用户并不一定对该热门的第二名称所指示的文件感兴趣。而在本发明实施例中,该服务器还可以在按照每个关键词的类型对应的权重级别,为该每个关键词分配权重后,按照该每个关键词的出现频率,对该每个关键词所分配的权重进行调整。通过综合考虑该第二名称与该第一名称的相关度以及该第二名称的出现频率,既可以提高最终确定的待推荐的第二名称与该第一名称的相关度,也可以优先选择出现频率较高的文件推荐给用户。In practical applications, keywords with high frequency can be considered more popular, but the relevance of the second name corresponding to the keyword with high frequency and the first name may be low, and the user may not necessarily be the popular one. The file indicated by the second name is of interest. In the embodiment of the present invention, the server may further assign a weight to each keyword according to a weight level corresponding to the type of each keyword, and according to the frequency of occurrence of each keyword, each key The weight assigned by the word is adjusted. By comprehensively considering the degree of correlation between the second name and the first name and the frequency of occurrence of the second name, the degree of correlation between the finally determined second name to be recommended and the first name may be improved, or may be preferentially selected. Higher frequency files are recommended to the user.
进一步地,该步骤(2)中的“根据该每个关键词的出现频率,对该每个关键词所分配的权重进行调整”,可以采用以下任一种方式:Further, in the step (2), "the weight assigned to each keyword is adjusted according to the frequency of occurrence of each keyword", and any of the following methods may be adopted:
(2-1)根据该每个关键词的出现频率,确定调整幅度,按照确定的调整幅度,对该每个关键词所分配的权重进行相应的增大或减小。(2-1) According to the frequency of occurrence of each keyword, the adjustment range is determined, and the weights assigned to each keyword are correspondingly increased or decreased according to the determined adjustment range.
如,该服务器为关键词“刘德华”、“张学友”、“演唱会”、“服装”、“出席”分配的权重为0.3、0.3、0.2、0.1、0.1,并计算出在时装周期间关 键词“刘德华”、“张学友”、“演唱会”、“服装”和“出席”的出现频率分别为0.3、0.2、0.1、0.2和0.01,则确定关键词“刘德华”、“张学友”、“演唱会”、“服装”和“出席”的调整幅度为0.025、0.025、-0.1、0.15、-0.1,则根据该调整幅度,对该每个关键词进行调整后,最终确定分配的权重为0.275、0.275、0.1、0.25、0。For example, the server assigns weights of 0.3, 0.3, 0.2, 0.1, and 0.1 to the keywords “Andy Lau”, “Zhang Xueyou”, “concert”, “clothing”, and “attendance”, and calculates the relationship during the fashion week. The frequency of the keywords "Andy Lau", "Zhang Xueyou", "concert", "clothing" and "attendance" are 0.3, 0.2, 0.1, 0.2 and 0.01 respectively, and the keywords "Andy Lau", "Zhang Xueyou", " The adjustment range of "concert", "clothing" and "attendance" is 0.025, 0.025, -0.1, 0.15, -0.1. According to the adjustment range, after adjusting each keyword, the weight of the distribution is finally determined to be 0.275. , 0.275, 0.1, 0.25, 0.
(2-2)根据该每个关键词的出现频率,将出现频率大于等于预设阈值的关键词所分配的权重增加预设调整权重,将出现频率小于预设阈值的关键词所分配的权重减少所述预设调整权重。(2-2) according to the frequency of occurrence of each keyword, the weight assigned to the keyword whose frequency is greater than or equal to the preset threshold is increased by a preset adjustment weight, and the weight of the keyword whose frequency is less than the preset threshold is assigned The preset adjustment weight is reduced.
如,该服务器确定该预设阈值为0.2,该预设调整权重为0.05,则当该服务器为关键词“刘德华”、“张学友”、“演唱会”、“服装”、“出席”分配的权重为0.3、0.3、0.2、0.1、0.1,并计算出关键词“刘德华”、“张学友”、“演唱会”、“服装”和“出席”的出现频率分别为0.3、0.2、0.1、0.2和0.01时,将出现频率大于等于0.2的关键词“刘德华”、“张学友”、“服装”所分配的权重增加0.05,将出现频率小于0.2的关键词“演唱会”、“出席”所分配的权重减少0.05,则最终确定分配的权重为0.25、0.25、0.15、0.15、0.05。For example, the server determines that the preset threshold is 0.2, and the preset adjustment weight is 0.05, and the server assigns weights to the keywords “Andy Lau”, “Zhang Xueyou”, “concert”, “clothing”, and “attendance”. For 0.3, 0.3, 0.2, 0.1, 0.1, and calculate the frequency of the keywords "Andy Lau", "Zhang Xueyou", "concert", "clothing" and "attendance" are 0.3, 0.2, 0.1, 0.2 and 0.01 respectively. At the time, the weights assigned to the keywords "Andy Lau", "Zhang Xueyou" and "clothing" with a frequency greater than or equal to 0.2 will increase by 0.05, and the weights assigned to the keywords "concert" and "attendance" with a frequency less than 0.2 will be reduced. 0.05, the weight of the distribution is finally determined to be 0.25, 0.25, 0.15, 0.15, 0.05.
需要说明的是,本发明实施例以该步骤205在该步骤204之后执行为例进行说明,实际上,该步骤205只需在该步骤201之后、该步骤206之前执行即可,即该步骤205还可以在该步骤204之前执行,或者与该步骤204同时执行,本发明实施例对该步骤205的执行时机不做限定。It should be noted that the embodiment of the present invention is described by taking the step 205 after the step 204 as an example. In fact, the step 205 only needs to be performed after the step 201 and before the step 206, that is, the step 205 The execution time of the step 205 is not limited, and may be performed by the embodiment of the present invention.
206、该服务器获取该每个第二名称所包括的匹配关键词在该第一名称中的权重。206. The server acquires a weight of the matching keyword included in each second name in the first name.
在本发明实施例中,该服务器已确定该第一关键词集合中每个关键词在该第一名称中的权重,也即是已确定每个匹配关键词在该第一名称中的权重,则该服务器确定每个第二名称包括的匹配关键词,以及每个第二名称包括的匹配关键词在该第一名称中的权重。 In the embodiment of the present invention, the server has determined the weight of each keyword in the first keyword set in the first name, that is, the weight of each matching keyword in the first name has been determined. Then the server determines a matching keyword included in each second name, and a weight of the matching keyword included in each second name in the first name.
基于表1,假设该第一名称为“刘德华出席张学友的演唱会时穿的服装”,且该服务器为关键词“刘德华”分配权重0.3,为关键词“张学友”分配权重0.3,为关键词“演唱会”分配权重0.2,为关键词“服装”分配权重0.1,为关键词“出席”分配权重0.1,其余关键词分配权重0,则该服务器确定的每个第二名称包括的匹配关键词在该第一名称中的权重可以如表2所示。Based on Table 1, it is assumed that the first name is "the costume worn by Andy Lau when attending Zhang Xueyou's concert", and the server assigns a weight of 0.3 to the keyword "Andy Lau" and a weight of 0.3 to the keyword "Zhang Xueyou" as the keyword " The concert assigns a weight of 0.2, assigns a weight of 0.1 to the keyword "clothing", assigns a weight of 0.1 to the keyword "attendance", and assigns a weight of 0 to the remaining keywords, and the matching keyword included in each second name determined by the server is The weights in the first name can be as shown in Table 2.
表2Table 2
Figure PCTCN2015072103-appb-000003
Figure PCTCN2015072103-appb-000003
207、该服务器根据该每个第二名称所指示文件的发布时间,确定该每个第二名称的时间权重,按照预设比例,对该每个第二名称包括的匹配关键词在该第一名称中的权重的和值以及该时间权重进行加权计算,得到加权和值,将该加权和值确定为该每个第二名称的权重。207. The server determines, according to a publishing time of the file indicated by each second name, a time weight of each second name. According to a preset ratio, a matching keyword included in each second name is at the first The sum value of the weights in the name and the time weight are weighted to obtain a weighted sum value, and the weighted sum value is determined as the weight of each of the second names.
在本发明实施例中,第二名称所指示文件可能为最新发布的文件,也可能为早已发布的文件,而文件的发布时间不同,用户感兴趣的程度也不同,即发布时间会影响到用户感兴趣的程度,进而影响到推荐成功率。因此,在确定该待推荐的第二名称时,需要考虑到该每个第二名称所指示文件的发布时间。In the embodiment of the present invention, the file indicated by the second name may be the newly released file, or may be a file that has already been released, and the publishing time of the file is different, and the degree of interest of the user is also different, that is, the publishing time affects the user. The degree of interest, which in turn affects the recommended success rate. Therefore, when determining the second name to be recommended, it is necessary to consider the release time of the file indicated by each second name.
具体地,该服务器计算该每个第二名称包括的匹配关键词在该第一名称中的权重的和值,并按照该每个第二名称所指示文件的发布时间,对 该每个第二名称进行排序,按照排列顺序,为该每个第二名称分配时间权重,使得发布时间晚的第二名称的时间权重高于发布时间早的第二名称的时间权重。该服务器按照该预设比例,对该和值和该时间权重进行加权计算,得到加权和值,即为每个第二名称的权重。Specifically, the server calculates a sum value of the weights of the matching keywords included in each second name in the first name, and according to the publishing time of the file indicated by each second name, Each of the second names is sorted, and each of the second names is assigned a time weight according to the sorting order, such that the time weight of the second name with the late release time is higher than the time weight of the second name with the earlier release time. The server performs weighting calculation on the sum value and the time weight according to the preset ratio, and obtains a weighted sum value, that is, a weight of each second name.
其中,该预设比例是指该和值和该时间权重之间的比例,根据该比例,可以确定在进行加权计算时该和值和该时间权重的加权系数。该预设比例可以由该服务器预先设定,也可以由该服务器在使用过程中进行调整,如当前打开文件的发布时间较早时,该时间权重所占比例较小,而当前打开文件为“新闻”等时效性较强的类型的文件时,该时间权重所占比例较大,本发明实施例对此不做限定。The preset ratio refers to a ratio between the sum value and the time weight, and according to the ratio, the weighting coefficient of the sum value and the time weight when performing the weighting calculation may be determined. The preset ratio may be preset by the server, or may be adjusted by the server during use. For example, when the currently opened file is published earlier, the time weight is smaller, and the current open file is “ When the type of the document is of a type that is more time-sensitive, the time weight is a large proportion, which is not limited by the embodiment of the present invention.
基于表2,对于第二名称“刘德华演唱会全集”,假设该服务器为该第二名称分配的时间权重为0.4,且该预设比例为6∶4,则该服务器计算该第二名称包括的匹配关键词的权重和值为0.5,计算该第二名称的权重即为0.5*0.6+0.4*0.4=0.46。Based on Table 2, for the second name "Andy Lau Concert Complete Works", assuming that the server assigns a time weight of 0.4 to the second name, and the preset ratio is 6:4, the server calculates the second name included The weight of the matching keyword is 0.5, and the weight of the second name is 0.5*0.6+0.4*0.4=0.46.
进一步地,该服务器可以预先设定发布时间和当前时间之间的时间间隔与时间权重的对应关系,即确定每个时间间隔所对应的时间权重,则该服务器可以计算该每个第二名称所指示文件的发布时间与当前时间之间的时间间隔,根据该预先设定的对应关系,确定该每个第二名称的时间权重。Further, the server may pre-set a correspondence between a time interval and a time weight between the release time and the current time, that is, determine a time weight corresponding to each time interval, and the server may calculate each second name And indicating a time interval between the publishing time of the file and the current time, and determining a time weight of each of the second names according to the preset correspondence.
如,该服务器预先设定该时间间隔为1天的第二名称的时间权重为0.9,该时间间隔为2天的第二名称的时间权重为0.8......则对于一个第二名称来说,该服务器确定该第二名称所指示的文件的发布时间与当前时间之间的时间间隔为4天时,确定该第二名称的时间权重为0.6。For example, the server presets that the time weight of the second name with the time interval of 1 day is 0.9, and the time weight of the second name with the time interval of 2 days is 0.8... for a second name For example, when the server determines that the time interval between the publishing time of the file indicated by the second name and the current time is 4 days, the time weight of the second name is determined to be 0.6.
需要说明的是,上述步骤207为可选步骤,该服务器还可以不考虑文件发布时间的影响,而仅按照每个第二名称所包括的匹配关键词在该第 一名称中的权重,确定每个第二名称的权重,即在本发明实施例提供的另一实施例中,该步骤207可以由以下步骤代替:将该每个第二名称包括的匹配关键词在该第一名称中的权重的和值确定为该每个第二名称的权重。如基于表2,该第二名称为“刘德华演唱会全集”,则该服务器计算该第二名称包括的匹配关键词的权重和值为0.5,即确定该第二名称的权重为0.5。It should be noted that the foregoing step 207 is an optional step, and the server may also consider the impact of the file publishing time only, and only according to the matching keyword included in each second name. A weight in a name is used to determine the weight of each second name. In another embodiment provided by the embodiment of the present invention, the step 207 may be replaced by the following steps: matching keywords included in each second name The sum of the weights in the first name is determined as the weight of each of the second names. For example, based on Table 2, the second name is "The Complete Works of Andy Lau Concert", the server calculates the weight of the matching keyword included in the second name and the value is 0.5, that is, the weight of the second name is determined to be 0.5.
208、该服务器按照该每个第二名称的权重从大到小的顺序,将预设数目的第二名称确定为该待推荐的第二名称。208. The server determines, according to a weight of each second name, a preset number of second names as the second name to be recommended.
其中,该预设数目可以由该服务器预先设定,或者由该服务器根据当前打开文件的显示界面中的推荐区域能显示的文件数目确定,本发明实施例对此不做限定。The preset number may be preset by the server, or may be determined by the server according to the number of files that can be displayed in the recommended area of the currently open file display interface, which is not limited by the embodiment of the present invention.
具体地,该服务器按照权重从大到小的顺序,对该每个第二名称进行排序,并在排在前预设数目的第二名称确定为该待推荐的第二名称,以便将排在前预设数目的第二名称所指示的文件推荐给用户。Specifically, the server sorts each second name according to the order of weights from large to small, and determines a second name to be recommended as the second name to be recommended before being ranked, so as to be ranked The file indicated by the previous preset number of second names is recommended to the user.
209、该服务器推荐该确定的第二名称所指示的文件。209. The server recommends the file indicated by the determined second name.
在本发明实施例中,该服务器推荐该确定的第二名称所指示的文件时,可以在当前打开文件的显示界面上提供该确定的第二名称的链接地址,该链接地址用于跳转至该确定的第二名称所指示的文件。另外,该服务器还可以显示该确定的第二名称所指示的文件生成的缩略图,或者显示发布者、发布时间等相关信息等,本发明实施例对此不做限定。In the embodiment of the present invention, when the server recommends the file indicated by the determined second name, the determined link address of the second name may be provided on the display interface of the currently open file, and the link address is used to jump to The file indicated by the second name of the determination. In addition, the server may also display a thumbnail generated by the file indicated by the determined second name, or display related information such as a publisher, a publishing time, and the like, which are not limited in this embodiment of the present invention.
进一步地,对于多个该确定的第二名称来说,可以按照权重顺序依次进行推荐,还可以按照发布时间依次进行推荐,本发明实施例对此均不做限定。Further, for a plurality of the determined second names, the recommendations may be sequentially performed in the order of weights, and the recommendations may be sequentially performed according to the release time, which is not limited in the embodiment of the present invention.
本发明实施例提供的方法,通过对当前打开文件的第一名称进行处理,得到多个备选的第二名称,将该第一名称与每个第二名称进行匹配,确定每个第二名称包括的匹配关键词,并根据匹配关键词的词性确定匹配 关键词的权重,从而根据匹配关键词的权重从多个备选的第二名称中确定待推荐的第二名称,并推荐该确定的第二名称所指示的文件,提高了最终推荐的文件名称与当前打开文件的名称的相关度,提高了推荐成功率。进一步地,考虑到文件的发布时间的因素,通过计算该每个第二名称的时间权重来确定该待推荐的第二名称,进一步提高了推荐成功率。The method provided by the embodiment of the present invention obtains a plurality of alternative second names by processing the first name of the currently opened file, and matching the first name with each second name to determine each second name. Include matching keywords and determine matches based on the part of speech of the matching keywords The weight of the keyword, thereby determining the second name to be recommended from the plurality of alternative second names according to the weight of the matching keyword, and recommending the file indicated by the determined second name, and improving the final recommended file name The degree of relevance to the name of the currently open file increases the recommended success rate. Further, considering the factor of the publication time of the file, determining the second name to be recommended by calculating the time weight of each second name further improves the recommendation success rate.
图3是本发明实施例提供的一种文件推荐装置的结构示意图,参见图3,该装置包括:第一分词模块301、第二集合获取模块302、匹配模块303、权重获取模块304、名称确定模块305、推荐模块306。FIG. 3 is a schematic structural diagram of a file recommendation apparatus according to an embodiment of the present invention. Referring to FIG. 3, the apparatus includes: a first word segmentation module 301, a second set acquisition module 302, a matching module 303, a weight acquisition module 304, and a name determination. Module 305, recommendation module 306.
第一分词模块301,用于对第一名称进行分词,得到第一关键词集合,该第一名称为当前打开文件的名称,该第一关键词集合包括该第一名称分词得到的至少一个关键词。The first participle module 301 is configured to perform segmentation on the first name to obtain a first keyword set, where the first name is a name of a currently open file, and the first keyword set includes at least one key obtained by the first name word segmentation. word.
第二集合获取模块302与第一分词模块301连接,用于根据关键词与包含所述关键词的文件名称之间的预设对应关系,获取与所述第一关键词集合中包括的所述至少一个关键词对应的至少一个第二名称,并获取该至少一个第二名称对应的第二关键词集合。The second set obtaining module 302 is connected to the first word segmentation module 301, and configured to acquire, according to a preset correspondence between the keyword and the file name including the keyword, the method included in the first keyword set. At least one second name corresponding to the at least one keyword, and acquiring the second keyword set corresponding to the at least one second name.
匹配模块303与第二集合获取模块302连接,用于获取该第一关键词集合和每个第二名称对应的第二关键词集合中相同的关键词,将该相同的关键词作为匹配关键词。The matching module 303 is connected to the second set obtaining module 302, and is configured to acquire the same keyword in the first keyword set and the second keyword set corresponding to each second name, and use the same keyword as a matching keyword. .
权重获取模块304与匹配模块303连接,用于获取该每个第二名称包括的匹配关键词在该第一名称中的权重。The weight obtaining module 304 is connected to the matching module 303, and is configured to obtain a weight of the matching keyword included in each second name in the first name.
名称确定模块305与权重获取模块304连接,用于根据该每个第二名称包括的匹配关键词在该第一名称中的权重,确定待推荐的第二名称。The name determining module 305 is connected to the weight obtaining module 304, and is configured to determine a second name to be recommended according to the weight of the matching keyword included in each second name in the first name.
推荐模块306与名称确定模块305连接,用于推荐该确定的第二名称所指示的文件。The recommendation module 306 is coupled to the name determination module 305 for recommending the file indicated by the determined second name.
可选地,该第二集合获取模块302包括: Optionally, the second set obtaining module 302 includes:
第二名称获取单元,用于根据关键词与包含所述关键词的文件名称之间的预设对应关系,获取与所述第一关键词集合中包括的所述至少一个关键词对应的至少一个第二名称;a second name obtaining unit, configured to acquire at least one corresponding to the at least one keyword included in the first keyword set according to a preset correspondence between a keyword and a file name including the keyword Second name
第二分词单元,用于对于该至少一个第二名称中的每个第二名称,对该第二名称进行分词,得到第二关键词集合,该第二关键词集合包括该第二名称分词得到的至少一个关键词。a second word segment unit, configured to perform word segmentation on the second name for each second name in the at least one second name, to obtain a second keyword set, where the second keyword set includes the second name word segmentation At least one keyword.
可选地,该装置还包括:Optionally, the device further includes:
第一权重获取模块,用于根据该第一关键词集合中每个关键词的类型和出现频率中的至少一项,获取该第一关键词集合中每个关键词在该第一名称中的权重。a first weight obtaining module, configured to acquire, according to at least one of a type and an appearance frequency of each keyword in the first keyword set, each keyword in the first keyword set in the first name Weights.
可选地,该第一权重获取模块包括:Optionally, the first weight obtaining module includes:
第一权重获取单元,用于根据该第一关键词集合中每个关键词的类型对应的权重级别,按照权重级别从高到低的顺序为该每个关键词分配权重,使得权重级别高的关键词所分配的权重大于权重级别低的关键词所分配的权重;或,a first weight acquiring unit, configured to assign a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weight level is high. The weight assigned to the keyword is greater than the weight assigned to the keyword with a low weight level; or,
第二权重获取单元,用于按照该第一关键词集合中每个关键词的出现频率从高到低的顺序为该每个关键词分配权重,使得出现频率高的关键词所分配的权重大于出现频率低的关键词所分配的权重;或,a second weight obtaining unit, configured to assign a weight to each keyword according to an order of occurrence frequency of each keyword in the first keyword set, so that a keyword with a high frequency of occurrence is assigned a weight greater than The weight assigned to a keyword with a low frequency; or,
第三权重获取单元,用于根据该第一关键词集合中每个关键词的类型对应的权重级别,按照权重级别从高到低的顺序为该每个关键词分配权重,使得权重级别高的关键词所分配的权重大于权重级别低的关键词所分配的权重;a third weight obtaining unit, configured to assign a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weight level is high. The weight assigned to the keyword is greater than the weight assigned by the keyword with a low weight level;
调整单元,用于根据该第一关键词集合中每个关键词的出现频率,对该每个关键词所分配的权重进行调整。The adjusting unit is configured to adjust the weight assigned to each keyword according to the frequency of occurrence of each keyword in the first keyword set.
可选地,该关键词的类型包括名词、动词或虚词,名词的权重级别 高于动词和虚词的权重级别;Optionally, the type of the keyword includes a noun, a verb or a function word, and a weight level of the noun Higher than the weight level of verbs and function words;
该关键词的出现频率为该关键词在已存储的文件名称中出现的频率,或者,该关键词的出现频率为该关键词在已存储的指定类别的文件名称中出现的频率,该指定类别为该当前打开文件所属的类别。The frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name, or the frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name of the specified category, the specified category The category to which the currently open file belongs.
可选地,名词中姓名的权重级别高于其他名词的权重级别。Optionally, the weight of the name in the noun is higher than the weight level of the other noun.
可选地,该名称确定模块305包括:Optionally, the name determining module 305 includes:
权重确定单元,用于根据该每个第二名称包括的匹配关键词在该第一名称中的权重,确定该每个第二名称的权重;a weight determining unit, configured to determine a weight of each second name according to a weight of the matching keyword included in each second name in the first name;
待推荐名称确定单元,用于按照该每个第二名称的权重从大到小的顺序,将预设数目的第二名称确定为该待推荐的第二名称。The to-be-recommended name determining unit is configured to determine a preset number of second names as the second name to be recommended according to the order of the weight of each second name from large to small.
可选地,该权重确定单元用于将该每个第二名称包括的匹配关键词在该第一名称中的权重的和值确定为该每个第二名称的权重;或,Optionally, the weight determining unit is configured to determine, as the weight of each of the second names, the sum of the weights of the matching keywords included in each second name in the first name; or
该权重确定单元用于根据该每个第二名称所指示文件的发布时间,确定该每个第二名称的时间权重,按照预设比例,对该每个第二名称包括的匹配关键词在该第一名称中的权重的和值以及该时间权重进行加权计算,得到加权和值,将该加权和值确定为该每个第二名称的权重。The weight determining unit is configured to determine a time weight of each second name according to a publishing time of the file indicated by each second name, and the matching keyword included in each second name is in the preset ratio. The sum value of the weights in the first name and the time weight are weighted to obtain a weighted sum value, and the weighted sum value is determined as the weight of each of the second names.
本发明实施例提供的装置,通过对当前打开文件的第一名称进行处理,得到多个备选的第二名称,将该第一名称与每个第二名称进行匹配,确定每个第二名称包括的匹配关键词,并根据匹配关键词的词性确定匹配关键词的权重,从而根据匹配关键词的权重从多个备选的第二名称中确定待推荐的第二名称,并推荐该确定的第二名称所指示的文件,提高了最终推荐的文件名称与当前打开文件的名称的相关度,提高了推荐成功率。The device provided by the embodiment of the present invention obtains a plurality of alternative second names by processing the first name of the currently opened file, and matches the first name with each second name to determine each second name. Included matching keywords, and determining a weight of the matching keyword according to the part of speech of the matching keyword, thereby determining a second name to be recommended from the plurality of alternative second names according to the weight of the matching keyword, and recommending the determined The file indicated by the second name improves the relevance of the final recommended file name to the name of the currently open file, and improves the recommendation success rate.
需要说明的是:上述实施例提供的文件推荐装置在推荐文件时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将服务器的内部结构划分成不同 的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的文件推荐装置与文件推荐方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that, when the file recommendation device provided by the foregoing embodiment is used for recommending a file, only the division of the above functional modules is illustrated. In an actual application, the function distribution may be completed by different functional modules as needed. The internal structure of the server is divided into different Functional modules to perform all or part of the functions described above. In addition, the document recommendation device and the file recommendation method embodiment provided by the foregoing embodiments are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
图4是本发明实施例提供的一种服务器结构示意图,该服务器400可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)422(例如,一个或一个以上处理器)和存储器432,一个或一个以上存储应用程序442或数据444的存储介质430(例如一个或一个以上海量存储设备)。其中,存储器432和存储介质430可以是短暂存储或持久存储。存储在存储介质430的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器422可以设置为与存储介质430通信,在服务器400上执行存储介质430中的一系列指令操作。FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present invention. The server 400 may generate a large difference due to different configurations or performances, and may include one or more central processing units (CPUs) 422 (for example, One or more processors) and memory 432, one or more storage media 430 that store application 442 or data 444 (eg, one or one storage device in Shanghai). Among them, the memory 432 and the storage medium 430 may be short-term storage or persistent storage. The program stored on storage medium 430 may include one or more modules (not shown), each of which may include a series of instruction operations in the server. Still further, central processor 422 can be configured to communicate with storage medium 430, executing a series of instruction operations in storage medium 430 on server 400.
服务器400还可以包括一个或一个以上电源426,一个或一个以上有线或无线网络接口450,一个或一个以上输入输出接口458,和/或,一个或一个以上操作系统441,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。 Server 400 may also include one or more power sources 426, one or more wired or wireless network interfaces 450, one or more input and output interfaces 458, and/or one or more operating systems 441, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
上述实施例中所述的由服务器所执行的步骤可以基于该图4所示的服务器结构。The steps performed by the server described in the above embodiments may be based on the server structure shown in FIG.
具体地,在本发明实施例中,该服务器所包括的处理器422可执行存储在存储器432中的程序指令,以执行以下功能:对第一名称进行分词,得到第一关键词集合,所述第一名称为当前打开文件的名称,所述第一关键词集合包括所述第一名称分词得到的至少一个关键词;根据关键词与包含所述关键词的文件名称之间的预设对应关系,获取与所述第一关键词集合中包括的所述至少一个关键词对应的至少一个第二名称,并获取所述至少一个第二名称对应的第二关键词集合;获取所述第一关键词集合和每个 第二名称对应的第二关键词集合中相同的关键词,将所述相同的关键词作为匹配关键词;获取所述每个第二名称包括的匹配关键词在所述第一名称中的权重;根据所述每个第二名称包括的匹配关键词在所述第一名称中的权重,确定待推荐的第二名称;推荐所述确定的第二名称所指示的文件。Specifically, in the embodiment of the present invention, the processor 422 included in the server may execute program instructions stored in the memory 432 to perform the following functions: segmenting the first name to obtain a first keyword set, The first name is a name of the currently open file, the first keyword set includes at least one keyword obtained by the first name word segmentation; and a preset correspondence relationship between the keyword and the file name including the keyword Obtaining at least one second name corresponding to the at least one keyword included in the first keyword set, and acquiring a second keyword set corresponding to the at least one second name; acquiring the first key Word collection and each And the same keyword in the second keyword set corresponding to the second name, the same keyword is used as a matching keyword; and the weight of the matching keyword included in each second name is obtained in the first name Determining a second name to be recommended according to the weight of the matching keyword included in each of the second names in the first name; recommending the file indicated by the determined second name.
可选地,上述获取所述至少一个第二名称对应的第二关键词集合包括:对于所述至少一个第二名称中的每个第二名称,对所述第二名称进行分词,得到第二关键词集合,所述第二关键词集合包括所述第二名称分词得到的至少一个关键词。Optionally, the acquiring the second keyword set corresponding to the at least one second name includes: segmenting the second name for each second name of the at least one second name, and obtaining a second a set of keywords, the second set of keywords including at least one keyword obtained by the second name word segmentation.
可选地,获取所述每个第二名称包括的匹配关键词在所述第一名称中的权重之前,还包括:根据所述第一关键词集合中每个关键词的类型和出现频率中的至少一项,获取所述第一关键词集合中每个关键词在所述第一名称中的权重;获取所述每个第二名称包括的匹配关键词在所述第一名称中的权重包括:从所述第一关键词集合中每个关键词在所述第一名称中的权重中获取所述每个第二名称包括的匹配关键词在所述第一名称中的权重。Optionally, before obtaining the weight of the matching keyword included in each of the second names, the method further includes: according to the type and frequency of occurrence of each keyword in the first keyword set. Obtaining a weight of each keyword in the first keyword set in the first name; obtaining a weight of the matching keyword included in each second name in the first name The method includes: obtaining weights of the matching keywords included in each of the second names in the first name from weights of each keyword in the first keyword set in the first name.
可选地,根据所述第一关键词集合中每个关键词的类型和出现频率中的至少一项,获取所述第一关键词集合中每个关键词在所述第一名称中的权重包括:根据所述第一关键词集合中每个关键词的类型对应的权重级别,按照权重级别从高到低的顺序为所述每个关键词分配权重,使得权重级别高的关键词所分配的权重大于权重级别低的关键词所分配的权重;或,按照所述第一关键词集合中每个关键词的出现频率从高到低的顺序为所述每个关键词分配权重,使得出现频率高的关键词所分配的权重大于出现频率低的关键词所分配的权重;或,根据所述第一关键词集合中每个关键词的类型对应的权重级别,按照权重级别从高到低的顺序为所述每个关键词分配权重,使得权重级别高的关键词所分配的权重大于权重级别低的关键 词所分配的权重;以及根据所述第一关键词集合中每个关键词的出现频率,对所述每个关键词所分配的权重进行调整。Optionally, obtaining, according to at least one of a type and an appearance frequency of each keyword in the first keyword set, a weight of each keyword in the first keyword set in the first name The method includes: assigning a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set, so that a keyword with a high weight level is allocated The weight of the key is greater than the weight assigned by the keyword with a low weight level; or, the weight of each keyword in the first keyword set is assigned a weight according to the frequency of occurrence of each keyword, so that the key appears The key of the high frequency keyword is assigned to the weight assigned by the keyword having a low frequency of occurrence; or, according to the weight level corresponding to the type of each keyword in the first keyword set, according to the weight level from high to low The order is assigned to each of the keywords, so that the key weighted by the keyword with a higher weight level is greater than the key with a low weight level. The weight assigned by the word; and adjusting the weight assigned to each keyword according to the frequency of occurrence of each keyword in the first keyword set.
可选地,根据所述每个第二名称包括的匹配关键词在所述第一名称中的权重,确定待推荐的第二名称包括:根据所述每个第二名称包括的匹配关键词在所述第一名称中的权重,确定所述每个第二名称的权重;按照所述每个第二名称的权重从大到小的顺序,将预设数目的第二名称确定为所述待推荐的第二名称。Optionally, determining, according to the weight of the matching keyword included in each second name in the first name, determining the second name to be recommended includes: matching keywords according to each of the second names a weight in the first name, determining a weight of each of the second names; determining, according to a weight from each of the second names, a preset number of second names as the waiting The second name recommended.
可选地,根据所述每个第二名称包括的匹配关键词在所述第一名称中的权重,确定所述每个第二名称的权重包括:将所述每个第二名称包括的匹配关键词在所述第一名称中的权重的和值确定为所述每个第二名称的权重;或,根据所述每个第二名称所指示文件的发布时间,确定所述每个第二名称的时间权重,按照预设比例,对所述每个第二名称包括的匹配关键词在所述第一名称中的权重的和值以及所述时间权重进行加权计算,得到加权和值,将所述加权和值确定为所述每个第二名称的权重。Optionally, determining, according to the weight of the matching keyword included in each second name in the first name, determining the weight of each of the second names comprises: matching each of the second names The sum value of the weights of the keywords in the first name is determined as the weight of each of the second names; or, according to the publishing time of the file indicated by each of the second names, determining each of the second The time weight of the name, according to a preset ratio, weighting the sum of the weights of the matching keywords included in each of the second names in the first name and the time weights to obtain a weighted sum value, The weighted sum value is determined as the weight of each of the second names.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The above are only the preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are within the spirit and scope of the present invention, should be included in the protection of the present invention. Within the scope.

Claims (16)

  1. 一种文件推荐方法,其特征在于,所述方法包括:A file recommendation method, characterized in that the method comprises:
    对第一名称进行分词,得到第一关键词集合,所述第一名称为当前打开文件的名称,所述第一关键词集合包括所述第一名称分词得到的至少一个关键词;Performing word segmentation on the first name to obtain a first keyword set, the first name is a name of a currently open file, and the first keyword set includes at least one keyword obtained by the first name word segmentation;
    根据关键词与包含所述关键词的文件名称之间的预设对应关系,获取与所述第一关键词集合中包括的所述至少一个关键词对应的至少一个第二名称,并获取所述至少一个第二名称对应的第二关键词集合;Acquiring at least one second name corresponding to the at least one keyword included in the first keyword set according to a preset correspondence between a keyword and a file name including the keyword, and acquiring the At least one second keyword set corresponding to the second name;
    获取所述第一关键词集合和每个第二名称对应的第二关键词集合中相同的关键词,将所述相同的关键词作为匹配关键词;Obtaining the same keyword in the first keyword set and the second keyword set corresponding to each second name, and using the same keyword as a matching keyword;
    获取所述每个第二名称包括的匹配关键词在所述第一名称中的权重;Obtaining a weight of the matching keyword included in each of the second names in the first name;
    根据所述每个第二名称包括的匹配关键词在所述第一名称中的权重,确定待推荐的第二名称;Determining a second name to be recommended according to a weight of the matching keyword included in each second name in the first name;
    推荐所述确定的第二名称所指示的文件。The file indicated by the determined second name is recommended.
  2. 根据权利要求1所述的方法,其特征在于,获取所述至少一个第二名称对应的第二关键词集合包括:The method according to claim 1, wherein the acquiring the second keyword set corresponding to the at least one second name comprises:
    对于所述至少一个第二名称中的每个第二名称,对所述第二名称进行分词,得到第二关键词集合,所述第二关键词集合包括所述第二名称分词得到的至少一个关键词。For each of the at least one second name, segmenting the second name to obtain a second keyword set, the second keyword set including at least one of the second name word segmentation Key words.
  3. 根据权利要求1所述的方法,其特征在于,获取所述每个第二名称包括的匹配关键词在所述第一名称中的权重之前,所述方法还包括:The method according to claim 1, wherein before the weighting of the matching keyword included in each of the second names in the first name, the method further comprises:
    根据所述第一关键词集合中每个关键词的类型和出现频率中的至少一项,获取所述第一关键词集合中每个关键词在所述第一名称中的权重;Obtaining a weight of each keyword in the first keyword set in the first name according to at least one of a type and an appearance frequency of each keyword in the first keyword set;
    获取所述每个第二名称包括的匹配关键词在所述第一名称中的权重包括:从所述第一关键词集合中每个关键词在所述第一名称中的权重中获取所述每个第二名称包括的匹配关键词在所述第一名称中的权重。 Obtaining the weight of the matching keyword included in each of the second names in the first name includes: obtaining, according to the weight of each keyword in the first name, the weight in the first name Each second name includes a weight of the matching keyword in the first name.
  4. 根据权利要求3所述的方法,其特征在于,根据所述第一关键词集合中每个关键词的类型和出现频率中的至少一项,获取所述第一关键词集合中每个关键词在所述第一名称中的权重包括:The method according to claim 3, wherein each keyword in the first keyword set is obtained according to at least one of a type and an appearance frequency of each keyword in the first keyword set The weights in the first name include:
    根据所述第一关键词集合中每个关键词的类型对应的权重级别,按照权重级别从高到低的顺序为所述每个关键词分配权重,使得权重级别高的关键词所分配的权重大于权重级别低的关键词所分配的权重;或,Assigning a weight to each of the keywords according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weights assigned by the keywords with high weight levels are assigned weights a weight greater than a keyword assigned to a keyword with a lower weight level; or,
    按照所述第一关键词集合中每个关键词的出现频率从高到低的顺序为所述每个关键词分配权重,使得出现频率高的关键词所分配的权重大于出现频率低的关键词所分配的权重;或,Assigning a weight to each keyword according to an order of occurrence frequency of each keyword in the first keyword set, so that a keyword with a high frequency of occurrence is assigned a key weight greater than a keyword having a low frequency of occurrence The weight assigned; or,
    根据所述第一关键词集合中每个关键词的类型对应的权重级别,按照权重级别从高到低的顺序为所述每个关键词分配权重,使得权重级别高的关键词所分配的权重大于权重级别低的关键词所分配的权重;以及根据所述第一关键词集合中每个关键词的出现频率,对所述每个关键词所分配的权重进行调整。Assigning a weight to each of the keywords according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weights assigned by the keywords with high weight levels are assigned weights a weight greater than a keyword assigned to a keyword having a low weight level; and adjusting a weight assigned to each keyword according to an appearance frequency of each keyword in the first keyword set.
  5. 根据权利要求3所述的方法,其特征在于,所述关键词的类型包括名词、动词或虚词,名词的权重级别高于动词和虚词的权重级别;The method according to claim 3, wherein the type of the keyword comprises a noun, a verb or a function word, and the weight level of the noun is higher than the weight level of the verb and the function word;
    所述关键词的出现频率为所述关键词在已存储的文件名称中出现的频率,或者,所述关键词的出现频率为所述关键词在已存储的指定类别的文件名称中出现的频率,所述指定类别为所述当前打开文件所属的类别。The frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name, or the frequency of occurrence of the keyword is the frequency at which the keyword appears in the file name of the specified category that has been stored. The specified category is a category to which the currently open file belongs.
  6. 根据权利要求5所述的方法,其特征在于,名词中姓名的权重级别高于其他名词的权重级别。The method of claim 5 wherein the name of the noun has a higher weight level than the other noun.
  7. 根据权利要求1所述的方法,其特征在于,根据所述每个第二名称包括的匹配关键词在所述第一名称中的权重,确定待推荐的第二名称包括:The method according to claim 1, wherein determining the second name to be recommended according to the weight of the matching keyword included in each of the second names in the first name comprises:
    根据所述每个第二名称包括的匹配关键词在所述第一名称中的权重,确定所述每个第二名称的权重; Determining weights of each of the second names according to weights of the matching keywords included in each of the second names in the first name;
    按照所述每个第二名称的权重从大到小的顺序,将预设数目的第二名称确定为所述待推荐的第二名称。And determining, according to the weight of each of the second names, a preset number of second names as the second name to be recommended.
  8. 根据权利要求7所述的方法,其特征在于,根据所述每个第二名称包括的匹配关键词在所述第一名称中的权重,确定所述每个第二名称的权重包括:The method according to claim 7, wherein determining the weight of each of the second names according to the weight of the matching keywords included in each of the second names in the first name comprises:
    将所述每个第二名称包括的匹配关键词在所述第一名称中的权重的和值确定为所述每个第二名称的权重;或,Determining a sum of weights of the matching keywords included in each of the second names in the first name as a weight of each of the second names; or
    根据所述每个第二名称所指示文件的发布时间,确定所述每个第二名称的时间权重,按照预设比例,对所述每个第二名称包括的匹配关键词在所述第一名称中的权重的和值以及所述时间权重进行加权计算,得到加权和值,将所述加权和值确定为所述每个第二名称的权重。Determining, according to a release time of the file indicated by each second name, a time weight of each of the second names, and according to a preset ratio, matching keywords included in each of the second names are in the first The sum value of the weights in the name and the time weight are weighted to obtain a weighted sum value, and the weighted sum value is determined as the weight of each of the second names.
  9. 一种文件推荐装置,其特征在于,所述装置包括:A document recommendation device, characterized in that the device comprises:
    第一分词模块,用于对第一名称进行分词,得到第一关键词集合,所述第一名称为当前打开文件的名称,所述第一关键词集合包括所述第一名称分词得到的至少一个关键词;a first participle module, configured to perform word segmentation on the first name, to obtain a first keyword set, the first name is a name of a currently open file, and the first keyword set includes at least the first name word segmentation obtained a keyword;
    第二集合获取模块,用于根据关键词与包含所述关键词的文件名称之间的预设对应关系,获取与所述第一关键词集合中包括的所述至少一个关键词对应的至少一个第二名称,并获取所述至少一个第二名称对应的第二关键词集合;a second set obtaining module, configured to acquire at least one corresponding to the at least one keyword included in the first keyword set according to a preset correspondence between a keyword and a file name including the keyword a second name, and acquiring a second keyword set corresponding to the at least one second name;
    匹配模块,用于获取所述第一关键词集合和每个第二名称对应的第二关键词集合中相同的关键词,将所述相同的关键词作为匹配关键词;a matching module, configured to acquire the same keyword in the first keyword set and the second keyword set corresponding to each second name, and use the same keyword as a matching keyword;
    权重获取模块,用于获取所述每个第二名称包括的匹配关键词在所述第一名称中的权重;a weight obtaining module, configured to obtain a weight of the matching keyword included in each of the second names in the first name;
    名称确定模块,用于根据所述每个第二名称包括的匹配关键词在所述第一名称中的权重,确定待推荐的第二名称;a name determining module, configured to determine a second name to be recommended according to a weight of the matching keyword included in each second name in the first name;
    推荐模块,用于推荐所述确定的第二名称所指示的文件。 A recommendation module for recommending the file indicated by the determined second name.
  10. 根据权利要求9所述的装置,其特征在于,所述第二集合获取模块包括:The apparatus according to claim 9, wherein the second set acquisition module comprises:
    第二名称获取单元,用于根据所述关键词与包含所述关键词的文件名称之间的预设对应关系,获取与所述第一关键词集合中包括的所述至少一个关键词对应的所述至少一个第二名称;a second name obtaining unit, configured to acquire, according to a preset correspondence between the keyword and a file name including the keyword, a corresponding to the at least one keyword included in the first keyword set The at least one second name;
    第二分词单元,用于对于所述至少一个第二名称中的每个第二名称,对所述第二名称进行分词,得到第二关键词集合,所述第二关键词集合包括所述第二名称分词得到的至少一个关键词。a second word segment unit, configured to perform word segmentation on the second name for each second name in the at least one second name to obtain a second keyword set, where the second keyword set includes the first At least one keyword obtained by the second name participle.
  11. 根据权利要求9所述的装置,其特征在于,所述装置还包括:The device according to claim 9, wherein the device further comprises:
    第一权重获取模块,用于根据所述第一关键词集合中每个关键词的类型和出现频率中的至少一项,获取所述第一关键词集合中每个关键词在所述第一名称中的权重。a first weight obtaining module, configured to acquire, according to at least one of a type and an appearance frequency of each keyword in the first keyword set, each keyword in the first keyword set in the first The weight in the name.
  12. 根据权利要求11所述的装置,其特征在于,所述第一权重获取模块包括:The apparatus according to claim 11, wherein the first weight acquisition module comprises:
    第一权重获取单元,用于根据所述第一关键词集合中每个关键词的类型对应的权重级别,按照权重级别从高到低的顺序为所述每个关键词分配权重,使得权重级别高的关键词所分配的权重大于权重级别低的关键词所分配的权重;或,a first weight obtaining unit, configured to assign a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weight level is High keywords are assigned weights that are greater than those assigned to keywords with low weight levels; or,
    第二权重获取单元,用于按照所述第一关键词集合中每个关键词的出现频率从高到低的顺序为所述每个关键词分配权重,使得出现频率高的关键词所分配的权重大于出现频率低的关键词所分配的权重;或,a second weight obtaining unit, configured to assign a weight to each keyword according to an order of occurrence frequency of each keyword in the first keyword set, so that a keyword with a high frequency is allocated Weight is greater than the weight assigned to keywords with low frequency; or,
    第三权重获取单元,用于根据所述第一关键词集合中每个关键词的类型对应的权重级别,按照权重级别从高到低的顺序为所述每个关键词分配权重,使得权重级别高的关键词所分配的权重大于权重级别低的关键词所分配的权重;a third weight obtaining unit, configured to assign a weight to each keyword according to a weight level corresponding to a type of each keyword in the first keyword set, so that the weight level is High keywords are assigned weights that are greater than those assigned to keywords with low weight levels;
    调整单元,用于根据所述第一关键词集合中每个关键词的出现频率,对所述每个关键词所分配的权重进行调整。 The adjusting unit is configured to adjust the weight assigned to each keyword according to the frequency of occurrence of each keyword in the first keyword set.
  13. 根据权利要求11所述的装置,其特征在于,所述关键词的类型包括名词、动词或虚词,名词的权重级别高于动词和虚词的权重级别;The apparatus according to claim 11, wherein the type of the keyword comprises a noun, a verb or a function word, and the weight level of the noun is higher than the weight level of the verb and the function word;
    所述关键词的出现频率为所述关键词在已存储的文件名称中出现的频率,或者,所述关键词的出现频率为所述关键词在已存储的指定类别的文件名称中出现的频率,所述指定类别为所述当前打开文件所属的类别。The frequency of occurrence of the keyword is the frequency at which the keyword appears in the stored file name, or the frequency of occurrence of the keyword is the frequency at which the keyword appears in the file name of the specified category that has been stored. The specified category is a category to which the currently open file belongs.
  14. 根据权利要求13所述的装置,其特征在于,名词中姓名的权重级别高于其他名词的权重级别。The apparatus according to claim 13, wherein the name of the noun has a higher weight level than the other noun.
  15. 根据权利要求9所述的装置,其特征在于,所述名称确定模块包括:The device according to claim 9, wherein the name determining module comprises:
    权重确定单元,用于根据所述每个第二名称包括的匹配关键词在所述第一名称中的权重,确定所述每个第二名称的权重;a weight determining unit, configured to determine a weight of each of the second names according to weights of the matching keywords included in each of the second names in the first name;
    待推荐名称确定单元,用于按照所述每个第二名称的权重从大到小的顺序,将预设数目的第二名称确定为所述待推荐的第二名称。The to-be-recommended name determining unit is configured to determine a preset number of second names as the second name to be recommended according to the order in which the weights of the second names are from large to small.
  16. 根据权利要求15所述的装置,其特征在于,所述权重确定单元用于将所述每个第二名称包括的匹配关键词在所述第一名称中的权重的和值确定为所述每个第二名称的权重;或,The apparatus according to claim 15, wherein the weight determining unit is configured to determine a sum value of weights of the matching keywords included in each of the second names in the first name as the each The weight of the second name; or,
    所述权重确定单元用于根据所述每个第二名称所指示文件的发布时间,确定所述每个第二名称的时间权重,按照预设比例,对所述每个第二名称包括的匹配关键词在所述第一名称中的权重的和值以及所述时间权重进行加权计算,得到加权和值,将所述加权和值确定为所述每个第二名称的权重。 The weight determining unit is configured to determine a time weight of each of the second names according to a publishing time of the file indicated by each of the second names, and include a match for each of the second names according to a preset ratio. The sum of the weights of the keywords in the first name and the time weights are weighted to obtain a weighted sum value, and the weighted sum value is determined as the weight of each of the second names.
PCT/CN2015/072103 2013-12-05 2015-02-02 File recommendation method and device WO2015081909A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310652678.3A CN104699696B (en) 2013-12-05 2013-12-05 File recommendation method and device
CN201310652678.3 2013-12-05

Publications (1)

Publication Number Publication Date
WO2015081909A1 true WO2015081909A1 (en) 2015-06-11

Family

ID=53272920

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/072103 WO2015081909A1 (en) 2013-12-05 2015-02-02 File recommendation method and device

Country Status (2)

Country Link
CN (1) CN104699696B (en)
WO (1) WO2015081909A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3322194A4 (en) * 2015-07-06 2018-05-30 Tencent Technology (Shenzhen) Company Limited Video recommendation method, server and storage medium
CN109144954A (en) * 2018-09-18 2019-01-04 天津字节跳动科技有限公司 Edit resource recommendation method, device and the electronic equipment of document
EP3341861A4 (en) * 2015-08-24 2019-03-06 Google LLC Video recommendation based on video titles

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205159B (en) * 2015-09-29 2020-06-02 陈中和 Device and method for automatically feeding back information
CN106708858A (en) 2015-11-13 2017-05-24 阿里巴巴集团控股有限公司 Information recommendation method and device
CN110020132B (en) * 2017-11-03 2023-04-11 腾讯科技(北京)有限公司 Keyword recommendation method and device, computing equipment and storage medium
CN107832405A (en) * 2017-11-03 2018-03-23 北京小度互娱科技有限公司 The method and apparatus for calculating the correlation between title
CN108256010A (en) * 2018-01-03 2018-07-06 阿里巴巴集团控股有限公司 Content recommendation method and device
CN109240991B (en) * 2018-09-26 2021-07-30 Oppo广东移动通信有限公司 File recommendation method and device, storage medium and intelligent terminal
CN112256843B (en) * 2020-12-22 2021-04-20 华东交通大学 News keyword extraction method and system based on TF-IDF method optimization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079894A (en) * 2006-12-21 2007-11-28 腾讯科技(深圳)有限公司 A system and method for pushing network information
US20110218994A1 (en) * 2010-03-05 2011-09-08 International Business Machines Corporation Keyword automation of video content
CN103106208A (en) * 2011-11-11 2013-05-15 中国移动通信集团公司 Streaming media content recommendation method and system in mobile internet
CN103186550A (en) * 2011-12-27 2013-07-03 盛乐信息技术(上海)有限公司 Method and system for generating video-related video list

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102760124B (en) * 2011-04-25 2014-11-12 阿里巴巴集团控股有限公司 Pushing method and system for recommended data
CN102789453B (en) * 2011-05-16 2015-12-02 阿里巴巴集团控股有限公司 Advertising message put-on method and device
CN102799589B (en) * 2011-05-25 2016-05-11 阿里巴巴集团控股有限公司 A kind of information-pushing method and device
CN103164405A (en) * 2011-12-08 2013-06-19 盛乐信息技术(上海)有限公司 Generation method for relevant video data bank, recommendation method and recommendation system for relevant videos
CN103365899B (en) * 2012-04-01 2017-10-20 深圳市世纪光速信息技术有限公司 The problem of in a kind of Ask-Answer Community, recommends method and system
CN103425687A (en) * 2012-05-21 2013-12-04 阿里巴巴集团控股有限公司 Retrieval method and system based on queries

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079894A (en) * 2006-12-21 2007-11-28 腾讯科技(深圳)有限公司 A system and method for pushing network information
US20110218994A1 (en) * 2010-03-05 2011-09-08 International Business Machines Corporation Keyword automation of video content
CN103106208A (en) * 2011-11-11 2013-05-15 中国移动通信集团公司 Streaming media content recommendation method and system in mobile internet
CN103186550A (en) * 2011-12-27 2013-07-03 盛乐信息技术(上海)有限公司 Method and system for generating video-related video list

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3322194A4 (en) * 2015-07-06 2018-05-30 Tencent Technology (Shenzhen) Company Limited Video recommendation method, server and storage medium
EP3341861A4 (en) * 2015-08-24 2019-03-06 Google LLC Video recommendation based on video titles
US10387431B2 (en) 2015-08-24 2019-08-20 Google Llc Video recommendation based on video titles
CN109144954A (en) * 2018-09-18 2019-01-04 天津字节跳动科技有限公司 Edit resource recommendation method, device and the electronic equipment of document
CN109144954B (en) * 2018-09-18 2021-03-16 北京字节跳动网络技术有限公司 Resource recommendation method and device for editing document and electronic equipment

Also Published As

Publication number Publication date
CN104699696B (en) 2018-12-28
CN104699696A (en) 2015-06-10

Similar Documents

Publication Publication Date Title
WO2015081909A1 (en) File recommendation method and device
US9704185B2 (en) Product recommendation using sentiment and semantic analysis
WO2019105432A1 (en) Text recommendation method and apparatus, and electronic device
JP6838098B2 (en) Knowledge panel contextualizing
CN105653705B (en) Hot event searching method and device
US20160188997A1 (en) Selecting a High Valence Representative Image
KR101506380B1 (en) Infinite browse
WO2018072071A1 (en) Knowledge map building system and method
TWI522819B (en) Methods and apparatus for performing an internet search
US20150293928A1 (en) Systems and Methods for Generating Personalized Video Playlists
CN104598505B (en) Multimedia resource recommends method and device
US8244751B2 (en) Information processing apparatus and presenting method of related items
TWI540448B (en) Methods and apparatus for selecting a search engine to which to provide a search query
WO2015081915A1 (en) File recommendation method and device
JP6405704B2 (en) Information processing apparatus, information processing method, and program
CN104021140B (en) A kind of processing method and processing device of Internet video
US20200192951A1 (en) Personalized search result rankings
US9563704B1 (en) Methods, systems, and media for presenting suggestions of related media content
US11797590B2 (en) Generating structured data for rich experiences from unstructured data streams
US20200257724A1 (en) Methods, devices, and storage media for content retrieval
JP2023062173A (en) Video generation method and apparatus of the same, and neural network training method and apparatus of the same
US20130317951A1 (en) Auto-annotation of video content for scrolling display
WO2023278256A1 (en) Navigating content by relevance
CN109189899A (en) Content interest acquisition and content-data recommended method, device, equipment/terminal/server and storage medium
CN108140034B (en) Selecting content items based on received terms using a topic model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15725968

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 26.10.2016)

122 Ep: pct application non-entry in european phase

Ref document number: 15725968

Country of ref document: EP

Kind code of ref document: A1