CN104408115B

CN104408115B - The heterogeneous resource based on semantic interlink recommends method and apparatus on a kind of TV platform

Info

Publication number: CN104408115B
Application number: CN201410687895.0A
Authority: CN
Inventors: 郑玄; 陈洁
Original assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Current assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Priority date: 2014-11-25
Filing date: 2014-11-25
Publication date: 2017-09-22
Anticipated expiration: 2034-11-25
Also published as: CN104408115A; KR102314645B1; KR20160062667A

Abstract

The invention discloses resource recommendation method and device based on semantic interlink on a kind of TV platform, this method includes：Extract the text message of all media resources of backstage media source library；The candidate feature word of the media resource is extracted according to the text message of each media resource, calculate the weights of the candidate feature word, the candidate feature word is carried out according to the weights to be filtrated to get Feature Words, the Feature Words weight matrix T of backstage media source library is generated；If the current media asset of user's viewing is the media resource in the backstage media source library, the cluster similarity of each media resource and current media asset in the backstage media source library is then calculated using the Feature Words weight matrix T using the method for cluster, L media resource generation media resource recommendation list of cluster similarity highest is chosen.

Description

Semantic link-based heterogeneous resource recommendation method and device on television platform

Technical Field

The invention relates to the technical field of multimedia, in particular to a semantic link-based heterogeneous resource recommendation method and device on a television platform.

Background

When a user watches a television program on a television platform, the user usually has an interest in some information of the current program and wants to watch other media resources related to the current program. For the psychology of a user, some recommendation methods among media resources are currently available, and generally, a keyword of a current resource is obtained according to the current resource watched by the user to represent user characteristics, and the obtained keyword is used as a vector for representing the user characteristics, so that the resource with high similarity to the current resource is recommended to the user.

However, there are many disadvantages to the existing recommendation methods among these media resources, such as: the recommendation among the similar resources is mostly carried out, and the recommendation among the heterogeneous resources is less in application; few heterogeneous resource recommendations are one-way recommendations, that is, from one resource to another resource, such as a video source recommendation method associated with a television program, a product recommendation method associated with the television program, and the like, and the methods for recommending various resources are few; the words playing an important role in the resource recommendation method are partially recognizable, partially unrecognizable and manually constructed, and are complicated to operate; limited to morphological information, lacking semantic information; depending on manual labeling, the recommendation results are not ideal for the user due to the lack of use of feedback from the user.

Disclosure of Invention

In view of the above, the invention provides a method and a device for recommending heterogeneous resources on a television platform based on semantic links, which can automatically and intelligently recommend heterogeneous resources according to the resources currently watched by a user without additional operation of the user.

The technical scheme provided by the invention is as follows:

a semantic link-based heterogeneous resource recommendation method on a television platform comprises the following steps:

extracting text information of all media resources in a background media resource library;

extracting candidate characteristic words of each media resource according to the text information of the media resource, calculating the weight of the candidate characteristic words, filtering the candidate characteristic words according to the weight to obtain characteristic words, and generating a characteristic word weight matrix T of a background media resource library;

if the current media resources watched by the user are the media resources in the background media resource library, calculating the clustering similarity of each media resource in the background media resource library and the current media resources by using the characteristic word weight matrix T by adopting a clustering method, and selecting L media resources with the highest clustering similarity to generate a media resource recommendation list, wherein L is an integer greater than 0.

A semantic link-based heterogeneous resource recommendation device on a television platform comprises:

the text information extraction module is used for extracting the text information of all the media resources in the background media resource library;

the characteristic word extraction module is used for extracting candidate characteristic words of each media resource according to the text information of the media resource, calculating the weight of the candidate characteristic words, filtering the candidate characteristic words according to the weight to obtain characteristic words, and generating a characteristic word weight matrix T of a background media resource library;

and if the current media resource watched by the user is the media resource in the background media resource library, calculating the clustering similarity between each media resource in the background media resource library and the current media resource by using the feature word weight matrix T by adopting a clustering method, and selecting L media resources with the highest clustering similarity to generate a media resource recommendation list, wherein L is an integer greater than 0.

In summary, the semantic link-based heterogeneous resource recommendation method and device on the television platform provided by the invention map various heterogeneous resources into the same semantic space by relying on mass data resources, automatically construct semantic relationships between heterogeneous resources, and generate semantic link relationships between text to video, video to text and other heterogeneous resources, thereby generating a heterogeneous resource recommendation list.

Drawings

FIG. 1 is a flow chart of a first embodiment of the method of the present invention;

FIG. 2 is a flow chart of a second embodiment of the method of the present invention;

fig. 3 is a diagram showing a structure of an apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.

When a user watches the current media resource on the television platform, the heterogeneous resource recommendation method based on semantic link on the television platform can provide L background media resources with higher relevance with the current media resource for the user according to the clustering similarity between various heterogeneous resources in the background media resource library and the current media resource watched by the user, and is convenient for the user to watch the background media resource relevant with the current media resource.

Method embodiment one

Fig. 1 is a flowchart of an embodiment of the present invention, as shown in fig. 1, including the following steps:

step 101: and extracting text information of all media resources in the background media resource library.

In this step, firstly, text information is extracted from all media resources in the background media resource library. Using each media resource in background media resource library by D_iAnd expressing that i is a positive integer, i is more than or equal to 1 and less than or equal to N, and N is the number of the media resources contained in the background media resource library.

All media resources of the background media asset library can be divided into two broad categories: news text and video assets. For news texts, directly extracting text information; for video resources, text information is located in a video title and subtitle content, the video title is relatively easy to obtain, and the subtitle content is identified by two methods: one is self-contained subtitles in the play stream, and the subtitles can be extracted from the play stream; and the other method is to process the image, complete the subtitle extraction by positioning the position of the subtitle in the image and integrate the subtitle into a corresponding video description text.

And extracting text information of all media resources in the background media resource library, and representing each media resource in a text form.

Step 102: and extracting candidate characteristic words of each media resource in the background media resource library.

In step 101, the text information of each media resource in the background media resource library is obtained, and in this step, the text information obtained in step 101 is further processed to obtain candidate feature words of each media resource, and the candidate feature words of the media resources can represent the content of the media resource to a certain extent.

Firstly, segmenting the text information of each media resource into a plurality of participles by utilizing a lexical analysis tool according to different parts of speech to obtain a participle sequence of each media resource. Because the lexical analysis tool only segments the text information according to the judgment of the part of speech, and does not consider the importance degree of the segmented participle pair for representing the media resource and the semantic relationship of each participle between the context in the text information of the media resource, the segmentation process may obtain some participles without practical significance, such as "in", "will", and the like, and may also segment an originally integral word string into two or more participles, such as "search fox video" into three participles of "search fox", "video", and the original "search fox video" should be taken as an integral word string to represent the media resource.

Aiming at the defect of a lexical analysis tool, the obtained participles cannot be directly used as candidate characteristic words of each media resource, the obtained participles need to be matched with a hot word dictionary, the obtained participles are corrected by the hot word dictionary, a plurality of participles containing relations in the hot word dictionary are combined according to the longest word string, and the combined participles are used as the candidate characteristic words of the media resources. For example, the word segmentation sequence of a certain media resource includes three word segmentations of "search", "fox" and "video", and the hot word dictionary includes four hot words of "search", "fox", "video" and "fox search video", then the three word segmentations of "search", "fox" and "video" in the media resource are merged according to the longest word string "fox search video" in the hot word dictionary, so as to obtain a candidate feature word "fox search video" of the media resource. In specific implementation, the word segmentation sequence of each media resource can be matched with the hot word dictionary by adopting a dictionary tree method. After the obtained segmented words are corrected by the hot word dictionary, the corrected segmented words can better accord with the reading habit of people.

The hot word dictionary is a hot word set, the hot words in the hot word dictionary can represent semantic information of a background media resource library, and the construction method comprises the following steps:

(1) and according to the language types of the text information of all the media resources in the background media resource library, selecting separators in specific language types to split the text information of all the media resources in the background media resource library into clauses, such as Chinese. ","! ","? "Chinese punctuation marks, or", "? The English punctuation marks such as "" and "".

(2) Calculating the word frequency of each repeated word string in the background media resource library, defining the word frequency of the repeated word string as how many clauses of the repeated word string appear in the background media resource library, and taking each repeated word string with the word frequency larger than a word frequency threshold value as a candidate word string to construct a candidate word string set.

(3) And filtering the candidate word strings, and taking the candidate word strings reserved after filtering as hot words to construct a hot word dictionary.

The specific filtering method can be realized by the following three steps:

a. and collecting the deactivation word list, and filtering the candidate word strings by using the deactivation word list, namely deleting the candidate word strings in the deactivation word list from the candidate word string set.

b. Calculating a weight value of each candidate word string, wherein the weight value is represented by a word Frequency (TF, Term Frequency) -Inverse Document Frequency (IDF), and the candidate word strings with the weight values lower than a weight value threshold value are deleted from the candidate word string set.

c. And (3) establishing prior knowledge according to the type of noise data in the candidate word string, for example, noise strings consisting of time information, numbers, quantifiers and the like frequently appear in text information, and deleting the noise strings from the candidate word string set.

Step 103: and further extracting the characteristic words of each media resource in the background media resource library.

The method comprises the following steps of extracting the characteristic words of each media resource in the background media resource library, and representing each media resource by at least one characteristic word. The method for extracting the media resource feature words comprises the following steps:

calculating the weight of the candidate feature words of each media resource in the background media resource library obtained in step 102, still expressing the weight by the TF-IDF value of the candidate feature words, deleting the candidate feature words with the weight less than the weight threshold, further filtering the candidate feature words with the weight not less than the weight threshold through a deactivation table, and finally reserving the candidate feature words of the media resources as the feature words of the media resources.

Defining the characteristic words of all media resources in the background media resource library as the characteristic words of the background media resource library, and expressing the characteristic word vector of the background media resource library as C ═ C₁,…，c_j,,…,c_M]Wherein c is_jThe characteristic words are the jth characteristic word of the background media resource library, M is the quantity of the characteristic words of the background media resource library, the characteristic words of the background media resource library comprise the characteristic words of each media resource, and any two characteristic words of the background media resource library are different.

Setting a weight matrix T of the characteristic words of M × N, wherein the row number M of the matrix represents the characteristic word c of the background media resource library_jThe number of columns N represents the media resources D of the background media resource library_iQuantity of, element T of the feature word weight matrix T_jiRepresentation feature word c_jOn media asset D_iThe weight value of (1) is taken as the feature word c_jIs a media asset D_iWhen the feature word of (1), t_jiIs a characteristic word c_jOn media asset D_iTF-IDF value of (1); when the feature word c_jIs not a media asset D_iWhen the feature word of (1), t_ji＝0。

Step 104: and carrying out singular value decomposition on the feature word weight matrix T.

In order to mine the semantic relation among the characteristic words of the background media resource library, singular value decomposition is carried out on the characteristic word weight matrix T, and three matrixes S, V, U containing the semantic relation are obtained after the singular value decomposition^TAnd T ═ SVU^T. Wherein, U^TThe method is characterized in that the method is a characteristic word weight matrix after dimension reduction of the characteristic word weight matrix T through singular value decomposition, the singular value decomposition can realize theme extraction, the weights of words with the same theme are consistent in a certain range, and the singular value decomposition can find the implicit semantic relation between characteristic words and characteristic words in the characteristic word weight matrix T.

Step 105: and judging whether the current media resources watched by the user are the media resources of the background media resource library, if not, executing step 106, and if so, executing step 107.

Step 106: and calculating the weight vector of the current media resource.

In this step, the text information of the current media resource watched by the user is first obtained, and the obtaining method is the same as the method for obtaining the text information of each media resource in the background media resource library in step 101, and is not described herein again. After obtaining text information of the current media resource, extracting candidate feature words of the current media resource (the extraction method is the same as the method for obtaining the candidate feature words of the background media resource library in step 102), then matching the candidate feature words of the current media resource with the feature word vector C, if a certain candidate feature word of the current media resource is not an element of the feature word vector C, deleting the candidate feature words of the current media resource, further performing weight calculation on the retained candidate feature words, still expressing the weight by using a TF-IDF value, deleting the candidate feature words with the weight smaller than the weight threshold, further filtering the candidate feature words with the weight not smaller than the weight threshold by using a deactivation table, and finally taking the retained candidate feature words as the feature words of the current media resource.

Constructing a weight vector Y of the current media resource, wherein Y is an M × 1 matrix and an element Y of the matrix_j(j is more than or equal to 1 and less than or equal to M) as a characteristic word c_jWeight in the current media resource, when the feature word c_jIs a feature word of the current media asset, y_jIs a characteristic word c_jA TF-IDF value in the current media resource; when the feature word c_jWhen not the feature word of the current media resource, y_j＝0。

The matrix Y is transformed as follows: y1 ═ Y^TSV^-1Wherein Y is^TTransposed matrix of Y, V^-1Is the inverse matrix of V.

Step 107: and generating a media resource recommendation list by adopting a clustering method.

In order to enable the media recommendation list to more accurately capture the interest of the user, the media resource recommendation list is generated by adopting a clustering method, so that the requirements of the user on diversity and relevance are met.

In this step, the feature words of the current media resources are defined as specific feature words, and the media with weights not equal to 0 on all the specific feature words in the background media resource libraryResource composition background media resource collections

Aggregating background media resources using a K-means algorithmClustering is carried out, wherein K in the K-means algorithm takes the number of specific characteristic words, and background media resources are gatheredDivision into K classes

Go throughThe cluster similarity of each background media asset to the current media asset,middle background media resource D_jThe clustering similarity with the current media resource D' is calculated by the following formula:

wherein, the background media resource D_jSimilarity Sim (D) with current media asset D_jD') is calculated using cosine similarity:

wherein if the current media resource D' is not a resource in the background media resource library, u_jkIs D_jAt U^TOf the corresponding jth row and kth column element, y_kThe corresponding kth column element in Y1 for D'; if the current media asset D' is a resource in the background media asset library, i.e. D ═ D_dWhere d is not equal to j and d is not less than 1 and not more than N, then u_jkIs D_jAt U^TOf the corresponding jth row and kth column element, y_kIs D' in U^TCorresponding to the kth column element of the d-th row.

According to clustering similarity pairsAnd sequencing all background media resources, selecting the first L background media resources to form a recommendation list and returning the recommendation list to the user as the L background media resources recommended to the user and having the maximum correlation with the current media resources, wherein L is an integer larger than 0.

Step 108: and updating the background media resource library.

In this step, if the current media resource watched by the user is a media resource in the background media resource library, the background media resource library does not need to be updated, and the feature word weight matrix T of the background media resource library is not changed; if the current media resource watched by the user is not the media resource in the background media resource library, the current media resource D' is taken as D_N+1Adding the updated background media resource library into a background media resource library, wherein the updated background media resource library comprises N +1 media resources, correspondingly updating the feature word weight matrix T of the background media resource library, updating the T into an M × (N +1) -dimensional matrix, namely, adding a row of the original feature word weight matrix T, wherein the added row of elements is Y in step 105, and when a media resource recommendation list of other current media resources is generated again for the user in the following step, the background media resource library comprises the N +1 media resources, without executing step 101 to step 103 again, and executing step 104 directly.

The method completes heterogeneous resource recommendation of the current media resources watched by the user on the television platform, and the recommendation list obtained by the scheme meets the requirement of the user on information diversification.

Method embodiment two

Furthermore, in order to make the semantic relevance between the heterogeneous resources recommended to the user and the current media resources higher, the method further adjusts the weight of the clicked media resource feature words in the media resource recommendation list by combining implicit user feedback information such as the click rate and the click sequence of different users on the media resources in the media resource recommendation list, so that the interest of the user can be more approached when the media resource recommendation list is calculated for the user again in the following process. FIG. 2 is a flowchart illustrating the present embodiment of adjusting media resources R in a media resource recommendation list_lFor example, where L is a positive integer, and L is greater than or equal to 1 and less than or equal to L, as shown in fig. 2, the user performs the following steps each time the user clicks one media resource in the media resource recommendation list:

step 201: the scores of the individual users for the media assets are calculated.

The user can select one or more of the media resource recommendation lists according to the interest of the user to click and watch, and when the user clicks the media resources in a certain recommendation list, a click sequence can be generated for the clicked media resources. User to media resource R_lThe click sequence of (C) is denoted as rank (R)_l) Due to R_lIs a media resource in the recommendation list containing L media resources, therefore, the click sequence of the media resource is necessary to satisfy 1 ≦ rank (R)_l) Less than or equal to L. Applying a formula according to the click sequenceComputing a single user pair R_lAnd scoring, wherein Score _ max is a constant used for limiting the maximum value of the scoring of the media resources by the single user.

Step 202: a current total score of the media assets is calculated.

Media resource R_lThe current total score is defined as the current total score of all users for the media resource R_lThe sum of the scores of (a). Suppose there are currently a total of P users clicking the mediaBody resource R_lEach user will be assigned to a media resource R_lGenerating a score, thenIs the media resource R_lThe current total score.

Step 203: and judging whether the current total score of the media resources is larger than a score threshold value, if not, executing the step 204, and if so, executing the step 205.

In this step, P is R of the current click media resource_lNumber of users, if media resource R_lThe current total score is not greater than the score thresholdIndicate a click on media asset R_lIs less, and/or the user clicks on the media resource R_lThe order of the media resources is later, the reflected information is the media resource R_lNot very attractive to a wide range of users, then only for that R_lFine tuning the feature word weight; if media resource R_lThe current total score is greater than the score thresholdIndicate a click on media asset R_lAnd/or the user clicks on the media resource R_lThe order of the media resources is earlier, and the reflected information is the media resource R_lIs more attractive to a wide range of users, then R is the most attractive to_lThe weight of the feature word is adjusted to a greater extent.

Step 204: and finely adjusting the weight value of each feature word of the media resource.

In this step, t_jIs a media resource R_lThe weight of the jth feature word in (1), namely the media resource R in the feature word weight matrix T_lCorresponding elements, wherein α is a weight adjustment parameter, is an empirical constant, andcalculating the media resource R according to the formula_lAfter each feature word weight value, updating a feature word weight value matrix T of the background media resource database.

Step 205: and adding all the characteristic words of the media resources into the high-frequency characteristic word set, and adjusting the weight value of each characteristic word of the media resources.

In this step, since the media resource R_lThe current total score is greater than the score thresholdTo illustrate a media resource R_lThe attraction to the user is generally high, then the media resource R is used_lAll feature words of (2) are added to the high-frequency feature word setIn, andthe characteristic words in (1) are mutually different, i.e.No repeated feature words are included. Then according to the formula f (t)_j)＝t_j×(1+Score(R_l) /(β +1)) to media resource R_lIs adjusted, wherein t is_jIs a media resource R_lThe weight of the jth feature word in (1), namely the media resource R in the feature word weight matrix T_lCorresponding element, f (t)_j) Is a media resource R_lβ is a weight adjustment parameter, is an empirical constant, andx isThe number of feature words contained therein. Calculating the media resource R according to the formula_lEach of (1)And after the feature word weight is updated, updating a feature word weight matrix T of the background media resource database.

The above-mentioned process of adjusting the feature word weight matrix T for different users according to the click amount and the click sequence of the user can adjust the feature word weight of the background media resource according to the click feedback information of the user, and can provide more reasonable hot media resource sequencing for the user, so that the recommendation performance is better.

The invention also discloses a device of the resource recommendation method based on semantic link on the television platform, fig. 3 is a structural diagram of the device, and as shown in fig. 3, the device comprises:

the text information extraction module 310 is configured to extract text information of all media resources in the background media resource library;

the feature word extraction module 320 is configured to extract candidate feature words of each media resource according to text information of the media resource, calculate a weight of the candidate feature words, filter the candidate feature words according to the weight to obtain feature words, and generate a feature word weight matrix T of the background media resource library;

if the current media resource watched by the user is the media resource in the background media resource library, the media resource recommendation list generation module 330 calculates the clustering similarity between each media resource in the background media resource library and the current media resource by using the feature word weight matrix T by using a clustering method, and selects L media resources with the highest clustering similarity to generate a media resource recommendation list.

The feature word extraction module 320 further includes:

the word segmentation sequence sub-module 321 is configured to, for each media resource in the background media resource library, segment text information of each media resource into word segmentation sequences according to different parts of speech by using a lexical analysis tool;

the candidate characteristic word extraction sub-module 322 is configured to match the word segmentation sequence of each media resource with the hot word dictionary, merge multiple word segments including relationships in the hot word dictionary according to the longest word string, and use the merged word segment as a candidate characteristic word of the media resource;

the feature word weight matrix generation submodule 323 is used for calculating the weight of the candidate feature word, wherein the weight is the word frequency-inverse document frequency value of the candidate feature word, the candidate feature word with the weight not less than the weight threshold value is filtered through a deactivation table, and the filtered candidate feature word is the feature word of the media resource;

constructing feature words of the background media resource library by using feature words of all media resources of the background media resource library, and using a vector C ═ C₁,…,c_j,,…,c_M]Representing, wherein M is the number of the feature words of the background media resource library, the feature words of the background media resource library comprise the feature words of each media resource in the background media resource library, and the feature words of any two background media resource libraries are different;

setting a weight matrix T of the characteristic words of M × N, wherein the row number M of the matrix represents the characteristic word c of the background media resource library_jColumn number N represents media asset D of background media asset library_iElement T of the weight matrix T of the feature word_jiRepresentation feature word c_jOn media asset D_iThe weight value of (1) is taken as the feature word c_jIs a media asset D_iWhen the feature word of (1), t_jiIs a characteristic word c_jOn media asset D_iTF-IDF value of (1); when the feature word c_jIs not a media asset D_iWhen the feature word of (1), t_ji＝0。

The feature word weight matrix generation submodule 323 is further configured to:

performing singular value decomposition on the feature word weight matrix T to obtain three matrixes S, V, U containing semantic relations^TAnd T ═ SVU^TWherein, U^TThe weight matrix of the feature words is obtained after the weight matrix T of the feature words is subjected to singular value decomposition and dimension reduction.

If the current media resource watched by the user is not the media resource in the background media resource library, the device further comprises a current media resource feature word weight calculation module 340, configured to obtain text information of the current media resource watched by the user, extract feature words of the current media resource according to the text information of the current media, calculate a weight of each feature word, construct a weight vector Y of the current media resource, where Y is an M × 1 matrix and a matrix element Y is_j(j is more than or equal to 1 and less than or equal to M) as a characteristic word c_jWeight in the current media resource, when the feature word c_jIs a feature word of the current media asset, y_jIs a characteristic word c_jA TF-IDF value in the current media resource; when the feature word c_jWhen not the feature word of the current media resource, y_j＝0。

The current media resource feature word weight calculation module 340 is further configured to:

The media resource recommendation list generation module 330 further comprises:

a background media resource set generating sub-module 331, configured to define the feature words of the current media resources as specific feature words, and configure the background media resource set with media resources whose weights on all the specific feature words are not 0 in the background media resource library

A similarity operator module 332 for aggregating background media resources by using K-means algorithmClustering is carried out, wherein K in the K-means algorithm takes the number of specific characteristic words, and background media resources are gatheredDivision into K classes

wherein u is_jkIs D_jAt U^TOf the corresponding jth row and kth column element, y_kThe corresponding k column element in Y1 for D'.

The device further includes a weight learning module 340, configured to adjust a weight of a feature word weight matrix T of a background media resource library according to a click sequence and a click amount of a user clicking a media resource in a media resource recommendation list, where the weight learning module 340 further includes:

a media asset score calculation module 341 for calculating a media asset score based onComputing single user to mediaResource R_lWherein R is_lFor the media resource currently clicked and watched by the user in the media resource recommendation list, rank (R)_l) For user to media resource R_lAnd 1 is not more than rank (R)_l) L ≦ Score _ max being the maximum worth constant that defines the individual user's scoring of the media asset;

a media resource total score calculation module 342 for calculating a total score based onComputing a media resource R_lCurrent total score, where P is the current click media asset R_lThe number of users of (c);

a weight value adjusting module 343, for adjusting the media resource R if it is determined that the media resource R is a media resource R_lThe current total score is not greater than the score thresholdAccording to the formula f (t)_j)＝t_j×(1+Score(R_l) /(α +1)) to media resource R_lAdjusting the weight of each feature word;

if media resource R_lThe current total score is greater than the score thresholdMedia resource R_lAll the feature words are added into a high-frequency feature word setIn accordance with the formula f (t)_j)＝t_j×(1+Score(R_l) /(β +1)) to media resource R_lAdjusting the weight of each feature word;

wherein, t_jIs a media resource R_lThe weight of the jth feature word in (1), namely the media resource R in the feature word weight matrix T_lCorresponding element, f (t)_j) Is a media resource R_lα is a weight adjustment parameter, and the characteristic words in (1) are mutually different, i.e.β is a weight adjustment parameter, andx isThe number of feature words contained therein.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that are within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A resource recommendation method based on semantic links on a television platform is characterized by comprising the following steps:

if the current media resource watched by the user is the media resource in the background media resource library, calculating the clustering similarity of each media resource in the background media resource library and the current media resource by using the characteristic word weight matrix T by adopting a clustering method, and selecting L media resources with the highest clustering similarity to generate a media resource recommendation list, wherein L is an integer greater than 0;

the method includes the steps of extracting candidate feature words of each media resource according to text information of the media resource, calculating weights of the candidate feature words, filtering the candidate feature words to obtain feature words, and generating a feature word weight matrix T of a background media resource library, and further includes the steps of:

aiming at each media resource in the background media resource library, segmenting text information of each media resource into word segmentation sequences by utilizing a lexical analysis tool according to different parts of speech;

matching the word segmentation sequence of each media resource with a hot word dictionary, merging a plurality of word segmentations containing relations in the hot word dictionary according to the longest word string, and taking the merged word segmentations as candidate characteristic words of the media resource;

calculating a weight of the candidate characteristic word, wherein the weight is a word frequency-inverse document frequency value of the candidate characteristic word, filtering the candidate characteristic word with the weight not less than a weight threshold value through a deactivation table, and taking the filtered candidate characteristic word as the characteristic word of the media resource;

constructing feature words of the background media resource library by using feature words of all media resources of the background media resource library, and using a vector C ═ C₁,…,c_j,…,c_M]Representing, wherein M is the number of the feature words of the background media resource library, the feature words of the background media resource library comprise the feature words of each media resource in the background media resource library, and the feature words of any two background media resource libraries are different;

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein if the current media resource viewed by the user is not a media resource in the background media resource library, before the clustering similarity between each media resource in the background media resource library and the current media resource is calculated by the clustering method, the method further comprises:

acquiring text information of a current media resource watched by a user, extracting feature words of the current media resource according to the text information of the current media resource, calculating a weight of each feature word, and constructing a weight vector Y of the current media resource, wherein Y is an M × 1 matrix and a matrix element Y_j(j is more than or equal to 1 and less than or equal to M) as a characteristic word c_jWeight in the current media resource, when the feature word c_jIs a feature word of the current media asset, y_jIs a characteristic word c_jA TF-IDF value in the current media resource; when the feature word c_jWhen not the feature word of the current media resource, y_j＝0。

4. The method of claim 2, further comprising:

5. The method according to claim 4, wherein the clustering method uses the feature word weight matrix T to calculate the clustering similarity between each media resource in the background media resource library and the current media resource, and further comprises:

defining the characteristic words of the current media resources as specific characteristic words, and forming a background media resource set by the media resources with weights not being 0 on all the specific characteristic words in a background media resource library

<mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>D</mi> <mi>j</mi> </msub> <mo>,</mo> <msup> <mi>D</mi> <mo>&prime;</mo> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <munder> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> </munder> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>&times;</mo> <msub> <mi>y</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msqrt> <mrow> <munder> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> </munder> <msubsup> <mi>u</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> <mn>2</mn> </msubsup> </mrow> </msqrt> <msqrt> <mrow> <munder> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> </munder> <msubsup> <mi>y</mi> <mi>k</mi> <mn>2</mn> </msubsup> </mrow> </msqrt> </mrow> </mfrac> <mo>;</mo> </mrow>

wherein u is_jkIs D_jAt U^TOf the corresponding jth row and kth column element, y_kIs D' corresponding kth column element in Y1.

6. The method of claim 1, further comprising:

aiming at the click sequence and click quantity of the media resources in the media resource recommendation list clicked by the user, carrying out weight adjustment on a feature word weight matrix T of a background media resource library, and specifically comprising the following steps:

according toComputing a single user-to-media resource R_lWherein R is_lFor the media resource currently clicked and watched by the user in the media resource recommendation list, rank (R)_l) For user to media resource R_lAnd 1 is not more than rank (R)_l) L ≦ Score _ max being a constant defining the maximum value of the individual user's scoring of the media asset;

according toComputing a media resource R_lCurrent total score, where P is the current click media asset R_lThe number of users of (c);

if media resource R_lThe current total score is not greater than the score thresholdAccording to the formula f (t)_j)＝t_j×(1+Score(R_l) /(α +1)) to media resource R_lAdjusting the weight of each feature word;

7. A resource recommendation device based on semantic links on a television platform, the device comprising:

a media resource recommendation list generation module, if the current media resource watched by the user is the media resource in the background media resource library, calculating the clustering similarity between each media resource in the background media resource library and the current media resource by using the feature word weight matrix T by adopting a clustering method, and selecting L media resources with the highest clustering similarity to generate a media resource recommendation list, wherein L is an integer greater than 0;

wherein,

the word segmentation sequence submodule is used for segmenting the text information of each media resource into word segmentation sequences by utilizing a lexical analysis tool according to different parts of speech aiming at each media resource in the background media resource library;

the candidate characteristic word extraction sub-module is used for matching the word segmentation sequence of each media resource with the hot word dictionary, merging a plurality of word segmentations containing relations in the hot word dictionary according to the longest word string, and taking the merged word segmentations as candidate characteristic words of the media resource;

the feature word weight matrix generation submodule is used for calculating the weight of the candidate feature words, the weight is the word frequency-inverse document frequency value of the candidate feature words, the candidate feature words with the weight not less than the weight threshold value are filtered through a deactivation table, and the filtered candidate feature words are the feature words of the media resources;

setting a weight matrix T of the characteristic words of M × N, wherein the row number M of the matrix represents the characteristic word c of the background media resource library_jColumn number N represents media asset D of background media asset library_iElement T of the weight matrix T of the feature word_jiRepresentation feature word c_jOn media asset D_iThe weight value of (1) is taken as the feature word c_jIs a media asset D_iWhen the feature word of (1), t_jiIs characterized in thatWord c_jOn media asset D_iTF-IDF value of (1); when the feature word c_jIs not a media asset D_iWhen the feature word of (1), t_ji＝0。

8. The apparatus of claim 7, wherein the feature word weight matrix generation sub-module is further configured to:

9. The apparatus as claimed in claim 7, wherein if the current media asset viewed by the user is not a media asset in the background media asset library, the apparatus further comprises:

a current media resource feature word weight calculation module, configured to obtain text information of a current media resource watched by a user, extract feature words of the current media resource according to the text information of the current media resource, calculate a weight of each feature word, and construct a weight vector Y of the current media resource, where Y is an M × 1 matrix and a matrix element Y is_j(j is more than or equal to 1 and less than or equal to M) as a characteristic word c_jWeight in the current media resource, when the feature word c_jIs a feature word of the current media asset, y_jIs a characteristic word c_jA TF-IDF value in the current media resource; when the feature word c_jWhen not the feature word of the current media resource, y_j＝0。

10. The apparatus according to claim 8, wherein the current media resource feature word weight calculation module is further configured to:

11. The apparatus of claim 10, wherein the media resource recommendation list generation module further comprises:

a background media resource set generation submodule for defining the characteristic words of the current media resources as specific characteristic words and forming a background media resource set by the media resources with weights not being 0 on all the specific characteristic words in a background media resource library

A similarity operator module for gathering background media resources by adopting K-means algorithmClustering is carried out, wherein K in the K-means algorithm takes the number of specific characteristic words, and background media resources are gatheredDivision into K classes

wherein, the background media resource D_jSimilarity Sim (D) with current media asset D_jD') by cosine phaseAnd (4) calculating the similarity:

12. The apparatus according to claim 7, further comprising a weight learning module, configured to perform weight adjustment on a feature word weight matrix T of a background media asset library according to a click sequence and a click amount of a user clicking on a media asset in the media asset recommendation list, where the weight learning module further includes:

a media resource score calculation module for calculating a score based onComputing a single user-to-media resource R_lWherein R is_lFor the media resource currently clicked and watched by the user in the media resource recommendation list, rank (R)_l) For user to media resource R_lAnd 1 is not more than rank (R)_l) L ≦ Score _ max being the maximum worth constant that defines the individual user's scoring of the media asset;

a media resource total score calculating module for calculating total score according toComputing a media resource R_lCurrent total score, where P is the current click media asset R_lThe number of users of (c);

a weight value adjusting module for adjusting the weight value if the media resource R is_lThe current total score is not greater than the score thresholdAccording to the formula f (t)_j)＝t_j×(1+Score(R_l) /(α +1)) to media resource R_lAdjusting the weight of each feature word;