WO2011036704A1 - Content recommendation device - Google Patents

Content recommendation device Download PDF

Info

Publication number
WO2011036704A1
WO2011036704A1 PCT/JP2009/004812 JP2009004812W WO2011036704A1 WO 2011036704 A1 WO2011036704 A1 WO 2011036704A1 JP 2009004812 W JP2009004812 W JP 2009004812W WO 2011036704 A1 WO2011036704 A1 WO 2011036704A1
Authority
WO
WIPO (PCT)
Prior art keywords
vod
content
learning data
broadcast
importance
Prior art date
Application number
PCT/JP2009/004812
Other languages
French (fr)
Japanese (ja)
Inventor
中田康太
村上知子
Original Assignee
株式会社 東芝
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社 東芝 filed Critical 株式会社 東芝
Priority to PCT/JP2009/004812 priority Critical patent/WO2011036704A1/en
Publication of WO2011036704A1 publication Critical patent/WO2011036704A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/16Analogue secrecy systems; Analogue subscription systems
    • H04N7/173Analogue secrecy systems; Analogue subscription systems with two-way working, e.g. subscriber sending a programme selection signal
    • H04N7/17309Transmission or handling of upstream communications
    • H04N7/17318Direct or substantially direct transmission and handling of requests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/251Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/252Processing of multiple end-users' preferences to derive collaborative data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47202End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting content on demand, e.g. video on demand
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection
    • H04N21/4826End-user interface for program selection using recommendation lists, e.g. of programs or channels sorted out according to their score

Definitions

  • the present invention relates to a content recommendation device for recommending video content to a user in a video on demand service.
  • VoD Video On Demand
  • high-quality contents such as movies and sports are distributed in large quantities, and the user can view the contents according to his / her situation.
  • the video content increases, it is necessary for the user to select a video he / she wants to see from a huge amount of videos, and finding a video he / she really wants to see becomes increasingly difficult.
  • video distribution sites provide information by expanding the search system based on genre information and presenting popular videos.
  • a broadcast program recommendation system is progressing as a method for presenting video that is more tailored to the user's personality.
  • the broadcast program recommendation system is a system that recommends a program that matches a user's preference from among an increasing number of broadcast programs as the number of broadcast channels increases. In such a recommendation system, it is expected to present a broadcast program that suits the user's preference by recommending the program using the program genre, metadata such as performers, and the user's viewing history.
  • the target video can be presented when the content that the user wants to view is clear, but when the content that the user wants to view is not clear, the video that suits the taste can be presented. difficult.
  • a system that presents a popular video it is possible to present content that is generally viewed, but since the user's interests are different, it is not always possible to present content that suits the taste.
  • Japanese Patent Application Laid-Open No. 2006-127145 discloses that a viewer's preference at the time of viewing a program is appropriately reflected and presented to the viewer by reflecting a preference that changes with changes in program content ( Patent Document 1).
  • a method of using a relatively large amount of viewing history of a broadcast program can be considered as a method for supplementing the viewing history of VoD.
  • VoD content recommendation is performed using the viewing history of the broadcast program, the recommendation reflecting the user's preference for the broadcast program is performed, and an appropriate VoD content cannot be recommended.
  • the present invention has been made to solve the above-described problems, and an object of the present invention is to provide a content recommendation device capable of presenting content that matches a user's preference from a large amount of VoD content. .
  • the content recommendation device of the present invention includes a VoD positive example complementing unit that complements data similar to a positive example from negative examples of VoD learning data, and supplemented VoD learning data.
  • the feature amount importance of each feature amount is calculated using the feature amount importance, and a selection processing unit that selects data suitable for VoD content recommendation from the broadcast learning data, and the complemented VoD learning data are selected.
  • a recommendation model learning unit that learns a recommendation model using broadcast learning data and the calculated feature value importance, a determination unit that determines a recommendation based on a recommendation model learned for a VoD target content, and a recommendation determination result It has an output part which performs control presented to.
  • the block diagram which shows the example of whole structure of the content recommendation apparatus based on the Example of this invention.
  • the figure which shows an example of the metadata provided to the VoD content which concerns on an Example.
  • the flowchart which shows the specific example of the process in the VoD positive example complementation part which concerns on an Example.
  • the figure which shows an example of the process which reverses the "negative example” of VoD learning data to the "positive example” using four content which concerns on an Example.
  • the flowchart which shows the specific example of the process in the selection process part which concerns on an Example The figure which shows an example of the importance provided to the feature-value of the performer which concerns on an Example.
  • the figure which shows an example of the importance provided to the feature-value of the keyword which concerns on an Example The figure which shows an example of the positive example of the broadcast learning data which concerns on an Example.
  • the flowchart which shows the specific example of the process in the recommendation model learning part which concerns on an Example It is the example which converted the performer feature-value and keyword feature-value of the learning data of FIG. 7 into binary.
  • the flowchart which shows the specific example of the process in the determination part which concerns on embodiment The figure which shows an example of the VoD object data which concern on embodiment.
  • FIG. 1 is a block diagram showing an example of the overall configuration of a content recommendation device according to an embodiment of the present invention.
  • the content recommendation device 1 includes a VoD learning data storage unit 11, a broadcast learning data storage unit 12, a VoD correct example complementing unit 13, a selection processing unit 14, a recommended model learning unit 15, a VoD target data storage unit 16, A determination unit 17 and an output unit 18 are included.
  • the VoD learning data storage unit 11 is a storage device for storing VoD learning data.
  • the VoD learning data includes VoD metadata and VoD viewing data.
  • VoD metadata is data obtained from metadata attached to content.
  • FIG. 2 shows an example of metadata assigned to the VoD content.
  • the metadata includes “content ID, genre, title, performer, time, summary” and the like.
  • the “summary” is expressed as text data.
  • the keywords assigned to the VoD metadata are extracted from the “summary”.
  • the keyword is a word having a high frequency and a high TFIDF value from the text data in the “summary” of all contents.
  • the word frequency and the TFIDF value can be easily calculated by using a conventional morphological analysis technique.
  • FIG. 3 is an example of VoD metadata generated from the metadata of FIG.
  • the keyword extracted from the “summary” is given instead of the “summary” information.
  • the VoD viewing data used in the present embodiment is information regarding viewing of the user's VoD content. Here, it is assumed that information related to viewing is obtained from the device for all VoD contents.
  • FIG. 4 shows an example of VoD viewing data.
  • the content ID 41 of the VoD viewing data is an individual ID assigned to each VoD content.
  • the viewing information 42 of VoD viewing data represents viewing-related information. For example, if the user is viewing “back to the xxx” corresponding to the content ID-1 in FIG. 2, the viewing information 42 is “viewing”. On the other hand, when the user has not viewed the contents ID-2 and ID-3, the viewing information 42 is “unviewed”.
  • FIG. 5 shows an example of VoD learning data.
  • the VoD learning data is obtained by adding VoD viewing data to the VoD metadata of each content.
  • learning is performed by generating “positive example” and “negative example” based on the VoD viewing data of FIG. 4 as a teacher signal.
  • the “positive example” is what the user desires to watch, and the “negative example” does not want to watch. Therefore, “viewed” and “favorite” of the VoD learning data are set as “positive examples”, and “unviewed” are set as “negative examples”.
  • each value of the performer, keyword, and genre of VoD metadata is set as a “feature amount” representing the content.
  • the feature amount of the content ID-1 is “Robert ⁇ ⁇ ⁇ ⁇ ”, “Michael J. ⁇ ⁇ ⁇ ⁇ ⁇ ”, “Time”, “Machine”, “Past”, “Movie” -Western film.
  • a set of performers such as [Robert ⁇ ⁇ ⁇ ⁇ ⁇ , Michael J. ⁇ ⁇ ⁇ ⁇ ] in the VoD metadata is called a performer feature amount of the content ID-1.
  • a set of keywords such as [time, machine, past] is called a keyword feature amount of content ID-1.
  • a set of genres such as [movie-foreign film] is called a genre feature amount of content ID-1.
  • the broadcast learning data storage unit 12 is a storage device for storing broadcast learning data.
  • Broadcast learning data includes broadcast metadata and broadcast viewing data. Broadcast learning data is stored in the same format as VoD learning data.
  • broadcast metadata for example, there may be cases where the genre system and expression differ between VoD content and broadcast content, but this can be solved by associating the genre of broadcast content with the genre of VoD content. Further, it is assumed that the same word as the keyword used in the VoD metadata is also used in the broadcast metadata. Through the above processing, broadcast metadata in the same format as VoD metadata can be obtained for broadcast content.
  • Broadcast viewing data is information related to viewing of the broadcast content of the user.
  • information viewed by the user is obtained directly or indirectly from the television receiver for all broadcast contents of all channels.
  • it is possible to obtain data in the same format as the VoD viewing data by setting “viewing” for broadcast content that has viewed 20% or more of a whole broadcast program and “not viewing” for other content. is there.
  • Broadcast learning data in the same format as VoD learning data can be obtained by using broadcast metadata and broadcast viewing data.
  • the definitions of “positive example” and “negative example” and the definition of the feature amount are the same as those of the VoD learning data.
  • the VoD correct example complementing unit 13 performs a correct example complement process including a similar title selection process, a similar metadata selection process, and a correct example inversion process.
  • the similar title selection process is a process of selecting a “negative example” similar to the “positive example” title from the “negative examples” of the VoD learning data acquired from the VoD learning data storage unit 11.
  • the similar metadata selection process is a process of selecting a “negative example” in which performers and keywords overlap from the selected “negative example”.
  • the positive example inversion process is a process of converting the selected “negative example” into the “positive example”.
  • the VoD positive example complementing unit 13 is realized by executing a program for the positive example complementing process by the processor.
  • FIG. 6 is a flowchart showing a specific example of the processing operation of the VoD correct example complementing unit 13.
  • the VoD correct example complementing unit 13 determines whether there is a “correct example” that is not selected in the “correct example” of the VoD learning data (step S131). If the “positive example” that has not been selected remains in the VoD learning data, the “positive example” that has not yet been selected is selected (step S132). On the other hand, if the “positive example” of all VoD learning data has been selected, the process ends (No in step S131).
  • the VoD positive example complementing unit 13 determines whether there is a “negative example” that is not selected in the “negative example” of the VoD learning data (step S133). If the “negative example” that has not been selected remains in the VoD learning data, the “negative example” that has not yet been selected is selected (step S134). On the other hand, if “negative example” of all VoD learning data has been selected, the process returns to step S131.
  • the VoD positive example complementing unit 13 calculates a title score representing the similarity between the “negative example” of the selected VoD learning data and the “positive example” title of the selected VoD learning data (step S135). . Then, it is determined whether or not the calculated title score is equal to or greater than a threshold set by the user (step S136). If the calculated title score is equal to or greater than the threshold, the process proceeds to step S137, and if less than the threshold, the process returns to step S133.
  • the VoD positive example complementing unit 13 generates a metadata score representing the degree of commonality between the “negative example” of the selected VoD learning data and the performers and keywords of the “positive example” of the selected VoD learning data. Is calculated (step S137). Then, it is determined whether or not the calculated metadata score is equal to or greater than a threshold set by the user (step S138). If it is equal to or greater than the threshold, the process proceeds to step S139, and if it is less than the threshold, the process returns to step S133.
  • the VoD correct example complementing unit 13 inverts the “negative example” of the VoD learning data whose title score and metadata score are both equal to or greater than the threshold value to the label “positive example”, and stores the contents (step S139).
  • the VoD positive example complementing unit 13 can extract the content that should actually be handled as the “positive example” from the “negative example” of the VoD learning data.
  • a specific operation of the VoD correct example complementing unit 13 will be described using the four contents of FIG.
  • Formula (1) is a title score calculation example executed in the process of step S135 of FIG.
  • the sub_N (title) in the expression (1) represents an arbitrary N character partial character string in the title. It should be understood that a part of the expression is different between the expression (1) and the expression of the expression (1). The same applies to the explanation of other equations.
  • the title score is calculated using the selected title “title_p” of “positive example” and the selected title “title_n” of “negative example”.
  • a title score that is “1” is used when an arbitrary N-character partial character string of title_p is included in title_n, and “0” is used otherwise.
  • step S137 if the selected “positive example” is the content ID-1 and the “negative example” is the content ID-4, the content ID-4 is not a “positive example” candidate, and the process goes to step S133. Return.
  • Formula (2) is a calculation example of the metadata score executed in the process of step S137 in FIG. [Guest] _ ⁇ p, n ⁇ in Expression (2) represents a set of performers of “positive example” and “negative example”.
  • the metadata score is determined according to the performers of the selected “positive example” and “negative example” and the degree of overlapping of keywords.
  • step S136 content ID-1 and ID-2 are not the same performers, so
  • 0.
  • the data score is “0”.
  • the threshold is set to “1”
  • the content ID-2 is the content of step S139.
  • a pseudo “positive example” label is given in the process.
  • the content ID-2 is treated as a pseudo “positive example” in all subsequent processes.
  • the selected “positive example” is the content ID-3 and the “negative example” is the content ID4
  • the content ID-4 is treated as a “negative example” as it is in all the processes.
  • VoD positive example complementing process 13 it is possible to perform highly accurate learning by inverting “negative example” that should be handled as a positive example to “positive example”, and at the same time, a small amount of “positive example”. The number can be increased.
  • VoD content often includes all content such as movies and drama series and sports tournament videos.
  • the content viewed as before is treated as a “positive example” and the unviewed content is treated as a “negative example”, for example, when only the first episode of a series of all 10 episodes is viewed, All are treated as “negative examples”, and it is considered that recommendations with high accuracy cannot be made.
  • content similar in title representing the same series or the like using the title score is processed as a pseudo “positive example” candidate.
  • content ID-3 and ID-4 are similar in content but contain a lot of completely different content. Therefore, the metadata score is further used to determine whether or not the pseudo “right example” candidate is content that should be treated as a “pseudo“ right example ”.
  • the value of the metadata score becomes high and is treated as a pseudo “positive example”.
  • the performers do not overlap like content ID-3 and ID-4, it is not treated as a pseudo “positive example” because there is no relationship such as series. Therefore, it is expected that the “negative example” that should actually be handled as a positive example can be handled as a “positive example” by the VoD processing complementing unit 13, and a highly recommended recommendation model can be learned.
  • VoD content is often distributed by paid services, it is considered that there are a small number of “positive examples”.
  • the selection processing unit 14 performs selection processing including feature amount importance calculation processing and case selection processing.
  • the feature amount importance calculation processing is processing for calculating the importance of each feature amount using the VoD learning data acquired from the VoD example complementing unit 13.
  • the case selection process is a process of calculating the reliability of broadcast learning data using the calculated feature value importance and holding the broadcast learning data having a high value as learning data.
  • the selection processing unit 14 is realized by executing a program for selection processing by the processor.
  • FIG. 8 is a flowchart showing a specific example of the processing operation of the selection processing unit 14.
  • the selection processing unit 14 determines whether there is a feature amount that is not selected among the feature amounts of the VoD learning data (step S141). If there is a feature amount that has not yet been selected from the feature amounts of the VoD learning data, the selection processing unit 14 selects it (step S142). If all the feature values have been selected, the process proceeds to step S145.
  • the selection processing unit 14 calculates the importance of the selected feature amount (step S143).
  • the selection processing unit 14 gives the calculated importance to the feature amount (step S144), and returns to step S141.
  • the processes in steps S142 to S144 are repeatedly executed until there is no unselected feature quantity among the feature quantities of the VoD learning data.
  • the selection processing unit 14 determines whether or not there is a “correct example” not selected in the “correct example” of the broadcast learning data (step S145). If there is a “correct example” that has not yet been selected from the “correct example” of the broadcast learning data, the selection processing unit 14 selects it (step S146). If all broadcast learning data has been selected, the process ends. The selection processing unit 14 calculates the reliability of the selected broadcast learning data (step S147). The selection processing unit 14 determines whether or not the calculated reliability is greater than or equal to a threshold (step S148), and if it is less than the threshold, the process returns to step S145.
  • the selection processing unit 14 assigns the calculated reliability to the broadcast learning data (step S149), and returns to step S145.
  • the processes of steps S146 to S149 are repeatedly executed until there is no “positive example” not selected in the broadcast learning data.
  • the selection processing unit 14 can select data that can be handled in the same way as the VoD learning data from “positive examples” of the broadcast learning data, and can use it as learning data together with the VoD learning data. It is.
  • the feature amount of the VoD learning data uses performers, keywords, and genres, and the selection processing unit 14 can assign importance to all feature amounts.
  • the importance of the feature amount indicates the strength of the correlation between the feature amount and the “positive example”. That is, the feature quantity with high importance strongly reflects the user's preference.
  • importance can be given using different criteria for performer feature values, keyword feature values, and genre feature values.
  • Formula (3) is a calculation example of the importance CR of the guest's feature quantity guest_i.
  • P (guest_i) is the number of VoD learning data in which guest_i appears
  • viewed) is the number of data in which guest_i appears in the VoD learning data. This score takes a value from 0 to 1, and is a high value when a lot of content in which guest_i appears is viewed.
  • Expression (4) is an example of the importance GS of the keyword feature quantity keyword_j.
  • viewed) is the number of data including keyword_j in the positive example of VoD learning data.
  • not viewed) is the number of data including keyword_j in the negative example of VoD learning data.
  • coef is a correction coefficient. This value is a value from 0 to 1 based on the Graham score, and is a high value when a large amount of content including keyword_j is viewed.
  • Formula (5) is a calculation example of the importance GI of the genre feature quantity genre_l.
  • [genre_viewed] represents the genre set of the content viewed.
  • the importance value is “1” if the genre_l has been viewed, and “0” otherwise.
  • FIG. 9 shows an example of the importance CR assigned to the feature amount of the performer.
  • the importance CR is high.
  • the content in which “Antonio ⁇ ⁇ ⁇ ” appears is rarely viewed, the importance CR is low.
  • FIG. 10 shows an example of the importance GS assigned to the keyword feature.
  • the importance GS is high.
  • the content including “Legend” is not viewed much, the importance GS is low.
  • FIG. 11 shows an example of the degree of importance assigned to the genre feature quantity GI.
  • the importance GI is “1”.
  • the content of “animation-domestic animation” is not viewed, the importance GI is “0”.
  • the selection processing unit 14 calculates the reliability of the broadcast content by the process of step S147 of FIG. 8 using the importance of the feature amount calculated by the above method.
  • different reliability calculation methods are provided for performers, keywords, and genres.
  • Formula (6) is a calculation example of the performer reliability calculated by the importance CR of the performer's feature amount.
  • T_J represents the selected broadcast learning data.
  • the reliability of the performer of the broadcast learning data T_J is obtained by adding the importance of the performer to the broadcast learning data T_J.
  • Expression (7) is a calculation example of the reliability of the keyword calculated from the importance GS of the keyword feature amount.
  • the reliability of the keyword of the broadcast learning data T_J becomes higher as T_J includes more important keywords.
  • the importance level of the keyword is expressed by Expression (4), and the importance level of the keyword included in a large amount of the viewed VoD content increases. Therefore, the more reliable the viewed VoD content and the broadcast learning data T_J keyword, the greater the reliability of the T_J keyword.
  • Expression (8) is a calculation example of the genre reliability calculated based on the importance of the genre feature quantity GI.
  • the genre reliability of the broadcast learning data T_J is calculated by taking the sum of the importance levels of the genres included in the broadcast learning data T_J. There may be only one genre of broadcast learning data, but some broadcast learning data has multiple genres such as "Information / Wide Show-Entertainment / Wide Show” and "News / Report-Other". Sometimes given at the same time. Therefore, the sum of importance levels is adopted.
  • the performer reliability, keyword reliability, and genre reliability of the selected broadcast learning data are calculated.
  • the selection processing unit 14 determines whether or not to add the broadcast learning data to the “positive example” depending on whether or not each reliability of the selected broadcast learning data T_J is equal to or greater than a threshold in the process of step S148 of FIG. To do. In the present embodiment, if any one of the reliability levels exceeds the threshold, the broadcast learning data is added to the “positive example”.
  • FIG. 12 shows an example of a positive example of broadcast learning data.
  • the process of holding the “correct example” of the broadcast learning data of the selection processing unit 14 (step S149 in FIG. 8) will be specifically described.
  • the performer reliability threshold is 0.7
  • the keyword reliability threshold is 0.99
  • the genre reliability threshold is “1”.
  • the selection processing unit 14 selects the content ID-T1, the reliability value of the performer exceeds the performer reliability threshold value 0.7. That is, since the performers of the content ID-T1 include “Michael J. ⁇ ⁇ ⁇ ⁇ ⁇ ”, the importance level 0.8 of the performers in FIG. 9 exceeds the threshold value 0.7. . In the content ID-T1, the reliability of the genre and the reliability of the keyword are less than the threshold, but the reliability of the performer is greater than or equal to the threshold. Therefore, the selection processing unit 14 sets the content ID-T1 to “ It is added to the broadcast learning data as a “positive example”.
  • the genre reliability value is the same as the genre reliability threshold “1”. That is, since the genre of the content ID-T2 is “movie-foreign film”, the importance “1” and the threshold “1” of the genre in FIG. 11 have the same value.
  • the reliability of the performer and the reliability of the keyword are less than the threshold value.
  • the selection processing unit 14 sets the content ID-T2 in the process of step S149. It is added to the broadcast learning data as a “positive example”.
  • the selection processing unit 14 selects the content ID-T3, the reliability of the performer of the content ID-T3, the reliability of the genre, and the reliability of the keyword are all below the threshold. Therefore, the selection processing unit 14 proceeds from step S148 to step S145, and does not add the content ID-T3 as “correct example” to the broadcast learning data. Similarly, the content ID-T4 is not added to the “primary example”.
  • this selection processing unit 14 it is possible to use only “positive examples” that are effective for the recommended model of VoD learning data from the broadcast learning data.
  • “positive examples” of the broadcast learning data include “positive examples” having the same tendency as the VoD learning data and “positive examples” having a tendency unique to broadcast contents.
  • the selection processing unit 14 can increase the “positive examples” used for learning the recommended model, and at the same time can exclude the “positive examples” peculiar to the broadcast learning data, so that accurate learning can be expected.
  • the recommendation model learning unit 15 learns the user's VoD content recommendation model using the learning data composed of the “positive examples” of the VoD learning data and the stored broadcast learning data.
  • the Bayesian network method is used for learning the recommendation model.
  • FIG. 13 shows an example of a recommended model learning Bayesian network.
  • the structure of the Bayesian network is such that “genre”, “performer”, and “keyword” nodes are connected to the viewing node.
  • the recommendation model can be learned by calculating a conditional probability table of each node from the learning data.
  • FIG. 14 is a flowchart showing a specific example of processing in the recommended model learning unit 15.
  • the recommended model learning unit 15 determines whether there is any learning data that has not yet been selected (step S151). If there is learning data that has not yet been selected from the learning data, the recommended model learning unit 15 selects the learning data (step S152). If all the learning data has been selected, the process proceeds to step S156.
  • the recommended model learning unit 15 determines whether there is a node that has not been selected in the selected learning data (step S153). If there is a node that has not been selected in the selected learning data, the recommended model learning unit 15 selects that node (step S154). If all nodes have been selected, the process ends.
  • the recommended model learning unit 15 converts the feature amount of the selected node into a discrete value (step S155). If there is a node that has not yet been selected in the selected learning data, the processes of steps S153 to S155 are repeatedly executed. And the recommendation model by a Bayesian network is learned using the learning data by which each node was converted into the discrete value (step S156).
  • the nodes are the feature amount of the performer (for example, see FIG. 9) and the feature amount of the keyword (for example, see FIG. 10).
  • the recommended model learning unit 15 converts the feature quantity of the performer and the keyword feature quantity of the learning data into binary discrete values of “preference” and “normal” as shown in FIG.
  • the values of the equations (6) and (7) used for the reliability calculation are used as scores
  • the sum of the performer scores and the keyword scores of the learning data are each equal to or greater than the threshold value.
  • “normal” can be converted into a binary discrete value.
  • the recommended model learning unit 15 receives the scores based on the equations (6) and (7) already calculated by the selection processing unit 14, and therefore performs a process of calculating the weight of each performer and keyword. Can be omitted.
  • FIG. 15 shows an example of converting the feature quantity of the performer in FIG. 9 and the feature quantity of the keyword in FIG. 10 into binary for the VoD content in FIG.
  • the score of the performer “Michael J. ⁇ ⁇ ⁇ ⁇ ⁇ ” of the VoD learning data of the content ID-1 is 0.8. If the threshold value is 0.7 as in the case of the broadcast content, the score is equal to or higher than the threshold value, so the feature amount of the performer is converted to “preference”. On the other hand, the performer score of the VoD learning data of content ID-4 is 0.1, which is less than the threshold value, so the performer feature value is converted to “normal”.
  • the learning data is converted using the importance of the feature amount calculated in advance, and the CPT is created using the learning data in the same manner as in the conventional Bayesian network learning method.
  • the VoD target data storage unit 16 is a storage device that stores VoD learning data to be recommended.
  • the VoD target data is stored in the same format as the VoD metadata.
  • the determination unit 17 selects unselected VoD target data from the VoD target data stored in the VoD target data storage unit 16, calculates a filter value for the selected VoD target data, and the filter value is
  • the VoD target data that is equal to or greater than the threshold value is determined by the recommended model learned by the recommendation model learning unit 15, and the determination result is given to the VoD target data.
  • the determination unit 17 is realized by executing a program for the series of processes by the processor.
  • FIG. 16 is a flowchart illustrating a specific example of processing in the determination unit 17.
  • the determination unit 17 determines whether there is data that is not selected in the VoD target data (step S171). If there is VoD target data that has not been selected in the VoD target data, the determination unit 17 selects the VoD target data (step S172). If all the VoD target data has been selected, the process ends. Next, the determination unit 17 calculates a filter value of the selected VoD target data (step S173). The determination unit 17 determines whether or not the calculated filter value is greater than or equal to a threshold value (step S174). If the calculated filter value is less than the threshold value, the process proceeds to step S176.
  • the determination unit 17 determines the VoD target data using the recommended model learned by the recommendation model learning unit 15 (step S175). The determination unit 17 assigns the determination result to the VoD target data (step S176), and returns to step S171.
  • the determination unit 17 calculates the filter value in the process of step S173, and determines only the VoD target data whose filter value is equal to or greater than the threshold using the recommended model in the process of step S175. Generally, since it takes time to make a determination in a model learned by a Bayesian network, it is possible to eliminate the “negative example” VoD target data that is obvious by setting a filter value with a short calculation time in advance, and to reduce the overall calculation cost. Can be reduced. In the present embodiment, the importance of the feature amount calculated by the selection processing unit 14 can be used for calculating the filter value in the process of step S173.
  • Equation (9) shows an example of the filter value.
  • id_obj, guest_obj, keyword_obj, and genre_obj are any elements such as the content ID, performer, keyword, and genre of the selected VoD target data, and [guest] ⁇ 5_1, [keyword] ⁇ 5_1, and [genre] ⁇ 5_1 , Performers, keywords, and genres, the set of the top five importance levels calculated by the selection processing unit 14.
  • Such filter values include any of the top five performers with the highest importance of performers, the top five keywords with the highest importance of keywords, and the top five genres with the highest importance of genres. The value is “1” when the value is “0”, and “0” otherwise.
  • the VoD target data including any one of the performers, keywords, and genres with high importance is recommended model in the process of step S175. More detailed discrimination is performed using. However, it is assumed that the VoD learning data that does not include any of them is not determined by the recommended model because it is clearly not recommended.
  • the determination unit 17 gives a determination result to the VoD target data.
  • the numerical value of the result is given, and when the result is not obtained from the recommended model, 0.0 is given as an obvious “negative example”.
  • FIG. 17 is an example of VoD target data.
  • the determination unit 17 selects the content ID-R1 since the content ID-R1 includes “movie-foreign film” having a higher genre importance, the filter value is “ 1 ". Therefore, the determination unit 17 performs detailed determination using the recommended model in the process of step S175. When the viewing probability calculated by the determination is 0.45, for example, the value is assigned to the VoD target data.
  • step S176 the determination unit 17 assigns a value of 0.0 to the VoD target data.
  • the determination unit 17 removes data that is clearly determined as “negative example” from the VoD target data in advance by the filter value, thereby reducing the number of determinations by the recommended model and reducing the calculation cost.
  • the filter value By using the importance of the feature amount calculated by the selection processing unit 14 as the filter value, it is not necessary to perform a calculation for the filter value, so that a new calculation cost does not occur. For this reason, it is possible to perform recommendation with low calculation cost even for VoD target data that generally exists in large quantities.
  • the output unit 18 selects the VoD target data having a higher viewing probability from the VoD target data to which the determination result is given by the determination unit 17, and the feature having high importance included in the selected VoD target data.
  • the amount is selected as a reason, and control is performed to present the selected VoD target data and the reason to the user.
  • the output unit 18 is realized by executing a program for the series of processes by the processor.
  • FIG. 18 is an example of output by the output unit 18.
  • the title, genre, performer, and outline are displayed with the data having the highest viewing probability among the VoD target data as “new content recommendation”.
  • the VoD target data includes a higher importance level of the feature amount calculated by the selection processing unit 14, it is considered that this is an important reason for recommending the VoD learning data. It is possible to emphasize and display the most important feature amount (for example, coloring, underlining, bold text, shading, etc.). With such a display, the user can find content that suits each taste without work such as search.
  • the VoD learning data that has become “pseudo positive example” in the VoD correct example complementing unit 13 has not been viewed by the user. Therefore, by displaying the “pseudo positive example” at the same time, the user can find not only new contents but also contents of series that are missed at the same time.
  • Such a display method is possible because the importance of the feature amount and the pseudo positive example are calculated in advance by the VoD correct example complementing unit 13 and the selection processing unit 14.
  • the content recommendation apparatus is effective for recommending VoD content from a viewing history of broadcast content using a pseudo increase of the VoD content that the user would like to view based on the viewed VoD content. By selecting a case, it is possible to learn an effective recommendation model.
  • the content recommendation device may be realized by installing a program in a computer device in advance. Further, the program may be realized by storing the program in a storage medium such as a CD-ROM or distributing the program through a network and installing the program in a computer apparatus as appropriate.
  • the VoD learning data storage unit 11, the broadcast learning data storage unit 12, and the VoD target data storage unit 16 are a memory, a hard disk or a CD-R, a CD-RW, a DVD-RAM, It can be realized by appropriately using a storage medium such as a DVD-R.
  • Example is not limited as it is, In an implementation stage, a component can be deform
  • Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Provided is a content recommendation device capable of presenting content to meet the preference of a user from among large amounts of VoD content. The content recommendation device increases, in a pseudo manner, VoD content which is conceivable that a user wants to view on the basis of viewed VoD content, and selects a case effective for the recommendation of the VoD content from the viewing histories of broadcast content using the increased VoD content, thereby performing the learning of an effective recommendation model.

Description

コンテンツ推薦装置Content recommendation device
 本発明は、ビデオオンデマンドサービスにおいてユーザに映像コンテンツを推薦するコンテンツ推薦装置に関する。 The present invention relates to a content recommendation device for recommending video content to a user in a video on demand service.
 コンテンツの電子化とインターネット環境の普及に伴い、インターネット上で利用できる映像は飛躍的に増加している。インターネット上の映像は視聴する内容やタイミングを自由に選択することができるため、ユーザは自分の興味・嗜好に応じたコンテンツを楽しむことが可能である。例えば、一般的なVideo On Demand(以下、VoDと称する)サービスでは、映画やスポーツなど質の高いコンテンツが大量に配信されており、ユーザは自分の状況に応じてコンテンツを視聴することができる。一方で、映像コンテンツが増加するにつれて、ユーザは膨大な量の映像の中から自分の見たい映像を選択する必要があり、本当に見たい映像を見つけることがますます困難になっている。 With the digitization of content and the spread of the Internet environment, the video that can be used on the Internet has increased dramatically. Since the content and timing of viewing the video on the Internet can be freely selected, the user can enjoy content according to his / her interests and preferences. For example, in a general Video On Demand (hereinafter referred to as VoD) service, high-quality contents such as movies and sports are distributed in large quantities, and the user can view the contents according to his / her situation. On the other hand, as the video content increases, it is necessary for the user to select a video he / she wants to see from a huge amount of videos, and finding a video he / she really wants to see becomes increasingly difficult.
 この問題に対して映像配信サイトは、ジャンル情報などによる検索システムの拡充、人気のある映像の提示といった方法で、情報提供を図っている。このような方法を用いることで、ユーザが見たい映像が明確である場合や、人気といった一定の基準で映像を探したい場合に、目的の映像を早く見つけ出すことが可能となってきた。また、よりユーザの個性に合わせた映像の提示方法として、放送番組推薦システムの開発も進んでいる。放送番組推薦システムは、放送の多チャンネル化に伴い、増加する放送番組の中からユーザの嗜好に合う番組を推薦するシステムである。このような推薦システムでは、番組のジャンル、出演者などのメタデータ、およびユーザの視聴履歴などを用いて番組を推薦することにより、ユーザの嗜好に合った放送番組の提示が期待されている。 In response to this problem, video distribution sites provide information by expanding the search system based on genre information and presenting popular videos. By using such a method, it has become possible to quickly find a target video when the video that the user wants to see is clear or when the user wants to search for video based on a certain standard such as popularity. Also, development of a broadcast program recommendation system is progressing as a method for presenting video that is more tailored to the user's personality. The broadcast program recommendation system is a system that recommends a program that matches a user's preference from among an increasing number of broadcast programs as the number of broadcast channels increases. In such a recommendation system, it is expected to present a broadcast program that suits the user's preference by recommending the program using the program genre, metadata such as performers, and the user's viewing history.
 しかしながら、従来の検索システムでは、ユーザの視聴したいコンテンツが明確である場合は目的の映像を提示することが可能であるものの、視聴したいコンテンツが明確でない場合は好みに合った映像を提示することが難しい。人気のある映像を提示するシステムでは、一般的に視聴されているコンテンツを提示することができるが、ユーザの興味はそれぞれ異なるため、必ずしも嗜好に合ったコンテンツを提示できるとは限らない。例えば、視聴者の番組視聴時における嗜好度を適切に反映し、かつ番組内容の変化に伴って変化する嗜好度を反映して視聴者に提示するものとして特開2006-127145号公報がある(特許文献1)。 However, in the conventional search system, the target video can be presented when the content that the user wants to view is clear, but when the content that the user wants to view is not clear, the video that suits the taste can be presented. difficult. In a system that presents a popular video, it is possible to present content that is generally viewed, but since the user's interests are different, it is not always possible to present content that suits the taste. For example, Japanese Patent Application Laid-Open No. 2006-127145 discloses that a viewer's preference at the time of viewing a program is appropriately reflected and presented to the viewer by reflecting a preference that changes with changes in program content ( Patent Document 1).
 このような場合、ユーザに合ったコンテンツを提示するためには、従来の放送番組推薦技術を活用することが有効であると考えられる。しかしながら、一般的なVoDサービスを考えた場合、ほとんどのコンテンツの視聴は有料となっており、ユーザは放送番組に比べてVoDコンテンツを視聴する機会は少ない。そのため、放送番組に比べてVoDコンテンツの視聴履歴は、非常に少なくなってしまう。また、コンテンツ数が大量であるため、全てのコンテンツに目を通して視聴の希望の有無を判断する可能性は低く、視聴したコンテンツに比べて視聴していないコンテンツが非常に多いというインバランス問題も起きてしまう。よって、特にサービス導入の初期段階にVoDコンテンツの視聴履歴を利用して推薦を行った場合、精度の高い推薦を行うことが難しい。 In such a case, it is considered effective to utilize conventional broadcast program recommendation technology in order to present content suitable for the user. However, when considering a general VoD service, most content is paid for viewing, and the user has fewer opportunities to view VoD content than broadcast programs. For this reason, the viewing history of VoD content is very small compared to a broadcast program. In addition, since there is a large amount of content, it is unlikely that all content will be viewed or not, and there is an imbalance problem that there are a lot of unviewed content compared to the viewed content. End up. Therefore, it is difficult to make highly accurate recommendations, particularly when recommendations are made using the viewing history of VoD content in the initial stage of service introduction.
 大量の視聴履歴を用いて推薦を行う場合、VoDの視聴履歴を補う方法として、比較的大量に得られる放送番組の視聴履歴を用いる方法が考えられる。しかし、例えば放送番組ではバラエティを中心に視聴するが、VoDコンテンツでは映画を中心に視聴するといったように、一般的に放送番組とVoDコンテンツにおいて視聴スタイルが異なる人が多い。よって、放送番組の視聴履歴を用いてVoDコンテンツ推薦を行った場合には、放送番組に対するユーザの嗜好を反映した推薦を行ってしまい、適切なVoDコンテンツを推薦できない。 When a recommendation is made using a large amount of viewing history, a method of using a relatively large amount of viewing history of a broadcast program can be considered as a method for supplementing the viewing history of VoD. However, there are many people who generally have different viewing styles for broadcast programs and VoD content, such as watching mainly for variety in broadcast programs, but mainly for movies in VoD content. Therefore, when the VoD content recommendation is performed using the viewing history of the broadcast program, the recommendation reflecting the user's preference for the broadcast program is performed, and an appropriate VoD content cannot be recommended.
特開2006-127145号公報JP 2006-127145 A
 本発明は、上記した問題点を解決するためになされたもので、大量のVoDコンテンツの中からユーザの嗜好に合ったコンテンツを提示することが可能なるコンテンツ推薦装置を提供することを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a content recommendation device capable of presenting content that matches a user's preference from a large amount of VoD content. .
 上記目的を達成するために、本発明のコンテンツ推薦装置は、VoD学習データの負例中から正例と類似したデータを正例として補完するVoD正例補完部と、補完されたVoD学習データを用いて各特徴量の特徴量重要度を算出し、特徴量重要度を用いて放送学習データからVoDコンテンツ推薦に適切なデータを選択する選択処理部と、補完されたVoD学習データと選択された放送学習データと算出された特徴量重要度を用いて推薦モデルを学習する推薦モデル学習部と、VoD対象コンテンツについて学習された推薦モデルにより推薦の判定を行う判定部と、推薦の判定結果をユーザに提示する制御を行う出力部を有することを特徴とする。 In order to achieve the above object, the content recommendation device of the present invention includes a VoD positive example complementing unit that complements data similar to a positive example from negative examples of VoD learning data, and supplemented VoD learning data. The feature amount importance of each feature amount is calculated using the feature amount importance, and a selection processing unit that selects data suitable for VoD content recommendation from the broadcast learning data, and the complemented VoD learning data are selected. A recommendation model learning unit that learns a recommendation model using broadcast learning data and the calculated feature value importance, a determination unit that determines a recommendation based on a recommendation model learned for a VoD target content, and a recommendation determination result It has an output part which performs control presented to.
 本発明によれば、大量のVoDコンテンツの中からユーザの嗜好に合ったコンテンツを提示することが可能になる。 According to the present invention, it is possible to present content that suits the user's preference from among a large amount of VoD content.
本発明の実施例に係るコンテンツ推薦装置の全体構成例を示すブロック図。The block diagram which shows the example of whole structure of the content recommendation apparatus based on the Example of this invention. 実施例に係るVoDコンテンツに付与されたメタデータの一例を示す図。The figure which shows an example of the metadata provided to the VoD content which concerns on an Example. 図2のメタデータから生成されるVoDメタデータの一例を示す図。The figure which shows an example of the VoD metadata produced | generated from the metadata of FIG. 実施例に係るVoD視聴データの一例を示す図。The figure which shows an example of the VoD viewing-and-listening data which concern on an Example. 実施例に係るVoD学習データの一例を示す図。The figure which shows an example of the VoD learning data based on an Example. 実施例に係るVoD正例補完部における処理の具体例を示すフローチャート。The flowchart which shows the specific example of the process in the VoD positive example complementation part which concerns on an Example. 実施例に係る4つのコンテンツを用いてVoD学習データの「負例」を「正例」に反転する処理の一例を示す図。The figure which shows an example of the process which reverses the "negative example" of VoD learning data to the "positive example" using four content which concerns on an Example. 実施例に係る選択処理部における処理の具体例を示すフローチャート。The flowchart which shows the specific example of the process in the selection process part which concerns on an Example. 実施例に係る出演者の特徴量に付与される重要度の一例を示す図。The figure which shows an example of the importance provided to the feature-value of the performer which concerns on an Example. 実施例に係るジャンルの特徴量に付与される重要度の一例を示す図。The figure which shows an example of the importance provided to the feature-value of the genre which concerns on an Example. 実施例に係るキーワードの特徴量に付与される重要度の一例を示す図。The figure which shows an example of the importance provided to the feature-value of the keyword which concerns on an Example. 実施例に係る放送学習データの正例の一例を示す図。The figure which shows an example of the positive example of the broadcast learning data which concerns on an Example. 実施例に係る推薦モデル学習用ベイジアンネットの一例を示す図。The figure which shows an example of the Bayesian network for recommendation model learning which concerns on an Example. 実施例に係る推薦モデル学習部における処理の具体例を示すフローチャート。The flowchart which shows the specific example of the process in the recommendation model learning part which concerns on an Example. 図7の学習データの出演者特徴量およびキーワード特徴量を2値に変換した例である。It is the example which converted the performer feature-value and keyword feature-value of the learning data of FIG. 7 into binary. 実施形態に係る判定部における処理の具体例を示すフローチャート。The flowchart which shows the specific example of the process in the determination part which concerns on embodiment. 実施形態に係るVoD対象データの一例を示す図。The figure which shows an example of the VoD object data which concern on embodiment. 実施形態に係る出力部による出力の一例を示す図。The figure which shows an example of the output by the output part which concerns on embodiment.
 以下、本発明の実施例について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
 図1は、本発明の実施例に係るコンテンツ推薦装置の全体構成例を示すブロック図である。本実施例に係るコンテンツ推薦装置1は、VoD学習データ格納部11、放送学習データ格納部12、VoD正例補完部13、選択処理部14、推薦モデル学習部15、VoD対象データ格納部16、判定部17、出力部18を有する。 FIG. 1 is a block diagram showing an example of the overall configuration of a content recommendation device according to an embodiment of the present invention. The content recommendation device 1 according to the present embodiment includes a VoD learning data storage unit 11, a broadcast learning data storage unit 12, a VoD correct example complementing unit 13, a selection processing unit 14, a recommended model learning unit 15, a VoD target data storage unit 16, A determination unit 17 and an output unit 18 are included.
 以下、コンテンツ推薦装置1における各部の具体的な構成と動作について説明する。 Hereinafter, a specific configuration and operation of each unit in the content recommendation device 1 will be described.
 VoD学習データ格納部11はVoD学習データを記憶するための記憶装置である。VoD学習データはVoDメタデータ及びVoD視聴データを含む。VoDメタデータは、コンテンツに付与されたメタデータから得られるデータである。 The VoD learning data storage unit 11 is a storage device for storing VoD learning data. The VoD learning data includes VoD metadata and VoD viewing data. VoD metadata is data obtained from metadata attached to content.
 図2は、VoDコンテンツに付与されたメタデータの例を示す。メタデータは、「コンテンツID、ジャンル、タイトル、出演者、時間、概要」などから構成されている。「概要」は、テキストデータで表現されている。本実施例では、VoDメタデータに付与されるキーワードは、「概要」から抽出される。キーワードは、全コンテンツの「概要」中のテキストデータから頻度やTFIDF値が高い単語である。なお、単語の頻度やTFIDF値は、従来の形態素解析の技術を用いることで容易に算出が可能である。 FIG. 2 shows an example of metadata assigned to the VoD content. The metadata includes “content ID, genre, title, performer, time, summary” and the like. The “summary” is expressed as text data. In the present embodiment, the keywords assigned to the VoD metadata are extracted from the “summary”. The keyword is a word having a high frequency and a high TFIDF value from the text data in the “summary” of all contents. The word frequency and the TFIDF value can be easily calculated by using a conventional morphological analysis technique.
 図3は、図2のメタデータから生成されるVoDメタデータの一例である。上述の処理により、「概要」の情報の代わりに、「概要」から抽出したキーワードが付与されている。 FIG. 3 is an example of VoD metadata generated from the metadata of FIG. Through the above-described processing, the keyword extracted from the “summary” is given instead of the “summary” information.
 本実施例に用いられるVoD視聴データとは、ユーザのVoDコンテンツの視聴に関する情報である。ここでは、全てのVoDコンテンツに対して視聴に関する情報が機器から得られているものとする。 The VoD viewing data used in the present embodiment is information regarding viewing of the user's VoD content. Here, it is assumed that information related to viewing is obtained from the device for all VoD contents.
 図4は、VoD視聴データの一例を示す。VoD視聴データのコンテンツID41は、各VoDコンテンツに付けられた個別のIDである。VoD視聴データの視聴情報42は、視聴に関する情報を表している。例えば、図2のコンテンツID-1に対応する「バック・トゥ・ザ・○○○○○○」をユーザが視聴していたとすると、視聴情報42は「視聴」となる。一方、コンテンツID-2,ID-3に対してユーザが未視聴である場合は、視聴情報42は「未視聴」となる。機器によっては、ユーザが将来見たいVoDコンテンツに対して「お気に入り」の情報を付与することが可能である。例えば、コンテンツID-4に対して、ユーザが将来視聴したいという情報を付与した場合、視聴情報42は「お気に入り」となる。 FIG. 4 shows an example of VoD viewing data. The content ID 41 of the VoD viewing data is an individual ID assigned to each VoD content. The viewing information 42 of VoD viewing data represents viewing-related information. For example, if the user is viewing “back to the xxx” corresponding to the content ID-1 in FIG. 2, the viewing information 42 is “viewing”. On the other hand, when the user has not viewed the contents ID-2 and ID-3, the viewing information 42 is “unviewed”. Depending on the device, it is possible to give “favorite” information to the VoD content that the user wants to see in the future. For example, when information that the user wants to view in the future is given to the content ID-4, the viewing information 42 becomes “favorite”.
 図5は、VoD学習データの例を示す。VoD学習データは、各コンテンツのVoDメタデータにVoD視聴データを付与したものとなっている。本実施例における推薦モデル学習では、教師信号として図4のVoD視聴データを基に「正例」と「負例」を生成することで学習を行う。ここでは、「正例」はユーザが視聴を希望したもの、「負例」は視聴を希望しなかったものとする。従って、VoD学習データの「視聴した」,「お気に入り」を「正例」、「未視聴」を「負例」とする。 FIG. 5 shows an example of VoD learning data. The VoD learning data is obtained by adding VoD viewing data to the VoD metadata of each content. In the recommended model learning in the present embodiment, learning is performed by generating “positive example” and “negative example” based on the VoD viewing data of FIG. 4 as a teacher signal. Here, it is assumed that the “positive example” is what the user desires to watch, and the “negative example” does not want to watch. Therefore, “viewed” and “favorite” of the VoD learning data are set as “positive examples”, and “unviewed” are set as “negative examples”.
 また、本実施例では、VoDメタデータの出演者、キーワード、ジャンルの各値をそのコンテンツを表す「特徴量」とする。図5の例では、コンテンツID-1の特徴量は、「ロバート・〇△〇△」,「マイケル・J・〇△□〇□」,「タイム」,「マシン」,「過去」,「映画-洋画」である。VoDメタデータの[ロバート・〇△〇△, マイケル・J・〇△□〇□]のような出演者の集合を、コンテンツID-1の出演者特徴量と呼ぶ。また、[タイム,マシン,過去]のようなキーワードの集合を、コンテンツID-1のキーワード特徴量と呼ぶ。さらに、[映画-洋画]のようなジャンルの集合を、コンテンツID-1のジャンル特徴量と呼ぶ。 Further, in this embodiment, each value of the performer, keyword, and genre of VoD metadata is set as a “feature amount” representing the content. In the example of FIG. 5, the feature amount of the content ID-1 is “Robert 〇 △ 〇 △”, “Michael J. 〇 △ □ ○ □”, “Time”, “Machine”, “Past”, “Movie” -Western film. A set of performers such as [Robert 〇 △ 〇 △, Michael J. ○ △ □ ○ □] in the VoD metadata is called a performer feature amount of the content ID-1. A set of keywords such as [time, machine, past] is called a keyword feature amount of content ID-1. Furthermore, a set of genres such as [movie-foreign film] is called a genre feature amount of content ID-1.
 次に、放送学習データ格納部12は放送学習データを記憶するための記憶装置である。放送学習データは放送メタデータ及び放送視聴データを含む。放送学習データは、VoD学習データと同様の形式で記憶されている。 Next, the broadcast learning data storage unit 12 is a storage device for storing broadcast learning data. Broadcast learning data includes broadcast metadata and broadcast viewing data. Broadcast learning data is stored in the same format as VoD learning data.
 VoDコンテンツと放送コンテンツからは、タイトル、ジャンル、出演者、概要などの情報が得られるため、VoD学習データと同様のデータを生成することが可能である。放送メタデータについては、例えばVoDコンテンツや放送コンテンツとの間でジャンルの体系や表現が異なる場合が考えられるが、放送コンテンツのジャンルをVoDコンテンツのジャンルに対応付けることで解決できる。また、VoDメタデータで用いられたキーワードと同じ単語を放送メタデータでも用いるとする。以上の処理により、放送コンテンツに関してもVoDメタデータと同様の形式の放送メタデータを得ることが可能である。 Since information such as title, genre, performer, and outline can be obtained from the VoD content and the broadcast content, it is possible to generate data similar to the VoD learning data. With regard to broadcast metadata, for example, there may be cases where the genre system and expression differ between VoD content and broadcast content, but this can be solved by associating the genre of broadcast content with the genre of VoD content. Further, it is assumed that the same word as the keyword used in the VoD metadata is also used in the broadcast metadata. Through the above processing, broadcast metadata in the same format as VoD metadata can be obtained for broadcast content.
 放送視聴データは、ユーザの放送コンテンツの視聴に関する情報である。ここでは、全てのチャンネルの全ての放送コンテンツに対して、ユーザが視聴した情報がテレビ受像機から直接的又は間接的に得られているものとする。例えば、ある放送番組の全体の20%以上を視聴した放送コンテンツは「視聴」、それ以外のコンテンツは「未視聴」とすることで、VoD視聴データと同様の形式でデータを得ることが可能である。 Broadcast viewing data is information related to viewing of the broadcast content of the user. Here, it is assumed that information viewed by the user is obtained directly or indirectly from the television receiver for all broadcast contents of all channels. For example, it is possible to obtain data in the same format as the VoD viewing data by setting “viewing” for broadcast content that has viewed 20% or more of a whole broadcast program and “not viewing” for other content. is there.
 放送メタデータと放送視聴データを用いることで、VoD学習データと同じ形式の放送学習データを得ることができる。なお「正例」,「負例」の定義および特徴量の定義も、VoD学習データと同様である。 Broadcast learning data in the same format as VoD learning data can be obtained by using broadcast metadata and broadcast viewing data. The definitions of “positive example” and “negative example” and the definition of the feature amount are the same as those of the VoD learning data.
 VoD正例補完部13は、類似タイトル選択処理、類似メタデータ選択処理、及び、正例反転処理を含む正例補完処理を行う。類似タイトル選択処理は、VoD学習データ格納部11から取得したVoD学習データの「負例」の中から、「正例」のタイトルと類似する「負例」を選択する処理である。類似メタデータ選択処理は、選択された「負例」の中から出演者やキーワードが重複する「負例」を選択する処理である。正例反転処理は、選択された「負例」を「正例」に変換する処理である。プロセッサで正例補完処理のためのプログラムが実行されることにより、VoD正例補完部13は実現される。 The VoD correct example complementing unit 13 performs a correct example complement process including a similar title selection process, a similar metadata selection process, and a correct example inversion process. The similar title selection process is a process of selecting a “negative example” similar to the “positive example” title from the “negative examples” of the VoD learning data acquired from the VoD learning data storage unit 11. The similar metadata selection process is a process of selecting a “negative example” in which performers and keywords overlap from the selected “negative example”. The positive example inversion process is a process of converting the selected “negative example” into the “positive example”. The VoD positive example complementing unit 13 is realized by executing a program for the positive example complementing process by the processor.
 図6は、VoD正例補完部13の処理動作の具体例を示すフローチャートである。まず、VoD正例補完部13は、VoD学習データの「正例」の中で選択されていない「正例」があるか否かを判断する(ステップS131)。VoD学習データに選択されていない「正例」が残っている場合、未だ選択されていない「正例」を選択する(ステップS132)。一方、全てのVoD学習データの「正例」が選択済みであれば、処理を終了する(ステップS131のNo)。 FIG. 6 is a flowchart showing a specific example of the processing operation of the VoD correct example complementing unit 13. First, the VoD correct example complementing unit 13 determines whether there is a “correct example” that is not selected in the “correct example” of the VoD learning data (step S131). If the “positive example” that has not been selected remains in the VoD learning data, the “positive example” that has not yet been selected is selected (step S132). On the other hand, if the “positive example” of all VoD learning data has been selected, the process ends (No in step S131).
 次に、VoD正例補完部13は、VoD学習データの「負例」の中で選択されていない「負例」があるか否かを判断する(ステップS133)。VoD学習データに選択されていない「負例」が残っている場合、未だ選択されていない「負例」を選択する(ステップS134)。一方、全てのVoD学習データの「負例」が選択済みであれば、ステップS131に戻る。 Next, the VoD positive example complementing unit 13 determines whether there is a “negative example” that is not selected in the “negative example” of the VoD learning data (step S133). If the “negative example” that has not been selected remains in the VoD learning data, the “negative example” that has not yet been selected is selected (step S134). On the other hand, if “negative example” of all VoD learning data has been selected, the process returns to step S131.
 そして、VoD正例補完部13は、選択されたVoD学習データの「負例」と、選択されたVoD学習データの「正例」のタイトルの類似度を表すタイトルスコアを算出する(ステップS135)。そして、算出されたタイトルスコアがユーザの設定した閾値以上であるか否かを判断し(ステップS136)、閾値以上であればステップS137に進み、閾値未満であればステップS133に戻る。 Then, the VoD positive example complementing unit 13 calculates a title score representing the similarity between the “negative example” of the selected VoD learning data and the “positive example” title of the selected VoD learning data (step S135). . Then, it is determined whether or not the calculated title score is equal to or greater than a threshold set by the user (step S136). If the calculated title score is equal to or greater than the threshold, the process proceeds to step S137, and if less than the threshold, the process returns to step S133.
 同様にして、VoD正例補完部13は、選択されたVoD学習データの「負例」と、選択されたVoD学習データの「正例」の出演者やキーワードなどの共通度を表すメタデータスコアを算出する(ステップS137)。そして、算出されたメタデータスコアがユーザの設定した閾値以上であるか否かを判断し(ステップS138)、閾値以上であればステップS139に進み、閾値未満であればステップS133に戻る。VoD正例補完部13は、タイトルスコアおよびメタデータスコアとも閾値以上であったVoD学習データの「負例」を「正例」のラベルに反転させ、その内容を記憶する(ステップS139)。 Similarly, the VoD positive example complementing unit 13 generates a metadata score representing the degree of commonality between the “negative example” of the selected VoD learning data and the performers and keywords of the “positive example” of the selected VoD learning data. Is calculated (step S137). Then, it is determined whether or not the calculated metadata score is equal to or greater than a threshold set by the user (step S138). If it is equal to or greater than the threshold, the process proceeds to step S139, and if it is less than the threshold, the process returns to step S133. The VoD correct example complementing unit 13 inverts the “negative example” of the VoD learning data whose title score and metadata score are both equal to or greater than the threshold value to the label “positive example”, and stores the contents (step S139).
 以上の処理により、VoD正例補完部13は、VoD学習データの「負例」の中から、実際には「正例」として扱うべきコンテンツを抽出することが可能となる。次に、図7の4つのコンテンツを用いてVoD正例補完部13の具体的な動作を説明する。
Figure JPOXMLDOC01-appb-M000001
With the above processing, the VoD positive example complementing unit 13 can extract the content that should actually be handled as the “positive example” from the “negative example” of the VoD learning data. Next, a specific operation of the VoD correct example complementing unit 13 will be described using the four contents of FIG.
Figure JPOXMLDOC01-appb-M000001
式(1)は、図6のステップS135の処理で実行されるタイトルスコアの計算例である。式(1)のsub_N(title)は、title内の任意のN文字の部分文字列を表している。なお、式(1)と、それをテキストで表記したものとでは、一部の表現が異なっていることを理解されたい。他の式の説明でも同様である。 Formula (1) is a title score calculation example executed in the process of step S135 of FIG. The sub_N (title) in the expression (1) represents an arbitrary N character partial character string in the title. It should be understood that a part of the expression is different between the expression (1) and the expression of the expression (1). The same applies to the explanation of other equations.
 本実施例では、タイトルスコアは選択された「正例」のタイトルtitle_pと、選択された「負例」のタイトルtitle_nとを用いて算出する。式(1)では、特にtitle_pの任意のN文字の部分文字列がtitle_nに含まれる場合に値は“1”となり、それ以外の場合に“0”となるようなタイトルスコアを用いている。本実施例では、タイトルが4文字よりも長い場合はN=4、それ以外の場合はN=2とする。 In this embodiment, the title score is calculated using the selected title “title_p” of “positive example” and the selected title “title_n” of “negative example”. In the expression (1), a title score that is “1” is used when an arbitrary N-character partial character string of title_p is included in title_n, and “0” is used otherwise. In this embodiment, N = 4 if the title is longer than 4 characters, and N = 2 otherwise.
 例えば「正例」として、図7のコンテンツID-1、「負例」としてコンテンツID-2のコンテンツが選択されているとする。このとき、「正例」のタイトルのN=4の部分文字列の1つ「バック・」は、コンテンツID-2のタイトルの先頭に含まれているため、タイトルスコアの値は“1”となる。他の例では、「正例」としてコンテンツID-1、「負例」としてコンテンツID-4のコンテンツが選択されたとする。この例では、「正例」のタイトルのN=4の部分文字列はいずれも「負例」のタイトルに含まれないため、タイトルスコアの値は“0”となる。図6のステップS136の処理で、閾値を“1”とすると、選択された「正例」がコンテンツID-1で、「負例」がコンテンツID-2であった場合、コンテンツID-2は「正例」候補として抽出されステップS137に進む。しなしながら、選択された「正例」がコンテンツID-1で、「負例」がコンテンツID-4であった場合、コンテンツID-4は「正例」候補とはならず、ステップS133に戻る。
Figure JPOXMLDOC01-appb-M000002
For example, it is assumed that the content ID-1 in FIG. 7 is selected as the “positive example” and the content ID-2 is selected as the “negative example”. At this time, one of the N = 4 partial character strings “back •” of the title of “correct example” is included at the beginning of the title of content ID-2, and therefore the title score value is “1”. Become. In another example, it is assumed that content ID-1 is selected as “positive example” and content ID-4 is selected as “negative example”. In this example, since the N = 4 partial character string of the title “positive example” is not included in the title “negative example”, the value of the title score is “0”. In the process of step S136 of FIG. 6, if the threshold value is “1”, if the selected “positive example” is content ID-1 and “negative example” is content ID-2, the content ID-2 is It is extracted as a “correct example” candidate and proceeds to step S137. However, if the selected “positive example” is the content ID-1 and the “negative example” is the content ID-4, the content ID-4 is not a “positive example” candidate, and the process goes to step S133. Return.
Figure JPOXMLDOC01-appb-M000002
式(2)は、図6のステップS137の処理で実行されるメタデータスコアの計算例である。式(2)の[guest]_{p,n}は、「正例」と「負例」のそれぞれの出演者の集合を現している。 Formula (2) is a calculation example of the metadata score executed in the process of step S137 in FIG. [Guest] _ {p, n} in Expression (2) represents a set of performers of “positive example” and “negative example”.
 本実施例では、メタデータスコアは、選択された「正例」と「負例」の出演者や、キーワードの重複の度合いにより値を決定する。式(2)では、特に選択された「正例」と「負例」の出演者の重複数がkよりも大きい場合に値が“1”、それ以外の場合に“0”となるようなメタデータスコアを用いている。本実施例ではk=1とする。 In this embodiment, the metadata score is determined according to the performers of the selected “positive example” and “negative example” and the degree of overlapping of keywords. In Formula (2), the value is “1” when the overlap number of the performers of “positive example” and “negative example” selected is greater than k, and “0” otherwise. Metadata score is used. In this embodiment, k = 1.
 ここで、「正例」として図7のコンテンツID-1、「負例」としてコンテンツID-2が選択されたとする。このとき、上述した通りステップS136におけるタイトルスコアは“1”であるため、ステップS137の処理において両者のメタデータスコアが算出される。この場合、コンテンツID-1とID-2では出演者の2名とも共通しているため、|[guest_p]∩[guest_n|=2となり、メタデータスコアは“1”となる。 Here, it is assumed that the content ID-1 in FIG. 7 is selected as the “positive example” and the content ID-2 is selected as the “negative example”. At this time, since the title score in step S136 is “1” as described above, both metadata scores are calculated in the process in step S137. In this case, since the content ID-1 and ID-2 are common to the two performers, | [guest_p] ∩ [guest_n | = 2], and the metadata score is “1”.
 一方、「正例」としてコンテンツID-3、「負例」としてコンテンツID-4が選択されたとする。この場合、ステップS136でのタイトルスコアは“1”となるが、ステップS137ではコンテンツID-1とID-2では同じ出演者でないことから|[guest_p]∩[guest_n|=0であるため、メタデータスコアは“0”となる。 On the other hand, it is assumed that the content ID-3 is selected as the “positive example” and the content ID-4 is selected as the “negative example”. In this case, the title score in step S136 is “1”. However, in step S137, content ID-1 and ID-2 are not the same performers, so | [guest_p] ∩ [guest_n | = 0. The data score is “0”.
 次のステップS138の処理では、閾値を“1”とすると、選択された「正例」がコンテンツID-1、「負例」がコンテンツID-2である場合、コンテンツID-2はステップS139の処理において擬似「正例」のラベルを付与される。その結果、コンテンツID-2は、以降の全ての処理で擬似「正例」として扱われる。しかしながら、選択された「正例」がコンテンツID-3、「負例」がコンテンツID4である場合、コンテンツID-4は全ての処理でそのまま「負例」として扱われる。 In the processing of the next step S138, if the threshold is set to “1”, if the selected “positive example” is the content ID-1 and “negative example” is the content ID-2, the content ID-2 is the content of step S139. A pseudo “positive example” label is given in the process. As a result, the content ID-2 is treated as a pseudo “positive example” in all subsequent processes. However, if the selected “positive example” is the content ID-3 and the “negative example” is the content ID4, the content ID-4 is treated as a “negative example” as it is in all the processes.
 VoD正例補完処理13の効果として、実際には正例として扱うべき「負例」を「正例」に反転することで精度の高い学習を行うことができ、同時に少量の「正例」の数を増加させることが可能であるとなる。 As an effect of the VoD positive example complementing process 13, it is possible to perform highly accurate learning by inverting “negative example” that should be handled as a positive example to “positive example”, and at the same time, a small amount of “positive example”. The number can be increased.
 前述のようにVoDコンテンツは、映画やドラマのシリーズやスポーツの大会の映像などのコンテンツを全て含んでいる場合が多い。このとき、従来通り視聴したコンテンツを「正例」、未視聴のコンテンツを「負例」と扱うと、例えばある全10話のシリーズの1話目だけを見た状態では、残りの9話は全て「負例」として扱われてしまい、精度の良い推薦が行えないと考えられる。そのため、本実施例では、タイトルスコアを用いて同シリーズなどを表すタイトルの類似したコンテンツを擬似「正例」候補として処理している。一方で、例えばコンテンツID-3とID-4のように、タイトルは類似しているものの実際には全く異なるコンテンツも多く含まれている。そのため、更にメタデータスコアを用いて、擬似「正例」候補が擬似「正例」として扱うべきコンテンツであるか否かを判断している。 As mentioned above, VoD content often includes all content such as movies and drama series and sports tournament videos. At this time, if the content viewed as before is treated as a “positive example” and the unviewed content is treated as a “negative example”, for example, when only the first episode of a series of all 10 episodes is viewed, All are treated as “negative examples”, and it is considered that recommendations with high accuracy cannot be made. For this reason, in the present embodiment, content similar in title representing the same series or the like using the title score is processed as a pseudo “positive example” candidate. On the other hand, for example, content ID-3 and ID-4 are similar in content but contain a lot of completely different content. Therefore, the metadata score is further used to determine whether or not the pseudo “right example” candidate is content that should be treated as a “pseudo“ right example ”.
 本実施例では、コンテンツID-1とID-2のように出演者が重複する場合には、メタデータスコアの値が高くなり擬似「正例」として扱うことになる。これに対し、コンテンツID-3とID-4のように出演者が重複しない場合には、シリーズなどの関係性はないとして、擬似「正例」としては扱わない。そのため、VoD処理補完部13により、実際には正例として扱うべき「負例」を「正例」として扱うことができ、精度の高い推薦モデルが学習できることが期待される。 In the present embodiment, when the performers overlap like content ID-1 and ID-2, the value of the metadata score becomes high and is treated as a pseudo “positive example”. On the other hand, if the performers do not overlap like content ID-3 and ID-4, it is not treated as a pseudo “positive example” because there is no relationship such as series. Therefore, it is expected that the “negative example” that should actually be handled as a positive example can be handled as a “positive example” by the VoD processing complementing unit 13, and a highly recommended recommendation model can be learned.
 また、VoDコンテンツは、有料サービスにより配信されることが多いため「正例」が少数であることが考えられる。本実施例では、例えばシリーズの残りの9話を「正例」として扱うことで、より多くの「正例」を得ることが可能となる。 Also, since VoD content is often distributed by paid services, it is considered that there are a small number of “positive examples”. In the present embodiment, for example, by treating the remaining nine episodes of the series as “positive examples”, it becomes possible to obtain more “positive examples”.
 選択処理部14は、特徴量重要度算出処理および事例選択処理を含む選択処理を行う。特徴量重要度算出処理は、VoD正例補完部13から取得したVoD学習データを用いて各特徴量の重要度を算出する処理である。事例選択処理は、算出された特徴量重要度を用いて放送学習データの信頼度を算出し、その値が高い放送学習データを学習データとして保持する処理である。プロセッサで選択処理のためのプログラムが実行されることにより、選択処理部14は実現される。 The selection processing unit 14 performs selection processing including feature amount importance calculation processing and case selection processing. The feature amount importance calculation processing is processing for calculating the importance of each feature amount using the VoD learning data acquired from the VoD example complementing unit 13. The case selection process is a process of calculating the reliability of broadcast learning data using the calculated feature value importance and holding the broadcast learning data having a high value as learning data. The selection processing unit 14 is realized by executing a program for selection processing by the processor.
 図8は、選択処理部14の処理動作の具体例を示すフローチャートである。まず、選択処理部14は、VoD学習データの特徴量の中で選択されていない特徴量があるか否かを判断する(ステップS141)。選択処理部14は、VoD学習データの特徴量の中から未だ選択されていない特徴量があれば、それを選択する(ステップS142)。全ての特徴量が選択済みであれば、ステップS145に進む。 FIG. 8 is a flowchart showing a specific example of the processing operation of the selection processing unit 14. First, the selection processing unit 14 determines whether there is a feature amount that is not selected among the feature amounts of the VoD learning data (step S141). If there is a feature amount that has not yet been selected from the feature amounts of the VoD learning data, the selection processing unit 14 selects it (step S142). If all the feature values have been selected, the process proceeds to step S145.
 次に、選択処理部14は選択された特徴量の重要度を算出する(ステップS143)。選択処理部14は、算出した重要度を特徴量に付与して(ステップS144)、ステップS141に戻る。VoD学習データの特徴量の中に選択されていない特徴量がなくなるまで、ステップS142~S144の処理が繰返し実行される。 Next, the selection processing unit 14 calculates the importance of the selected feature amount (step S143). The selection processing unit 14 gives the calculated importance to the feature amount (step S144), and returns to step S141. The processes in steps S142 to S144 are repeatedly executed until there is no unselected feature quantity among the feature quantities of the VoD learning data.
 全ての特徴量が選択済みとなると、選択処理部14は放送学習データの「正例」の中に選択されていない「正例」があるか否かを判断する(ステップS145)。選択処理部14は、放送学習データの「正例」の中から未だ選択されていない「正例」があれば、それを選択する(ステップS146)。全ての放送学習データが選択済みであれば、処理を終了する。選択処理部14は、選択された放送学習データの信頼度を算出する(ステップS147)。選択処理部14は、算出された信頼度が閾値以上であるか否かを判断し(ステップS148)、閾値未満であればステップS145に戻る。選択処理部14は、信頼度が閾値以上である場合、算出した信頼度を放送学習データに付与して(ステップS149)、ステップS145に戻る。放送学習データに選択されていない「正例」がなくなるまで、ステップS146~S149の処理が繰返し実行される。 When all the feature values have been selected, the selection processing unit 14 determines whether or not there is a “correct example” not selected in the “correct example” of the broadcast learning data (step S145). If there is a “correct example” that has not yet been selected from the “correct example” of the broadcast learning data, the selection processing unit 14 selects it (step S146). If all broadcast learning data has been selected, the process ends. The selection processing unit 14 calculates the reliability of the selected broadcast learning data (step S147). The selection processing unit 14 determines whether or not the calculated reliability is greater than or equal to a threshold (step S148), and if it is less than the threshold, the process returns to step S145. When the reliability is greater than or equal to the threshold, the selection processing unit 14 assigns the calculated reliability to the broadcast learning data (step S149), and returns to step S145. The processes of steps S146 to S149 are repeatedly executed until there is no “positive example” not selected in the broadcast learning data.
 以上の処理により、選択処理部14は、放送学習データの「正例」の中からVoD学習データと同様に扱うことができるデータを選択し、VoD学習データと合わせて学習データとして用いることが可能である。 Through the above processing, the selection processing unit 14 can select data that can be handled in the same way as the VoD learning data from “positive examples” of the broadcast learning data, and can use it as learning data together with the VoD learning data. It is.
 本実施例では、VoD学習データの特徴量は、出演者、キーワード、ジャンルを用い、選択処理部14はすべての特徴量に重要度を付与することができる。特徴量の重要度は、特徴量と「正例」との相関の強さを示す。すなわち、重要度が高い特徴量はユーザの嗜好を強く反映している。本実施例では、出演者特徴量、キーワード特徴量、ジャンル特徴量のそれぞれで異なる基準を用いて重要度を付与することができる。
Figure JPOXMLDOC01-appb-M000003
In the present embodiment, the feature amount of the VoD learning data uses performers, keywords, and genres, and the selection processing unit 14 can assign importance to all feature amounts. The importance of the feature amount indicates the strength of the correlation between the feature amount and the “positive example”. That is, the feature quantity with high importance strongly reflects the user's preference. In the present embodiment, importance can be given using different criteria for performer feature values, keyword feature values, and genre feature values.
Figure JPOXMLDOC01-appb-M000003
式(3)は、出演者の特徴量guest_iの重要度CRの計算例である。式(3)でP(guest_i)はguest_i出演しているVoD学習データの数、P(guest_i|viewed)はVoD学習データの中でguest_iが出演しているデータの数である。このスコアは0から1までの値をとり、guest_iが出演しているコンテンツを多く視聴しているとき高い値となる。
Figure JPOXMLDOC01-appb-M000004
Formula (3) is a calculation example of the importance CR of the guest's feature quantity guest_i. In Expression (3), P (guest_i) is the number of VoD learning data in which guest_i appears, and P (guest_i | viewed) is the number of data in which guest_i appears in the VoD learning data. This score takes a value from 0 to 1, and is a high value when a lot of content in which guest_i appears is viewed.
Figure JPOXMLDOC01-appb-M000004
式(4)は、キーワードの特徴量keyword_jの重要度GSの例である。ここでP(keyword_j|viewed)は、VoD学習データの正例中でkeyword_jが含まれるデータの数である。P(keyword_j|not viewed)は、VoD学習データの負例中でkeyword_jが含まれるデータの数である。coefは、補正係数である。この値は、Grahamスコアに基づいた値で0から1までの値をとり、keyword_jが含まれるコンテンツを多く視聴しているとき高い値となる。
Figure JPOXMLDOC01-appb-M000005
Expression (4) is an example of the importance GS of the keyword feature quantity keyword_j. Here, P (keyword_j | viewed) is the number of data including keyword_j in the positive example of VoD learning data. P (keyword_j | not viewed) is the number of data including keyword_j in the negative example of VoD learning data. coef is a correction coefficient. This value is a value from 0 to 1 based on the Graham score, and is a high value when a large amount of content including keyword_j is viewed.
Figure JPOXMLDOC01-appb-M000005
式(5)は、ジャンルの特徴量genre_lの重要度GIの計算例である。ここで[genre_viewed]は、視聴したコンテンツのジャンル集合を表している。この重要度の値は、genre_lが視聴したことのあるジャンルであれば“1”、それ以外では“0”となる。 Formula (5) is a calculation example of the importance GI of the genre feature quantity genre_l. Here, [genre_viewed] represents the genre set of the content viewed. The importance value is “1” if the genre_l has been viewed, and “0” otherwise.
 図9は、出演者の特徴量に付与された重要度CRの一例を示している。この例では、「マイケル・J・〇△□〇□」の出演するコンテンツを視聴することが多いため、重要度CRが高い。これに対し、「アントニオ・〇〇△△□」の出演するコンテンツは、あまり視聴しないため重要度CRは低くなっている。 FIG. 9 shows an example of the importance CR assigned to the feature amount of the performer. In this example, since the content in which “Michael J. ○ △ □ ○ □” appears is often viewed, the importance CR is high. On the other hand, since the content in which “Antonio 〇 △△△ □” appears is rarely viewed, the importance CR is low.
 図10は、キーワードの特徴量に付与された重要度GSの一例を示している。この例では、「未来」が含まれるコンテンツを視聴することが多いため重要度GSが高い。これに対し、「伝説」が含まれるコンテンツはあまり視聴しないため重要度GSは低くなっている。 FIG. 10 shows an example of the importance GS assigned to the keyword feature. In this example, since the content including “future” is often viewed, the importance GS is high. On the other hand, since the content including “Legend” is not viewed much, the importance GS is low.
 図11は、ジャンルの特徴量GIに付与された重要度の一例を示している。この例では、「映画-洋画」と「映画-邦画」のコンテンツは、視聴しているため重要度GIは“1”になっている。これに対し、「アニメ-国内アニメ」のコンテンツは、視聴していないため重要度GIは“0”となっている。 FIG. 11 shows an example of the degree of importance assigned to the genre feature quantity GI. In this example, since the contents of “movie-foreign” and “movie-Japanese” are viewed, the importance GI is “1”. On the other hand, since the content of “animation-domestic animation” is not viewed, the importance GI is “0”.
 選択処理部14は、以上のような方法により算出された特徴量の重要度を用いて、図8のステップS147の処理によって放送コンテンツの信頼度を計算する。本実施例では、出演者、キーワード、ジャンルに対して異なる信頼度の算出方法を設ける。
Figure JPOXMLDOC01-appb-M000006
The selection processing unit 14 calculates the reliability of the broadcast content by the process of step S147 of FIG. 8 using the importance of the feature amount calculated by the above method. In this embodiment, different reliability calculation methods are provided for performers, keywords, and genres.
Figure JPOXMLDOC01-appb-M000006
式(6)は、出演者の特徴量の重要度CRにより算出される出演者の信頼度の計算例である。式(6)でT_Jは、選択された放送学習データを表している。放送学習データT_Jに出演者の重要度を加算したものを放送学習データT_Jの出演者の信頼度としている。
Figure JPOXMLDOC01-appb-M000007
Formula (6) is a calculation example of the performer reliability calculated by the importance CR of the performer's feature amount. In Expression (6), T_J represents the selected broadcast learning data. The reliability of the performer of the broadcast learning data T_J is obtained by adding the importance of the performer to the broadcast learning data T_J.
Figure JPOXMLDOC01-appb-M000007
式(7)は、キーワードの特徴量の重要度GSにより算出されるキーワードの信頼度の計算例である。式(7)では、放送学習データT_Jのキーワードの信頼度は、T_Jが重要度の高いキーワードを多く含んでいるほど高い値となる。本実施例では、キーワードの重要度は式(4)で表されており、視聴したVoDコンテンツに多く含まれるキーワードの重要度が高くなる。そのため、視聴したVoDコンテンツと放送学習データT_Jのキーワードが共通するほど、T_Jのキーワードの信頼度が大きい値となる。
Figure JPOXMLDOC01-appb-M000008
Expression (7) is a calculation example of the reliability of the keyword calculated from the importance GS of the keyword feature amount. In Expression (7), the reliability of the keyword of the broadcast learning data T_J becomes higher as T_J includes more important keywords. In this embodiment, the importance level of the keyword is expressed by Expression (4), and the importance level of the keyword included in a large amount of the viewed VoD content increases. Therefore, the more reliable the viewed VoD content and the broadcast learning data T_J keyword, the greater the reliability of the T_J keyword.
Figure JPOXMLDOC01-appb-M000008
式(8)は、ジャンルの特徴量GIの重要度により算出されるジャンルの信頼度の計算例である。式(8)では、放送学習データT_Jに含まれるジャンルの重要度の和をとったものを放送学習データT_Jのジャンル信頼度を算出している。放送学習データのジャンルは1つしか与えられていない場合もあるが、一部の放送学習データには「情報・ワイドショー-芸能・ワイドショー」と「ニュース・報道-その他」など複数のジャンルが同時に与えられることもある。そのため、重要度の和の形式を採用している。 Expression (8) is a calculation example of the genre reliability calculated based on the importance of the genre feature quantity GI. In Expression (8), the genre reliability of the broadcast learning data T_J is calculated by taking the sum of the importance levels of the genres included in the broadcast learning data T_J. There may be only one genre of broadcast learning data, but some broadcast learning data has multiple genres such as "Information / Wide Show-Entertainment / Wide Show" and "News / Report-Other". Sometimes given at the same time. Therefore, the sum of importance levels is adopted.
 以上の処理により、選択された放送学習データの出演者信頼度、キーワード信頼度、ジャンル信頼度が算出される。 Through the above processing, the performer reliability, keyword reliability, and genre reliability of the selected broadcast learning data are calculated.
 選択処理部14は、図8のステップS148の処理において、選択された放送学習データT_Jの各信頼度が閾値以上か否かによって、当該放送学習データを「正例」に加えるか否かを判断する。本実施例では、各信頼度のうち1つでも閾値を超えている場合には、放送学習データを「正例」に加えるものとする。 The selection processing unit 14 determines whether or not to add the broadcast learning data to the “positive example” depending on whether or not each reliability of the selected broadcast learning data T_J is equal to or greater than a threshold in the process of step S148 of FIG. To do. In the present embodiment, if any one of the reliability levels exceeds the threshold, the broadcast learning data is added to the “positive example”.
 図12は、放送学習データの正例の一例を示しめしている。図12の例を用いて、選択処理部14の放送学習データの「正例」保持の処理(図8のステップS149)について具体的に説明する。なお、本実施例では、出演者の信頼度の閾値が0.7、キーワードの信頼度の閾値が0.99、ジャンルの信頼度の閾値が“1”であると仮定する。 FIG. 12 shows an example of a positive example of broadcast learning data. With reference to the example of FIG. 12, the process of holding the “correct example” of the broadcast learning data of the selection processing unit 14 (step S149 in FIG. 8) will be specifically described. In the present embodiment, it is assumed that the performer reliability threshold is 0.7, the keyword reliability threshold is 0.99, and the genre reliability threshold is “1”.
 選択処理部14が、コンテンツID-T1を選択しているとすると、出演者の信頼度の値は、出演者の信頼度の閾値0.7を上回ることになる。即ち、コンテンツID-T1の出演者には「マイケル・J・〇△□〇□」が含まれているため、図9の出演者の重要度0.8が閾値0.7を上回ることになる。コンテンツID-T1では、ジャンルの信頼度とキーワードの信頼度は閾値未満であるが、出演者の信頼度が閾値以上であるため、ステップS149の処理により選択処理部14はコンテンツID-T1を「正例」として放送学習データに加える。 If the selection processing unit 14 selects the content ID-T1, the reliability value of the performer exceeds the performer reliability threshold value 0.7. That is, since the performers of the content ID-T1 include “Michael J. ○ △ □ ○ □”, the importance level 0.8 of the performers in FIG. 9 exceeds the threshold value 0.7. . In the content ID-T1, the reliability of the genre and the reliability of the keyword are less than the threshold, but the reliability of the performer is greater than or equal to the threshold. Therefore, the selection processing unit 14 sets the content ID-T1 to “ It is added to the broadcast learning data as a “positive example”.
 また、選択処理部14がコンテンツID-T2を選択しているとすると、ジャンルの信頼度の値は、ジャンルの信頼度の閾値“1”と同じとなる。即ち、コンテンツID-T2のジャンルは「映画-洋画」であるため、図11のジャンルの重要度“1”と閾値“1”とが同じ値となる。コンテンツID-T2では、出演者の信頼度とキーワードの信頼度は閾値未満であるが、出演者の信頼度が閾値以上であるため、ステップS149の処理によって選択処理部14はコンテンツID-T2を「正例」として放送学習データに加える。 If the selection processing unit 14 selects the content ID-T2, the genre reliability value is the same as the genre reliability threshold “1”. That is, since the genre of the content ID-T2 is “movie-foreign film”, the importance “1” and the threshold “1” of the genre in FIG. 11 have the same value. In the content ID-T2, the reliability of the performer and the reliability of the keyword are less than the threshold value. However, since the reliability of the performer is greater than or equal to the threshold value, the selection processing unit 14 sets the content ID-T2 in the process of step S149. It is added to the broadcast learning data as a “positive example”.
 しかしながら、選択処理部14がコンテンツID-T3を選択した場合では、コンテンツID-T3の出演者の信頼度、ジャンルの信頼度、キーワードの信頼度はいずれも閾値未満である。従って、選択処理部14は、ステップS148からステップS145に進み、コンテンツID-T3を「正例」として放送学習データに加えない。同様に、コンテンツID-T4も「正例」には加えない。 However, when the selection processing unit 14 selects the content ID-T3, the reliability of the performer of the content ID-T3, the reliability of the genre, and the reliability of the keyword are all below the threshold. Therefore, the selection processing unit 14 proceeds from step S148 to step S145, and does not add the content ID-T3 as “correct example” to the broadcast learning data. Similarly, the content ID-T4 is not added to the “primary example”.
 この選択処理部14の効果として、放送学習データの中からVoD学習データの推薦モデルに有効な「正例」のみを用いることが可能となる。 As an effect of this selection processing unit 14, it is possible to use only “positive examples” that are effective for the recommended model of VoD learning data from the broadcast learning data.
 一般的に、放送コンテンツとVoDコンテンツでは、ユーザの視聴するコンテンツの傾向は異なることが多い。そのため、放送学習データの「正例」の中にはVoD学習データと同様の傾向を持つ「正例」と、放送コンテンツならではの傾向を持つ「正例」がある。 In general, broadcast content and VoD content often have different trends in content viewed by users. Therefore, “positive examples” of the broadcast learning data include “positive examples” having the same tendency as the VoD learning data and “positive examples” having a tendency unique to broadcast contents.
 図12に示した放送コンテンツで「〇〇〇〇マーケット」や「〇〇〇さん」は、放送コンテンツとしては視聴しているが、VoDコンテンツとして提供された場合には視聴したいと思うとは限らない。一方、「〇〇〇〇ロマン‥」や「〇〇〇〇〇・オブ・カリビアン」は、VoD学習データの「正例」と同様の理由があり、VoD学習データの推薦モデルを学習する際の「正例」として適切であると考えられる。従って、選択処理部14により、推薦モデル学習に用いる「正例」を増加させることができ、同時に放送学習データに特有の「正例」を除くことができるため、精度の良い学習が期待できる。 In the broadcast content shown in FIG. 12, “000 market” and “Mr. 00” are viewed as broadcast content, but are not always desired to be viewed when provided as VoD content. Absent. On the other hand, “0000 Roman ...” and “00.000. Of Caribbean” have the same reason as the “positive example” of VoD learning data, and are used when learning a recommended model of VoD learning data. It is considered appropriate as a “positive example”. Accordingly, the selection processing unit 14 can increase the “positive examples” used for learning the recommended model, and at the same time can exclude the “positive examples” peculiar to the broadcast learning data, so that accurate learning can be expected.
 次に、推薦モデル学習部15は、VoD学習データと保持された放送学習データの「正例」で構成される学習データを用いて、ユーザのVoDコンテンツ推薦モデルを学習する。本実施例では、推薦モデルの学習にはベイジアンネットの手法を用いる。 Next, the recommendation model learning unit 15 learns the user's VoD content recommendation model using the learning data composed of the “positive examples” of the VoD learning data and the stored broadcast learning data. In this embodiment, the Bayesian network method is used for learning the recommendation model.
 図13は、推薦モデル学習用ベイジアンネットの一例を示している。ベイジアンネットの構造は、視聴ノードに対して「ジャンル」、「出演者」、「キーワード」のノードが接続されている形になっている。推薦モデルは、従来の学習方法と同様に、学習データから各ノードの条件付確率表(Conditional Probability Table)を計算することで学習することが可能である。 FIG. 13 shows an example of a recommended model learning Bayesian network. The structure of the Bayesian network is such that “genre”, “performer”, and “keyword” nodes are connected to the viewing node. Similar to the conventional learning method, the recommendation model can be learned by calculating a conditional probability table of each node from the learning data.
 図14は、推薦モデル学習部15における処理の具体例を示すフローチャートである。まず、推薦モデル学習部15は、未だ選択されていない学習データがあるか否かを判断する(ステップS151)。推薦モデル学習部15は、学習データの中から未だ選択されていない学習データがある場合、その学習データを選択する(ステップS152)。全ての学習データが選択済みであれば、ステップS156に進む。推薦モデル学習部15は、選択した学習データ中に未だ選択されていないノードがあるか否かを判断する(ステップS153)。推薦モデル学習部15は、選択した学習データ中に未だ選択されていないノードがある場合、そのノードを選択する(ステップS154)。全てのノードが選択済みであれば、処理を終了する。次に、推薦モデル学習部15は、選択したノードの特徴量を離散値に変換する(ステップS155)。選択した学習データ中に未だ選択されていないノードがある場合、ステップS153~S155の処理が繰返し実行される。そして、各ノードが離散値に変換された学習データを用いて、ベイジアンネットによる推薦モデルを学習する(ステップS156)。 FIG. 14 is a flowchart showing a specific example of processing in the recommended model learning unit 15. First, the recommended model learning unit 15 determines whether there is any learning data that has not yet been selected (step S151). If there is learning data that has not yet been selected from the learning data, the recommended model learning unit 15 selects the learning data (step S152). If all the learning data has been selected, the process proceeds to step S156. The recommended model learning unit 15 determines whether there is a node that has not been selected in the selected learning data (step S153). If there is a node that has not been selected in the selected learning data, the recommended model learning unit 15 selects that node (step S154). If all nodes have been selected, the process ends. Next, the recommended model learning unit 15 converts the feature amount of the selected node into a discrete value (step S155). If there is a node that has not yet been selected in the selected learning data, the processes of steps S153 to S155 are repeatedly executed. And the recommendation model by a Bayesian network is learned using the learning data by which each node was converted into the discrete value (step S156).
 本実施例のベイジアンネットを用いた学習方法では、ノードとは出演者の特徴量(例えば、図9を参照)とキーワードの特徴量(例えば、図10を参照)である。 In the learning method using the Bayesian network of this embodiment, the nodes are the feature amount of the performer (for example, see FIG. 9) and the feature amount of the keyword (for example, see FIG. 10).
 本実施例では、推薦モデル学習部15は、学習データの出演者の特徴量とキーワードの特徴量を図13に示すように「好み」,「通常」の2値の離散値に変換する。即ち、本実施例では、信頼度算出に用いた式(6)、式(7)の値をスコアとして用いることで、学習データの出演者スコアの和、キーワードスコアの和がそれぞれ閾値以上のとき「好み」、閾値未満のとき「通常」として2値の離散値に変換することが可能である。 In this embodiment, the recommended model learning unit 15 converts the feature quantity of the performer and the keyword feature quantity of the learning data into binary discrete values of “preference” and “normal” as shown in FIG. In other words, in this embodiment, when the values of the equations (6) and (7) used for the reliability calculation are used as scores, the sum of the performer scores and the keyword scores of the learning data are each equal to or greater than the threshold value. When “preferred” or less than the threshold value, “normal” can be converted into a binary discrete value.
 この変換を行うためには、通常、一度全ての学習データを選択して各出演者やキーワードのスコアを算出し、各学習データの出演者やキーワードのスコアの和を基に「好み」「通常」を決定する動作が必要となる。しかしながら、本実施例では、推薦モデル学習部15は、すでに選択処理部14により計算された式(6)、式(7)によるスコアを受け取るため、各出演者やキーワードの重みを計算する処理を省略することができる。 In order to perform this conversion, usually, all the learning data is selected once, and the score of each performer or keyword is calculated. Based on the sum of the performer and keyword scores of each learning data, “preference” “normal” ”Is required. However, in the present embodiment, the recommended model learning unit 15 receives the scores based on the equations (6) and (7) already calculated by the selection processing unit 14, and therefore performs a process of calculating the weight of each performer and keyword. Can be omitted.
 図15は、図7のVoDコンテンツに対し、図9の出演者の特徴量および図10のキーワードの特徴量を2値に変換した一例を示している。例えば、式(6)によりコンテンツID-1のVoD学習データの出演者「マイケル・J・〇△□〇□」のスコアは0.8となる。放送コンテンツと同様に閾値を0.7とすると、このスコアは閾値以上であるため、出演者の特徴量は「好み」に変換される。一方、コンテンツID-4のVoD学習データの出演者スコアは0.1となり閾値未満であるため、出演者特徴量は「通常」に変換される。このように、あらかじめ算出された特徴量の重要度を用いて学習データを変換し、その学習データを用いて従来のベイジアンネットの学習手法と同様にCPTを作成することで、推薦モデル学習部15は各特徴量の重みを再計算することなく、図14のステップS156の処理により推薦モデルを学習することが可能である。 FIG. 15 shows an example of converting the feature quantity of the performer in FIG. 9 and the feature quantity of the keyword in FIG. 10 into binary for the VoD content in FIG. For example, according to Expression (6), the score of the performer “Michael J. ○ Δ □ ○ □” of the VoD learning data of the content ID-1 is 0.8. If the threshold value is 0.7 as in the case of the broadcast content, the score is equal to or higher than the threshold value, so the feature amount of the performer is converted to “preference”. On the other hand, the performer score of the VoD learning data of content ID-4 is 0.1, which is less than the threshold value, so the performer feature value is converted to “normal”. As described above, the learning data is converted using the importance of the feature amount calculated in advance, and the CPT is created using the learning data in the same manner as in the conventional Bayesian network learning method. Can learn the recommended model by the process of step S156 in FIG. 14 without recalculating the weight of each feature quantity.
 次に、VoD対象データ格納部16は、推薦の対象となるVoD学習データが格納される記憶装置である。VoD対象データは、VoDメタデータと同形式で格納されている。 Next, the VoD target data storage unit 16 is a storage device that stores VoD learning data to be recommended. The VoD target data is stored in the same format as the VoD metadata.
 次に、判定部17は、VoD対象データ格納部16に格納されるVoD対象データから未選択のVoD対象データを選択し、選択されたVoD対象データに対してフィルタ値を算出し、フィルタ値が閾値以上であるVoD対象データに対しては推薦モデル学習部15が学習した推薦モデルにより判定を行い、判定結果をVoD対象データに付与する。プロセッサでこれら一連の処理のためのプログラムが実行されることにより判定部17は実現される。 Next, the determination unit 17 selects unselected VoD target data from the VoD target data stored in the VoD target data storage unit 16, calculates a filter value for the selected VoD target data, and the filter value is The VoD target data that is equal to or greater than the threshold value is determined by the recommended model learned by the recommendation model learning unit 15, and the determination result is given to the VoD target data. The determination unit 17 is realized by executing a program for the series of processes by the processor.
 図16は、判定部17における処理の具体例を示すフローチャートである。判定部17は、VoD対象データの中で選択されていないデータがあるか否かを判断する(ステップS171)。判定部17は、VoD対象データの中で未だ選択されていないVoD対象データがある場合、そのVoD対象データを選択する(ステップS172)。全てのVoD対象データが選択済みであれば、処理を終了する。次に、判定部17は、選択されたVoD対象データのフィルタ値を算出する(ステップS173)。判定部17は、算出されたフィルタ値が閾値以上であるか否かを判断する(ステップS174)。算出されたフィルタ値が閾値未満であればステップS176に進む。算出されたフィルタ値が閾値以上である場合、判定部17は、推薦モデル学習部15により学習された推薦モデルを用いてVoD対象データを判定する(ステップS175)。判定部17は、判定の結果をVoD対象データに付与し(ステップS176)、ステップS171へ戻る。 FIG. 16 is a flowchart illustrating a specific example of processing in the determination unit 17. The determination unit 17 determines whether there is data that is not selected in the VoD target data (step S171). If there is VoD target data that has not been selected in the VoD target data, the determination unit 17 selects the VoD target data (step S172). If all the VoD target data has been selected, the process ends. Next, the determination unit 17 calculates a filter value of the selected VoD target data (step S173). The determination unit 17 determines whether or not the calculated filter value is greater than or equal to a threshold value (step S174). If the calculated filter value is less than the threshold value, the process proceeds to step S176. When the calculated filter value is equal to or greater than the threshold value, the determination unit 17 determines the VoD target data using the recommended model learned by the recommendation model learning unit 15 (step S175). The determination unit 17 assigns the determination result to the VoD target data (step S176), and returns to step S171.
 判定部17は、ステップS173の処理でフィルタ値を算出し、フィルタ値が閾値以上であったVoD対象データのみをステップS175の処理で推薦モデルを用いて判別する。一般的に、ベイジアンネットにより学習されたモデルにおける判定には時間がかかるため、計算時間の短いフィルタ値を設定することで明らかな「負例」のVoD対象データをあらかじめ除き、全体の計算コストを削減することができる。本実施例では、ステップS173の処理でのフィルタ値の算出に、選択処理部14により算出された特徴量の重要度を用いることが可能である。
Figure JPOXMLDOC01-appb-M000009
The determination unit 17 calculates the filter value in the process of step S173, and determines only the VoD target data whose filter value is equal to or greater than the threshold using the recommended model in the process of step S175. Generally, since it takes time to make a determination in a model learned by a Bayesian network, it is possible to eliminate the “negative example” VoD target data that is obvious by setting a filter value with a short calculation time in advance, and to reduce the overall calculation cost. Can be reduced. In the present embodiment, the importance of the feature amount calculated by the selection processing unit 14 can be used for calculating the filter value in the process of step S173.
Figure JPOXMLDOC01-appb-M000009
式(9)は、フィルタ値の一例を示している。ここでid_obj、guest_obj、keyword_obj、genre_objは選択されたVoD対象データのコンテンツID、出演者、キーワード、ジャンルなどの任意の要素、[guest]^5_1、[keyword]^5_1、[genre]^5_1は、出演者、キーワード、ジャンルのそれぞれで選択処理部14が算出した重要度の上位5個からなる集合である。 Equation (9) shows an example of the filter value. Where id_obj, guest_obj, keyword_obj, and genre_obj are any elements such as the content ID, performer, keyword, and genre of the selected VoD target data, and [guest] ^ 5_1, [keyword] ^ 5_1, and [genre] ^ 5_1 , Performers, keywords, and genres, the set of the top five importance levels calculated by the selection processing unit 14.
 このようなフィルタ値は、VoD対象データが出演者の重要度の上位5名の出演者、キーワードの重要度の上位5個のキーワード、ジャンルの重要度の上位5個のジャンルのいずれかを含んでいる場合は“1”、それ以外の場合は“0”となるような値となる。 Such filter values include any of the top five performers with the highest importance of performers, the top five keywords with the highest importance of keywords, and the top five genres with the highest importance of genres. The value is “1” when the value is “0”, and “0” otherwise.
 図16のステップS174の処理で、閾値を“1”とすれば、重要度の高い出演者、キーワード、ジャンルのいずれかを1つでも含んでいるVoD対象データは、ステップS175の処理で推薦モデルを用いてより詳細な判別が行われる。しかし、いずれも含んでいないVoD学習データは、明らかに推薦されないとして推薦モデルによる判別は行わないとする。 If the threshold value is set to “1” in the process of step S174 in FIG. 16, the VoD target data including any one of the performers, keywords, and genres with high importance is recommended model in the process of step S175. More detailed discrimination is performed using. However, it is assumed that the VoD learning data that does not include any of them is not determined by the recommended model because it is clearly not recommended.
 次のステップS176の処理で、判別部17は、VoD対象データに判別結果を付与する。ここでは、推薦モデルにより結果が得られている場合には結果の数値を与え、推薦モデルから結果が得られていない場合には明らかな「負例」として0.0を与える。 In the next step S176, the determination unit 17 gives a determination result to the VoD target data. Here, when the result is obtained from the recommended model, the numerical value of the result is given, and when the result is not obtained from the recommended model, 0.0 is given as an obvious “negative example”.
 図17は、VoD対象データの一例である。判定部17がコンテンツID-R1を選択している場合、コンテンツID-R1はジャンルの重要度が上位の「映画-洋画」を含んでいるため、図16のステップS173の処理でフィルタ値は“1”となる。したがって、判定部17は、ステップS175の処理で推薦モデルを用いて詳細な判定を行う。判定により算出された視聴確率が、例えば0.45であった場合、その値をVoD対象データに付与する。 FIG. 17 is an example of VoD target data. When the determination unit 17 selects the content ID-R1, since the content ID-R1 includes “movie-foreign film” having a higher genre importance, the filter value is “ 1 ". Therefore, the determination unit 17 performs detailed determination using the recommended model in the process of step S175. When the viewing probability calculated by the determination is 0.45, for example, the value is assigned to the VoD target data.
 また、判定部17がコンテンツID-R2を選択している場合、コンテンツID-R2は、出演者の重要度、キーワードの重要度、およびジャンルの重要度とも上位のものを含んでいないため、ステップS174からステップS176に進む。判定部17は、ステップS176において、VoD対象データに0.0の値を付与する。 Further, when the determination unit 17 selects the content ID-R2, the content ID-R2 does not include higher ranks in the importance of the performer, the importance of the keyword, and the importance of the genre. The process proceeds from step S174 to step S176. In step S176, the determination unit 17 assigns a value of 0.0 to the VoD target data.
 以上の処理により、判定部17はVoD対象データに対して、明らかに「負例」と判定されるデータをフィルタ値によりあらかじめ除くことで、推薦モデルによる判定の回数を減らし計算コストを削減する。フィルタ値には選択処理部14が算出した特徴量の重要度を用いることで、フィルタ値のための計算を行う必要がないため新たな計算コストも発生しない。そのため、一般的に大量に存在するVoD対象データに関しても少ない計算コストで推薦を行うことが可能となる。 Through the above processing, the determination unit 17 removes data that is clearly determined as “negative example” from the VoD target data in advance by the filter value, thereby reducing the number of determinations by the recommended model and reducing the calculation cost. By using the importance of the feature amount calculated by the selection processing unit 14 as the filter value, it is not necessary to perform a calculation for the filter value, so that a new calculation cost does not occur. For this reason, it is possible to perform recommendation with low calculation cost even for VoD target data that generally exists in large quantities.
 次に、出力部18は、判定部17により判定結果を付与されたVoD対象データの中から視聴確率が上位のVoD対象データを選択し、選択されたVoD対象データに含まれる重要度の高い特徴量を理由として選択し、選択されたVoD対象データと理由をユーザに提示する制御を行う。プロセッサでこれら一連の処理のためのプログラムが実行されることにより、出力部18は実現される。 Next, the output unit 18 selects the VoD target data having a higher viewing probability from the VoD target data to which the determination result is given by the determination unit 17, and the feature having high importance included in the selected VoD target data. The amount is selected as a reason, and control is performed to present the selected VoD target data and the reason to the user. The output unit 18 is realized by executing a program for the series of processes by the processor.
 図18は、出力部18による出力の例である。ここでは、VoD対象データのうち視聴確率が上位のデータを「新しいコンテンツの推薦」として、タイトル、ジャンル、出演者、概要を表示している。選択処理部14が算出した特徴量の重要度が上位のものがVoD対象データに含まれている場合、そのVoD学習データを推薦した重要な理由であると考えられるため、図18のように各特徴量において最も重要度の高いものを強調して表示(例えば、色づけ、アンダーライン、太文字、網掛けなど)することが可能である。このような表示により、ユーザはそれぞれの嗜好にあったコンテンツを検索などの作業なしに見つけることができる。 FIG. 18 is an example of output by the output unit 18. Here, the title, genre, performer, and outline are displayed with the data having the highest viewing probability among the VoD target data as “new content recommendation”. If the VoD target data includes a higher importance level of the feature amount calculated by the selection processing unit 14, it is considered that this is an important reason for recommending the VoD learning data. It is possible to emphasize and display the most important feature amount (for example, coloring, underlining, bold text, shading, etc.). With such a display, the user can find content that suits each taste without work such as search.
 また、VoD正例補完部13において「擬似正例」となったVoD学習データは、ユーザは未視聴である。そのため「擬似正例」も同時に表示することで、ユーザは新しいコンテンツだけでなく、見逃しているシリーズのコンテンツも同時に見つけることができる。 In addition, the VoD learning data that has become “pseudo positive example” in the VoD correct example complementing unit 13 has not been viewed by the user. Therefore, by displaying the “pseudo positive example” at the same time, the user can find not only new contents but also contents of series that are missed at the same time.
 このような表示方法は、VoD正例補完部13と選択処理部14によりあらかじめ特徴量重要度と擬似正例が算出されているため可能となっている。 Such a display method is possible because the importance of the feature amount and the pseudo positive example are calculated in advance by the VoD correct example complementing unit 13 and the selection processing unit 14.
 本実施例のコンテンツ推薦装置は、視聴したVoDコンテンツをもとにユーザが視聴したいと考えられるVoDコンテンツを擬似的に増加させ、それらを用いて放送コンテンツの視聴履歴からVoDコンテンツの推薦に有効な事例を選択することにより有効な推薦モデルの学習を行うことができる。 The content recommendation apparatus according to the present embodiment is effective for recommending VoD content from a viewing history of broadcast content using a pseudo increase of the VoD content that the user would like to view based on the viewed VoD content. By selecting a case, it is possible to learn an effective recommendation model.
 本実施例に係るコンテンツ推薦装置は、プログラムをコンピュータ装置にあらかじめインストールすることで実現してもよい。また、CD-ROMなどの記憶媒体に記憶して、あるいはネットワークを介して上記のプログラムを配布して、このプログラムをコンピュータ装置に適宜インストールすることで実現してもよい。また、VoD学習データ格納部11、放送学習データ格納部12、およびVoD対象データ格納部16は、コンピュータ装置に内蔵あるいは外付けされたメモリ、ハードディスクもしくはCD-R、CD-RW、DVD-RAM、DVD-Rなどの記憶媒体などを適宜利用して実現することができる。 The content recommendation device according to the present embodiment may be realized by installing a program in a computer device in advance. Further, the program may be realized by storing the program in a storage medium such as a CD-ROM or distributing the program through a network and installing the program in a computer apparatus as appropriate. The VoD learning data storage unit 11, the broadcast learning data storage unit 12, and the VoD target data storage unit 16 are a memory, a hard disk or a CD-R, a CD-RW, a DVD-RAM, It can be realized by appropriately using a storage medium such as a DVD-R.
 なお、上記実施例はそのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施例に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。 In addition, the said Example is not limited as it is, In an implementation stage, a component can be deform | transformed and embodied in the range which does not deviate from the summary. Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments.
11‥VoD学習データ格納部
12‥放送学習データ格納部
13‥VoD正例補完部
14‥選択処理部
15‥推薦モデル学習部
16‥VoD対象データ格納部
17‥判定部、
18‥出力部
DESCRIPTION OF SYMBOLS 11 ... VoD learning data storage part 12 ... Broadcast learning data storage part 13 ... VoD example supplement part 14 ... Selection processing part 15 ... Recommended model learning part 16 ... VoD object data storage part 17 ... Determination part,
18 Output section

Claims (6)

  1.  複数のVoDコンテンツのメタデータと前記複数のVoDコンテンツの視聴に関するデータとを有する複数のVoD学習データを格納するVoD学習データ格納部と、
     複数の放送コンテンツのメタデータと前記複数の放送コンテンツの視聴に関するデータとを有する複数の放送学習データを格納する放送学習データ格納部と、
     前記VoD学習データ格納部に格納される前記複数のVoD学習データの負例の中で正例と類似したデータを正例に変換するVoD正例補完部と、
     前記VoD正例補完部によって補完されたVoD学習データの各メタデータについて、正例との相関の強さを表す特徴量重要度を算出し、その算出した特徴量重要度を用いて前記放送学習データの中からVoDコンテンツの推薦に適切な放送学習データを選択する選択処理部と、
     前記選択された放送学習データと前記算出された特徴量重要度を用いてVoDコンテンツの推薦モデルを学習する推薦モデル学習部と、
     推薦の対象となる複数のVoDコンテンツのメタデータを格納したVoD対象データ格納部と、
     前記VoDコンテンツの推薦モデルに基づいて前記推薦の対象となる複数のVoDコンテンツに対し推薦の判定を行う判定部と、
     前記判定部による推薦の判定結果を出力する出力部と、
    を有することを特徴とするコンテンツ推薦装置。
    A VoD learning data storage unit for storing a plurality of VoD learning data having metadata of a plurality of VoD contents and data relating to viewing of the plurality of VoD contents;
    A broadcast learning data storage unit for storing a plurality of broadcast learning data having metadata of a plurality of broadcast contents and data relating to viewing of the plurality of broadcast contents;
    A VoD positive example complementing unit that converts data similar to a positive example among positive examples of the plurality of VoD learning data stored in the VoD learning data storage unit;
    For each metadata of the VoD learning data supplemented by the VoD positive example complementing unit, a feature amount importance indicating the strength of correlation with the positive example is calculated, and the broadcast learning is performed using the calculated feature amount importance. A selection processing unit for selecting broadcast learning data suitable for recommendation of VoD content from the data;
    A recommended model learning unit that learns a recommended model of VoD content using the selected broadcast learning data and the calculated feature value importance;
    A VoD target data storage unit storing metadata of a plurality of VoD contents to be recommended;
    A determination unit configured to determine recommendation for a plurality of VoD contents to be recommended based on a recommendation model of the VoD content;
    An output unit for outputting a determination result of recommendation by the determination unit;
    A content recommendation device comprising:
  2.  前記VoD正例補完部は、
     前記VoD学習データの負例の中から正例のタイトルと類似する負例を選択する類似タイトル選択処理と、
     前記類似タイトル選択処理で選択された負例の中から出演者やキーワードが重複する負例を選択する類似メタデータ選択処理と、
     前記類似メタデータ選択処理で選択された負例を正例に変換する正例反転処理と、
    を行うことを特徴とする請求項1記載のコンテンツ推薦装置。
    The VoD positive example complement part is:
    A similar title selection process for selecting a negative example similar to a positive example title from negative examples of the VoD learning data;
    Similar metadata selection processing for selecting a negative example in which performers and keywords overlap among negative examples selected in the similar title selection processing,
    A positive example inversion process for converting a negative example selected in the similar metadata selection process into a positive example;
    The content recommendation device according to claim 1, wherein:
  3.  前記選択処理部は、
     前記VoD学習データを用いて各特徴量の重要度を算出する特徴量重要度算出処理と、
     前記特徴量重要度算出処理で算出された前記特徴量重要度を用いて前記放送学習データの信頼度を算出し、その値により前記VoDコンテンツの推薦に適切な放送学習データを選択する事例選択処理と、
    を行うことを特徴とする請求項1記載のコンテンツ推薦装置。
    The selection processing unit
    A feature amount importance calculation process for calculating the importance of each feature amount using the VoD learning data;
    Example selection processing for calculating the reliability of the broadcast learning data using the feature amount importance calculated in the feature amount importance calculation processing, and selecting broadcast learning data appropriate for recommendation of the VoD content based on the value When,
    The content recommendation device according to claim 1, wherein:
  4.  前記判定部は、
     前記算出された特徴量重要度を用いて重要度の高い特徴量を含むVoD対象データのみを選択するフィルタ処理と、
     前記選択されたVoD対象データに対して前記推薦モデルを用いて詳細な判別を行いその結果を付与する判別処理と、
    を行うことを特徴とする請求項1記載のコンテンツ推薦装置。
    The determination unit
    A filtering process for selecting only VoD target data including a feature quantity having a high importance using the calculated feature quantity importance;
    A determination process for performing detailed determination on the selected VoD target data using the recommended model and giving a result thereof;
    The content recommendation device according to claim 1, wherein:
  5.  前記出力部は、前記判別結果と前記特徴量重要度を用いて推薦の上位の結果を推薦理由と共に提示することを特徴とする請求項1記載のコンテンツ推薦装置。 The content recommendation device according to claim 1, wherein the output unit presents a result of a recommendation higher than the recommendation reason using the determination result and the feature amount importance.
  6.  前記出力部は、前記補完されたVoD学習データを用いてタイトルが類似したVoD学習データを推薦理由と共に提示することを特徴とする請求項1記載のコンテンツ推薦装置。 The content recommendation device according to claim 1, wherein the output unit presents VoD learning data having similar titles together with a recommendation reason using the complemented VoD learning data.
PCT/JP2009/004812 2009-09-24 2009-09-24 Content recommendation device WO2011036704A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2009/004812 WO2011036704A1 (en) 2009-09-24 2009-09-24 Content recommendation device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2009/004812 WO2011036704A1 (en) 2009-09-24 2009-09-24 Content recommendation device

Publications (1)

Publication Number Publication Date
WO2011036704A1 true WO2011036704A1 (en) 2011-03-31

Family

ID=43795479

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2009/004812 WO2011036704A1 (en) 2009-09-24 2009-09-24 Content recommendation device

Country Status (1)

Country Link
WO (1) WO2011036704A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019046058A (en) * 2017-08-31 2019-03-22 キヤノン株式会社 Information processing device, and information processing method and program
JP2019179372A (en) * 2018-03-30 2019-10-17 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America Learning data creation method, learning method, risk prediction method, learning data creation device, learning device, risk prediction device, and program
JP2022104310A (en) * 2020-12-28 2022-07-08 楽天グループ株式会社 Learning device, machine learning model and learning method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004287776A (en) * 2003-03-20 2004-10-14 Fujitsu Ltd Document classification method, document classification device, and document classification program
JP2006127145A (en) * 2004-10-28 2006-05-18 Sharp Corp Content recommendation apparatus, content recommendation method, content recommendation program, and computer-readable recording medium with program recorded thereon
JP2007060398A (en) * 2005-08-25 2007-03-08 Toshiba Corp Program information providing device, program information providing method and its program
JP2007142761A (en) * 2005-11-17 2007-06-07 Sharp Corp Program recommending apparatus, program information providing system, information processing apparatus, program information providing method, program information providing program, and computer-readable recording medium recording this program
JP2007282042A (en) * 2006-04-10 2007-10-25 Toshiba Corp Apparatus and method for recommending program
JP2008117222A (en) * 2006-11-06 2008-05-22 Sony Corp Information processor, information processing method, and program
JP2009110064A (en) * 2007-10-26 2009-05-21 Toshiba Corp Sorting model learning apparatus and sorting model learning method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004287776A (en) * 2003-03-20 2004-10-14 Fujitsu Ltd Document classification method, document classification device, and document classification program
JP2006127145A (en) * 2004-10-28 2006-05-18 Sharp Corp Content recommendation apparatus, content recommendation method, content recommendation program, and computer-readable recording medium with program recorded thereon
JP2007060398A (en) * 2005-08-25 2007-03-08 Toshiba Corp Program information providing device, program information providing method and its program
JP2007142761A (en) * 2005-11-17 2007-06-07 Sharp Corp Program recommending apparatus, program information providing system, information processing apparatus, program information providing method, program information providing program, and computer-readable recording medium recording this program
JP2007282042A (en) * 2006-04-10 2007-10-25 Toshiba Corp Apparatus and method for recommending program
JP2008117222A (en) * 2006-11-06 2008-05-22 Sony Corp Information processor, information processing method, and program
JP2009110064A (en) * 2007-10-26 2009-05-21 Toshiba Corp Sorting model learning apparatus and sorting model learning method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KOTA NAKATA ET AL.: "Shitsu no Kotonaru Kyoshi Data o Mochiita Bunrui Shuho", DAI 80 KAI SPECIAL INTERNET GROUP ON KNOWLEDGE-BASED SOFTWARE SHIRYO (SIG-KBS-A703), 25 December 2007 (2007-12-25), pages 61 - 66 *
TSUYOSHI MURATA: "Seirei to Furei kara no Web Community Hakken", IEICE TECHNICAL REPORT ARTIFICIAL INTELLIGENCE AND KNOWLEDGE-BASED PROCESSING, vol. 103, no. 243, 24 July 2003 (2003-07-24), pages 37 - 42 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019046058A (en) * 2017-08-31 2019-03-22 キヤノン株式会社 Information processing device, and information processing method and program
JP7027070B2 (en) 2017-08-31 2022-03-01 キヤノン株式会社 Information processing equipment, information processing methods, and programs
JP2019179372A (en) * 2018-03-30 2019-10-17 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America Learning data creation method, learning method, risk prediction method, learning data creation device, learning device, risk prediction device, and program
JP2022104310A (en) * 2020-12-28 2022-07-08 楽天グループ株式会社 Learning device, machine learning model and learning method
JP7190479B2 (en) 2020-12-28 2022-12-15 楽天グループ株式会社 LEARNING APPARATUS, MACHINE LEARNING MODEL AND LEARNING METHOD

Similar Documents

Publication Publication Date Title
US11601703B2 (en) Video recommendation based on video co-occurrence statistics
CN106331778B (en) Video recommendation method and device
US9100701B2 (en) Enhanced video systems and methods
KR101061234B1 (en) Information processing apparatus and method, and recording medium
CN104317835B (en) The new user of video terminal recommends method
US8200689B2 (en) Apparatus, method and computer program for content recommendation and recording medium
JP4464463B2 (en) Related word presentation device
US20080294625A1 (en) Item recommendation system
US20090006368A1 (en) Automatic Video Recommendation
JP4370850B2 (en) Information processing apparatus and method, program, and recording medium
KR101016990B1 (en) Recommender and method of providing a recommendation of content, a computer program enabling such method, and a private video recorder comprising such recommender
US11153655B1 (en) Content appeal prediction using machine learning
CN112507163B (en) Duration prediction model training method, recommendation method, device, equipment and medium
CN106599165B (en) content recommendation method and server based on playing behavior
CN110287375B (en) Method and device for determining video tag and server
Chiny et al. Netflix recommendation system based on TF-IDF and cosine similarity algorithms
US20130124310A1 (en) Method and apparatus for creating recommendations for a user
US20180067935A1 (en) Systems and methods for digital media content search and recommendation
CN112464100A (en) Information recommendation model training method, information recommendation method, device and equipment
WO2011036704A1 (en) Content recommendation device
CN112313697A (en) System and method for generating interpretable description-based recommendations describing angle augmentation
JP5941078B2 (en) Information processing apparatus, program, and method
GB2576938A (en) System and method for improved content discovery
US20120010874A1 (en) Method and system for providing a representative phrase based on keyword searches
JP5018148B2 (en) Information processing apparatus, information processing method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09849736

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09849736

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP