WO2022200815A1

WO2022200815A1 - Video content item selection

Info

Publication number: WO2022200815A1
Application number: PCT/GB2022/050768
Authority: WO
Inventors: Ben POLKINGHORNE; Sean WHITTAKER; Seth SHENBANJO
Original assignee: Witwit Holdings Enterprise Global Super Limited
Priority date: 2021-03-26
Filing date: 2022-03-28
Publication date: 2022-09-29
Also published as: GB2622525A; GB202319926D0; GB202104299D0

Abstract

A computer implemented method for selecting at least one recommended video content item for a user, the method comprising: processing a plurality of video content items to derive an emotion affect score for each of the video content items; receiving a desired emotional state of the user; and selecting at least one recommended video content item from the plurality of video content items based on the desired emotional state of the user and the respective emotion affect scores of the video content items.

Description

VIDEO CONTENT ITEM SELECTION

This invention relates to a computer implemented method for selecting at least one recommended video content item for a user.

There are a number of providers of video content items. Some of these providers permit users to select physical copies of video content items and have them delivered to the user, such as via the postal service. Some of these providers permit users to select video content items that are delivered to the user via a computer network, such as the internet, directly to a device associated with the user. The video content items can be films, TV shows, documentaries, or other moving picture-based content items. The video content items will usually include moving picture content and audio content.

Most of these providers provide some way in which video content items can be suggested to a user based on the previous video content items that the user has consumed. For instance, video content items may be suggested to the user based on other users having rated both video content items the user has rated and other video content items that the user has not rated. Video content items may also be suggested based on a calculated similarity between video content items the user has rated and other video content items that the user has not rated. Video content items may also be suggested based on a what a similar type of user has rated. The algorithms used for these suggestions tend to point the user towards similar video content items to those that the user has watched previously. The unsophisticated nature of the selection methods means that users tend to be offered similar content to that which they have watched previously. This may be acceptable on some occasions however the user may prefer to watch video content items that are not so intrinsically linked to previous video content item selections. Such a large number of video content items are available that it is very difficult for a user to select a new video content item to view without reliance on the previous viewing habits of the user. The volume of video content items that are available make it impossible for a user to process all of the video content items themselves and form a view on what one might be most suitable to watch at a given moment. It would therefore be desirable for there to be an improved method for the selection of at least one recommended video content item for a user.

According to a first aspect of the present invention there is provided a computer implemented method for selecting at least one recommended video content item for a user, the method comprising: processing a plurality of video content items to derive an emotion affect score for each of the video content items; receiving a desired emotional state of the user; and selecting at least one recommended video content item from the plurality of video content items based on the desired emotional state of the user and the respective emotion affect scores of the video content items.

The method may comprise receiving emotional state data of the user; and deriving a current emotional state of the user from the emotional state data; and wherein selecting at least one recommended video content item from the plurality of video content items may be based on the current emotional state of the user, the desired emotional state of the user and the respective emotion affect scores of the video content items.

According to a second aspect of the present invention there is provided a computer implemented method for selecting at least one recommended video content item for a user, the method comprising: processing a plurality of video content items to derive an emotion affect score for each of the video content items; receiving emotional state data of the user; deriving a current emotional state of the user from the emotional state data; and selecting at least one recommended video content item from the plurality of video content items based on the current emotional state of the user, and the respective emotion affect scores of the video content items.

The method may comprise receiving a desired emotional state of the user; and wherein selecting at least one recommended video content item from the plurality of video content items may be based on the current emotional state of the user, the desired emotional state of the user and the respective emotion affect scores of the video content items. The emotion affect score may be an indication of emotional state the video content item generates upon watching the video content item. The emotion affect score may be a category of emotional state. The emotion affect score may comprise a plurality of sub-scores. The plurality of sub-scores may be each associated with a category of emotional state.

At least one of the plurality of video content items may be associated with a respective written component, and processing a plurality of video content items to derive an emotion affect score for each of the video content items may comprise processing the written component to derive the emotion affect score for the video content item. At least one of the plurality of video content items may be associated with a respective video script, and processing a plurality of video content items to derive an emotion affect score for each of the video content items may comprise processing the video script to derive the emotion affect score for the video content item. The video script may be part of the video content item. The video script may be caption data and or subtitle data. The video content item may comprise audio data, and processing a plurality of video content items to derive an emotion affect score for each of the video content items may comprise processing audio data to transcribe the respective video script. The video content item may comprise audio data, and processing a plurality of video content items to derive an emotion affect score for each of the video content items may comprise processing the audio data to derive the emotion affect score for the video content item. Processing a plurality of video content items to derive an emotion affect score for each of the video content items may comprise processing associated audio data that is related to the video content item to derive the emotion affect score for the video content item.

The video content item may comprise video data, and processing a plurality of video content items to derive an emotion affect score for each of the video content items may comprise processing the video data to derive the emotion affect score for the video content item. Processing a plurality of video content items to derive an emotion affect score for each of the video content items may comprise processing associated video data that is related to the video content item to derive the emotion affect score for the video content item. Processing the video data to derive the emotion affect score for the video content item may comprise applying image recognition to the video data to detect known features in the video content and deriving the emotion affect score based on those known features. The video content items may comprise a plurality of segments, and processing the video data to derive the emotion affect score for the video content item may comprise generating an emotion affect score for each segment of the video content items. Processing a plurality of video content items to derive an emotion affect score for each of the video content items may comprise training a model to process the video content items to derive the emotion affect scores by loading training data into the model, the training data may comprise at least one of (i) respective emotional state samples for the duration of a plurality video content items for a plurality of sample users, (ii) feedback from the user after watching a selected video content item, (iii) metadata that has been attached to various points along the training data. The emotional state samples may be one or more physiological parameters of the plurality of sample users. The model may simulate one or more physiological parameters of a user over the duration of a video content item.

The emotional state data may comprise at least one user emotion judgement. Deriving the current emotional state of the user may comprise combining the plurality of user emotion judgements to form the current emotional state of the user. The emotional state data may comprise a recording of the user’s voice. Deriving the current emotional state of the user may comprise processing the recording to derive one or more characteristics of the user’s voice, and processing those characteristics to derive the current emotional state of the user.

Selecting at least one recommended video content item from the plurality of video content items may comprise selecting at least one video content item that has a respective emotion affect score that matches the desired emotional state. Video content items may have emotion affect scores for segments of the video content item, and selecting at least one recommended video content item from the plurality of video content items may comprise selecting at least one video content item that has a respective emotion affect score for at least one segment that matches the desired emotional state. Selecting at least one recommended video content item from the plurality of video content items may comprise selecting at least one video content item that has a respective emotion affect score that matches the current emotional state. Selecting at least one recommended video content item from the plurality of video content items may comprise selecting at least one video content item that has a respective emotion affect score that does not match the current emotional state. Video content items may have emotion affect scores for segments of the video content item, and selecting at least one recommended video content item from the plurality of video content items may comprise selecting at least one video content item that has a respective emotion affect score for at least one segment that does not match the current emotional state. Video content items may have emotion affect scores for segments of the video content item, and selecting at least one recommended video content item from the plurality of video content items may comprise selecting at least one video content item that has a respective emotion effect score for a segment within a defined time period of the video content item starting.

The method may comprise simulating one or more physiological parameters for the user based on the emotional state data over a plurality of video content items, and selecting at least one recommended video content item from the plurality of video content items may comprise selecting the at least one recommended video content item in response to the simulated physiological parameters indicating a match for the current and/or desired emotional state.

Determining that an emotional state matches an emotion affect score may comprise determining that the emotional state is within an emotional state threshold of an emotion affect score. Determining that an emotional state does not match an emotion affect score may comprise determining that the emotional state is outside an emotional state threshold of an emotion affect score.

The method may comprise sending video content item information for the at least one recommended video content item to the user. The video content information may (i) identify the video content item(s), (ii) show which supplier the video content item is available from, (iii) provide a reference to enable an end user device to access the content item and/or (iv) use images and other video components that convey particular emotions within the video content item.

The emotional state data may be generated based on data gathered by an end user device. The emotional state data may be gathered by capturing a recording of the user’s voice. The user may be a first user and the desired emotional state may be a first desired emotional state, the method further comprising: receiving a second desired emotional state of a second user; and selecting at least one recommended video content item from the plurality of video content items based on the respective emotion affect scores of the video content items and a third desired emotional state that is calculated based on the first and second desired emotional states.

The user may be a first user, the emotional state data may be first emotional state data and the current emotional state may be a first current emotional state, the method further comprising: deriving a second current emotional state of a second user from received emotional state data of the second user; and selecting at least one recommended video content item from the plurality of video content items based on the respective emotion affect scores of the video content items and a third current emotional state that is calculated based on the first and second current emotional states.

The one or more video content items may be interstitial content items, and the interstitial content items may be selected for insertion into a pre-existing stream of content items.

The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings:

Figure 1 shows an example system that implements the methods described herein.

Figure 2 shows a flow diagram showing the methods described herein.

The following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art. The general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

The present invention relates to a computer implemented method for selecting at least one recommended video content item for a user, the method comprises processing a plurality of video content items to derive an emotion affect score for each of the video content items. The emotion affect score may have a plurality of sub-scores. The emotion affect score may be an aggregate of a plurality of sub-scores. Further details of the emotion affect score are included herein. The method may further comprise receiving emotional state data of the user, deriving a current emotional state of the user from the emotional state data and receiving a desired emotional state of the user. The method may further comprise selecting at least one recommended video content item from the plurality of video content items based on the current emotional state of the user, and the respective emotion affect scores of the video content items. The method may further comprise selecting at least one recommended video content item from the plurality of video content items based on the desired emotional state of the user and the respective emotion affect scores of the video content items. The method may further comprise selecting at least one recommended video content item from the plurality of video content items based on the current emotional state of the user, the desired emotional state of the user and the respective emotion affect scores of the video content items.

Figure 1 shows an example system that can permit selection of recommended video content items. The system comprises a server 1. It will be appreciated that whilst reference may be made to a single server 1 , this server could be part of a cluster of servers or may be a virtual server running in a cloud-based, virtual environment. The server comprises a processing section 2 and a storage section 3. The server 1 is configured to implement methods described herein for processing video content items and selecting recommended video content items. These methods can be implemented and controlled by the processing section 2. The processing section 2 could perform its methods using dedicated hardware, using a general purpose processor executing software code, or using a combination of the two. A processor 4 executes software code stored in a non-transient way in software memory 5 in order to perform its methods. The processing section can read/write data from/to storage location 3. The storage location 3 may be in the form of a memory. Storage location 3 may comprise non-volatile memory, may be in the form of an array of discrete banks of memory such as hard disks. Whilst shown in Figure 1 as schematically being part of server 1 , the storage location 3 may be separate to server 1 and connected to server 1. The above-described content items may be stored in storage location 3. The methods undertaken by server 1 may be spilt across multiple servers. For instance, a first server may process the video content items, a second server may process user emotional state information and a third server may store the video content items and supply them to the end-user device. The servers may be under the control of different entities.

The server 1 may be connected to a computer network 6 to permit communication with other devices to enable performance of the methods described herein. Computer network 6 may be made up of many network segments that are connected together and so may be a large distributed network such as the internet or another public network.

Also connected to computer network 6 are a plurality of user terminals. User terminals may be a computer, such as a desktop or laptop computer; a portable device, such as a laptop, tablet computer or smartphone; or a smart TV or other device that can connect to remote servers using computer network 6 to access remote content located on servers such as server 1 to permit a user to send and receive information over computer network 6. In the example given in figure 1 , two user terminals are shown, the first user device 7 being a smartphone 7 with the second user device 8 being a laptop 8. Typically, the user device 7, 8 will be located remote from the server 1 and may well be located in a different country or even on a different continent from the server 1 .

As shown in figure 1 , the first and second devices 7, 8 may comprise a housing 9. The first and second devices 7, 8 comprise a processing section 10 and a memory 11 . The first and second devices comprise a user interface constituted by a display 12 and, shown in the case of second device 8, a series of user-actuable switches 13. The display and switches could be combined into a touchscreen device as shown by first device 7. The first and second devices 7, 8 may comprise a wireless communication interface 19 for communicating with computer network 6. The first and second devices 7, 8 may comprise a wired communication interface 19 for communicating with computer network 6.

The device 7, 8 may be configured to display video content items on the screen. The first and second devices 7, 8 may be capable of implementing methods described herein to supply emotion state data and receive details of recommended video content items and display those video content items on display screen 12. These methods may be implemented and controlled by the processing section 10. The processing section 10 could perform its methods using dedicated hardware, using a general purpose processor executing software code, or using a combination of the two. A processor 14 executes software code stored in a non-transient way in software memory 15 in order to perform its methods. The processing section can read/write data from/to memory 11 . The memory 11 may be a storage location for data. Memory 11 may comprise non-volatile memory, may be in the form of an array of discrete banks of memory such as hard disks. Whilst shown in Figure 1 as schematically being part of first and second devices 7, 8, the memory 11 may be separate to first and second devices 7, 8 and respectively connected to first and second devices 7, 8 by some means.

The device 7, 8 may be configured to run an application that enables the collection of user emotional state data from the user. The application may also display details of the at least one recommended video content item. The application may also be capable of displaying the at least one recommended video content item on display screen 12. The application may also be capable of transmitting the at least one recommended video content item to another device, such as a smart TV, for display on that device. The application(s) may be implemented and controlled by the processing section 10. In figure 1 , user device 8 is shown as displaying the application 16. First user device 7, analogously to second user device 8, is shown displaying the application. In this case, the application 16 fills the display screen 12 of the first user device 7. The application may be a discrete application that runs on the device. Alternatively, the application may be in the form of a website which is loaded and run in an internet browser, or be presented in a virtual or non-virtual space, such as a metaverse.

The improved method of selecting at least one recommended video content item for a user will now be described with reference to the figures.

Figure 2 shows a flow diagram of the method by which the server 1 selects at least one recommended video content item for a user. The method may be implemented by the server 1 executing software code to implement the method shown in figure 2.

As shown at 21 , a plurality of video content items are processed to derive an emotion affect score for each of the video content items. Each video content item of the plurality of video content items may be processed in order to derive a respective emotion affect score for that item. The plurality of video content items may be categorised within different channels of entertainment. Each channel of entertainment may be associated with a different emotional state or range of emotions. For example, one channel of entertainment may comprise video content items associated with the emotion of fear. Another channel of entertainment may comprise video content items associated with the emotion of romance. A channel of entertainment may be a channel that is specific to the user, and to which video content items are added in dependence on the emotional state of the user.

The emotion affect score may be an indication of the type of emotional state that the video content item causes a user to be in upon watching the video content item. The emotion affect score may be a category of emotional state. The categories of emotion state describe aspects of the emotional state of the user. For instance, the categories may include at least some of happy, sad, tired, energised, romantic and/or include emotional states such as mood, or take into account arousal and valence. Thus, the emotion affect score may state which of these emotions the video content item is going to generate in the user upon watching the video content item. For example, the emotion affect score for a video content item may be a descriptive name that is associated with that item, such as ‘heart-wrenching’. The emotion affect score may have a plurality of sub-scores. The sub-scores may numerically rate the ability of the video content item to generate a particular emotion in a user upon watching the video content item. There may be a sub-score per category of emotion state. The sub-score has a minimum and maximum value for each category. The emotion affect score may be an aggregate of the plurality of sub-scores. In some examples, each sub-score may have multiple data points. For example, a three-minute scene forming part of a video content item may have its own sub-score that is distinct from the sub-scores of the preceding and subsequent scenes in that video content item. The sub-score may comprise data points for a variety of emotions for every frame within a scene forming part of a video content item.

There are various methods which may be used individually or jointly to process a video content item to derive an emotion affect score for the video content item. It will be appreciated that the list provided herein is exemplary and other methods that are similar or different may be used to process a video content item to derive an emotion affect score for the video content item.

An emotional affect score may be derived from the processing of semantic information from multimedia data sources that form part of the video content items. Semantic information that forms part of a video content item may be comprised within a written component, an audio component or a video component forming part of the video content item. Examples of types of semantic information that may be processed to derive an emotional affect score are provided below. In some examples, processing may be performed on the content of the video content items (e.g., semantic information comprised within the video content item) in order to derive an emotion affect score. In other examples, processing may be performed on information associated with the video item (e.g., semantic information used to identify the video content item).

Emotion may be extracted from analysis of text that forms part of the content of the video file using methods such as natural language processing. The content may be at least part of what is shown to the user when the video file is played and also associated content. Datasets may include written components such as a synopsis, script, subtitles or captions, plot summaries or reviews. The text may also come from audio analysis undertaken as part of the methods described herein. For example, dialogue from the film may be transcribed and then natural language processing or other text analysis may be run on that audio analysis.

Natural language processing or other text analysis may be used on the content of the video file to derive the meaning of the content included in the video content item and use that meaning to calculate the emotion affect score. The natural language processing or other text analysis may be used on at least one of a variety of elements of the video content item.

The video content item may have at least one written component associated with it. The video content item may be processed to derive an emotion affect score by processing the written component(s) to derive the emotion affect score for the video content item. The written component(s) may be analysed to derive the meaning of the written component(s) which is then used to derive an emotion affect score for the video content item.

The video content item may have a video script associated with it. The video script may be part of the video content item. For instance, the video content item may comprise caption data. The caption data may be synchronised to the content of the video content item so that the relevant part of the video script is presented to the user alongside the content of the video content item. The caption data may include a written description of the words that are spoken in the video content item. In this way, the caption data may be subtitle data. The caption data may also include description of what is shown in the video content item. The video content item may be processed to derive an emotion affect score by processing the video script to derive an emotion affect score for the video content item. The video script may be analysed to derive the meaning of the video script which is then used to derive an emotion affect score for the video content item.

The video content item contains audio data. The audio data comprises any spoken words through out the length of the video content item. The audio data contains background sounds through out the length of the video content item. The background sounds may include music, sound effects, and/or environmental noises. Other audio data may be included in the analysis that is related to the video content item. For instance, audio data may include a recording of an audience’s reaction to watching the video content item and/or may include audio from a trailer for the video content item.

The video content item may be processed to derive an emotion affect score by processing the audio data to derive an emotion affect score for the video content item. The analysis of audio data may make use of other analysis methods described herein. For instance, the audio data may be processed in dependence on the script, subtitles or captions of the video content item. The processing of the audio data may include an analysis of voice/speech content, sound effects content and/or music content of the video content item.

The audio data may be analysed to separate out the spoken words in the audio data and analyse those spoken words to determine the emotion with which they are being said. The audio data as a whole may be processed to analyse the speech present in the audio data file. In this way, speech analysis may be undertaken on the audio data file. The analysis of the speech may include an analysis of speech characteristics such as tone, cadence, pitch pace, tempo and other variables that give us insight into emotion and sentiment. The analysis may process the audio data to separate out the speech characteristics information of the spoken words. The speech characteristics information can indicate the emotion with which the words are being said. As an example, the processing of the audio data may include processing the subtitle data and/or captions to detect the spoken words in the video content item.

The audio data may be analysed to separate out background sounds. The background sounds may be identified by separating out the frequency components of the background sounds and matching the frequency components of those background sounds to known background sounds. The known background sounds may have a particular emotion affect associated with them. The determined background sounds are used to generate the emotion affect score by using the particular emotion effect associated with the detected known background sound.

The video content item contains video data. The video data comprises the moving images that form the video of the video content item. Other video data may be included in the analysis that is related to the video content item. For instance, the video data may comprise a trailer for the video content item. The video data may be analysed to derive an emotion affect score for the video content item. The video data may be analysed using one or more method to derive this information. For instance, the analysis may involve one or more of:

- facial expression or facial movement analysis,

- body movement and gestures, posture, body language, behaviour or gait analysis,

- gleaning physiological signals or combinations thereof.

In addition, the surroundings of the agents present in the video data, inter-agent proximities and interactions, or the absence of other agents may be analysed. Any combination of these may be used to derive the perceived emotional state of the agent(s) present in the video content item.

It will be appreciated that the described video data processing techniques could be combined with other analysis, such as the audio analysis of what an agent present in the video content item is saying. This can be used to provide a more robust estimation of the subject’s emotional state.

The video data may be processed using computer vision and/or image recognition. The processing techniques such as computer vision and/or image recognition may be configured to detect known features such as faces, known objects and/or other images. The processing techniques may recognise series of faces, known objects and/or other images to detect when particular events occur in the video data. The image recognition may be configured to detect faces in the video data. The faces that are detected may be processed to derive the emotion of the faces over the length of the video content item. The image recognition may be configured to detect objects within the video data. The image recognition may associate known objects or images with particular emotion affects. For instance, the image recognition may be configured to detect weapons within the video data and ascribe a particular emotion affect due to weapons being detected in the video data. In this way the video content item may be processed to derive an emotion affect score by processing the video data to derive an emotion affect score for the video content item. The emotion affect score may be derived using the particular emotion affects for detected known objects within the video data.

The emotion affect score may be generated for the video content item as a whole.

The video content item may comprise more than one segment. Each segment may run contiguously one after the other to form the overall length of the video content item. An emotion affect score may also be generated for each segment of the video content item. In this way, emotional states can be defined for segments, scenes, and various other stages of the film before evaluating the video content item in its entirety. The emotion affect scores for each segment may be combined to generate an emotion affect score for the whole video content item.

A segment may be a predefined time period of the video content item. A segment may be related to the content of the video content item. For instance, a segment may be a scene of the video content item. A segment may be the beginning, middle and/or end of the content. Each of the methods described herein for the generation of the emotion affect score may be used to generate an emotion affect score for a segment of the video content item. It will be appreciated that other methods that are similar or different may be used to generate an emotion affect score for a segment of the video content item.

The emotion affect score for segments of the video content item may be used to select the recommended video content item as described herein. For instance, the emotion affect scores for the first and last segments may be used to select the recommended video content item as these segments may best influence the emotional state of the user.

A neural network and/or deep learning model may be trained to process the video content items to derive the emotion affect score for each video content item. The model may be trained to process the audio data and/or video data to derive the emotional affect score. The model may be set up to derive the emotion affect score using any or all of the methods described herein. It will be appreciated that other methods that are similar or different may also be used to derive the motion affect score. The model may then be fed training data.

A set of users watch a set of video content items whilst having their emotional state logged throughout the watching of each video content item. The logged emotional state throughout each video content item is an emotional state sample. The training data therefore comprises a plurality of emotional state samples for respective video content items for a particular user. The training data comprises emotional state samples for a plurality of users. The training data also comprises the video content items. In this way, the model can process the video content items against the emotional state samples to derive the emotional state generated by the video content item at each moment in the video content item. The emotion state samples may each have a time stamp.

The emotional state samples may be generated by monitoring one or more physiological parameters of the users. As examples, the physiological parameters may be heart rate, breathing rate, pupil dilation, blink rate, and/or eye closure time. The variation of these parameter(s) over time can be used to derive the emotional state of the user at a given time. For instance, a high heart rate could mean the user is scared or excited, and the pupils being dilated could mean the user is scared.

The training data may be fed into the model to give the model a set of data points which can be used to derive the emotional state generated by video content items that were not part of the training data. In this way, the model is able to derive an emotion affect score for a video content item.

The model may simulate one or more physiological parameters of a user. As examples, the physiological parameters may be heart rate, breathing rate, pupil dilation, blink rate, and/or eye closure time. The model thereby derives one or more simulated physiological parameters of a user over the duration of a video content item. The derivation of the simulated physiological parameters may be based on the set of data points gathered from the training data. The simulated physiological parameters generated for the video content item can then be used to derive the emotional state generated by the video content item. Advantageously, the model may simulate the heart rate of a user. The model thereby derives a simulated heart rate of the user over the duration of a video content item.

Training data may also comprise feedback from the user after watching a selected video content item. This feedback may comprise an emotion affect score for the video content item. The feedback may indicate whether the user agrees with the recommendation of the video content item. The feedback may indicate whether the user agrees with mood altering categorisation of the video content item.

Training data may comprise metadata that has been attached to various points in the timeline of the video content item. The metadata may be attached manually by an operative watching the video content. The metadata may indicate the emotion generated by the video content item at a given point. The metadata may comprise an arousal/valence scale and/or a categorical emotion. These may be based on the universal emotions from Ekman, for example. Only a sub-set of the training data may comprise metadata as described.

As shown at 22, in some examples, emotional state data of a user is received by the server 1 . The emotional state data may be gathered by an end user device 7, 8 and transmitted to server 1. The emotional state data may encompass emotions, moods, feelings and/or attitudes.

The emotional state data may be gathered by asking the user one or more questions about their current emotional state. The emotional state data may be gathered by asking the user to undertake one or more surveys or tests which are used to gather information about their current emotional state. A chat bot may be used to provide the surveys or tests. The user may be asked to select one or more emoticons that the user judges best describes their current emotional state. The user may be shown multiple emoticons and asked to select one or more of them. The user may be shown multiple emoticons and asked to select only one. The user may be shown multiple sets of emoticons one after the other and asked to select one or more of them from each set. The selections made by the user may be sent to the server as the emotional state data. The user may be shown multiple visual elements and asked to select at least one of the visual elements. The visual elements may be pictures or videos. These visual elements may not have a specific emotion attached to them but which can be used to derive an emotional state. For example, the user may be presented with a series of colours and/or an assortment of pictures. The user may be asked one or more questions from which a current emotional state can be derived. These questions may be predetermined or generated by an algorithm in response to previous questions. In this way, the emotional state data comprises user emotion judgement(s).

The user emotion judgement(s) may be gathered by asking a user to quantify the user’s current emotional state against a plurality of categories. For instance, the end user device may present a series of sliders which the user drags to quantify their current emotional state in a plurality of categories.

The emotional state data may be generated by the end user device based on data gathered by the end user device. An end user device that is used to generate and gather emotional state data may be a device that is wearable by the user. Examples of devices that are wearable by the user are smart devices such as smart jewellery, smart watches and smart clothing. Smart devices are electronic devices that may contain one or more sensors and a means for connecting to a network comprising other devices so as to transfer data to and from those other devices. Another example of an end user device that is used to generate and gather emotional state data is a device located in the environment of the user, such as a television comprising a camera or a mobile device such as a telephone. In the example where the end user device is a television comprising a camera, the camera may analyse a user’s face so as to read their emotions. The user may have more than one end user device which is used to generate and gather the emotional state data. The data gathered may be related to the physiological state of the user. For instance, the duration of sleep, wake time, bedtime, heart rate, exercise time and/or step count. The end user device may combine this data to form emotional state data using one or more predefined rules.

The emotional state data may be gathered by capturing a recording of the user’s voice. The user may be prompted to say certain phrases out loud whilst the end user device 7, 8 records the audio generated. The phrase may be a selected phrase to start the recommendation method. The emotional state data may be generated by the end user device based on conditions in the surrounding environment of the user. For example, the emotional state data may be generated based on the day of the week on which the user is using their device, or an application running on the device. In another example, the emotional state data may be generated based on the weather at the time at which the user is using their device or application. Data indicating the conditions in the surrounding environment of the user may be obtained from an application running on the device (e.g., a weather app). In a further example, the emotional state data may be based on the emotional state of other users in the vicinity of the user that is accessing their device or application. In this example, the emotional state of other users may be gathered from a network accessible to the user’s device. Other examples of conditions in the surrounding environment of the user may also be used to generate the emotional state data. The emotional state data may be based on previous emotional state data obtained for the user, such as from previous inputs provided by the user to their end device. In each of the above examples, emotional state data may be derived without a user having to input data actively indicating their emotional state.

The emotional state data may combine one or more of the options described herein. It will be appreciated that similar or different options to those described herein may additionally be used.

As shown at 23, in some examples, the current emotional state of the user is derived by server 1 . The current emotional state of the user is derived from the emotional state data received from the end user device 7, 8. The current emotional state data may encompass emotions, moods, feelings and/or attitudes.

Where the emotional state data comprises user emotion judgements, the current emotional state of the user may be derived by combining the received user emotion judgements. The user may have chosen the same emotion judgement multiple times which then weights the emotional state data towards a particular emotional state. The derivation of the current emotional state may compare the emotional state data to emotional state criteria which define which current emotional state should be selected based on the received emotional state data. It may be a look-up table that links current emotional states to particular emotional state data. Where only one user emotion judgement is received then the current emotional state may be set to the user emotion judgement.

Where the emotional state data comprises a recording of the user’s voice, then the recording is processed to derive one or more characteristics of the user’s voice. Those characteristics are then used to derive the current emotional state of the user. For instance, the characteristics may be selected from tone, inclination, diction and/or the level of slurring in a person’s speech.

As shown at 24, in some examples, a desired emotional state of the user is received by server 1 . The desired emotional state of the user may be gathered by end user device 7, 8 and transmitted to server 1. The desired emotional state data may encompass emotions, moods, feelings and/or attitudes.

The desired emotional state data may be gathered by asking the user one or more questions about their desired emotional state. The desired emotional state data may be gathered by asking the user to undertake one or more surveys or tests which are used to gather information about their desired emotional state. For instance, the user may be asked to select one or more emoticons that the user judges best describes their desired emotional state. The user may be shown multiple emoticons and asked to select one or more of them. The user may be shown multiple emoticons and asked to select only one. The user may be shown multiple sets of emoticons one after the other and asked to select one or more of them from each set. The selections made by the user may be sent to the server as the desired emotional state data. The user may be shown multiple visual elements and asked to select at least one of the visual elements. The visual elements may be pictures or videos. These visual elements may not have a specific emotion attached to them but which can be used to derive an emotional state. For example, the user may be presented with a series of colours and/or an assortment of pictures. The user may be asked one or more questions from which a desired emotional state can be derived. These questions may be predetermined or generated by an algorithm in response to previous questions. The desired emotional state data may be gathered by asking a user to quantify the user’s desired emotional state against a plurality of categories. For instance, the end user device may present a series of sliders which the user drags to quantify their desired emotional state in a plurality of categories. The desired emotional state data may alternatively be gathered by derivation based on conditions in the surrounding environment of the user, as described with respect to the generation of emotional state data above. This desired emotional state data may be derived without a user having to input data actively indicating their desired emotional state.

In an example, the step of deriving the current emotional state or receiving the desired emotional state of a user may comprise deriving/receiving the current/desired emotional states of multiple users. For example, the step of deriving the current emotional state of a user may comprise deriving a first current emotional state of a first user from emotional state data of a first user, and a second current emotional state of a second user from received emotional state data of the second user. In this example, a third current emotional state may be derived by the system, where the third current emotional state is calculated based on the first and second current emotional states. For example, the third current emotional state may be calculated by merging the first and second current emotional states to determine an average current emotional state. Similarly, the step of receiving the desired emotional state of a user may comprise receiving a first desired emotional state from a first user and a second desired emotional state from second user. In this example, a third desired emotional state may be calculated by the system, where the third desired emotional state is calculated based on the first and second desired emotional states. For example, the third desired emotional state may be calculated by merging the first and second desired emotional states to determine an average desired emotional state. In alternative examples, the step of deriving the current emotional state or receiving the desired emotional state of a user may comprise deriving/receiving the current/desired emotional states of more than two users. In these alternative examples, the third current/desired emotional state may be calculated based on the emotional states of those more than two users.

Once the end user device 7, 8 has gathered the desired emotional state data it is transmitted to server 1 . As shown at 25, at least one recommended video content item is selected from the plurality of video content items. The at least one recommended video content item to be selected may consist of a single video content item, or may comprise a plurality of video content items. Where the at least one recommended video content item comprises a plurality of video content items, these items may be recommended via the recommendation of a channel of entertainment comprising those video content items. For example, a channel comprising video content items designed to convey an emotion of fear may be recommended to a user in dependence on their emotional state. Alternatively, the video content items may form part of a channel that is personal to the user, and that is adapted by adding and/or removing video content items from the channel in dependence on the emotional state of the user.

The server 1 undertakes the selection of the at least one recommended video content item. The selection may be based on the current emotional state of the user. The selection may be based on the desired emotional state of the user. The selection is based on the respective emotion affect scores of the plurality of video content items. The selection may additionally be based on video content categorisations received from the user. These video content categorisations may comprise one or more of a desired genre, a desired release time period (such as a desired release year) for the video content items, a desired age rating, and a desired video content item length or range of lengths. The video content categorisations may be used to further refine the recommendation.

The purpose of the at least one recommended video content item is to attempt to complement, influence, or transition the emotional state of the user who views at least one recommended video content item. The methods described herein look to extract semantic information from multimedia data sources that form part of video content items. These data sources may include directly perceivable media such as audio, image and video; may include indirectly perceivable sources such as text, semantic descriptions, bio-signals; and may include non-perceivable sources.

A video content item may be selected as a recommended video content item due to the desired emotional state of the user matching the emotion affect score of the video content item. In the case that the video content item has emotion affect scores for segments of the video content item, the video content item may be selected as a recommended video content item due to the desired emotional state of the user matching one or more emotion affect scores for segments of the video content item. The video content item may be selected if the emotion affect scores of segments near the end of the video content item match the desired emotional state of the user. It may be the last 1 , 2, 3, 4, 5, or 10 segments of the video content. The video content item may be selected if the emotion affect scores of segments near the start of the video content item match the desired emotional state of the user. It may be the first 1 , 2, 3, 4, 5, or 10 segments of the video content. The video content item may be selected if the emotion affect scores in a particular group of segments match the desired emotional state of the user.

A video content item may be selected as a recommended video content item due to the current emotional state of the user matching the emotion affect score of the video content item. In the case that the video content item has emotion affect scores for segments of the video content item, the video content item may be selected as a recommended video content item due to the current emotional state of the user matching one or more emotion affect scores for segments of the video content item. The video content item may be selected if the emotion affect scores of segments near the end of the video content item match the desired emotional state of the user. It may be the last 1 , 2, 3, 4, 5, or 10 segments of the video content. The video content item may be selected if the emotion affect scores of segments near the start of the video content item match the desired emotional state of the user. It may be the first 1 , 2, 3, 4, 5, or 10 segments of the video content. The video content item may be selected if the emotion affect scores in a particular group of segments match the current emotional state of the user.

A video content item may be selected as a recommended video content item due to the current emotional state of the user not matching the emotion affect score of the video content item. In the case that the video content item has emotion affect scores for segments of the video content item, the video content item may be selected as a recommended video content item due to the current emotional state of the user not matching one or more emotion affect scores for segments of the video content item. The video content item may be selected if the emotion affect scores of segments near the end of the video content item does not match the desired emotional state of the user. It may be the last 1 , 2, 3, 4, 5, or 10 segments of the video content. The video content item may be selected if the emotion affect scores of segments near the start of the video content item does not match the desired emotional state of the user. It may be the first 1 , 2, 3, 4, 5, or 10 segments of the video content. The video content item may be selected if the emotion affect scores in a particular group of segments does not match the current emotional state of the user.

The method may determine that an emotional state matches an emotion affect score by determining that the emotional state is within an emotional state threshold of an emotion affect score. The emotional state may be greater than or less than an emotion affect score by the emotional state threshold and be determined to match the emotion affect score. The method may determine that an emotional state does not match an emotion affect score by determining that the emotional state is outside an emotional state threshold of an emotion affect score. The emotional state may be greater than or less than an emotion affect score by more than the emotional state threshold and be determined to not match the emotion affect score. In this way, an emotion affect score may be determined to be suitable or not suitable for an emotional state.

A set of selection rules may be stored by the server 1. These selection rules may associate particular current emotional states and/or desired emotional states with particular emotion affect scores. Video content items can then be selected based on those video content items having the particular emotion affect scores that are associated with combinations of current emotional state and/or desired emotional state. The set of selection rules may be a look up table that associates current emotional states and/or desired emotional states, and emotion affect scores. In this way, the emotional states may be determined to match to emotion affect scores.

A video content item may be selected as a recommended video content item based on the desired emotional state of the user matching the emotion affect score of the video content item within a defined time period of the start of the video content item. The defined time period may be calculated based on attributes of the user. The attributes of the user may include age and/or gender. One or more physiological parameters may be simulated for the user based on the emotional state data of the user. As examples, the physiological parameters may be heart rate, breathing rate, pupil dilation, blink rate, and/or eye closure time. The one or more physiological parameters may be simulated for the user over the duration of a plurality of video content items based on the emotion affect scores of the video content item. The simulation may also be based on other information associated with the user. Advantageously, the model may simulate the heart rate of the user. Therefore, a simulated heart rate for the user may be generated over the duration of a plurality of video content items. Video content item(s) may be selected in response to the simulated physiological parameters indicating a match for the current and/or desired emotional state.

A video content item may be selected as a recommended video content item additionally based on video content categorisations received from the user. The video content categorisations may comprise one or more of a desired genre, a desired release time period (such as a desired release year) for the video content items, a desired age rating, and a desired video content item length or range of lengths. The video content categorisations may be used to further refine the recommendation.

As shown at 26, video content item information for the at least one recommended video content item is sent to the user. The information is sent to the user by the server 1 transmitting the information to the end user device 7, 8. The video content item information may identify the video content item. The video content item information may show which supplier the video content item is available from. The video content item information may provide a reference to enable the end user device 7, 8 to access the video content item. The video content item information may use images and other video components that convey particular emotions within the video content item. The emotions being identified by the methods described herein. For instance, the images may show emotional high or low points of the video content item visually represented by these images.

It has previously been mentioned that video content items can be films, TV shows, documentaries, or other moving picture-based content items. In one example, the video content items may be interstitial content items, such as advertisements. The interstitial content items may be selected for insertion into a pre-existing stream of content items. The pre-existing stream of content items may comprise a single video content item, such as a film, or a plurality of video content items such as different episodes of a television show. The pre-existing stream of content items may be parsed to determine one or more optimal positions in which to insert one or more interstitial content items. The optimal positions may be determined based on the processing of the content items in the stream to derive an emotional affect score, or sub-scores, for the stream. The interstitial content items may also be processed to derive an emotional affect score for each content item. The interstitial content items to be inserted into the content stream may be selected in dependence on the emotional affect scores, or sub scores, of the content stream and the emotional affect scores of the interstitial content items. The determination of emotional sub-scores is described in more detail above. The interstitial content items to be inserted into the content stream may be selected based on current or desired emotional state of the user. The current and desired emotional state of the user may be received or derived using any of the techniques described above. The interstitial content items may be inserted into the content stream in dependence on all of the emotional affect sub-scores derived by the content stream, the emotional affect scores of the interstitial content items and the current or desired emotional state of the user.

The methods described herein are advantageous because they enable a computer system to operate in a new way to process potentially a large number of video content items and provide recommendations of video content items that are based on the emotional state of a user.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

Claims

1 . A computer implemented method for selecting at least one recommended video content item for a user, the method comprising: processing a plurality of video content items to derive an emotion affect score for each of the video content items; receiving a desired emotional state of the user; and selecting at least one recommended video content item from the plurality of video content items based on the desired emotional state of the user and the respective emotion affect scores of the video content items.

2. A method according to claim 1 , wherein the method comprises receiving emotional state data of the user; and deriving a current emotional state of the user from the emotional state data; and wherein selecting at least one recommended video content item from the plurality of video content items is based on the current emotional state of the user, the desired emotional state of the user and the respective emotion affect scores of the video content items.

3. A computer implemented method for selecting at least one recommended video content item for a user, the method comprising: processing a plurality of video content items to derive an emotion affect score for each of the video content items; receiving emotional state data of the user; deriving a current emotional state of the user from the emotional state data; and selecting at least one recommended video content item from the plurality of video content items based on the current emotional state of the user, and the respective emotion affect scores of the video content items.

4. A method according to claim 3, wherein the method comprises receiving a desired emotional state of the user; and wherein selecting at least one recommended video content item from the plurality of video content items is based on the current emotional state of the user, the desired emotional state of the user and the respective emotion affect scores of the video content items.

5. A method according to any preceding claim, wherein the emotion affect score is an indication of emotional state the video content item generates upon watching the video content item.

6. A method according to any preceding claim, wherein the emotion affect score is a category of emotional state.

7. A method according to any preceding claim, wherein the emotion affect score comprises a plurality of sub-scores.

8. A method according to claim 7, wherein the plurality of sub-scores are each associated with a category of emotional state.

9. A method according to any preceding claim, wherein at least one of the plurality of video content items is associated with a respective written component, and processing a plurality of video content items to derive an emotion affect score for each of the video content items comprises processing the written component to derive the emotion affect score for the video content item.

10. A method according to any preceding claim, wherein at least one of the plurality of video content items is associated with a respective video script, and processing a plurality of video content items to derive an emotion affect score for each of the video content items comprises processing the video script to derive the emotion affect score for the video content item.

11. A method according to claim 10, wherein the video script is part of the video content item.

12. A method according to claim 10 or 11 , wherein the video script is caption data and or subtitle data.

13. A method according to any of claims 10 to 12, wherein the video content item comprises audio data, and processing a plurality of video content items to derive an emotion affect score for each of the video content items comprises processing audio data to transcribe the respective video script.

14. A method according to any preceding claim, wherein the video content item comprises audio data, and processing a plurality of video content items to derive an emotion affect score for each of the video content items comprises processing the audio data to derive the emotion affect score for the video content item.

15. A method according to any preceding claim, wherein processing a plurality of video content items to derive an emotion affect score for each of the video content items comprises processing associated audio data that is related to the video content item to derive the emotion affect score for the video content item.

16. A method according to any preceding claim, wherein the video content item comprises video data, and processing a plurality of video content items to derive an emotion affect score for each of the video content items comprises processing the video data to derive the emotion affect score for the video content item.

17. A method according to any preceding claim, wherein processing a plurality of video content items to derive an emotion affect score for each of the video content items comprises processing associated video data that is related to the video content item to derive the emotion affect score for the video content item.

18. A method according to claim 16 or 17, wherein processing the video data to derive the emotion affect score for the video content item comprises applying image recognition to the video data to detect known features in the video content and deriving the emotion affect score based on those known features.

19. A method according to any preceding claim, wherein the video content items comprise a plurality of segments, and processing the video data to derive the emotion affect score for the video content item comprises generating an emotion affect score for each segment of the video content items.

20. A method according to any preceding claim, wherein processing a plurality of video content items to derive an emotion affect score for each of the video content items comprises training a model to process the video content items to derive the emotion affect scores by loading training data into the model, the training data comprising at least one of (i) respective emotional state samples for the duration of a plurality video content items for a plurality of sample users, (ii) feedback from the user after watching a selected video content item, (iii) metadata that has been attached to various points along the training data.

21. A method according to claim 20, wherein the emotional state samples are one or more physiological parameters of the plurality of sample users.

22. A method according to claim 20 or 21 , wherein the model simulates one or more physiological parameters of a user over the duration of a video content item.

23. A method according to any preceding claim, wherein the emotional state data comprises at least one user emotion judgement.

24. A method according to claim 23, wherein deriving the current emotional state of the user comprises combining the plurality of user emotion judgements to form the current emotional state of the user.

25. A method according to any of claims 2 to 24, wherein the emotional state data comprises a recording of the user’s voice.

26. A method according to claim 25, wherein deriving the current emotional state of the user comprises processing the recording to derive one or more characteristics of the user’s voice, and processing those characteristics to derive the current emotional state of the user.

27. A method according to any of claims 1 , 2 and 4 to 26, wherein selecting at least one recommended video content item from the plurality of video content items comprises selecting at least one video content item that has a respective emotion affect score that matches the desired emotional state.

28. A method according to any of claims 1 , 2 and 4 to 27, wherein video content items have emotion affect scores for segments of the video content item, and selecting at least one recommended video content item from the plurality of video content items comprises selecting at least one video content item that has a respective emotion affect score for at least one segment that matches the desired emotional state.

29. A method according to any of claims 2 to 28, wherein selecting at least one recommended video content item from the plurality of video content items comprises selecting at least one video content item that has a respective emotion affect score that matches the current emotional state.

30. A method according to any of claims 2 to 29, wherein selecting at least one recommended video content item from the plurality of video content items comprises selecting at least one video content item that has a respective emotion affect score that does not match the current emotional state.

31 . A method according to any of claims 2 to 30, wherein video content items have emotion affect scores for segments of the video content item, and selecting at least one recommended video content item from the plurality of video content items comprises selecting at least one video content item that has a respective emotion affect score for at least one segment that does not match the current emotional state.

32. A method according to any of claims 2 to 31 , wherein video content items have emotion affect scores for segments of the video content item, and selecting at least one recommended video content item from the plurality of video content items comprises selecting at least one video content item that has a respective emotion effect score for a segment within a defined time period of the video content item starting.

33. A method according to any of claims 2 to 32, the method comprising simulating one or more physiological parameters for the user based on the emotional state data over a plurality of video content items, and selecting at least one recommended video content item from the plurality of video content items comprises selecting the at least one recommended video content item in response to the simulated physiological parameters indicating a match for the current and/or desired emotional state.

34. A method according to any of claims 27 to 33, wherein determining that an emotional state matches an emotion affect score comprises determining that the emotional state is within an emotional state threshold of an emotion affect score.

35. A method according to any of claims 27 to 34, wherein determining that an emotional state does not match an emotion affect score comprises determining that the emotional state is outside an emotional state threshold of an emotion affect score.

36. A method according to any preceding claim, the method comprising sending video content item information for the at least one recommended video content item to the user.

37. A method according to claim 36, wherein the video content information (i) identifies the video content item(s), (ii) shows which supplier the video content item is available from, (iii) provides a reference to enable an end user device to access the content item and/or (iv) uses images and other video components that convey particular emotions within the video content item.

38. A method according to any preceding claim as dependent on claim 2 or claim 3, wherein the emotional state data is generated based on data gathered by an end user device.

39. A method according to any preceding claim as dependent on claim 2 or claim 3, wherein the emotional state data is gathered by capturing a recording of the user’s voice.

40. A method according to any preceding claim as dependent on claim 1 , wherein the user is a first user and the desired emotional state is a first desired emotional state, the method further comprising: receiving a second desired emotional state of a second user; and selecting at least one recommended video content item from the plurality of video content items based on the respective emotion affect scores of the video content items and a third desired emotional state that is calculated based on the first and second desired emotional states.

41 . A method according to any preceding claim as dependent on claim 2 or claim 3, wherein the user is a first user, the emotional state data is first emotional state data and the current emotional state is a first current emotional state, the method further comprising: deriving a second current emotional state of a second user from received emotional state data of the second user; and selecting at least one recommended video content item from the plurality of video content items based on the respective emotion affect scores of the video content items and a third current emotional state that is calculated based on the first and second current emotional states.

42. The method of any preceding claim, wherein the one or more video content items are interstitial content items, and the interstitial content items are selected for insertion into a pre-existing stream of content items.