WO2007113580A1 - Dispositif intelligent de lecture de contenu multimédia doté d'une fonction de détection d'attention de l'utilisateur, procédé et support d'enregistrement associés - Google Patents
Dispositif intelligent de lecture de contenu multimédia doté d'une fonction de détection d'attention de l'utilisateur, procédé et support d'enregistrement associés Download PDFInfo
- Publication number
- WO2007113580A1 WO2007113580A1 PCT/GB2007/001288 GB2007001288W WO2007113580A1 WO 2007113580 A1 WO2007113580 A1 WO 2007113580A1 GB 2007001288 W GB2007001288 W GB 2007001288W WO 2007113580 A1 WO2007113580 A1 WO 2007113580A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- face
- media content
- detecting
- playing
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 44
- 238000000034 method Methods 0.000 title claims description 79
- 230000004044 response Effects 0.000 claims abstract description 42
- 230000001419 dependent effect Effects 0.000 claims description 10
- 210000000887 face Anatomy 0.000 description 46
- 230000006870 function Effects 0.000 description 16
- 230000000875 corresponding effect Effects 0.000 description 15
- 239000000872 buffer Substances 0.000 description 12
- 230000008859 change Effects 0.000 description 12
- 238000013459 approach Methods 0.000 description 11
- 230000006835 compression Effects 0.000 description 10
- 238000007906 compression Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 10
- 230000007423 decrease Effects 0.000 description 9
- 230000007246 mechanism Effects 0.000 description 8
- 210000001747 pupil Anatomy 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 6
- 230000003247 decreasing effect Effects 0.000 description 6
- 230000001815 facial effect Effects 0.000 description 6
- 230000015556 catabolic process Effects 0.000 description 5
- 238000006731 degradation reaction Methods 0.000 description 5
- 230000009467 reduction Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000004913 activation Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000000593 degrading effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000007667 floating Methods 0.000 description 2
- 210000003128 head Anatomy 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 238000003825 pressing Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 235000013405 beer Nutrition 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000005021 gait Effects 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000013442 quality metrics Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/765—Interface circuits between an apparatus for recording and another apparatus
- H04N5/775—Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television receiver
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/222—Secondary servers, e.g. proxy server, cable television Head-end
- H04N21/2225—Local VOD servers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/23439—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements for generating different versions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/258—Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
- H04N21/25866—Management of end-user data
- H04N21/25891—Management of end-user data being end-user preferences
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/2662—Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42204—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
- H04N21/42206—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/4223—Cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/433—Content storage operation, e.g. storage operation in response to a pause request, caching operations
- H04N21/4334—Recording operations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/441—Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card
- H04N21/4415—Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card using biometric characteristics of the user, e.g. by voice recognition or fingerprint scanning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/442—Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
- H04N21/44213—Monitoring of end-user related data
- H04N21/44218—Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV program
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/63—Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
- H04N21/64—Addressing
- H04N21/6405—Multicasting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/84—Television signal recording using optical recording
- H04N5/85—Television signal recording using optical recording on discs or drums
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/79—Processing of colour television signals in connection with recording
- H04N9/80—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
- H04N9/804—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components
- H04N9/8042—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components involving data reduction
Definitions
- the present invention relates to providing intelligent user control options for consuming media content such as video images to one or more viewers.
- Equipment for providing media content such as video images are well known and include the video cassette recorder (VCR), digital video disc players (DVD), hard disc drive based recorders (HDD), terrestrial, cable and satellite TV set top boxes coupled to a viewing device such as a television set (TV).
- VCR video cassette recorder
- DVD digital video disc players
- HDD hard disc drive based recorders
- TV television set
- personal computers are also being used to provide media content, and may include DVD disc readers, TV tuner capture cards, and Internet file download and streaming facilities.
- Streaming media is increasingly being used for fast and convenient access to media content such as Video on Demand (VoD) and is typically implemented using a remote storage server transmitting over a network to a receiving apparatus such as a computer or mobile phone.
- VoD Video on Demand
- the quality of the media content streamed will depend on the bandwidth of the network over which it is transmitted, for example using a Broadband Internet connection a high quality level media content transmission may be made whereas over a wireless network the same media content may need to be transmitted at a lower quality level given the constraints of the lower bandwidth connection.
- the media content quality level can be adjusted by using different compression technology (data rate) resulting in less packets to be sent over a unit time, a reduced number of frames for video, and lower resolution images.
- the quality of media content can also be adjusted dynamically, for example British Telecoms Fastnet video streaming technology provides media content in a number of different quality formats, and switches between them depending on network congestion. Thus if the network becomes congested, the media server switches to streaming a lower quality level media content to the consuming device, for example a mobile phone. This is described in Walker, M. D., Nilsson, M., Jebb, T. and Turnbull R. Mobile Video-Streaming, BT Technology Journal, Vol. 21, No. 3. (September 2003), pp. 192-202.
- a distraction such as a phone call or the door-bell ringing during playback of a DVD may necessitate the viewer manually rewinding to media content at the point of interruption in order to continue viewing without missing any of this content.
- HDD devices such as TiVoTM allow a user to press a "pause" , button during a live broadcast in order to accommodate an interruption. When the viewer returns, the HDD has started recording the missed content so that the user can return to watching the video content from the interruption time by pressing a "continue” button for example.
- WO2004084054 describes an attempt to automate such facilities by utilising an eye gaze attention recognition system which detects when a viewer's pupils are directed at a viewing screen. Detection of attention or non-attention can then be used to pause playback of media content.
- eye gaze attention recognition system which detects when a viewer's pupils are directed at a viewing screen. Detection of attention or non-attention can then be used to pause playback of media content.
- eye gaze detection equipment which also requires a light source to illuminate the pupils of the viewer.
- PCT published patent application No. WO 2006/061770 and US 2002/144259 describe intelligent pause buttons in which user attentiveness is monitored by some mechanism (several examples are given including using a video camera to monitor user gait's or movements, etc) and if user inattentiveness is detected then it pauses the display of the media until user attentiveness is again detected.
- PCT published patent application WO 2003/026250 describes a system in which user attentiveness is again measured and in the event that user inattentiveness is detected during playback of a stream of media, the quality of the media is reduced. Again a number of different ways of detecting user attentiveness/inattentiveness are described including techniques involving cameras.
- the present invention provides user media content control options automatically based on a face detection system or other means of detecting user attention states. It has been found that detection of a face directed towards a viewing screen is surprisingly highly indicative of user attention. Furthermore, face detection systems can be much simpler if the face to be deteted is always oriented in the same direction (and in the same direction as the training data). Although people do look out of the corners of their eyes, this is typically followed shortly afterwards by reorienting the head in the same direction. Thus using face detection as a mechanism for detecting user attentiveness is especially efficient because the face detection system works best when the face is oriented in the same way as in the training data.
- face detection is simpler than tracking eye gaze and does not require expensive pupil tracking mechanisms.
- the detection equipment does not need to be as sensitive and can detect its targets at a greater distance than eye tracking. This is especially useful in informal viewing situations such as a number of viewers in a lounge watching a TV screen from a distance, perhaps up to 5 metres, rather than a single viewer up close to a computer monitor for example.
- Face detection can be implemented using cheap off-the-shelf equipment such as a web-cam and readily available computer software.
- the system notes or stores a play or distraction index or time (eg media content duration so far) for the media content being played, and continues to play the media content, most preferably at a reduced quality level.
- a play or distraction index or time eg media content duration so far
- the system offers the viewer various viewing control options based on the noted play index. This can be implemented as a user interface for example a graphical user interface superimposed over the playing media content in the form of a user message and/or user actuable soft screen button.
- the user may then choose to continue playing the media content by ignoring the generated user interface, or to "rewind" the media content by actuating or operating the user interface so that the media content re-plays from the noted play index.
- Various other user control options are contemplated including viewing a "summary" of the missed content.
- the system provides a more user friendly control interface than other known approaches. There is no unnecessary pausing and instead a user interface or control option is provided.
- a user may be happy for the media content to continue playing rather than pausing even if they are interrupted, for example when momentarily discussing the content with a fellow viewer or during ad breaks.
- a high number of (unnecessary) pauses can be annoying for the viewer and this is exacerbated where these are based on false detection of a user non-attention state.
- there are typically less false positives ie detection that the user has looked away when in fact they haven't
- face detection than pupil tracking for example
- the user By continuing to play the media, but at a reduced quality level, the user is able to keep half an eye or ear on the media and can therefore reach a good conclusion as to whether or not the "missed" content contains something that the user would like to "see” (whilst paying full attention).
- the quality level By reducing the quality level, the bandwidth occupied by the streaming media is greatly reduced which is beneficial to the operator.
- the system since the system has detected user inattentiveness, it is likely that the user will not graeatly notice or mind the fact that the quality level of the media is reduced.
- a web cam or other digital camera apparatus and face detection software are used in conjunction with computer equipment arranged to play media content.
- the face detection software used in an embodiment is based on an algorithm disclosed in Wu J., Rehg J. M., Mullin M. D. (2004) Learning a Rare Event Detection Cascade by Direct Feature Selection, Advances in Neural Information Processing Systems vol.16 - see also http://www.cc.gatech.edu/ ⁇ wuix/research.htm.
- face detection software could alternatively be used, for example Mobile-I available from Neyen Vision.
- the web cam and face detection software detects a face, and the computer plays the media content such as a movie.
- the absence of a face is detected and used by the computer to store the current play index of the movie as well as to reduce the quality at which the media is being streamed (in the case where the media is being streamed of a connection having a limited amount of bandwidth for which there is competing demand from other users.
- a face is detected which triggers the computer to display a "rewind” button on the screen, and to increase the quality level of streamed media back to the "normal" level appropriate for when a user is viewing the media with attention.
- the computer stops the current playing of the media and restarts playing from the stored play index (or shortly before this); otherwise the rewind button disappears from the display after a brief period and the media continues to play uninterrupted.
- the face detection software and/or computer are arranged to identify detected faces for example by associating an identifier with each detected face.
- the system detects the new face and recognises it as a previously identified face and displays an appropriate or personalised control option for that user.
- the returned user may be offered a summary of the content missed either on the entire screen or in a corner of the screen with the main content continuing to play in the background.
- the summary may be offered on a separate personal device, for example a PDA or phone.
- the system can be configured to receive multiple face detection outputs or readings before basing a decision on those inputs in order to minimise the likelihood of false positives or false negatives - that is determinations that the user is looking away when in fact they are attentive.
- apparatus for playing media content and which is arranged to offer smart media content consumption (for example viewing or listening) options including rewinding to a point where consumption was interrupted.
- the apparatus includes means for detecting a face such as a camera and a face detection system or software executed on a processing platform such as a computer or set-top box.
- the apparatus also comprises means for playing the media content and associating the media content being played with a play index; examples include a DVD player which includes a play index or a streaming media client.
- the apparatus (for example the processing platform) is arranged to store the play index of the media content being played in response to detecting the absence of a previously detected face.
- the apparatus further comprises means for generating a user interface in response to detecting a face following the detection of the absence of a face. This may be a displayed message with a user control input facility such as a remote control device or soft button on the display showing the played media.
- the user interface is deactivated, for example the displayed message is deleted. If however a user input is received through the user interface, the playing means is arranged to re-play at least a part of the media content depending on the stored play index in response to receiving a user input from the user interface. For example the missed content may be replayed from the stored play index, or a play index dependent on this such as a few seconds before.
- the user interface may be implemented by generating a user alert such as a displayed message or audible sound, and facilitating the generation of a control signal in response to the user input, wherein the apparatus is responsive to the control signal to instruct the playing means to re-play the at least a part of the media content.
- Facilitation of the control signal may be achieved by monitoring incoming remote control signals for a predetermined signal for example.
- the apparatus also comprises a face recogniser which may be implemented using face recognition software executed on the processing platform.
- the face recogniser maintains a database of faces so that it can either recognise and identify a newly detected face, or add this to the database together with an identifier which can be used with the rest of the system.
- the face recogniser can be used to recognise a particular recently absent face in order to offer a personal service such as offering to provide a summary of content that user has missed when there are a number of users or viewers using the system.
- a method of playing media content such as a movie and having a play index
- the method comprising playing the media content in response to detecting a face, and storing the current play index in response to detecting the absence of a previously detected face.
- the method further comprises generating a user interface associated with the stored play index in response to detecting a face. Again there is no requirement for recognising this face, merely that a face has again been detected.
- the method further comprises further comprising re-playing at least part of the media content depending on the stored play index in response to receiving a user input from the generated user interface. If no user input is received, the user interface is no longer generated after a short time. In an embodiment the method further comprises identifying each detected face and wherein a personalised user interface is generated each time a previously absent identified face is again recognised .
- a method of playing media comprising: playing the media content in response to detecting a face; pausing the media content in response to detecting the absence of a detected face; resuming playing the media content in response to again detecting a face.
- different measures or indicators of user attentiveness may be used, for example infra-red or other proximity detectors, eye or pupil tracking devices, or even monitoring interaction with the system such as button presses.
- user attention state is used to indicate whether or not a user is attentive or paying attention to the media content being played.
- this can be determined by detecting the presence (attentive) or absence (non-attentive) of a face for example.
- One or a combination of attentiveness or user attention state detectors may be used, and this may be configured to depend on the content being used. For example audio-only content does not require the user to be directly in front of the speaker, but merely within the room say. Whereas a 3D movie may require a user to be positioned within a certain location space for optimum viewing.
- a media server which is supplying the media content is instructed by a media client of a user device to play the media content at a quality level dependent on the detection of a face.
- the user device detects a face notionally viewing a screen onto which the media content would be played, the user device instructs the media server to transmit a higher or the highest quality media content.
- the user device or apparatus instructs the media server to reduce or degrade the media content quality, hence reducing the data rate.
- the server switching to transmit a lower bit-rate stream to the user device, for example one coded using a higher compression ratio compression algorithm, or switching off one of a number of parallel streams or sessions each supporting a different layer in hierarchically encoded data.
- the user device continues to instruct a reduction in media content quality level so that the media content is degraded over time, perhaps to no content at all. Then when user attention (eg a face) is again detected, the media content quality is increased, perhaps initially to the highest quality so that the user does not notice any reduction in quality of the content that is actually viewed as well as also providing a smart rewind function at that time.
- User inattention may occur for a number of reasons, for example because the user or viewer has left the room to get a drink, answer the telephone or door, or looked away to talk with someone else in the room.
- the media server multicasts or transmits multiple streams of the same media content at different quality levels or bit rates, and the user device switches between the streams depending on the user attention state determined.
- the user device may switch off one of a number of parallel streams each supporting a different layer in hierarchically encoded content.
- the media server may be integral with the attention detection apparatus, for example all contained as part of a computer system, rather than having the media server at a remote location.
- the embodiment may be implemented on a personal computer with a DVD player as the media server. In this case, no network is required in order to couple these two parts of the overall system.
- the use of degrading the quality level of the content in this embodiment can be used to reduce the power consumption of the computer by lowering the brightness of the display (or turning it off altogether when not required to show content at all).
- Figure 1 is a schematic showing a system for detecting user attentiveness and offering 10 user controls based thereon;
- Figure 2 is a flow chart illustrating a method of responding to detected user inattentiveness
- Figure 3 is a flow chart illustrating another method of responding to detected user inattentiveness
- Figure 4 is a schematic showing a system for responding to user inattentiveness when watching streamed media
- Figure 5 is a graph illustrating degradation of video on demand content in response to viewer non-attentiveness
- Figure 6 is a flow chart illustrating a method of operating a user device or apparatus for 25 playing the media content
- Figure 7 is a flow chart illustrating a method of operating a media server for playing the media content
- Figure 8 illustrates a layered approach to streaming media content at different quality levels or bit rates
- Figure 9 illustrates a media server architecture for layered streaming media content
- Figure 10 illustrates a streaming client at the user device for receiving layered streamed media content.
- Figure 1 illustrates a system according to an embodiment, the system 100 comprising a user's face 105, a camera 110 or viewing equipment such as a web cam or other camera, a face detector 115, optionally a face recogniser 120, an attention interpreter 125, a media store 130 such as a DVD player containing video content such as a movie, a viewing screen 135, a user interface message display 140 (and optionally a user actuable soft button), and a user input device 145 such as a remote control unit.
- the face detector 115, face recogniser 120, and attention interpreter 125 are typically implemented as software executed on a computing platform such as a personal computer or dedicated video content equipment such as a DVD player or set-top box.
- the viewing screen 135 may be part of the computing platform, or a stand-alone device such as a plasma-TV screen for example.
- the Media Store may be a buffer for streamed content, or may exist remotely over a network - eg Video on Demand (VoD).
- VoD Video on Demand
- the camera 110 is arranged to capture images pointing away from the viewing screen 135 in order to "see" user faces 105 viewing the viewing screen.
- the camera 110 will be positioned at right angles to the plane of the viewing screen 135.
- the camera can be arranged such that the viewing angle of the screen and the camera's field of view are largely coincident; and this may be accomplished with additional lenses if necessary.
- Video or still images from the camera 110 are directed to the face detector 115, which may be implemented using various well known software packages on a computing or processor platform.
- Examples of face detection algorithms include: Wu et al (2004), modules of the Neven Vision's Mobile-ITM face recognition software developer's kit (SDK); Detecting Faces in Images: A Sutyey, Ming-Hsuan Yang, David J. K ⁇ egman and Narendra Ahuja- http://vision.ai.uiuc.edu/rnhyang/papers/pami02a.pdf; C-VIS Computer Vision and Automation GmbH Face Snap or similar technologies. These and similar packages may be either configured simply for face detection (115) or additionally for face recognition (120). The face detector function 115 simply reports whether a face has been detected or how many faces are currently detected within the field of view of the camera 110.
- the face recogniser function 120 either adds a new detected face to a database of recognised faces, or notes a match between a detected face and a face recorded in the database.
- the database may be temporary and used only for each media content playing session - such as each movie viewing. This is because a different set of faces may watch a different movie.
- the database may collect face identities from multiple media content playing sessions, to offer personalised services to regular viewers. Examples might include providing summaries or rewinds for individual viewers depending on content that they personally have missed.
- the Mobile-I TM and some of the other packages attempt to identify faces from any angle, and the system when using these packages can benefit from further identifying "looking ahead" faces, that is only faces that are looking directly (0 degrees) at the screen 135 or within a certain range of angles, for example up to say 20 degrees.
- Various algorithms are available to supplement the basic face recognition packages, including for example that discussed in Wu J., Rehg J. M., Mullin M. D. (2004) Learning a Rare Event Detection Cascade by Direct Feature Selection, Advances in Neural Information Processing Systems vol.16 - see also http://www.cc.gatech.edu/ ⁇ wuix/research.htm.
- the Wu et al algorithm can be configured to detect only "face-on" faces by training it only on face-on examples. From these examples it learns its own rules and these pre-learnt rules can then be used to only detect new faces when they are face- on.
- This algorithm then gives the location of a rectangle within the camera image that contains the face. In order to integrate this with the Neven Mobile-I system, this result is feed directly into the Mobile-I Pre-selector stage (see figure, page 2, Steffen et al).
- This software package is therefore configured to detect faces that are, or filter out faces that are not, "face-on” or attentive and then feed these results on to the face recognizer 120.
- the camera 110 is preferably located very near to the screen 135 so that users viewing the screen will be "face-on" to the camera.
- the attention interpreter 125 interprets the 'outputs of the face detector 115 and face recogniser 120 as user (105) attention states, and controls operation of the media store and a user interface (140, 145) accordingly.
- the attention interpreter 125 is typically implemented as a routine in a computing or processor based device, and two embodiments of this function (125) are illustrated in figures 2 and 3 and described in greater detail below.
- the attention interpreter 125 controls playing of the media store 130 for example playing, and rewinding of a movie played on a DVD player.
- the attention interpreter 125 generates a user interface with the user 105 in order to allow the user to control operation of the media store dependent on the user's attention status as interpreted by the attention interpreter 125.
- the attention interpreter 125 generates a user message display or user alert 140 on the viewing screen 135 superimposed on the content displayed on the screen by the media store 130. This could be provided in a new pop-up window on a Microsoft Windows TM based computer for example, but typically with an overlaid graphic.
- the user display or alert 140 could be provided with a soft button or icon actuable by a computer mouse for example, or simply as a message to operate a remote control device 145 in a particular way.
- the user alert 140 could simply be indicated using a suitable LED, a sound or some other user alerting mechanism, on the screen apparatus 135 or even on another device. The user alert is only shown (or sounded) on the screen apparatus for a short period, allowing it to be ignored.
- a user control input is also enabled to allow the user to activate a predetermined control on the remote control device 145 for example, and which is interpreted by the attention interpreter 125.
- the user control input may be enabled using a dedicated button or control on the remote control device 145 which is only used for the "smart rewind" or other smart functions provided by the system 100.
- a standard function control or button such as the rewind control may be disabled for the media store 130 and instead enabled for the attention interpreter 125 in response to display of the displayed user control input 140.
- coloured buttons may be used together with the user alert "press red button to replay what I missed".
- the user control input may be a mouse click whilst the cursor is positioned over the soft button 140.
- the user control input mechanism need not be a remote control device 145 as typically used with a TV set or other audio-visual provisioning equipment (eg DVD players), but could be integral with this equipment. Either way the user control input is used to record a user input in response to the user alert 140, which generates a control signal (button triggered) or message which causes the attention interpreter 125 to control the media store in predetermined ways.
- the camera could be used to facilitate gesture recognition as a means of user input control.
- a current example of this technology is provided by the Sony EyeToy.
- the attention interpreter 125 is arranged to provide certain "smart" control options available to the user 105 in response to detecting a change in the user's attention status - for example that the user has returned to an attentive state following a period of non- attention.
- This particular example may correspond to the user watching a movie played by the media store 130 on the viewing screen 135, the user being interrupted for example by a phone call and moving or looking away from the screen 135, then looking back to the screen following completion of the phone call. In the meantime the movie has continued playing so that the user has missed some of the content.
- the attention interpreter 125 then offers the user the opportunity to rewind to the content playing when the user first looked away from the screen (a non-attention status) and to replay the content from this point.
- the system may be desirable to configure the system to rewind slightly prior to this time in order to ensure continuity - say one or two seconds before the stored play index, other configurations are also possible in which the re-play starting point is dependent on the stored play index - for example a selectable number of seconds before or after the stored play or distraction index.
- This is implemented by generating the user control input interface comprising in this embodiment a user alert message 140 on the viewing screen 135, and a receiving an appropriate control input from the user via the remote control device 145.
- the media store 130 is then commanded by the attention interpreter 135 to rewind to this point (or slightly before for example)_and start replaying.
- Alternative or additional "smart" user control options may be provided, for example a summary of the missed content. This may be implemented simply as a faster playing of the missed content until this catches up with the current played content, or only playing some of the frames of the content - for example every 10 th frame. This may be provided on the full size of the viewing screen, or in a smaller window with the currently playing content in the background. Alternatively the summary content may be provided by a third party content provider such as a broadcaster for example.
- Figure 2 illustrates a method of operating the attention interpreter 125 for this situation in more detail.
- a distraction or play index is maintained which corresponds to the position or time in the playing of content when the user became non- attentive - for example looking or moving away from the viewing screen 135.
- a media store 130 or the content itself such as a DVD movie will incorporate its own play index, for example to indicate that hounl, minute: 12 and second:45 of a movie is currently being played.
- media store 100 and/or attention interpreter 125 may be configured to associate an independent play index with the content if it does not come already integrated in order to perform the method.
- the play step (205) indicates that the media store 130 is playing the content.
- the method (200) determines whether the rewind button 140 has been shown on the screen 135 and activated by the user on the remote control device 145 (210). If this is the case (210Y), then the media store 130 is instructed to rewind the content to a previously stored distraction or play index (215) before continuing to play the content (205). Otherwise (210N), the method determines whether a summary (or other smart content control function) button has been shown and activated (220). If this is the case (220Y), a summary routine is invoked (225) before continuing to play the content (205).
- the summary function or routine may simply involve playing the missed content (between the stored distraction or play index and the current play index) at an increased speed within a reduced size window of the screen 135. Otherwise (220N), the method receives the output from the face detector 115 (230).
- the face detector output will typically just be the number of faces detected within the camera's field of view - thus if there is only one user the outputs will be 1 for an attentive state or 0 for a non-attentive (looking away or absent) state.
- the number of faces detected corresponds to a face count parameter for each unit time or camera image.
- the method determines whether the current face count is less than, equal to, or greater than a previous face count (235). If the face count is less than the previous face count (235 ⁇ ), this corresponds to a user looking away from the screen (becoming non-attentive) and the distraction or play index is set or stored (240), and then content continues to play (205).
- the distraction index is not set (245N), this corresponds to a new user watching the content, in addition to an existing user. In this case the content continues to play (205) without offering user content control options.
- an option to rewind to the beginning of the content might be offered if the current position of the playing content is close to the beginning (eg the current play index is below a threshold). This situation might correspond to a friend arriving to join the initial user shortly after the start of the content.
- a summary option might be offered. If the distraction index is set (245Y), this corresponds to a user having already watched some of the content returning to watch the content after a period of non-attentiveness.
- the method determines whether the last face count was greater than zero (250). If this is not the case (250N), this means there is only one user to consider, and the rewind button is shown (255). The method then returns to continue playing the content (205) and determines any user response to the rewind button at step (210) as previously described. If the last face count was greater than zero (250Y), this means that an additional user has returned to view the screen 135 so that for example there are now two or more users watching the screen. Upon detecting this new attentive face, the method shows the summary button (260) before returning to continue playing the content (205). Determining whether the summary button has been actuated is then carried out at step (220) as previously described.
- the following example situation further illustrates the effect of this method.
- Andrew sits down to watch a live football match on television, he watches for five minutes when he is interrupted by a telephone call from his friend Ben.
- the conversation lasts for two minutes, before Andrew starts watching again, the television offers to rewind the match to where he left off. He opts not to rewind as he can see there is no score.
- Ben comes over and watches the match with him.
- the system operates to provide the rewind button option.
- the Face Detector counts two faces in the scene instead of one and the Attention Interpreter has a new rule for seeing more than one faces at once which triggers the automatic summarisation function.
- Figure 3 illustrates a method of operating the attention interpreter 125 according to another embodiment which utilises the face recogniser function 120.
- the method recognises the faces rather than just counting them and making certain assumptions about their identity as in the method of figure 2.
- the play step (305) indicates that the media store 130 is playing the content.
- the method (300) of figure 3 then proceeds in the same manner as the method (200) of figure 2, rewinding (315) in response to detecting activation of a rewind button (310) and activating the summary routine (325) in response to detecting activation of the summary button (320).
- the attention interpreter 125 also receives an identifier with each face detected from the face recogniser 120. This may be simply a number assigned to each recognised face and stored in a database. Each identifier or number has its own associated stored play or distraction index corresponding to the respective user.
- the method determines whether the additional face has been seen before - ie is a previously recognised face (345).
- This may be implemented by using a new database of users or faces for each media content playing session (eg movie viewing). If the newly detected face has not been seen before (345N), this corresponds to a completely new or unrecognised face joining the content viewing, for example a friend starting to watch a movie part way through. In this case the content continues to play (305) without offering user content control options.
- an alternative is to offer an option to rewind to the beginning of the content if the current position of the playing content is close to the beginning; or to offer a summary option.
- the method (300) gets the distraction index for the newly re-recognised face (350). The method then determines whether the last face count was greater than zero (355). If this is not the case (355N), this means there is only one user to consider, and the rewind button is shown (260). The method then returns to continue playing the content (305) and determines any user response to the rewind button at step (310). Note however that the distraction index used at step (315) will be the one obtained at step (350) for the respective recognised users.
- the method shows the summary button (365) before returning to continue playing the content (305). Determining whether the summary button has been actuated is then carried out at step (320) as previously described.
- the system described in this scenario or second embodiment provides more complex behaviour than in the first embodiment. In this case it is able to distinguish between the faces of Andrew and Ben using the face recogniser and to offer a personalised service.
- the camera 110 used may be provided with a light source (not shown) to illuminate the faces for detection and recognition.
- the ambient lighting may be relied upon, as this will be contributed to by the content displayed on the screen 135.
- This may be complemented by using a long exposure time for each frame, in this regard it is helpful that viewer's tend to stay relatively motionless whilst watching video content.
- various night mode and "camera-shake" facilities are already “built-in” to many digital cameras and can be used in this situation to improve the facial images provided to the face detector 115 and recogniser 120.
- a sequence of images or frames can be, summed such that each pixel in the resultant image is the sum of the luminance values in corresponding positions in the other frames. The summed images can then be used for face detection/recognition.
- the face detector 115 or the attention interpreter 125 can be configured to only recognise a change of attention status after this has been confirmed on a number of subsequent face detections. For example after 10 image frames from the camera all indicate that there is one less or one more face than previously detected.
- a statistical mode of the observations could be used to give an integer rather than a floating point number from the mean calculation.
- some face detection software packages can be configured to provide a confidence measure related to how confident the software module is that it has detected a face.
- This confidence measure or output can be used in an embodiment by the attention interpreter to decide when a face has become absent, for example by monitoring the confidence measure over time and detecting the absence of a face when the confidence measure drops below a predetermined threshold.
- detection of the absence of a face may only follow a characteristic temporal pattern such as a sharp drop-off in the confidence measure, rather than a gradual decline say which may be due to environmental changes.
- Figure 4 illustrates a system according to an embodiment, the system 1100 comprising a user's face 1105, a camera 1110 or viewing equipment such as a web cam or other camera, a face detector 1115, a session controller 1120, a display screen 1125, a network
- the face detector 1115 and session controller 1120 are typically implemented as software executed on a computing platform such as a personal computer or dedicated video content equipment such as a set-top box.
- the viewing screen 1125 may be part of the computing platform, or a stand-alone device such as a plasma-TV screen for example.
- the network 1130 may be a broadband Internet connection or a wireless network.
- a user device or apparatus 1160 implements the face detection, session control and media content stream receiving and playing on the screen.
- This may be provided on a mobile phone or other wireless device for example, home entertainment equipment, or a computer for example.
- Media content servers 1135 other than a VoD server may alternatively be used as will be understood by those skilled in the art.
- the camera 1110 is arranged to capture images pointing away from the viewing screen 1125 in order to "see” user faces 1105 viewing the viewing screen.
- the camera is arranged to capture images pointing away from the viewing screen 1125 in order to "see" user faces 1105 viewing the viewing screen.
- the camera 1110 will be positioned at right angles to the plane of the viewing screen 1125.
- the camera can be arranged such that the viewing angle of the screen and the camera's field of view are largely coincident; and this may be accomplished with additional lenses if necessary.
- Video or still images from the camera 1110 are directed to the face detector 1115, which may be implemented using various well known software packages on a computing or processor platform. Examples of face detection algorithms include: Wu et al (2004), Neven Vision's Mobile-ITM face recognition software developer's kit (SDK); Detecting Faces in Images: A Suiyey, Ming-Hsuan Yang, David J.
- the face detector function 1115 simply reports whether a face has been detected or how many faces are currently detected within the field of view of the camera 1110.
- the Wu et al algorithm can be configured to detect only "face-on" faces by training it only on face-on examples. From these examples it learns its own rules and these pre-learnt rules can then be used to only detect new faces when they are face- on. Pre-learnt rules may be distributed with the service, so that no learning is required in situ. This algorithm then gives the location of a rectangle within the camera image that contains the face. In an alternative embodiment there may be no restriction on the orientation of detected faces, merely whether faces are detected or not.
- the session controller or control module 1120 interprets the output of the face detector 1115 as a user (1105) attention state, and controls operation of the media store 1135 dependent on the user attention state.
- the session control module 1120 is typically implemented as a routine in a computing or processor based device, and an embodiment of this function (1120) is illustrated in figure 3 and described in greater detail below.
- the session controller 1120 instructs the media store 1135 over the network using known control packets and protocols 1140 for example HTTP and RPC.
- the media server 1135 comprises one or more media content such as movies in a plurality of quality formats or quality levels and can switch between these formats according to instructions 1140 from the session control module 1120.
- Various techniques used for implementing the switching will be known to those skilled in the art, for example server switching between streams at different bit rates, server provision of multiple bit rate streams which the user device or client can switch between, or parallel streams supporting different layers of hierarchically encoded content which can be switched on or off by either the server or client.
- a duplex control link (incorporating return path 1145) may be used for more robust signalling, for example for server 1135 to return acknowledgements following receipt of an instruction from the session module 1120.
- the quality formats range from high quality (large files or high bit-rate streams) to low quality (smaller files or low bit- rate streams) which may be implemented using lower frame rates or image resolution as is known.
- different compression technologies may be used to provide smaller file sizes or lower bit-rate streams, and hence a reduced data rate over the network connection 1150.
- a single media content file or bit stream may be crafted to reduce its bit-rate to the user device 160 for example by sending only intra-coded frames (i-frames) and not the predicted or bi-predictive frames (p-frames and b-frames) as is known in some compression algorithms.
- the media content 1150 provided to the user device 1160 over the network 1130 has a quality level controlled by the session control module 1120 which in turn is dependent on whether a user face 1105 has been detected or not (user attention or non-attention).
- This media content is received by the user device using a streaming client 1165 such as RealPlayer , Windows Media or Apple Quicktime for example which has established a network connection 1150 to the media server 1135.
- the media content is then played by the streaming client 1165 on the screen 1125 and/or an audio transducer such as loud speakers (not shown) at the quality level at which it is received from the media server 1135.
- Figure 5 illustrates a graph of video quality over time.
- t0 is the time after which the video quality is lowered once zero faces are seen (user attention state is non-attentive). This will be largely dependant on the length of pre-buffered video available.
- the dotted line indicates the desired video quality if the VoD Server is able to deliver a gradual decline in quality.
- the solid line indicates how this may be approximated by switching between discrete quality layers or levels using different compression algorithms for example.
- This can be implemented using variable rate coding as is known. Whilst the detailed implementation of variable rate coding is beyond the scope of this specification, the interested reader is directed to A. Ortega, "Variable Bit-rate Video Coding", in Compressed Video over Networks, M.-T. Sun and A. ⁇ R.
- the gradient of the decline may be controlled by the demands of the network (a congested network will benefit from a rapid descent) or the usage patterns of the television (related to the probability that the viewer will return, this may be learnt by the system or defined by the user or programme maker), tl is the time taken for the system to show full video quality on the return of the viewer (user attention state is attentive). As such it should be minimised to avoid any disruption to the viewer's experience.
- the data-rate of the streamed video is successively decreased. This results in bandwidth savings across the Network 130.
- this may be accomplished by the server crafting an individual stream for the device or switching it to a prepared lower-rate source; or in a further alternative implementing the switching at the streaming client 1165 by switching between multicast streams at different bit rates - various other methods of implementing the quality level change will also be known to those skilled in the art.
- the low-rate version may have a slower frame-rate, lower resolution or be otherwise degraded. As audio will be heard at a much wider range than the video, this will typically not be degraded (or at least not as quickly as the video) in a television based implementation for example.
- the degradation is such that the perception of quality decreases gradually, for example using the same approach detailed in Hands, D., Bourret, A., Bayart, D. (2005) Video QoS enhancement using perceptual quality metrics, BT Technology Journal, Vol. 23, No. 2. (April 2005), pp. 208-216.
- Either the VoD Server or Session Control module may therefore be configured to measure the current network traffic and use this to determine if the data- rate is decreased or not.
- the return control path 1145 may be used to refuse an instruction from the session control.
- Face detection may sometimes be in error.
- the face detection function or session controller may be configured to aggregate results, for instance taking the statistical mode of the faces counted over a time window.
- Figure 6 illustrates a method of operating the session controller 1120 in more detail.
- the session controller monitors the output from the face detector 1115, and issues instructions 1140 to the media server 1135 accordingly.
- the play step (605) indicates that the media content is being received over the network 1130 from the media server 1135 and is played on the screen 1125, for example using a streaming client 1165.
- the method (600) determines whether or not faces have been detected by the face detection package 1115 (610). If the output or count of the face detector 1115 is greater than zero (610N) corresponding to a user attention state of "attentive", viewing or using, the session controller 1120 instructs the media server to transmit or stream the media content at the highest quality level (615).
- the session control 1120 may instruct the streaming client 1165 to switch between content streams at different bit rates.
- the server 1135 is arranged to transmit or multicast multiple streams of the same content, but at different bit rates. In this case there is no need to instruct the server 1135.
- the client can be configured to switch on or off parallel streams or sessions associated with the different layers in order to change the content quality. The method then returns to playing the media content (605) which will now or shortly be at a higher quality level (eg higher data rate).
- the increased quality level may be implemented by rapidly increasing the quality level over a series of intermediate quality levels, in order to allow time for the higher quality levels to be buffered before playing. Again this is described in Walker et al (2003), especially Section 4.
- the server switches between different bit rate content streams so that the user device continues to receive the same stream or maintains the same session but at different bit rates according to the server switching. This avoids having any delay introduced whilst the high quality media content is buffered at the user device 1160 before it can be played.
- An intermediate quality level media content stream may be played with little or no buffering immediately whilst the high quality media content is buffered then played so that there is no interruption.
- the method determines whether a predetermined length of time (t ⁇ ) has elapsed since the last face was detected or the last quality level reduction was instructed (620). This avoids the quality being reduced too quickly, for example if the viewer has only momentarily looked away from the screen 1125. If the length of time has not been of sufficient duration (t2) since the last quality reduction (620N), then the method returns to playing the media content (605). If however there is been a predetermined period of time (t2) since the last face was detected or the last quality reduction instructed (620), then the method instructs the media server 1135 to reduce the quality level by one quality level (625). This may continue so that over time, with no faces detected, the quality level of the media content eventually falls to the lowest level - which may be zero data rate or a black screen for example. Again the reduce quality instruction to the media server 1135 may be in any suitable format.
- the length of the time based parameters tO and t2 can be configured according to system design.
- tO may be derived from the storage/network constraints as it reflects the size of the video buffer at the client - assuming that already received frames don't want to be thrown-away.
- t2 may be set according to the speed at which the video is intended to decline.
- Figure 7 illustrates a method of operating the media server 1135 in more detail.
- the media server having set up a connection with the streaming client 1165 on the user device 1160 transmits or streams packets of media content to the streaming client (405).
- the media content streamed is at a particular quality level or data rate, and the same media content is also stored in different formats having different assigned quality levels. For example lower quality media content may be highly compressed, have a low frame rate or image resolution, or a combination of these.
- the method then "listens" for quality control instructions from the user device's session controller 1120 (410). If no new quality level instruction is received (410N), then the media server continues to play the media content at the same quality level (405).
- the method sets the streaming quality level to the instructed level by switching to a different content stream (415). This may be implemented simply by switching to transmitting a different content file, matching the play index of the previous stream to the new stream so that streaming of the new file starts at the right location. Again the mechanism for switching between quality levels is described in -Walker et al (2003) and WO03/009581 as noted previously. The method then transmits or streams the lower quality content (405) until instructed to again change this by the user devices session controller 120.
- the camera 1110 used may be provided with a light source (not shown) to illuminate the faces for detection and recognition.
- the ambient lighting may be relied upon, as this will be contributed to by the content displayed on the screen 1125.
- This may be complemented by using a long exposure time for each frame, in this regard it is helpful that viewers tend to stay relatively motionless whilst watching video content.
- various night mode and "camera-shake" facilities are already “built-in” to many digital cameras and can be used in this situation to improve the facial images provided to the face detector 1115.
- a sequence of images or frames can be summed such that each pixel in the resultant image is the sum of the luminance values in corresponding positions in the other frames. The summed images can then be used for face detection/recognition.
- the face detector 1115 or the session controller 1120 can be configured to only recognise a change of attention status after this has been confirmed on a number of subsequent face detections. For example after 10 image frames from the camera all indicate that there is one less or one more face than previously detected.
- step 610 This might be implemented in the method of figure 6 for example at step 610 by implementing an additional routine holding the "face count” parameter to that noted at time x, then comparing the current "face count” parameter 10 times (once for each camera image say) at times x+1, x+2,...x+9, and then determining whether the average "face count” parameter is less than, greater than, or equal to the "last face count” parameter.
- a statistical mode of the observations could be used to give an integer rather than a floating point number from the mean calculation.
- some face detection software packages can be configured to provide a confidence measure related to how confident the software module is that it has detected a face.
- This confidence measure or output can be used in an embodiment by the attention interpreter to decide when a face has become absent, for example by monitoring the confidence measure over time and detecting the absence of a face when the confidence measure drops below a predetermined threshold.
- detection of the absence of a face may only follow a characteristic temporal pattern such as a sharp drop-off in the confidence measure, rather than a gradual decline say which may be due to environmental changes.
- the streaming client 1165 can set up multiple RTP sessions each associated with a different quality level of the content, for example the content in different compression formats or at lower frame rates. Then as the required quality level changes, either the media server 1135 starts transmission at the different quality level on one session and stops transmission on the current quality level on another session, such that the bits received by the receiving device 1160 change bit rate. Alternatively the same stream or session is maintained but the bit stream used by the server is switched, where each bit stream has a different bit rate. The received bits will be buffered and the rate at which the buffered bits are taken and decoded from the buffer by the client 1165 will also be changed to correspond with the new bit rate.
- a hierarchical coding approach in which the original content for example video data is encoded into a number of discrete streams called layers, where the first layer consists of basic data of a relatively poor quality and where successive layers represent more detailed information so that layers can be added to increase the image quality or layers can be taken away to degrade the image or other content quality level - this effectively means that the bit rate is decreased when layers are removed or increased when layers are added providing the required changes in quality level.
- Layered video compression is known from the 1998 version of H.263, but may equally be any other codec, such as MPEG4.
- Each layer in the hierarchy is coded in such a way as to allow the quality of individual pictures to be enhanced and their resolution to be increased, and additional pictures to be included to increase the overall picture rate.
- Figure 8 shows a typical dependency between pictures in an H.263 scalable layered codec, with boxes representing frames for each layer and arrows showing dependency between frames.
- the lowest row shows original, uncoded frames.
- the next row shows the. lowest layer (Layer 0) of the hierarchy which is coded at half the frame rate of Layer 1.
- Frames in Layer 0 are predicted from the previously encoded frame, as in conventional video compression.
- Frames in Layer 1 may be predicted from the previously encoded frame in Layer 1 , and if present, temporally simultaneous Layer 0 encoded frame.
- Frames in Layer 2 may be predicted from the previously encoded frame in Layer 2, and if present, temporally simultaneous Layer 1 and Layer 0 encoded frame.
- the H.263 specification allows for 15 layers, though a smaller number can be used in practical embodiments.
- FIG. 9 illustrates a media server 1135 which uses the layered approach to content delivery.
- the media content in this case is stored in a data-store 905 already compressed, although it could be received from a live feed for example.
- the content is packetised by an RTP packetiser 910 according to the Real-time Transport Protocol (RTP), although other protocols could alternatively be used.
- RTP Real-time Transport Protocol
- the packetiser 910 attaches an RTP header to the packets, as well as an H.263 Payload Header as is known.
- the payload header contains video specific information such as motion vector predictors.
- the packets are numbered by a packet numbering function 915 in order, to allow the receiving client to recover the correct order of the content packets.
- the layered encoding process uses a control strategy together with an output buffer for each layer in order to provide each layers constant bit rate output.
- Each layer is transmitted as an independent RTP session on a separate IP address by a corresponding session handler 925.
- the rate at which data is transmitted is controlled by the Transfer Rate Control module 920 which counts Layer 0 bytes to ensure that the correct number are transmitted in a given period of time.
- the transmission rate of the outer layers is smoothed and locked to the rate of Layer 0 using First-In-First-Out buffer elements 930.
- FIG 10 illustrates a streaming client 1165 which uses the layered approach to content delivery and reception.
- Each RTP/RTCP session associated with each layer of encoded data has a session handler 705 at the client which is responsible for receiving RTP packets from the network. These packets are forwarded to a blender module 710 which receives the packets in the order in which they were received from the network. This may not be the order required for decoding because of packet inter-arrival jitter or packet loss.
- the blender 710 uses the packet numbers in the RTP headers to arrange the packets from each layer in the right order, and then combines the packets from all the received layers.
- the output from the blender 710 is a single stream of packets in the correct order for decoding.
- the packets are then sent to a decoder 715 for decoding into video samples or pictures.
- the client 1165 also comprises a controller 720 which controls operation of the RTP session handlers 705 and blender/buffer 710.
- the session control 1120 instructs the server 1135 to drop or add a layer.
- This may be implemented in the present embodiment by the server instructing a corresponding RTP handler (for example 925 3 for Layer 3) to close a current session with corresponding handler (705 3 ) at the client 1165, or to open a new session in order to transfer the contents of its output buffer 93O 3 .
- the session control 120 may directly instruct the controller 720 in the client 165 to open or close an RTP session using a local session handler 705, depending on which layer is to be added/dropped.
- the various layer RTP sessions may be maintained open, but corresponding layer encoded packets may not be sent depending on the quality level currently required. This may be implemented at the FIFO buffers 930 for example, with packets being dropped after a certain time. Then when a higher quality level is requested, the packets are routed to the corresponding RTP handler 925 where the RTP session is already open so that there is no delay in increasing the quality level of the content provided.
- the low bit-rate RTP session provides low bit rate packets to the blender which starts filling up its newly enlarged or lengthened buffer. Meanwhile the packets from the higher layers start arriving and can be combined with the low bit-rate packets waiting in the buffer in order to form the higher quality content. Initially the low rate packets can be sent to the decoder in order to maintain the content at an initial low rate, then increasingly enlarged batches of packets are sent from the buffer to the decoder to provide the higher quality level content.
- a final embodiment (not shown) combines certain of the above separately described features so as to provide an embodiment in which as well as reducing the quality of streamed media in response to detecting inattentiveness, on detecting re-attentiveness, as well as increasing the quality back to the higher level, the system also offers the user a smart option such as a smart rewind option or a summary option.
- real-time video streaming for example from a camera or other source of unencoded or otherwise high quality video data could be provided.
- the embodiments may be implemented on a range of apparatus for example set-top boxes coupled to a VoD system over the Internet, or wireless mobile devices such as mobile phones where the network is a wireless network such as GPRS for example.
- processor control code for example on a carrier medium such as a disk, CD- or DVD-ROM, programmed memory such as read only memory (Firmware), or on a data carrier such as an optical or electrical signal carrier.
- a carrier medium such as a disk, CD- or DVD-ROM
- programmed memory such as read only memory (Firmware)
- a data carrier such as an optical or electrical signal carrier.
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- the code may comprise conventional programme code or microcode or, for example code for setting up or controlling an ASIC or FPGA.
- the code may also comprise code for dynamically configuring re-configurable apparatus such as re-programmable logic gate arrays.
- the code may comprise code for a hardware description language such as Verilog TM or VHDL (Very high speed integrated circuit Hardware Description Language).
- Verilog TM or VHDL Very high speed integrated circuit Hardware Description Language
- the code may be distributed between a plurality of coupled components in communication with one another.
- the embodiments may also be implemented using code running on a field- (re)programmable analogue array or similar device in order to configure analogue hardware.
- a method of operating an electronic device to play media content comprising: playing the media content at a first quality level; determining an attention state of a user of the electronic device; playing the media content at a second quality level in response to detecting a change in the user attention state.
- determining the user attention state comprises detecting the presence or absence of a face within a predetermined area.
- the media content is received over a network from a media server, and wherein the method further comprises switching from a first media content stream at a first bit rate corresponding to the first quality level to a second media content stream at a second bit rate corresponding to the second quality level in response to determining a change in the user attention state.
- An electronic device for playing media content comprising: means for determining an attention state of a user of the electronic device; means for playing the media content at a quality level dependent on the user attention state.
- the means for determining the user attention state comprises means for detecting faces.
- the playing means comprises: a session control module; means for receiving and playing media content transmitted from the media server; the receiving means arranged to switch from a first media content stream at a first bit rate corresponding to the first quality level to a second media content stream at a second bit rate corresponding to the second quality level in response to the session control determining a change in the user attention state.
- the playing means comprises: a session control module for communicating with a media server; means for receiving and playing media content transmitted from the media server; the session control module arranged to instruct the media server to transmit the media content at a quality level dependent on the user attention state.
- a device wherein the media content is played at a further reduced quality level for each consecutive predetermined period in which the absence of a face is detected.
- a system for playing media content comprising: an electronic device for playing the media content according to any one of clauses 8 to 13; a network coupled to the device and a media server for transmitting the media content over the network to the device.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Social Psychology (AREA)
- Biomedical Technology (AREA)
- Theoretical Computer Science (AREA)
- Computer Graphics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
L'invention a trait à un procédé permettant d'offrir à un ou plusieurs utilisateurs des options intelligentes de commande utilisateur destinées à la consommation d'un contenu multimédia tel que des images vidéo. L'invention concerne également un appareil de lecture d'un contenu multimédia, qui comprend : un moyen conçu pour détecter un visage (110, 115); un moyen conçu pour lire le contenu multimédia et associer le contenu multimédia en cours de lecture à un index de lecture (130); un moyen conçu pour stocker l'index de lecture du contenu multimédia en cours de lecture, en réponse à la détection de l'absence d'un visage précédemment détecté (240); un moyen conçu pour générer une interface utilisateur (140, 145) en réponse à la détection d'un visage suivant la détection de l'absence de visage. Le moyen de lecture est configuré pour relire au moins une partie du contenu multimédia en fonction de l'index de lecture stocké, en réponse à la réception d'une entrée utilisateur provenant de l'interface utilisateur.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP06251933.5 | 2006-04-05 | ||
EP06251932A EP1843591A1 (fr) | 2006-04-05 | 2006-04-05 | Appareil intelligent de reproduction de contenu multimedia avec détection de l'attention de l'utilisateur, méthode et support d'enregistrement correspondants |
EP06251933A EP1843592A1 (fr) | 2006-04-05 | 2006-04-05 | Contrôle de la qualité d'un contenu média |
EP06251932.7 | 2006-04-05 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2007113580A1 true WO2007113580A1 (fr) | 2007-10-11 |
Family
ID=38229812
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB2007/001288 WO2007113580A1 (fr) | 2006-04-05 | 2007-04-05 | Dispositif intelligent de lecture de contenu multimédia doté d'une fonction de détection d'attention de l'utilisateur, procédé et support d'enregistrement associés |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2007113580A1 (fr) |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2459705A (en) * | 2008-05-01 | 2009-11-04 | Sony Computer Entertainment Inc | Media reproducing device with user detecting means |
WO2010026187A1 (fr) * | 2008-09-05 | 2010-03-11 | Skype Limited | Système de communication |
WO2010142869A1 (fr) | 2009-06-08 | 2010-12-16 | Weballwin | Procede pour le controle de l'attention d'un utilisateur regardant un flux multimedia sur un ecran d'un appareil multimedia connecte a un reseau et systemes le mettant en œuvre |
EP2466771A1 (fr) * | 2010-12-16 | 2012-06-20 | Gérard Olivier | Dispositif d'audimétrie intelligent |
US8413199B2 (en) | 2008-09-05 | 2013-04-02 | Skype | Communication system and method |
US20130089006A1 (en) * | 2011-10-05 | 2013-04-11 | Qualcomm Incorporated | Minimal cognitive mode for wireless display devices |
US8421839B2 (en) | 2008-09-05 | 2013-04-16 | Skype | Peripheral device for communication over a communications system |
US8473994B2 (en) | 2008-09-05 | 2013-06-25 | Skype | Communication system and method |
US8489691B2 (en) | 2008-09-05 | 2013-07-16 | Microsoft Corporation | Communication system and method |
US8520050B2 (en) | 2008-09-05 | 2013-08-27 | Skype | Communication system and method |
US20140081748A1 (en) * | 2012-09-14 | 2014-03-20 | International Business Machines Corporation | Customized television commercials |
WO2014108194A1 (fr) * | 2013-01-10 | 2014-07-17 | Telefonaktiebolaget L M Ericsson (Publ) | Appareil et procédé pour commander une diffusion en continu adaptative de contenu multimédia |
WO2014085145A3 (fr) * | 2012-11-29 | 2014-07-24 | Qualcomm Incorporated | Procédés et appareil pour utiliser un engagement d'utilisateur afin de fournir une présentation de contenu |
US8866628B2 (en) | 2008-09-05 | 2014-10-21 | Skype | Communication system and method |
CN104737099A (zh) * | 2012-08-31 | 2015-06-24 | 谷歌公司 | 视频质量的动态调整 |
CN104767962A (zh) * | 2015-01-16 | 2015-07-08 | 京东方科技集团股份有限公司 | 多用途会议终端和多用途会议系统 |
US9690455B1 (en) * | 2014-04-17 | 2017-06-27 | Google Inc. | Methods, systems, and media for providing media guidance based on detected user events |
US20170364142A1 (en) * | 2015-08-12 | 2017-12-21 | Boe Technology Group Co., Ltd. | Distance sensing substrate, display device, display system and resolution adjustment method |
EP3261354A1 (fr) * | 2013-06-05 | 2017-12-27 | Thomson Licensing | Procédé et appareil de distribution de contenu pour un affichage multi-écran |
US9872199B2 (en) | 2015-09-22 | 2018-01-16 | Qualcomm Incorporated | Assigning a variable QCI for a call among a plurality of user devices |
US9930386B2 (en) | 2013-06-05 | 2018-03-27 | Thomson Licensing | Method and apparatus for content distribution multiscreen viewing |
EP2404411B1 (fr) * | 2009-03-06 | 2018-05-02 | Alcatel Lucent | Gestion de bandes passantes de diffusion multimédia en continu en temps réel |
WO2018108284A1 (fr) * | 2016-12-15 | 2018-06-21 | Telefonaktiebolaget Lm Ericsson (Publ) | Dispositif d'enregistrement audio pour présenter un discours audio manqué en raison de l'absence d'attention de l'utilisateur et procédé associé |
JP2018530277A (ja) * | 2015-09-01 | 2018-10-11 | トムソン ライセンシングThomson Licensing | 注目検出に基づくメディア・コンテンツ制御のための方法、システムおよび装置 |
WO2019026360A1 (fr) * | 2017-07-31 | 2019-02-07 | ソニー株式会社 | Dispositif de traitement d'informations et procédé de traitement d'informations |
US10212474B2 (en) | 2013-06-05 | 2019-02-19 | Interdigital Ce Patent Holdings | Method and apparatus for content distribution for multi-screen viewing |
US11064264B2 (en) | 2018-09-20 | 2021-07-13 | International Business Machines Corporation | Intelligent rewind function when playing media content |
US11438642B2 (en) * | 2018-08-23 | 2022-09-06 | Rovi Guides, Inc. | Systems and methods for displaying multiple media assets for a plurality of users |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020144259A1 (en) * | 2001-03-29 | 2002-10-03 | Philips Electronics North America Corp. | Method and apparatus for controlling a media player based on user activity |
US20030052911A1 (en) * | 2001-09-20 | 2003-03-20 | Koninklijke Philips Electronics N.V. | User attention-based adaptation of quality level to improve the management of real-time multi-media content delivery and distribution |
US20050281531A1 (en) * | 2004-06-16 | 2005-12-22 | Unmehopa Musa R | Television viewing apparatus |
WO2006061770A1 (fr) * | 2004-12-07 | 2006-06-15 | Koninklijke Philips Electronics N.V. | Bouton pause intelligent |
-
2007
- 2007-04-05 WO PCT/GB2007/001288 patent/WO2007113580A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020144259A1 (en) * | 2001-03-29 | 2002-10-03 | Philips Electronics North America Corp. | Method and apparatus for controlling a media player based on user activity |
US20030052911A1 (en) * | 2001-09-20 | 2003-03-20 | Koninklijke Philips Electronics N.V. | User attention-based adaptation of quality level to improve the management of real-time multi-media content delivery and distribution |
WO2003026250A1 (fr) * | 2001-09-20 | 2003-03-27 | Koninklijke Philips Electronics N.V. | Adaptation de qualite permettant la distribution de contenu multimedia en temps reel fondee sur l'attention des utilisateurs |
US20050281531A1 (en) * | 2004-06-16 | 2005-12-22 | Unmehopa Musa R | Television viewing apparatus |
WO2006061770A1 (fr) * | 2004-12-07 | 2006-06-15 | Koninklijke Philips Electronics N.V. | Bouton pause intelligent |
Cited By (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2459705A (en) * | 2008-05-01 | 2009-11-04 | Sony Computer Entertainment Inc | Media reproducing device with user detecting means |
US8774592B2 (en) | 2008-05-01 | 2014-07-08 | Sony Computer Entertainment Inc. | Media reproduction for audio visual entertainment |
GB2459705B (en) * | 2008-05-01 | 2010-05-12 | Sony Computer Entertainment Inc | Media reproducing device, audio visual entertainment system and method |
US8407749B2 (en) | 2008-09-05 | 2013-03-26 | Skype | Communication system and method |
US9654726B2 (en) | 2008-09-05 | 2017-05-16 | Skype | Peripheral device for communication over a communications system |
US9128592B2 (en) | 2008-09-05 | 2015-09-08 | Skype | Displaying graphical representations of contacts |
US8413199B2 (en) | 2008-09-05 | 2013-04-02 | Skype | Communication system and method |
US8866628B2 (en) | 2008-09-05 | 2014-10-21 | Skype | Communication system and method |
WO2010026187A1 (fr) * | 2008-09-05 | 2010-03-11 | Skype Limited | Système de communication |
US8421839B2 (en) | 2008-09-05 | 2013-04-16 | Skype | Peripheral device for communication over a communications system |
US8473994B2 (en) | 2008-09-05 | 2013-06-25 | Skype | Communication system and method |
US8489691B2 (en) | 2008-09-05 | 2013-07-16 | Microsoft Corporation | Communication system and method |
US8520050B2 (en) | 2008-09-05 | 2013-08-27 | Skype | Communication system and method |
EP2404411B1 (fr) * | 2009-03-06 | 2018-05-02 | Alcatel Lucent | Gestion de bandes passantes de diffusion multimédia en continu en temps réel |
WO2010142869A1 (fr) | 2009-06-08 | 2010-12-16 | Weballwin | Procede pour le controle de l'attention d'un utilisateur regardant un flux multimedia sur un ecran d'un appareil multimedia connecte a un reseau et systemes le mettant en œuvre |
EP2466771A1 (fr) * | 2010-12-16 | 2012-06-20 | Gérard Olivier | Dispositif d'audimétrie intelligent |
WO2013052887A1 (fr) * | 2011-10-05 | 2013-04-11 | Qualcomm Incorporated | Mode cognitif minimal pour dispositifs d'affichage sans fil |
KR101604296B1 (ko) * | 2011-10-05 | 2016-03-25 | 퀄컴 인코포레이티드 | 무선 디스플레이 디바이스들에 대한 최소 인식 모드 |
CN104041064A (zh) * | 2011-10-05 | 2014-09-10 | 高通股份有限公司 | 无线显示设备的最小认知模式 |
US20130089006A1 (en) * | 2011-10-05 | 2013-04-11 | Qualcomm Incorporated | Minimal cognitive mode for wireless display devices |
CN104737099B (zh) * | 2012-08-31 | 2018-05-08 | 谷歌有限责任公司 | 视频质量的动态调整 |
EP2891039A4 (fr) * | 2012-08-31 | 2016-04-27 | Google Inc | Ajustement dynamique de la qualité vidéo |
CN108347648A (zh) * | 2012-08-31 | 2018-07-31 | 谷歌有限责任公司 | 视频质量的动态调整 |
CN104737099A (zh) * | 2012-08-31 | 2015-06-24 | 谷歌公司 | 视频质量的动态调整 |
US9652112B2 (en) | 2012-08-31 | 2017-05-16 | Google Inc. | Dynamic adjustment of video quality |
US20140081748A1 (en) * | 2012-09-14 | 2014-03-20 | International Business Machines Corporation | Customized television commercials |
US20140081749A1 (en) * | 2012-09-14 | 2014-03-20 | International Business Machines Corporation | Customized television commercials |
JP2016504836A (ja) * | 2012-11-29 | 2016-02-12 | クゥアルコム・インコーポレイテッドQualcomm Incorporated | コンテンツ提示を提供するためにユーザエンゲージメントを使用するための方法および装置 |
US9398335B2 (en) | 2012-11-29 | 2016-07-19 | Qualcomm Incorporated | Methods and apparatus for using user engagement to provide content presentation |
WO2014085145A3 (fr) * | 2012-11-29 | 2014-07-24 | Qualcomm Incorporated | Procédés et appareil pour utiliser un engagement d'utilisateur afin de fournir une présentation de contenu |
CN104813678A (zh) * | 2012-11-29 | 2015-07-29 | 高通股份有限公司 | 用于使用用户参与度来提供内容呈现的方法和装置 |
WO2014108194A1 (fr) * | 2013-01-10 | 2014-07-17 | Telefonaktiebolaget L M Ericsson (Publ) | Appareil et procédé pour commander une diffusion en continu adaptative de contenu multimédia |
CN105359479A (zh) * | 2013-01-10 | 2016-02-24 | 瑞典爱立信有限公司 | 控制自适应流播媒体的装置和方法 |
US9930386B2 (en) | 2013-06-05 | 2018-03-27 | Thomson Licensing | Method and apparatus for content distribution multiscreen viewing |
US10212474B2 (en) | 2013-06-05 | 2019-02-19 | Interdigital Ce Patent Holdings | Method and apparatus for content distribution for multi-screen viewing |
EP3261354A1 (fr) * | 2013-06-05 | 2017-12-27 | Thomson Licensing | Procédé et appareil de distribution de contenu pour un affichage multi-écran |
US10416853B2 (en) | 2014-04-17 | 2019-09-17 | Google Llc | Methods, systems, and media for providing media guidance based on detected user events |
US9690455B1 (en) * | 2014-04-17 | 2017-06-27 | Google Inc. | Methods, systems, and media for providing media guidance based on detected user events |
US9888126B2 (en) | 2015-01-16 | 2018-02-06 | Boe Technology Group Co., Ltd. | Multipurpose conferencing terminal and multipurpose conference system |
CN104767962B (zh) * | 2015-01-16 | 2019-02-15 | 京东方科技集团股份有限公司 | 多用途会议终端和多用途会议系统 |
EP3070936A1 (fr) * | 2015-01-16 | 2016-09-21 | BOE Technology Group Co., Ltd. | Terminal multifonction pour conférence et système multifonction pour conférence |
CN104767962A (zh) * | 2015-01-16 | 2015-07-08 | 京东方科技集团股份有限公司 | 多用途会议终端和多用途会议系统 |
EP3070936A4 (fr) * | 2015-01-16 | 2017-03-29 | BOE Technology Group Co., Ltd. | Terminal multifonction pour conférence et système multifonction pour conférence |
US10228759B2 (en) * | 2015-08-12 | 2019-03-12 | Boe Technology Group Co., Ltd. | Distance sensing substrate, display device, display system and resolution adjustment method |
US20170364142A1 (en) * | 2015-08-12 | 2017-12-21 | Boe Technology Group Co., Ltd. | Distance sensing substrate, display device, display system and resolution adjustment method |
JP2018530277A (ja) * | 2015-09-01 | 2018-10-11 | トムソン ライセンシングThomson Licensing | 注目検出に基づくメディア・コンテンツ制御のための方法、システムおよび装置 |
US9872199B2 (en) | 2015-09-22 | 2018-01-16 | Qualcomm Incorporated | Assigning a variable QCI for a call among a plurality of user devices |
WO2018108284A1 (fr) * | 2016-12-15 | 2018-06-21 | Telefonaktiebolaget Lm Ericsson (Publ) | Dispositif d'enregistrement audio pour présenter un discours audio manqué en raison de l'absence d'attention de l'utilisateur et procédé associé |
WO2019026360A1 (fr) * | 2017-07-31 | 2019-02-07 | ソニー株式会社 | Dispositif de traitement d'informations et procédé de traitement d'informations |
JPWO2019026360A1 (ja) * | 2017-07-31 | 2020-05-28 | ソニー株式会社 | 情報処理装置および情報処理方法 |
US11250873B2 (en) | 2017-07-31 | 2022-02-15 | Sony Corporation | Information processing device and information processing method |
US11438642B2 (en) * | 2018-08-23 | 2022-09-06 | Rovi Guides, Inc. | Systems and methods for displaying multiple media assets for a plurality of users |
US11812087B2 (en) | 2018-08-23 | 2023-11-07 | Rovi Guides, Inc. | Systems and methods for displaying multiple media assets for a plurality of users |
US12081820B2 (en) | 2018-08-23 | 2024-09-03 | Rovi Guides, Inc. | Systems and methods for displaying multiple media assets for a plurality of users |
US11064264B2 (en) | 2018-09-20 | 2021-07-13 | International Business Machines Corporation | Intelligent rewind function when playing media content |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2007113580A1 (fr) | Dispositif intelligent de lecture de contenu multimédia doté d'une fonction de détection d'attention de l'utilisateur, procédé et support d'enregistrement associés | |
US11366632B2 (en) | User interface for screencast applications | |
US11651794B2 (en) | Variable speed playback | |
EP1843591A1 (fr) | Appareil intelligent de reproduction de contenu multimedia avec détection de l'attention de l'utilisateur, méthode et support d'enregistrement correspondants | |
EP1843592A1 (fr) | Contrôle de la qualité d'un contenu média | |
CN113141514B (zh) | 媒体流传输方法、系统、装置、设备及存储介质 | |
US20220174357A1 (en) | Simulating audience feedback in remote broadcast events | |
US9167312B2 (en) | Pause-based advertising methods and systems | |
US11930250B2 (en) | Video assets having associated graphical descriptor data | |
US20100122277A1 (en) | device and a method for playing audio-video content | |
WO2006041996A2 (fr) | Procede pour reduire au minimum les effets de retard de la memoire tampon lors de la diffusion en continu d'un contenu numerique | |
JP7155164B2 (ja) | リバッファリングイベントの時間的配置 | |
WO2003058965A1 (fr) | Service de conference avec presentation synchrone de programmes media | |
JP2002077820A (ja) | 蓄積再生装置およびデジタル放送送信装置 | |
JP2009224818A (ja) | コンテンツ再生装置およびコンテンツ再生方法 | |
KR20230074544A (ko) | 실시간 및 파일 기반 오디오 데이터 프로세싱 | |
JP4994942B2 (ja) | 情報処理装置、情報処理方法及び情報処理システム | |
JP6034113B2 (ja) | 映像コンテンツ配信装置 | |
WO2020128625A1 (fr) | Procédé pour commander un dispositif électronique pendant la lecture d'un contenu audiovisuel | |
WO2013178500A1 (fr) | Système de diffusion audio/vidéo interactive, procédé de fonctionnement associé et dispositif d'utilisateur pour le fonctionnement dans le système de diffusion audio/vidéo interactive | |
US11949948B2 (en) | Playback control based on image capture | |
JP2008054150A (ja) | 複数チャンネル画像転送装置 | |
KR20050074667A (ko) | 다채널, 원격 카메라 제어 기능을 가진 개인 휴대용 단말기용 무선 인터넷 실시간 멀티미디어 전송 서버/클라이언트 | |
CN114827715A (zh) | 显示设备和媒资播放方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07732332 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 07732332 Country of ref document: EP Kind code of ref document: A1 |