US12316921B2 - Content classifiers for automatic picture and sound modes - Google Patents
Content classifiers for automatic picture and sound modes Download PDFInfo
- Publication number
- US12316921B2 US12316921B2 US17/743,898 US202217743898A US12316921B2 US 12316921 B2 US12316921 B2 US 12316921B2 US 202217743898 A US202217743898 A US 202217743898A US 12316921 B2 US12316921 B2 US 12316921B2
- Authority
- US
- United States
- Prior art keywords
- settings
- audio
- media content
- display
- machine learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/251—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/252—Processing of multiple end-users' preferences to derive collaborative data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/4508—Management of client data or end-user data
- H04N21/4532—Management of client data or end-user data involving end-user characteristics, e.g. viewer profile, preferences
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/466—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/4662—Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/466—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/4662—Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
- H04N21/4666—Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms using neural networks, e.g. processing the feedback provided by the user
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/485—End-user interface for client configuration
- H04N21/4852—End-user interface for client configuration for modifying audio parameters, e.g. switching between mono and stereo
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/485—End-user interface for client configuration
- H04N21/4854—End-user interface for client configuration for modifying image parameters, e.g. image brightness, contrast
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/65—Transmission of management data between client and server
- H04N21/654—Transmission by server directed to the client
- H04N21/6547—Transmission by server directed to the client comprising parameters, e.g. for client setup
Definitions
- This disclosure is generally directed to display screen technology, and more particularly to automatic picture and sound selection for data streaming.
- TV picture and audio settings can have a dramatic impact on the quality of the TV viewing experience.
- sports content and movie content may benefit from optimized picture and sound settings.
- Modern smart TVs attempt to adjust picture and audio settings automatically using Automatic content recognition (ACR), AI-based scene detection, and other tools, however today's methods require sampling playback in order to determine the corrections which should be applied. This limits how invasive the corrections can be without creating a jarring experience for the viewer. It can therefore be very complicated to create a single solution to handle a plurality of parameters and find an optimum set of combinations to select picture and sound parameters.
- ACR Automatic content recognition
- AI-based scene detection and other tools
- An optimal picture and sound selection may be driven by many streaming parameters, such as, television parameters, media constructs, available bitrates, encoding techniques, network connection type (wired, WiFi, mobile), WiFi performance, processor performance, etc.
- the technology as described herein, in some embodiments, may be configured to improve picture and audio settings before the streamed data is played back on a display device, such as a television (TV).
- the technology may be applied broadly to any configurable aspect of streaming video and/or audio.
- FIG. 1 illustrates a block diagram of a multimedia environment, according to some embodiments.
- FIG. 2 illustrates a block diagram of a streaming media device, according to some embodiments.
- FIG. 3 illustrates a block diagram of audio/display settings selection, according to some embodiments.
- FIG. 4 illustrates a diagram of crowdsourcing audio/display settings, according to some embodiments.
- FIG. 5 illustrates a block diagram of a machine learning audio/display settings selection, according to some embodiments.
- FIG. 6 illustrates a table diagram of media content sound and picture modes, according to some embodiments.
- FIG. 7 illustrates a table diagram of display device picture modes and display settings, according to some embodiments.
- FIG. 8 illustrates a table diagram of display device sound modes and audio settings, according to some embodiments.
- FIG. 9 illustrates a process diagram of audio/display settings selection, according to some embodiments.
- FIG. 10 illustrates a machine learning process diagram of audio/display settings selection, according to some embodiments.
- FIG. 11 illustrates a crowdsource machine learning process diagram of an audio/display settings selection, according to some embodiments.
- FIG. 12 illustrates a process diagram of crowdsourcing audio/display settings, according to some embodiments.
- FIG. 13 illustrates an example computer system useful for implementing various embodiments.
- the technology described herein may implement modified content streams to include genre information or optimized audio/display settings in metadata that allow TV platforms to apply Picture Quality/Audio Quality (PQ/AQ) adjustments automatically before content playback starts.
- PQ/AQ Picture Quality/Audio Quality
- Optimized infers making the best or most effective use of viewing/sound conditions for a selected media content. As will be described, these optimizations may be pre-selected by media content providers or distributers, automatically by a crowdsourced data analysis, by machine learning models or by a hybrid approach of machine learning models trained by crowdsourced data.
- display setting refers to any display parameter for a display device to include any displayable attributes of media content, to include, but not be limited to, brightness, contrast, backlight, color temp, sharpness, chromaticity, resolution, or any other known or future display parameter.
- the technology described herein may implement the audio/display settings selection with a machine learning model.
- the machine learning model based on supervised training, may determine a future audio/display settings based on a forest decision tree based classifier that predicts a selected sound or picture mode (and associated display and audio settings) will occur given the selected media type for streaming.
- multimedia environment 102 may be implemented using and/or may be part of a multimedia environment 102 shown in FIG. 1 . It is noted, however, that multimedia environment 102 is provided solely for illustrative purposes, and is not limiting. Embodiments of this disclosure may be implemented using and/or may be part of environments different from and/or in addition to the multimedia environment 102 , as will be appreciated by persons skilled in the relevant art(s) based on the teachings contained herein. An example of the multimedia environment 102 shall now be described.
- FIG. 1 illustrates a block diagram of a multimedia environment 102 , according to some embodiments.
- multimedia environment 102 may be directed to streaming media.
- this disclosure is applicable to any type of media (instead of or in addition to streaming media), as well as any mechanism, means, protocol, method and/or process for distributing media.
- the multimedia environment 102 may include one or more media systems 104 .
- a media system 104 could represent a family room, a kitchen, a backyard, a home theater, a school classroom, a library, a car, a boat, a bus, a plane, a movie theater, a stadium, an auditorium, a park, a bar, a restaurant, or any other location or space where it is desired to receive and play streaming content.
- User(s) 132 may operate with the media system 104 to select and consume content.
- Each media system 104 may include one or more media devices 106 each coupled to one or more display devices 108 . It is noted that terms such as “coupled,” “connected to,” “attached,” “linked,” “combined” and similar terms may refer to physical, electrical, magnetic, logical, etc., connections, unless otherwise specified herein.
- Media device 106 may be a streaming media device, DVD or BLU-RAY device, audio/video playback device, cable box, and/or digital video recording device, to name just a few examples.
- Display device 108 may be a monitor, television (TV), smart TV, computer, touch screen, smart phone, tablet, wearable (such as a watch or glasses), virtual reality (VR) headset, appliance, internet of things (IoT) device, automotive display, gaming display, heads-up display (HUD), and/or projector, to name just a few examples.
- media device 106 can be a part of, integrated with, operatively coupled to, and/or connected to its respective display device 108 .
- Each media device 106 may be configured to communicate with network 118 via a communication device 114 .
- the communication device 114 may include, for example, a cable modem or satellite TV transceiver.
- the media device 106 may communicate with the communication device 114 over a link 116 , wherein the link 116 may include wireless (such as WiFi) and/or wired connections.
- the network 118 can include, without limitation, wired and/or wireless intranet, extranet, Internet, cellular, Bluetooth, infrared, and/or any other short range, long range, local, regional, global communications mechanism, means, approach, protocol and/or network, as well as any combination(s) thereof.
- Media system 104 may include a remote control 110 .
- the remote control 110 can be any component, part, apparatus and/or method for controlling the media device 106 and/or display device 108 , such as a remote control, a tablet, laptop computer, smartphone, wearable, on-screen controls, integrated control buttons, audio controls, or any combination thereof, to name just a few examples.
- the remote control 110 wirelessly communicates with the media device 106 and/or display device 108 using cellular, Bluetooth, infrared, etc., or any combination thereof.
- the remote control 110 may include a microphone 112 , which is further described below.
- the multimedia environment 102 may include a plurality of content servers 120 (also called content providers or sources 120 ). Although only one content server 120 is shown in FIG. 1 , in practice the multimedia environment 102 may include any number of content servers 120 . Each content server 120 may be configured to communicate with network 118 .
- Each content server 120 may store content 122 and metadata 124 .
- Content 122 may include any combination of music, videos, movies, TV programs, multimedia, images, still pictures, text, graphics, gaming applications, advertisements, programming content, public service content, government content, local community content, software, and/or any other content or data objects in electronic form.
- Metadata 124 comprises data about content 122 .
- metadata 124 may include associated or ancillary information indicating or related to writer, director, producer, composer, artist, actor, summary, chapters, production, history, year, trailers, alternate versions, related content, applications, and/or any other information pertaining or relating to the content 122 .
- Metadata 124 may also or alternatively include links to any such information pertaining or relating to the content 122 .
- Metadata 124 may also or alternatively include one or more indexes of content 122 , such as, but not limited to a trick mode index.
- Metadata 124 may also or alternatively include one or more of production information (audio formats, video formats, quality of content stream (e.g., low/high bitrate connection), resolution, etc.
- the multimedia environment 102 may include one or more system servers 126 .
- the system servers 126 may operate to support the media devices 106 from the cloud. It is noted that the structural and functional aspects of the system servers 126 may wholly or partially exist in the same or different ones of the system servers 126 .
- the media devices 106 may exist in thousands or millions of media systems 104 . Accordingly, the media devices 106 may lend themselves to crowdsourcing embodiments and, thus, the system servers 126 may include one or more crowdsource servers 128 .
- the crowdsource server(s) 128 may identify similarities and overlaps between closed captioning requests issued by different users 132 watching a particular movie. Based on such information, the crowdsource server(s) 128 may determine that turning closed captioning on may enhance users' viewing experience at particular portions of the movie (for example, when the soundtrack of the movie is difficult to hear), and turning closed captioning off may enhance users' viewing experience at other portions of the movie (for example, when displaying closed captioning obstructs critical visual aspects of the movie). Accordingly, the crowdsource server(s) 128 may operate to cause closed captioning to be automatically turned on and/or off during future streaming's of the movie.
- the system servers 126 may also include an audio command processing module 130 .
- the remote control 110 may include a microphone 112 .
- the microphone 112 may receive audio data from users 132 (as well as other sources, such as the display device 108 ).
- the media device 106 may be audio responsive, and the audio data may represent verbal commands from the user 132 to control the media device 106 as well as other components in the media system 104 , such as the display device 108 .
- the audio data received by the microphone 112 in the remote control 110 is transferred to the media device 106 , which is then forwarded to the audio command processing module 130 in the system servers 126 .
- the audio command processing module 130 may operate to process and analyze the received audio data to recognize the user 132 's verbal command. The audio command processing module 130 may then forward the verbal command back to the media device 106 for processing.
- the audio data may be alternatively or additionally processed and analyzed by an audio command processing module 216 in the media device 106 (see FIG. 2 ).
- the media device 106 and the system servers 126 may then cooperate to pick one of the verbal commands to process (either the verbal command recognized by the audio command processing module 130 in the system servers 126 , or the verbal command recognized by the audio command processing module 216 in the media device 106 ).
- FIG. 2 illustrates a block diagram of an example media device 106 , according to some embodiments.
- Media device 106 may include a streaming module 202 , processing module 204 , storage/buffers 208 , and user interface module 206 .
- the user interface module 206 may include the audio command processing module 216 .
- the media device 106 may also include one or more audio decoders 212 and one or more video decoders 214 .
- Each audio decoder 212 may be configured to decode audio of one or more audio formats, such as but not limited to AAC, HE-AAC, AC3 (Dolby Digital), EAC3 (Dolby Digital Plus), WMA, WAV, PCM, MP3, OGG GSM, FLAC, AU, AIFF, and/or VOX, to name just some examples.
- each video decoder 214 may be configured to decode video of one or more video formats, such as but not limited to MP4 (mp4, m4a, m4v, f4v, f4a, m4b, m4r, f4b, mov), 3GP (3gp, 3gp2, 3g2, 3gpp, 3gpp2), OGG (ogg, oga, ogv, ogx), WMV (wmy, wma, asf), WEBM, FLV, AVI, QuickTime, HDV, MXF (OP1a, OP-Atom), MPEG-TS, MPEG-2 PS, MPEG-2 TS, WAV, Broadcast WAV, LXF, GXF, and/or VOB, to name just some examples.
- MP4 mp4, m4a, m4v, f4v, f4a, m4b, m4r, f4b, mov
- 3GP 3gp, 3gp
- Each video decoder 214 may include one or more video codecs, such as but not limited to H.263, H.264, H.265, H.266, AV1, VP9, HEV, MPEG1, MPEG2, MPEG-TS, MPEG-4, Theora, 3GP, DV, DVCPRO, DVCPRO, DVCProHD, IMX, XDCAM HD, XDCAM HD422, and/or XDCAM EX, to name just some examples.
- video codecs such as but not limited to H.263, H.264, H.265, H.266, AV1, VP9, HEV, MPEG1, MPEG2, MPEG-TS, MPEG-4, Theora, 3GP, DV, DVCPRO, DVCPRO, DVCProHD, IMX, XDCAM HD, XDCAM HD422, and/or XDCAM EX, to name just some examples.
- the user 132 may interact with the media device 106 via, for example, the remote control 110 .
- the user 132 may use the remote control 110 to interact with the user interface module 206 of the media device 106 to select content, such as a movie, TV show, music, book, application, game, etc.
- the streaming module 202 of the media device 106 may request the selected content from the content server(s) 120 over the network 118 .
- the content server(s) 120 may transmit the requested content to the streaming module 202 .
- the media device 106 may transmit the received content to the display device 108 for playback to the user 132 .
- the streaming module 202 may transmit the content to the display device 108 in real time or near real time as it receives such content from the content server(s) 120 .
- the media device 106 may store the content received from content server(s) 120 in storage/buffers 208 for later playback on display device 108 .
- the technology as described herein may be configured to improve picture and audio settings for streaming video and/or audio.
- the technology may be applied broadly to any configurable aspect of an adaptive display or audio settings selection process.
- Audio and display settings may be controlled in most media devices while playing back streaming audio/video content.
- a smart television may use a home network to provide streaming content 122 and services to an on-premises smart TV (e.g., display device 108 ).
- the smart TV typically buffers the streamed content and plays the buffered stream back on a display screen and speakers of the smart TV.
- the playback system may include ancillary video and audio components, such as handheld displays, sound bars, external speakers, external displays, etc.
- FIG. 3 illustrates a block diagram of a media content metadata modifier system 300 , according to some embodiments.
- Metadata modifier system 300 may be implemented by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof. It is to be appreciated that not all components or steps described may be needed to perform the disclosure provided herein. Further, some of the components or steps described may be performed simultaneously, or in a different order than described for FIG. 3 , as will be understood by a person of ordinary skill in the art. Metadata modifier system 300 may be part of system server(s) 126 , be part of a media system, be part of a client streaming device, be a standalone system or be distributed on one or across or more systems (e.g., cloud system(s)).
- streamed media content may be downloaded and stored on the media device 106 or display device 108 .
- changes to the audio or display setting may be changed manually by the user or automatically by an internal smart TV Automatic Content Recognition (ACR) system, AI-based scene detection, and other internal tools.
- ACR Automatic Content Recognition
- manual adjustments require the user to test various settings during playback until they find a combination that they prefer.
- automatic adjustments require the smart TV to sample the playback and make adjustments. These choices are less than optimum as they require an interruption in the playback experience while adjustments are being made.
- the technology described herein will provide optimized sound and picture settings in advance of the playback process.
- metadata 124 associated with a media content 122 is modified to include the optimized sound and picture settings that are applied before playback, thus improving the user experience. This technical improvement to a display device's operation or a data streaming and playback operation, improves the quality of the playback on the display device not provided by previous methods.
- the sound and picture adjustments provided in the metadata 124 may be generated by the metadata modifier 312 in a variety of ways.
- the optimized sound and picture settings, in metadata 124 may be provided by the media content provider 304 or media content distributer 306 .
- the media content provider would modify the metadata to include specific audio and display settings optimized for sporting events.
- the metadata 124 is modified to include a sound mode of standard and a picture mode of sports. As will be described in greater detail hereafter, these specific modes may be further mapped to device specific settings, such as brightness, contrast, color, bass, treble, speech, to name a few.
- the metadata may, in some embodiments, be modified to include optimized audio/display settings by the content originator.
- Option 1 Add a content ID to the content manifest.
- Option 2 Tag all content with a system supplied content classifier and pass the content classifier with the content manifest.
- the participating media content provider may provide a method to question miss-classified content and a further process to update or re-classify the media content.
- the display device makes a network call to lookup table for the content ID, retrieves a content classifier (e.g., may run in parallel to other network requests).
- a content classifier e.g., may run in parallel to other network requests.
- the content classifier may include sub-classifications, multiple classifications or have one or more different classifications distributed within the media content.
- a movie may simultaneously have different classifications.
- the movie may include any number of classifications, such as night scenes, scary scenes, and quiet scenes all in the same movie and therefore may adjust the television audio or display settings to match or optimize the different scenes.
- individual ones of multiple metadata may refer to different corresponding segments of the streamed media content.
- metadata modification is not enabled for previews. For example, someone watching a preview may not benefit from changing their television audio/display setting for such a short time period. Therefore, a participating media content provider, may identify content as playback or a preview.
- the media content source provider passes content classifier information included as part of the content manifest. Regardless of how the content classifier is obtained, it is passed with the media content metadata.
- the display device 108 subsequently applies PQ/AQ changes based on the metadata content classifier received.
- an adaptive picture/audio model in a client device e.g., media device 106
- the metadata modification processing may be performed by television circuitry, a media device 106 , a content server 120 , a system server 126 , a video card, a gaming processor card, ancillary computing devices of user 132 or a combination of any of these processing elements.
- the media content is modified by an automated system that automatically adds the relevant audio and display settings metadata to the data content and either passes it back to the media content provider/distributer for storage or processes it in real-time and delivers it over network 118 to the media system 104 for storage (e.g., buffering) and playback on display device 108 .
- an automated system that automatically adds the relevant audio and display settings metadata to the data content and either passes it back to the media content provider/distributer for storage or processes it in real-time and delivers it over network 118 to the media system 104 for storage (e.g., buffering) and playback on display device 108 .
- the automated system will be described in greater detail hereafter.
- a crowdsourced database 308 includes the optimized audio and display settings and modifies the metadata to reflect these optimized audio and display settings for selected media content.
- manual changes made by a user during playback are recorded, aggregated for multiple users and stored in a crowdsourced database (DB) 308 .
- the crowdsource servers 128 analyze the aggregated audio/display setting changes to statistically determine what users determine to be the optimal settings for various types of programming (e.g., most frequent combination of setting changes). While described herein as using the crowdsourced data in a machine learning environment ( FIG. 11 ), the crowdsource data may be analyzed by a standalone analysis system. For example, the system detects that many users increase the brightness, volume and bass while watching sports. These setting changes are recorded and aggregated in crowdsourced settings database (DB) 308 stored within crowdsource server(s) 128 .
- DB crowdsourced settings database
- a trained machine learning model 310 predicts the optimized picture and sound settings and modifies the metadata to reflect updated display and audio settings.
- the trained machine learning model 310 predicts the optimized picture and sound settings by ingesting, as training data, the crowdsourced audio and display settings data.
- the media content metadata may be stored locally in computer storage, distributed to media content storage nodes or streamed directly to the media device 106 or the display device 108 .
- FIG. 4 illustrates a block diagram of a crowdsourcing system 400 , according to some embodiments.
- Crowdsourcing system 400 may be implemented by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof. It is to be appreciated that not all components or steps may be needed to perform the disclosure provided herein. Further, some of the components or steps described may be performed simultaneously, or in a different order than described for FIG. 4 , as will be understood by a person of ordinary skill in the art.
- the crowdsourcing system may be wholly contained within the display device 108 (e.g., smart TV), be part of an add-in system (e.g., as part of a plugin stick or card), be a hybrid system of internal and external components or be part of a cloud environment.
- the specific components may vary, but in any configuration, they collectively derive one or more user setting adjustments and forward these selections to crowdsource DB 308 .
- the technology described in FIG. 4 may collect future audio/display settings with a crowdsourced model, one or more machine learning predictive models ( FIG. 5 ), or a hybrid arrangement with a machine learning predictive models trained by crowdsourced data.
- settings selector 402 may be configured with an audio/display (A/D) settings history component 404 , selector component 406 and monitor component 408 .
- A/D audio/display
- Each of these components may be implemented by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof.
- the audio/display settings history component 404 retains a history of previous audio/display manual adjustment settings as metadata, an identification of the media content for which the selection was made, what was the timestamp in the content where the adjustment was made, who made the adjustment or what were the environmental conditions when the adjustment was made (was it loud, was is dark, etc.). For example, continuing with the earlier example, a user has selected a sporting event to stream. As the event plays back, the user may be dissatisfied with the current audio or display settings, or both. As they change the settings, a monitor 408 will record these changes as temporary changes. For example, a user may try many changes before they are happy with the settings.
- the selector 406 communicates the settings to metadata component 410 .
- the system may, in some embodiments, communicate the selected A/D setting changes to the metadata component after a predetermined time passes after a last setting change (e.g., the monitor assumes that the user is satisfied if no additional changes have been made for X minutes).
- the system may, in some embodiments, communicate all selected A/D setting adjustments to the metadata component to identify preferences, trends or behavioral aspects of the user for training machine learning systems. For example, continuing from the earlier example, the system may note that the user immediately increases the volume for a sporting event, and then changes the bass and then changes the brightness.
- the ordering therefore may provide some intelligence to the machine learning system as to a prioritization of setting changes.
- the system may, in some embodiments, communicate the selected A/D setting adjustments to the metadata component when additional setting changes are made throughout the play back.
- the time placement of user setting adjustments may provide some intelligence to the machine learning system as to a pattern of setting changes. For example, as the user plays a music video on the television, they change the volume at particular points in the music. The system may therefore adapt multiple metadata modifications throughout a media content playback.
- the selector 406 may, in some embodiments, collect environmental inputs, such as, but not limited to ambient light.
- the selector 406 may, in some embodiments, consider user profile elements inputs, such as, but not limited to, prefers loud bass during sports, et al.
- the history of display or audio setting adjustments may be an indicator for the machine learning models to predict the next display or audio settings for a same or similar streamed media content in the future.
- any number of past display or audio settings may be chosen.
- an ‘auto’ mode shall be added to the picture mode menu for supported platforms.
- an ‘auto’ mode shall be added to the sound mode menu for any platform supporting Digital Signal Processing (DSP), for any TV with peripherals attached, Soundbars, or Streambars (e.g., Soundbar with built-in streaming).
- DSP Digital Signal Processing
- User facing ‘Auto’ settings for picture plus sound modes may be able to be independently enabled/disabled by configuration service flags, by platform, and may test functionality and disable the A/D setting changes for unforeseen impact results (e.g., blasting sound or dimly backlit scenes). If disabled, picture mode defaults to ‘normal’; sound mode defaults to ‘standard’.
- the media system shall use the content classifier to trigger corresponding picture plus sound modes.
- the system node may, in some embodiments, makes a network call to request sound or picture mode information from node maps (e.g., genre information to content classifier). The node shall use the content classifier to trigger corresponding picture plus sound mode, when ‘auto mode’ is enabled. At conclusion of playback, picture mode reverts to ‘normal’; sound mode reverts to ‘standard’. In some embodiments, auto mode only applies to full-screen playback (not in-channel previews). When in auto mode, in some embodiments, the system may not alter picture quality/audio quality (PQ/AQ), for example, during livestreams or live content playback.
- PQ/AQ picture quality/audio quality
- FIG. 5 is a block diagram of a machine learning system, according to some embodiments.
- a machine learning system 500 may include a machine learning engine 502 (e.g., one or more servers (cloud or local)) processing streaming parameters, such as, but not limited to, audio settings or display settings.
- the machine learning engine 502 processes the various streaming parameters to recognize relationships of these parameters as they relate to media content.
- machine learning engine 502 may be used to predict audio/display settings. While described in stages, the sequence may include more or less stages or be performed in a different order.
- Machine learning involves computers discovering how they can perform tasks without being explicitly programmed to do so.
- Machine learning includes, but is not limited to, artificial intelligence, deep learning, fuzzy learning, supervised learning, unsupervised learning, etc.
- Machine learning algorithms build a model based on sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to do so.
- training data sample data
- the computer is presented with example inputs and their desired outputs and the goal is to learn a general rule that maps inputs to outputs.
- no labels are given to the learning algorithm, leaving it on its own to find structure in its input.
- Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning).
- Machine learning engine 502 may use various classifiers to map concepts associated with a specific streaming structure to capture relationships between streaming parameters and audio/display settings.
- the classifier discriminator
- the predictive models 506 are implemented with a forest (random forest) decision tree based regressor. Random forests are an ensemble learning method for classification and regression that operate by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned.
- Regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the ‘outcome’ or ‘response’ variable) and one or more independent variables (often called ‘predictors’, ‘covariates’, ‘explanatory variables’ or ‘features’). Regression analysis may be used for prediction or forecasting of the audio/display settings.
- Machine learning may involve computers learning from data provided so that they carry out certain tasks. For more advanced tasks, it can be challenging for a human to manually create the needed algorithms. This may be especially true of teaching approaches to correctly identify optimized audio/display settings for streaming of video and/or audio.
- the discipline of machine learning therefore employs various approaches to teach computers to accomplish tasks where no fully satisfactory algorithm is available.
- one approach, supervised learning is to label some of the correct answers as valid. This may then be used as training data for the computer to improve the algorithm(s) it uses to determine correct answers.
- a dataset of streaming parameters and corresponding known media classifications with associated audio and display settings may be used.
- the data set may be include hundreds or thousands of media content selections and corresponding known media classifications with associated audio and display settings.
- training data set 504 may be ingested to train various predictive models 506 .
- mode is defined as a collection of specific settings.
- the machine learning engine may predict or infer specific display and audio settings separately or predict a specific set of settings or a subset of settings for a specific mode.
- a display settings predictive model 522 may be trained based on machine learning engine 502 processing training data set 504 to predict one or more next streaming display settings. Training a model means learning (determining) values for weights as well as inherent bias from any labeled examples.
- a display settings predictive model 522 predicts (regresses) a next display setting for at least a part of a next media content. For example, after being trained on thousands of instances of media content 510 , retaining media classifications 512 of action, comedy and drama for movies and their associated display/audio settings 514 , the machine learning engine may predict the display settings for a new media content 515 and adjust the current display settings to the optimized settings.
- the training data may implement one or more of environmental 516 (e.g., ambient light) or user profile 518 data (who is watching, where are they watching, what are the conditions in the room when they are watching, what are they watching with (e.g., Audio/Video Receiver (AVR), connected speakers, etc.)
- environmental 516 e.g., ambient light
- user profile 518 data who is watching, where are they watching, what are the conditions in the room when they are watching, what are they watching with (e.g., Audio/Video Receiver (AVR), connected speakers, etc.)
- AVR Audio/Video Receiver
- an audio settings predictive model 524 may be trained based on machine learning engine 502 processing training data set 504 to predict or infer a potential a next audio setting for at least a part of a new media content. Training a model means learning (determining) values for weights as well as inherent bias from any labeled examples. An audio settings predictive model 524 assigns a probability of an audio setting being selected for the media content.
- the training data display/audio settings 514 may be derived from predetermined settings based on genre, media content type, time of day (e.g., day vs. night), power considerations, geography, to name a few.
- the display/audio settings are reflected by corresponding metadata associated with the media content stream.
- each media content may have one or more of a display setting, an audio setting, or both.
- a media content may have a plurality of a display or audio settings distributed throughout a media content, changing the settings at one or more points in the streaming sequence during playback.
- each media content may have one or more of a display setting, an audio setting, or both, derived by crowdsourced data ( FIG. 4 ).
- the crowdsourced data may aggregate individual user setting changes made for a specific media content, for a specific segment of a specific media content, for a specific genre of media content, for a specific time or range of times of a day, etc.
- the training data may source one or more of the media content, the media classification and the display/audio settings from the crowdsourced database.
- the training data may reflect a hybrid approach of predetermined data and crowdsourced data. For example, for a sporting event, the predetermined display/audio settings may be known. If, however, only a specific sound setting is changed, then only this data will need to be sourced from the crowdsourced database.
- the training cycle continuously looks at audio/display prediction results and provides a feedback loop 505 to improve capabilities of the various predictive models 506 .
- Feedback refers to the process by which a machine model's predicted outputs are reused to train new versions of the model ( 522 or 524 ).
- various predictive models (algorithms) 506 are created, they are stored in a database (not shown). For example, as the training sets are processed through the machine learning engine 502 , the models 522 and 524 may change (tuning/fine tuning) and therefore may be recorded in the database.
- Future new data 508 for example, new media content 515 , may be subsequently evaluated with the trained predictive models 506 . While media content is included as a new input in one or more example embodiments, any known or future streaming parameter may be substituted without departing from the scope of the technology described herein.
- geo location of a streaming device may be used to train the machine learning models to learn characteristics of a device's location, such as a region.
- the geo location may be implemented using clustering, based on geo locations, and then applying the machine learning models to each cluster. For example, the geolocation may be compared to a known “day vs. night” period and adjust the display device display/audio settings accordingly. For example, detecting an east coast location at 9:30 PM ET, may precipitate a change to a nighttime setting of a less bright display setting and lower volume.
- the technology described herein may implement machine learning model training based on a local hour. For example, streaming data may vary during different hours of the day. In some embodiments, each local hour may be an input feature to the ML models or the system may cluster 24 hours into peak hours (e.g., 6 PM-9 PM), no peak hours, etc.
- a power selection of a streaming device may be used to train the machine learning models to learn characteristics of a device's power preferences.
- display/audio settings may be modified for a device's settings of any of, but not limited to, low power, normal power, or high power (e.g., vivid settings).
- the training data may be adjusted to include one or more of the power settings as a streaming parameter of the media classification 512 and/or display/audio settings 514 , or be a separate training data set.
- one or more of the training features or future features may be sampled at different frequencies to balance an outsized affect one feature data set may have on a machine learning model's output.
- the audio settings predictive model 524 when implementing the audio settings predictive model 524 , far more occurrences of a standard sound prediction would be expected than for an increase bass prediction. To reduce this high imbalance in predictions, a smaller sample size of previous standard sound predictions may be implemented for the training datasets to allow the audio settings predictive model 524 model to extract additional information and therefore balance the audio settings predictive model 524 .
- the technology described herein may implement machine learning model training based on a specific streaming platform or streaming device.
- different products may have different hardware capabilities such as Wi/Fi, chips, drives, CPU performance, etc. All of these factors may influence display/audio selections. Therefore, the platform and/or streaming device may train and deploy the machine learning models per platform or per device.
- the ML system may collect the specifics for each platform or device, or may cluster based on hardware capabilities, and then apply the machine learning models. For example, the system may be able to customize the display/audio settings for a specific television that may have surround sound capabilities vs. a tablet computer that may not have the same surround sound capabilities.
- each product generation from an Original Equipment Manufacturer (OEM) may retain new capabilities and features that may be accounted for in the machine learning training set.
- OEM Original Equipment Manufacturer
- the ML models may be trained offline in supervised fashion using data download logs. These models may be fed a number of input parameters.
- the ML models may be deployed on a client device, such as media devices 106 or display devices 108 (e.g., televisions), or in the cloud.
- FIG. 6 illustrates a chart of example audio/display settings, according to some embodiments.
- Audio/display selections may be indexed into predetermined media classifications such as genres, picture type, etc.
- predetermined media classifications such as genres, picture type, etc.
- example content classifiers may be a genre, such as action, sports, comedy, drama, music, news podcasts, to name a few.
- a predetermined display and audio setting is provided.
- a predetermined setting may be derived by a content provider or a content distributed, et al.
- Another classifier may be “picture” or snapshot, with a content classifier of animated, black and white, color (not shown), etc.
- the modes correspond to a series of specific display/audio settings as shown in greater detail in FIGS. 7 and 8 .
- Sound modes may include, but are not limited to, movie, standard, music and dialog, to name a few.
- Picture modes may include, but are not limited to, movie, sports, low power, vivid, normal, to name a few.
- Each of the modes may be further mapped to a series of specific display device settings. It is understood that each display device may have all, more, or less of the specific settings that make up a specific mode.
- the display and audio settings may be optimized by a machine learning system based on a collection of different content provider's designations.
- the training data may include the same media content from different media content providers that have varying predetermined settings for a specific genre or media content type.
- FIG. 7 illustrates a chart of example picture modes as previously described in FIG. 6 , according to some embodiments.
- the picture modes correspond to a series of specific display settings shown in an indexed LookUp Table (LUT).
- the table reflects various setting capabilities of a display device such as, but not limited to, backlight, brightness, color temperature, contrast, color, sharpness, panel chromaticity and advanced PQ (picture quality). While the indexed table reflects specific combinations of display device display settings for a specific mode, the specific combinations of settings, per mode, may be modified without changing the scope of the technology described herein. Also, as previously mentioned, a display device's capabilities may include or exclude one or more of these settings or add new settings as they become commercially available.
- FIG. 8 illustrates a chart of example sound modes as previously described in FIG. 6 , according to some embodiments.
- the sound modes correspond to a series of specific audio settings shown in an indexed LookUp Table (LUT).
- the table reflects various audio setting capabilities of a display device or ancillary sound device (e.g., Soundbar, Streambar or separate speakers, etc.) such as, but not limited to, bass, treble, surround level, speech clarity, volume mode, expanded stereo, virtual surround sound, etc.
- the indexed table reflects specific combinations of display device/ancillary device audio settings for a specific mode, the specific combinations of settings per mode may be modified without changing the scope of the technology described herein.
- a display or ancillary device's sound capabilities may include or exclude one or more of these settings or add new settings as they become commercially available (e.g., a new sound standard).
- FIG. 9 illustrates a flow diagram of an audio/display settings selection, according to some embodiments.
- Audio/display settings selection may be implemented by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof. It is to be appreciated that not all steps may be needed to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than described for FIG. 9 , as will be understood by a person of ordinary skill in the art.
- class may include genre, picture, or other media content types that collimate specific display/audio settings for that content type.
- a media content classifier is determined.
- media content classifier's may include action, sports, comedy, drama, music, news, podcast, animation, or pictures (color or B/W), or any other classifier that collimates specific display/audio settings for a specific class.
- a picture mode of a media content classifier is determined.
- a picture mode may be a movie, sports, low power, normal or vivid mode or any other picture mode that collimates specific display settings for a specific media content classifier.
- a sound mode of a media content classifier is determined.
- a sound mode may be a movie, standard, music, dialog or any other sound mode that collimates specific audio settings for a specific media content classifier.
- metadata for a specific media content is updated to reflect picture and sound settings for subsequently streamed media content.
- media content is streamed to a media device with the updated picture and sound settings.
- the media device is a television that receives the streamed content and adjusts one or more of the television's individual picture and sound settings to match the settings identified in the updated metadata.
- any known or future streaming parameter may be substituted without departing from the scope of the technology described herein.
- the technology described herein may implement one or both of the predictive models as part of firmware loaded onto a streaming device.
- the models may be located remotely (e.g., in the cloud) and configured to connect to streaming devices.
- FIG. 10 illustrates a flow diagram of an audio/display settings selection, according to some embodiments.
- Audio/display settings selection may be implemented by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof.
- processing logic can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof.
- the flow diagram can be implemented using the ML, system 500 as shown and described in FIG. 5 . It is to be appreciated that not all steps may be needed to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than described for FIG. 10 , as will be understood by a person of ordinary skill in the art.
- a machine learning system is configured to train display settings and audio settings predictive models.
- the training data may include the media content and its associated media classification and display/audio settings, such as shown in FIGS. 6 - 8 .
- new media content is received by the ML system.
- the new media content is ingested and processed by classifiers of the trained display and audio predictive models (e.g., algorithms) to predict optimized display and audio settings for the new media content.
- Optimal display and audio settings may be determined based on machine learning over hundreds or thousands of previously classified media content.
- these data results will be fed back into the training set.
- the sound mode of the media content is determined.
- the sound mode may be a single audio setting for a playback device or include a plurality of audio settings.
- the picture mode of the media content is determined.
- the picture mode may be a single display setting for a playback device or include a plurality of display settings.
- Predictions may be selected to be a classification above any threshold to improve or relax the process of selecting an audio/display setting. For example, a probability of over 90% may be selected to predict an optimal selection when the classifier is 90% sure of the selection.
- the selected media content and updated or modified metadata is streamed (download data) to a client device, such as media devices 106 .
- a client device such as media devices 106 .
- For example, for sports program is streamed with sports associated or optimized display and audio settings.
- any known or future streaming parameter may be substituted without departing from the scope of the technology described herein.
- the technology described herein may implement one or both of the predictive models as part of firmware loaded onto a streaming device.
- the models may be located remotely (e.g., in the cloud) and configured to connect to streaming devices.
- FIG. 11 illustrates a flow diagram of an audio/display settings selection, according to some embodiments.
- Audio/display settings selection may be implemented by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof.
- the flow diagram can be implemented using the crowdsource server 128 in FIG. 1 , the ML system 500 of FIG. 5 and the crowdsource collection system of FIG. 4 , et al. It is to be appreciated that not all steps may be needed to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than described for FIG. 11 , as will be understood by a person of ordinary skill in the art.
- a machine learning system is configured to train display and audio settings predictive models.
- the training data may include the media content and its associated media classification and crowdsourced display/audio settings, such as shown in FIG. 4 .
- specific picture and sound modes and one or more of their specific display and audio settings are sourced from crowdsource database 308 ( FIG. 3 ).
- a user modifies sound settings for a specific movie (i.e., media content) or genre of movie (e.g., action).
- This setting change and the specific setting such as increased bass 10 dB to 80 dB, may be collected and aggregated into the crowdsource database 308 .
- This data may then be associated with the specific movie for future training of the audio and display settings predictive models.
- the training data may include all other settings that were not changed and that collectively may represent a new or customized sound mode for a class or classification.
- predefined picture and sound mode data e.g., display and audio settings
- predefined picture and sound mode data may be mixed with customer crowdsourced data to train the models, in a hybrid approach.
- new media content is received by the ML system.
- the new media content is ingested and processed by classifiers of the trained display and audio predictive models (e.g., algorithms) to predict optimized display and audio settings for the new media content.
- Optimal display and audio settings may be determined based on analytical learning over hundreds or thousands of previously classified media content.
- these data results will be fed back into the training set.
- the sound mode of the media content is determined.
- the sound mode may be a single audio setting for a playback device or include a plurality of audio settings.
- the picture mode of the media content is determined.
- the picture mode may be a single display setting for a playback device or include a plurality of display settings.
- the selected media content and updated or modified metadata is streamed (download data) to a client device, such as media devices 106 .
- a client device such as media devices 106 .
- For example, for sports program is streamed with sports associated or optimized display and audio settings.
- any known or future streaming parameter may be substituted without departing from the scope of the technology described herein.
- the technology described herein may implement one or both of the predictive models as part of firmware loaded onto a streaming device.
- the models may be located remotely (e.g., in the cloud) and configured to connect to streaming devices.
- FIG. 12 illustrates a flow diagram of crowdsourcing audio/display settings selection, according to some embodiments.
- Audio/display settings selection may be implemented by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof.
- the flow diagram can be implemented in conjunction with the crowdsource server 128 in FIG. 1 , the ML system 500 of FIG. 5 and the crowdsource collection system of FIG. 4 , et al. It is to be appreciated that not all steps may be needed to perform the disclosure provided herein. Further, some of the steps may be performed simultaneously, or in a different order than described for FIG. 12 , as will be understood by a person of ordinary skill in the art.
- media system receives media content 122 from media content provider 120 .
- media content provider 120 For example, a user requests a specific movie and the media system initiates a download from content server 120 , where it is buffered locally for playback on a display screen and/or sound system.
- the media system initiates playback of the streamed media content.
- the media device communicates one or more portions from the buffer at a playback rate, based at least in part on the quality of the presentation desired by the user (e.g., HD format).
- the media system detects a user's manual adjustment of one or more picture or sound settings and records these as shown in FIG. 4 , element 408 .
- the media system when the adjustments are final (e.g., after a set time with no additional adjustments), the media system accumulates the history of adjustments, recognizing the final adjustments or recognizing a plurality of adjustments made throughout the playback of the streamed media content.
- the media system communicates the picture/sound adjustments to the crowdsource server 128 , where they are stored and aggregated in crowdsource database 308 .
- the aggregation may be by media content title or ID, or by class, classifier, mode, geography, time of day, display device capabilities, or any other parameter associated with the playback device or media content.
- any known or future streaming parameter may be substituted without departing from the scope of the technology described herein.
- the technology described herein may implement one or both of the predictive models as part of firmware loaded onto a streaming device.
- the models may be located remotely (e.g., in the cloud) and configured to connect to streaming devices.
- FIG. 13 Various embodiments may be implemented, for example, using one or more well-known computer systems, such as computer system 1300 shown in FIG. 13 .
- the media device 136 may be implemented using combinations or sub-combinations of computer system 1300 .
- one or more computer systems 1300 may be used, for example, to implement any of the embodiments discussed herein, as well as combinations and sub-combinations thereof.
- Computer system 1300 may include one or more processors (also called central processing units, or CPUs), such as a processor 1304 .
- processors also called central processing units, or CPUs
- Processor 1304 may be connected to a communication infrastructure or bus 1306 .
- Computer system 1300 may also include user input/output device(s) 1302 , such as monitors, keyboards, pointing devices, etc., which may communicate with communication infrastructure 1306 through user input/output interface(s) 1302 .
- user input/output device(s) 1302 such as monitors, keyboards, pointing devices, etc.
- communication infrastructure 1306 may communicate with user input/output interface(s) 1302 .
- processors 1304 may be a graphics processing unit (GPU).
- a GPU may be a processor that is a specialized electronic circuit designed to process mathematically intensive applications.
- the GPU may have a parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, etc.
- Computer system 1300 may also include a main or primary memory 1308 , such as random access memory (RAM).
- Main memory 1308 may include one or more levels of cache.
- Main memory 1308 may have stored therein control logic (i.e., computer software) and/or data.
- Computer system 1300 may also include one or more secondary storage devices or memory 1310 .
- Secondary memory 1310 may include, for example, a hard disk drive 1312 and/or a removable storage device or drive 1314 .
- Removable storage drive 1314 may be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup device, and/or any other storage device/drive.
- Removable storage drive 1314 may interact with a removable storage unit 1318 .
- Removable storage unit 1318 may include a computer usable or readable storage device having stored thereon computer software (control logic) and/or data.
- Removable storage unit 1318 may be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/any other computer data storage device.
- Removable storage drive 1314 may read from and/or write to removable storage unit 1318 .
- Secondary memory 1310 may include other means, devices, components, instrumentalities or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 1300 .
- Such means, devices, components, instrumentalities or other approaches may include, for example, a removable storage unit 1322 and an interface 1320 .
- Examples of the removable storage unit 1322 and the interface 1320 may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB or other port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.
- Computer system 1300 may further include a communication or network interface 1324 .
- Communication interface 1324 may enable computer system 1300 to communicate and interact with any combination of external devices, external networks, external entities, etc. (individually and collectively referenced by reference number 1328 ).
- communication interface 1324 may allow computer system 1300 to communicate with external or remote devices 1328 over communications path 1326 , which may be wired and/or wireless (or a combination thereof), and which may include any combination of LANs, WANs, the Internet, etc.
- Control logic and/or data may be transmitted to and from computer system 1300 via communication path 1326 .
- Computer system 1300 may also be any of a personal digital assistant (PDA), desktop workstation, laptop or notebook computer, netbook, tablet, smart phone, smart watch or other wearable, appliance, part of the Internet-of-Things, and/or embedded system, to name a few non-limiting examples, or any combination thereof.
- PDA personal digital assistant
- Computer system 1300 may be a client or server, accessing or hosting any applications and/or data through any delivery paradigm, including but not limited to remote or distributed cloud computing solutions; local or on-premises software (“on-premise” cloud-based solutions); “as a service” models (e.g., content as a service (CaaS), digital content as a service (DCaaS), software as a service (SaaS), managed software as a service (MSaaS), platform as a service (PaaS), desktop as a service (DaaS), framework as a service (FaaS), backend as a service (BaaS), mobile backend as a service (MBaaS), infrastructure as a service (IaaS), etc.); and/or a hybrid model including any combination of the foregoing examples or other services or delivery paradigms.
- “as a service” models e.g., content as a service (CaaS), digital content as a service (DCaaS), software as
- Any applicable data structures, file formats, and schemas in computer system 1300 may be derived from standards including but not limited to JavaScript Object Notation (JSON), Extensible Markup Language (XML), Yet Another Markup Language (YAML), Extensible Hypertext Markup Language (XHTML), Wireless Markup Language (WML), MessagePack, XML User Interface Language (XUL), or any other functionally similar representations alone or in combination.
- JSON JavaScript Object Notation
- XML Extensible Markup Language
- YAML Yet Another Markup Language
- XHTML Extensible Hypertext Markup Language
- WML Wireless Markup Language
- MessagePack XML User Interface Language
- XUL XML User Interface Language
- a tangible, non-transitory apparatus or article of manufacture comprising a tangible, non-transitory computer useable or readable medium having control logic (software) stored thereon may also be referred to herein as a computer program product or program storage device.
- control logic software stored thereon
- control logic when executed by one or more data processing devices (such as computer system 1300 or processor(s) 1304 ), may cause such data processing devices to operate as described herein.
- references herein to “one embodiment,” “an embodiment,” “an example embodiment,” or similar phrases indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it would be within the knowledge of persons skilled in the relevant art(s) to incorporate such feature, structure, or characteristic into other embodiments whether or not explicitly mentioned or described herein. Additionally, some embodiments can be described using the expression “coupled” and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other.
- Coupled can also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
Claims (17)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/743,898 US12316921B2 (en) | 2022-05-13 | 2022-05-13 | Content classifiers for automatic picture and sound modes |
| CA3199182A CA3199182A1 (en) | 2022-05-13 | 2023-05-10 | Content classifiers for automatic picture and sound modes |
| EP23173117.5A EP4277285A1 (en) | 2022-05-13 | 2023-05-12 | Content classifiers for automatic picture and sound modes |
| US19/176,355 US20250240495A1 (en) | 2022-05-13 | 2025-04-11 | Content classifiers for automatic picture and sound modes |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/743,898 US12316921B2 (en) | 2022-05-13 | 2022-05-13 | Content classifiers for automatic picture and sound modes |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US19/176,355 Continuation US20250240495A1 (en) | 2022-05-13 | 2025-04-11 | Content classifiers for automatic picture and sound modes |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20230370689A1 US20230370689A1 (en) | 2023-11-16 |
| US12316921B2 true US12316921B2 (en) | 2025-05-27 |
Family
ID=86378529
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/743,898 Active 2042-09-28 US12316921B2 (en) | 2022-05-13 | 2022-05-13 | Content classifiers for automatic picture and sound modes |
| US19/176,355 Pending US20250240495A1 (en) | 2022-05-13 | 2025-04-11 | Content classifiers for automatic picture and sound modes |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US19/176,355 Pending US20250240495A1 (en) | 2022-05-13 | 2025-04-11 | Content classifiers for automatic picture and sound modes |
Country Status (3)
| Country | Link |
|---|---|
| US (2) | US12316921B2 (en) |
| EP (1) | EP4277285A1 (en) |
| CA (1) | CA3199182A1 (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP4311246A1 (en) * | 2022-07-21 | 2024-01-24 | Stryker Corporation | Systems and methods for routing multimedia signals |
| US20250274586A1 (en) * | 2024-02-26 | 2025-08-28 | Mellanox Technologies, Ltd. | Genre classification for video compression |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180234472A1 (en) * | 2017-02-13 | 2018-08-16 | Ooyala, Inc. | Adapting playback settings based on change history |
| US20190327526A1 (en) * | 2018-04-23 | 2019-10-24 | Free Stream Media Corporation d/b/a Samba TV | Device setting adjustment based on content recognition |
| US20220030102A1 (en) | 2020-07-27 | 2022-01-27 | Samsung Electronics Co., Ltd. | Providing situational device settings for consumer electronics and discovering user-preferred device settings for consumer electronics |
-
2022
- 2022-05-13 US US17/743,898 patent/US12316921B2/en active Active
-
2023
- 2023-05-10 CA CA3199182A patent/CA3199182A1/en active Pending
- 2023-05-12 EP EP23173117.5A patent/EP4277285A1/en not_active Withdrawn
-
2025
- 2025-04-11 US US19/176,355 patent/US20250240495A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180234472A1 (en) * | 2017-02-13 | 2018-08-16 | Ooyala, Inc. | Adapting playback settings based on change history |
| US20190327526A1 (en) * | 2018-04-23 | 2019-10-24 | Free Stream Media Corporation d/b/a Samba TV | Device setting adjustment based on content recognition |
| US20220030102A1 (en) | 2020-07-27 | 2022-01-27 | Samsung Electronics Co., Ltd. | Providing situational device settings for consumer electronics and discovering user-preferred device settings for consumer electronics |
Non-Patent Citations (2)
| Title |
|---|
| Donges, A Complete Guide to the Random Forest Algorithm (https://web.archive.org/web/20220402165808/https://builtin.com/data-science/random-forest-algorithm) (archived Apr. 21, 2022) (last accessed Jan. 25, 2025) (Year: 2022). * |
| Extended European Search Report for Application No. 23173117.5 mailed Jul. 24, 2023, 9 pages. |
Also Published As
| Publication number | Publication date |
|---|---|
| US20250240495A1 (en) | 2025-07-24 |
| US20230370689A1 (en) | 2023-11-16 |
| EP4277285A1 (en) | 2023-11-15 |
| CA3199182A1 (en) | 2023-11-13 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20250240495A1 (en) | Content classifiers for automatic picture and sound modes | |
| US12137265B2 (en) | Machine learning for adaptive bitrate selection | |
| US12177520B2 (en) | HDMI customized ad insertion | |
| US12301897B2 (en) | Emotion evaluation of contents | |
| US20250106474A1 (en) | Replacement of digital content in data streams | |
| US20250150668A1 (en) | Real-time objects insertion into content based on frame identifiers | |
| US11627368B1 (en) | Automatic offering and switching to a higher quality media stream | |
| US20250220284A1 (en) | Optimizing automatic content recognition queries based on content understanding | |
| US12549815B2 (en) | Context classification of streaming content using machine learning | |
| US20250386074A1 (en) | Customization of targeted media content | |
| US20250008188A1 (en) | Context classification of streaming content using machine learning | |
| US20250355956A1 (en) | Pairwise comparison rating to reduce presentation bias in content recommendation | |
| US20260004323A1 (en) | Content presentation using an exploitation-exploration paradigm | |
| US20240397168A1 (en) | Media device simulator | |
| US12363367B2 (en) | Tailoring and censoring content based on a detected audience | |
| US20260006276A1 (en) | Dynamic rendering of a contextualized advertisement | |
| US20250133251A1 (en) | Recommendation system with reduced bias based on a view history | |
| US20240127106A1 (en) | Online automatic hyperparameter tuning | |
| US12190864B1 (en) | Interest-based conversational recommendation system | |
| US20240196064A1 (en) | Trigger activated enhancement of content user experience | |
| US20260039929A1 (en) | High quality metadata creation for content using noisy sources | |
| US20240273575A1 (en) | Reinforcement learning (rl) model for optimizing long term revenue | |
| EP4351155A2 (en) | Stochastic content candidate selection for content recommendation | |
| US20250358488A1 (en) | Dynamic advertisement placement based on content understanding and user data | |
| US20250097523A1 (en) | Customized audio filtering of content |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: ROKU, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NANGERONI, PAUL;BELLERS, ERWIN BEN;CURTIS, ROBERT CASTON;AND OTHERS;SIGNING DATES FROM 20220511 TO 20220513;REEL/FRAME:059904/0821 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| AS | Assignment |
Owner name: CITIBANK, N.A., TEXAS Free format text: SECURITY INTEREST;ASSIGNOR:ROKU, INC.;REEL/FRAME:068982/0377 Effective date: 20240916 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| CC | Certificate of correction |