WO2024118624A1 - Outils et techniques informatiques d'optimisation en temps réel de contenu audio et audiovisuel - Google Patents

Outils et techniques informatiques d'optimisation en temps réel de contenu audio et audiovisuel Download PDF

Info

Publication number
WO2024118624A1
WO2024118624A1 PCT/US2023/081372 US2023081372W WO2024118624A1 WO 2024118624 A1 WO2024118624 A1 WO 2024118624A1 US 2023081372 W US2023081372 W US 2023081372W WO 2024118624 A1 WO2024118624 A1 WO 2024118624A1
Authority
WO
WIPO (PCT)
Prior art keywords
media content
audio
profile
optimizing
streaming media
Prior art date
Application number
PCT/US2023/081372
Other languages
English (en)
Inventor
Andrew Knox
Original Assignee
Akm Productions, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/071,367 external-priority patent/US20240089515A1/en
Application filed by Akm Productions, Inc. filed Critical Akm Productions, Inc.
Publication of WO2024118624A1 publication Critical patent/WO2024118624A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • H04N21/2335Processing of audio elementary streams involving reformatting operations of audio signals, e.g. by converting from one coding standard to another
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/27Server based end-user applications
    • H04N21/274Storing end-user multimedia data in response to end-user request, e.g. network recorder
    • H04N21/2743Video hosting of uploaded data from client
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Definitions

  • Various embodiments of the present invention generally relate to systems, processes, devices, and techniques for modifying, enhancing, or optimizing audio, video, and audiovisual content.
  • the invention relates to mastering or adjusting parameters for audio signals associated with certain video or audiovisual content.
  • Audio quality is an important component of content in a wide spectrum of applications arising in many different commercial enterprises, including companies in industries such as medical, entertainment, education, and business, among others. Audio quality can vary dramatically based on the type of device used to play the content, for example, such as computers, mobile phones, laptops, radios, vehicle sound systems, and other types of devices. In addition, audio quality often depends and varies based on the distribution channel through which it is communicated, such as through social media networks, professional networks, Internet, television, radio, or other channels. For example, sound volume for content viewed on the Internet (e.g., a “YouTube” video) can vary drastically from video to video, usually requiring the viewer to frequently and inconveniently adjust the volume up or down accordingly.
  • a “YouTube” video can vary drastically from video to video, usually requiring the viewer to frequently and inconveniently adjust the volume up or down accordingly.
  • volume for television programs can vary from program to program, or from commercial to commercial, likewise requiring the viewer to manually adjust settings on the television or computer to accommodate volume level differences.
  • another important factor to consider is the environment in which audio is experienced by the listener. For example, the acoustics of a room or other physical space occupied by the listener can significantly impact how the audio content is perceived.
  • FIGS. 1A and IB schematically illustrate one example of a computer system and associated process flows for providing a media content mastering platform structured in accordance with various embodiments of the invention.
  • FIG. 1C schematically illustrates another example of a computer system and associated process flows for providing a media content mastering platform structured in accordance with various embodiments of the invention
  • FIG. 2 displays an example of a landing page for a media content mastering platform.
  • FIG. 3 displays an example of adding original media content files to a media content mastering platform.
  • FIG. 4 includes an example of a screen display illustrating a tool for selecting original media content files to be optimized.
  • FIG. 5 includes an example of a screen display illustrating a list of original media content files to be optimized.
  • FIG. 6 includes an example of a screen display illustrating multiple currently available mastering profiles.
  • FIG. 7 includes an example of a screen display illustrating how a user has selected multiple mastering profiles for an uploaded media content file.
  • FIG. 8 includes an example of a screen display illustrating how the system can be programmed to facilitate copying a mastering profile between or among different media content files.
  • FIG. 9 includes an example of a screen display showing how a user can enter an e- mail address as a means for initiating the remastering process.
  • FIG. 10 includes an example of a screen display illustrating a “mastering in progress” message.
  • FIGS. 11A and 1 IB include examples of screen displays showing how a user can receive a list of files which have now been remastered using selected mastering profiles, and how the user is able to preview them through a user interface.
  • FIG. 12A includes an example of a screen display showing a remastered preview file.
  • FIG. 12B includes an example of a screen display showing an original preview file.
  • FIGS. 13 and 14 schematically illustrate a process for purchasing, downloading, and sharing remastered media content files.
  • FIGS. 15-24 include a series of screen displays illustrating examples of different aspects of the operation and processing performed by a media content mastering platform configured in accordance with certain embodiments of the invention.
  • FIG. 25 schematically illustrates an overview of the operational environment in which a media content optimization platform can be implemented in accordance with various embodiments of the invention.
  • FIG. 26 schematically illustrates examples of various functional components and associated process flows configured for implementing a media content optimization platform in accordance with various embodiments of the invention.
  • FIG. 27 includes an example of a media player and user interface functionality programmed for accessing certain aspects of a media content optimization platform in accordance with various embodiments of the invention.
  • the inventor has created a solution which provides a web-based platform that enables processing, analysis, and enhancement of different components of many types of media content, including audio, video, and audiovisual content.
  • the solution may employ a variety of mastering profiles, for example, which can be applied to selected media content files to adjust one or more sound parameters (e.g., volume) or other attributes associated with the content.
  • sound parameters e.g., volume
  • these tools can be used to optimize media content for a variety of different commercial applications.
  • FIGS. 1A and IB illustrate an example of a computer system 101 and associated process flows for providing a media content mastering platform structured in accordance with various embodiments of the invention.
  • a user can access a user interface 102 (e.g., a web-based interface) to load and display a landing page (as shown in the example of FIG. 2) on a user access device 104.
  • a button 202 for example, the user can add media content files (e.g., audio, video, audiovisual, or other kinds of media content files or content) to the platform, such as from a file directory as shown in the example of FIG. 3.
  • media content files e.g., audio, video, audiovisual, or other kinds of media content files or content
  • These media content files may be checked by the system 101 directly in the browser, and a determination may be made whether they are of a file type which can be supported by the processing features of the system 101.
  • the media content files to be optimized can be uploaded to file storage 106 directly from the user’s browser or interface 102, and then displayed on the access device 104 of the user, for example, as shown in the screen captures of FIG. 4 and FIG. 5 (which shows multiple files uploaded on the same screen display).
  • the media content files can be retrieved from file storage 106 by the user interface 102 to determine whether they are valid, to check format details, to detect duration, and/or to analyze other file attributes or media content.
  • the file storage 106 may include a variety of different kinds of electronic computer-readable media or devices.
  • the mastering profiles can be read from an application programming interface (API) 108 via the user interface 102.
  • the mastering profiles may comprise audio mastering profiles, for example, or other suitable mastering profiles containing data associated with parameters connected to different characteristics, attributes, or other aspects of media content.
  • the mastering profiles can then be displayed by the system 101 for each media file (if the media file passed all necessary checks, such as the supported file type check described above, and can be effectively processed by the system 101).
  • the audio mastering profiles may include one or more parameter settings that can be applied to a selected media file to alter or enhance one or more aspects of its sound components (e.g., bass, treble, loudness, quality, etc.) As shown in the example of FIG.
  • FIG. 8 illustrates an example of how the system 101 can be programmed to facilitate copying a mastering profile 704 between or among different media content files 706, 802.
  • FIG. 9 includes an example of how a user can enter an e-mail address 902 as a means for initiating the mastering process.
  • the mastering process involves the system 101 applying the selected mastering profiles to the uploaded media content files 706, 802 to generate one or more modified or adjusted media content files.
  • the screen displays a “mastering in progress” message and several processes can be performed on the content by the system 101.
  • processing performed by the system 101 may involve one or more computer servers 202, for example, and/or other types of computer processors or computing devices 204 programmed to perform or facilitate the various tasks and features described herein.
  • the user interface 102 can be programmed to monitor via the API 108 whether and when the designated mastering tasks have been completed by the system 101 by checking for the presence of mastered files in the file storage 106, for example.
  • the user interface 102 can be programmed for creating mastering tasks for appropriate media content files using selected profiles communicated via the API 108.
  • the API 108 may save information to a database 110 or other data storage media, generate and send email notifications to the user via a mail sender computer component 112, for example, and/or create jobs for each media file and profile in a job queue 114, among other tasks.
  • the job queue 114 can be programmed to deliver job messages to one or more available workers 116A-116D, which comprise computer-implemented modules or other computer components programmed for assisting with preparing modified or adjusted media content by applying mastering profiles to the data stored in the media content file.
  • the job queue 114 can use a control process 118 (e.g., a module or another set of computer- readable instructions) to monitor and manage how many workers 116 are available for mastering based on the amount of jobs in the job queue 114, to balance workload among the various workers 116A-116D, to determine whether a given job has been started, is in process, or has been completed, and/or other related processing tasks.
  • a control process 118 e.g., a module or another set of computer- readable instructions
  • each worker 116 may open and process one or more file streams.
  • an original audio file can be demultiplexed into individual media streams for further processing, the audio stream can be decoded into raw data, and certain streams can be segregated or discarded.
  • the video file can be demultiplexed into individual media streams for further processing. In this manner, it can be seen that the video stream and the audio stream can be segregated during the process.
  • a destination file can be created as a final file suitable for receiving the remastered audio content associated with the media content file.
  • the audio and/or video streams can be encoded to the formats and parameters in accordance with mastering profiles selected by the user during the mastering process.
  • the audio and video streams can then be multiplexed or combined together in a manner that reflects the most desirable profile, combination of profiles, or profile parameters as selected by the user.
  • a preview file with original sound can be created by re-encoding the audio stream in low quality 128k bitrate, for example, to generate the preview format.
  • the video stream (which uses the original file video) can have its resolution scaled, for example, to 640px width and variable height to keep its aspect ratio.
  • the video stream can be kept in the original format, and the audio and video streams can then be multiplexed together.
  • a preview file with remastered sound can be created by re-encoding the audio stream in low quality 128k bit rate, for example, and in accordance with a selected mastering profile or profiles, to generate the preview format.
  • the video stream (which uses the original file video) can have its resolution scaled, for example, to 640px width and variable height to keep its aspect ratio.
  • the video stream is kept in the original format, and the remastered audio stream and the video stream can then be multiplexed or combined together to create a modified or remastered media content file.
  • each worker 116 can be programmed to read the data of the original media content file audio stream by portions or segments. With each portion or segment, the worker 116 can perform the following processing (and this processing can be repeated until the entire media file is processed): remasters the portion using the mastering profile (or profiles) selected by the user; and sends the original file portion and remastered portion to destination file streams, based on whether the destination file uses original sound or remastered sound. Once the destination files have been finished, the worker 116 may set permissions on the preview files to be made available to the user 102. Also, once work has been completed on a media content file (or portion thereof), the worker 116 may be programmed to receive and process another job from the job queue 114.
  • one or more logs 119 may be maintained and stored to reflect processing and tasks performed by the workers 116A-116D and/or other components of the system 101.
  • the user can receive a list of files which have now been remastered using the selected profiles, and the user is able to preview them through user interface 102.
  • the user can toggle between an “original” selection 1102 to preview the audio content for a file with original sound (as shown in FIG. 1 IB), and a “mastered” selection 1104 to preview the audio content of the file with remastered sound (as shown in FIG. 11 A).
  • the user can preview the audio content by using the playback tools provided (e.g., play/pause button, mute button, volume control, etc.).
  • FIGS. 12A and 12B provide other examples of preview files, illustrating both an original preview file (see FIG. 12B) and a remastered preview file (see FIG. 12 A).
  • the user can make payment for the service by applying a coupon code and/or navigating directly to checkout to process the payment through a computer- based payment processor 120, perhaps with cart and checkout functionality, for example.
  • a download link may be communicated to the user to initiate download of final files now incorporating the desired remastered media content (e.g., as a ZIP file) (see FIG. 14).
  • the final files can be downloaded through a suitable data storage method or component as directed by the user.
  • the final files can be shared or communicated through various distribution channels or media outlets as directed and desired by the user.
  • certain user interface functionalities such as discount codes, user account data, and other information associated with specific users may be stored and accessed in connection with a database 132 associated directly with the user interface 102 (e.g., the web-based UI component).
  • a database 132 associated directly with the user interface 102 (e.g., the web-based UI component).
  • data and functions associated with the user interface 102 can be processed separately from data and functions associated with the API component 108.
  • tasks and data related to the web-based UI component 102 can be processed in parallel and more efficiently apart from the tasks and data processed by the API component 108.
  • the benefits of such parallel and separate computing functionality can be derived from dedicated database storage (e.g., a database for the API component 108, and a database 132 for the user interface 102), and dedicated e-mail notification capability (e.g., the mail sender 112 for the API component 108, and a mail sender 134 operatively associated with the user interface 102).
  • the API component 108 primarily handles functionality related to managing remastering jobs, such as data files, remastering profiles, communicating the status of remastering tasks (e.g., pending, in progress, completed, etc.), generating optimized or modified media content (e.g., waveforms, video thumbnails, previews, full- length remasters, etc.).
  • FIG. 15 illustrates an example of an initial screen accessible by the user through the user interface component 102, for example, which allows the user to initiate the process of uploading media content to the system 101 through an access device 104.
  • the user can select media content from a file directory, for example, or another data source by selecting the button 1602.
  • the user can also specify an e-mail address in data field 1604, which can be used at a later stage by the system 101 to communicate generated content or remastered media content to the user.
  • a “start remastering” button 1606 can then be selected to initiate the next phase of the process of remastering the uploaded media content, which is uploading the selected media content to the system 101 as shown in FIG. 17.
  • FIG. 18 shows a variety of profiles (e.g., mastering profiles and pre-mastering profiles) which a user can select for the uploaded media content to modify or remaster different aspects of the media content.
  • profiles e.g., mastering profiles and pre-mastering profiles
  • Audio frequency is the rate at which a sound wave completes a cycle from peak, through trough, and then back to peak. Frequency can be considered a different term for pitch.
  • Part of the audio recording, mixing, and mastering (and remastering) processes may involve assembling, shaping, and refining different frequency ranges (pitches) into a consonant (pleasurable) arrangement.
  • Another characteristic of sound is amplitude which can also be considered intensity or volume (e.g., the loudness or softness of the sound).
  • Timbre Sound quality or timbre describes those characteristics of sound which allow the listener to distinguish sounds which have the same pitch and loudness. Timbre can be considered a general term for the distinguishable characteristics of a tone. Timbre is why two different musical instruments can play the same exact note, at the same exact volume, and yet they sound different. Timbre recognizes that sound possesses subtle frequencies (pitches) in addition to a fundamental tone. Some are lower in pitch than the fundamental tone (subtones), and some are higher in pitch than the fundamental tone (overtones). These subtones and overtones collectively can be considered harmonics, and these different harmonics are what can “color” a sound and give the sound certain unique timbres.
  • An envelope (sometimes referred to as “ADSR envelope”) characteristic of sound reflects how sound behaves over time. Envelope can be divided into four separate characteristics.
  • An “attack” characteristic represents how quickly a sound reaches peak volume after the sound is activated. A “slow” attack means that the sound takes a longer time to reach the loudest point, and a “fast” attack means that the sound takes a comparatively shorter time to reach the loudest point. “Decay” addresses how quickly the sound drops to a “sustain” level after the sound hits its peak. Sustain relates to the steady state, constant volume that a sound achieves after decay until the note is released.
  • the “release” characteristic represents how quickly the sound will fade to nothing after a note has ended (e.g., after the key on a musical instrument has been released).
  • Velocity is the speed at which sound travels and it may be affected by different factors such as humidity, density, and temperature.
  • Wavelength is the distance between successive crests of a sound wave. Phase is defined as a location in a given waveform cycle for a sound wave (with 360 degrees representing one complete cycle).
  • selecting one or more different profiles depends a number of factors such as the nature of the media content (e.g., audio or audiovisual), its planned distribution medium or channel (e.g., broadcast channel versus social media channel), the type of access device used to communicate the content (e.g., mobile device versus laptop or notebook computing device), the acoustical environment in which the media content will be communicated (e.g., home setting, large department store, coffee shop, etc.), and/or other factors.
  • the nature of the media content e.g., audio or audiovisual
  • its planned distribution medium or channel e.g., broadcast channel versus social media channel
  • the type of access device used to communicate the content e.g., mobile device versus laptop or notebook computing device
  • the acoustical environment in which the media content will be communicated e.g., home setting, large department store, coffee shop, etc.
  • a “Max Presence” mastering profile may involve applying enhanced equalization (EQ) to make sound and music more immersive and engaging to audiences such as for social media channels (e.g., Instagram) or video distribution platforms (e.g., YouTube).
  • EQ enhanced equalization
  • a “Balanced Clarity” mastering profile can provide a balanced EQ level enhancement for music and voice over audio applications, for example.
  • a “Smooth Boost” mastering profile may include a slight EQ enhancement to audio content, such as when a user wants to make a product video more immersive while also keeping distance between music and a voice over used in the video.
  • a “Deep Detail” mastering profile can be employed as a balanced mix in situations where hearing different people engaged in dialogue, for example, is important for the media content.
  • a “Level Boost” profile can be applied to raise the volume level associated with the media content, for example.
  • “Special Applications” profiles 1804 can be provided for remastering purposes.
  • a “Voice Focus” profile can be applied to reduce or eliminate background noise and provide clarity to content such as interviews between people, conference calls, voice calls, or for content that will be transcribed by a transcriptionist.
  • a “Podcast” profile can be used for reducing or eliminating undesirable background noise, especially in situations where recorded voices need to be enhanced or distinguished in the content.
  • different kinds of broadcast television or radio profiles 1806 may be made available for selection by the user.
  • one or more different kinds of “Noise Reduction” profiles 1808 can be applied to the media content, which may occur in a pre-mastering stage before another mastering profile is applied to the content, for example.
  • These “Noise Reduction” profiles 1808 can be used to reduce or eliminate background noise such as microphone hissing or buzzing.
  • FIG. 19 illustrates the system 101 in the process of generating media content using the selected profiles.
  • FIG. 20 depicts an optimized media preview screen in which the user can view the now remastered or optimized media content 2002.
  • the user may selection one or more options 2004 for previewing the content, as remastered or optimized by a variety of different profiles, including reviewing the media content in its original pre-mastered form. It can be appreciated how this facilitates a comparison of the original content versus the remastered or optimized content in the context of a variety of different profiles.
  • a visual representation of the audio signal 2006 associated with each preview version of the media content can also be displayed on this preview screen to enhance understanding of the remastered or optimized content.
  • FIG. 21 illustrates an example of how other profiles or optimization techniques can be customized as may be desired by the user.
  • FIG. 22 illustrates an example of processing payment for the remastered versions of the media content selected for purchase by the user.
  • FIG. 23 illustrates an example of the system 101 processing the entirety of the media content in accordance with the profiles previewed, selected, and purchased by the user.
  • FIG. 24 shows an example of making the remastered or optimized media content available for download to a storage location determined by the user.
  • FIG. 25 schematically illustrates an overview of an operational environment in which a media content optimization platform can be implemented in accordance with various embodiments of the invention.
  • FIG. 26 schematically illustrates examples of various functional components and associated process flows configured for implementing a media content optimization platform in accordance with various embodiments of the invention.
  • FIG. 27 includes an example of a media player and user interface functionality programmed for accessing certain aspects of a media content optimization platform in accordance with various embodiments of the invention.
  • a media content optimization platform 2502 can be integrated into and accessed in an overall operating environment to facilitate optimizing audio aspects of audio and/or audiovisual media content 2504 created as signals or data which can be derived from various kinds of devices (e.g., microphones, cameras, content on computer-based storage devices, and other devices or sources).
  • the media content 2504 can be derived from a live stream of audio or audiovisual content, such as a broadcast or stream of content to a computer device using an audio or audiovisual player, for example.
  • the optimization platform 2502 may be configured with components programmed for enabling the various optimization features and executing the various optimization functions previously described hereinabove.
  • the optimization platform 2502 may include components configured for assisting with live stream audio optimization, for example, in real-time or near real-time applications of the technology.
  • the optimization platform 2502 may include a stream processing module 2506 operatively connected to a media server 2508 and a media processor 2510 (among other possible components).
  • the media server 2508 can be configured with multiple slots each of which can be dedicated to listening to incoming streams of media content 2504 of a certain kind, designation, or media type, for example.
  • the media server 2508 can be programmed for using Real-Time Messaging Protocol (RTMP) protocol, for example.
  • RTMP Real-Time Messaging Protocol
  • Each slot of the media server 2508 can be pre-configured with settings associated with one or more different kinds of optimizing profiles 2512 (e g., audio mastering profiles, noise reduction profiles, and/or many others as described hereinabove), which can be made available for performing audio optimization on the media content 2504.
  • Other settings or parameters of the media server 2508 may include data associated with the destination where the optimized stream is to be communicated after it has been optimized by the platform 2502, for example.
  • the media server 2508 extracts audio and/or video data from the received content 2504 and then passes the extracted content to the media processor 2510.
  • the media processor 2510 divides and processes the audio and/or video data in multiple segments and then demultiplexes the audio and video from each segment (such as in accordance with components, tools, and/or techniques described hereinabove).
  • the size of each segment can be selected or predetermined to facilitate optimum processing speed and efficiency for real-time or near real-time processing levels in association with delivery of optimized content to various end users 2514.
  • the media content can be optimized in real-time using one or more of the profiles 2512, for example, and profile settings can be selected by users 2514 and/or predetermined in the platform 2502.
  • a streaming control component 2516 can be included in the platform 2502 to facilitate selection and processing of optimizing profile data, among other parameters or configuration aspects of optimizing the media content 2504.
  • Optimized audio content can then be multiplexed on a segment-by-segment basis with the original media content 2504 to reconstruct a stream segment.
  • the media content 2504 is embodied in an audio-only stream, for example, then there may be no need for the demultiplexing and multiplexing processes described above, and the audio data can be passed directly to the media processor 2510.
  • Each optimized stream segment can then be streamed to its next destination, which may be a stream broadcast component 2518 of a media content provider streaming system 2520, for example. From that point, the content can be transmitted to a media content provider delivery system 2522, which includes a computer-implemented player 2524 configured for playing and displaying the content for the user 2514.
  • the delivery system 2522 can be configured with one or more types of user interfaces 2526 programmed for communicating profile 2521 settings, for example, received from different users 2514.
  • FIG. 27 illustrates one example of a player 2702 which can be accessed by a user 2514 in connection with accessing and viewing optimized media content 2504.
  • the functionality and features of the optimization platform 2502 can be initiated by selecting a button 2704 (or an equivalent access feature) which has been incorporated into the player 2702.
  • a user interface 2706 provides a menu of choices to facilitate user selection of predetermined settings for optimizing the media content 2504.
  • settings for various optimizing profiles 2512 can be selected by the user 2514, including a movie profile, a music profile, a voice clarity profile, a “loud and proud” profile, a spatial audio profile, and/or a noise reduction profile.
  • This integration with the media content 2504 allows the user 2514 to select the audio enhancement for different applications and environments, including for example, live streaming, stored content streaming, meta verse, web browser, mobile applications, and others.
  • a media content optimization platform can be configured which comprises a media server programmed for receiving original media content comprising streaming media content, extracting audio and/or video data from the received streaming media content, and transmitting the extracted audio and/or video data to a media processor.
  • the media processor can be programmed for dividing the transmitted audio and/or video data media content into multiple audio component segments and multiple video component segments, applying at least one optimizing profile to at least the audio component segments of the streaming media content to generate one or more optimized audio component segment.
  • At least one of the optimizing profiles may include an audio mastering profile including one or more parameter settings configured for application to a selected audio component segment to alter or enhance one or more aspects of sound components of the streaming media content, and the audio mastering profile can be configured for optimizing at least a portion of the streaming media content in response to a type of access device to be used to communicate optimized media content. Also, the platform can be configured for combining the video component segments with their corresponding optimized audio component segment to generate at least one optimized streaming media content segment.
  • any element expressed herein as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a combination of elements that performs that function.
  • the invention as may be defined by such means-plus-function claims, resides in the fact that the functionalities provided by the various recited means are combined and brought together in a manner as defined by the appended claims. Therefore, any means that can provide such functionalities may be considered equivalents to the means shown herein.
  • various models or platforms can be used to practice certain aspects of the invention.
  • software-as-a-service (SaaS) models or application service provider (ASP) models may be employed as software application delivery models to communicate software applications to clients or other users.
  • SaaS software-as-a-service
  • ASP application service provider
  • Such software applications can be downloaded through an Internet connection, for example, and operated either independently (e.g., downloaded to a laptop or desktop computer system) or through a third-party service provider (e.g., accessed through a third-party web site).
  • cloud computing techniques may be employed in connection with various embodiments of the invention.
  • the processes associated with the present embodiments may be executed by programmable equipment, such as computers.
  • Software or other sets of instructions that may be employed to cause programmable equipment to execute the processes may be stored in any storage device, such as a computer system (non-volatile) memory.
  • some of the processes may be programmed when the computer system is manufactured or via a computer-readable memory storage medium.
  • a computer-readable medium may include, for example, memory devices such as diskettes, compact discs of both read-only and read/write varieties, optical disk drives, and hard disk drives.
  • a computer-readable medium may also include memory storage that may be physical, virtual, permanent, temporary, semi-permanent and/or semitemporary.
  • Memory and/or storage components may be implemented using any computer-readable media capable of storing data such as volatile or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth.
  • Examples of computer-readable storage media may include, without limitation, RAM, dynamic RAM (DRAM), Double-Data-Rate DRAM (DDRAM), synchronous DRAM (SDRAM), static RAM (SRAM), read-only memory (ROM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory (e.g., NOR or NAND flash memory), content addressable memory (CAM), polymer memory (e g., ferroelectric polymer memory), phase-change memory, ovonic memory, ferroelectric memory, silicon-oxide- nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, or any other type of media suitable for storing information.
  • RAM random access memory
  • DRAM dynamic RAM
  • DDRAM Double-Data-Rate DRAM
  • SDRAM synchronous DRAM
  • SRAM static RAM
  • ROM read-only memory
  • PROM programmable ROM
  • EPROM erasable programmable ROM
  • the invention may employ optical character recognition (OCR) technology, such as to capture data and other information from documents scanned by different components of the platform.
  • OCR optical character recognition
  • This OCR technology may be derived from conventional OCR techniques, customized OCR technology (i.e., modified for the current platform solution), and/or some combination thereof.
  • a “computer,” “computer system,” “computing apparatus,” “component,” or “computer processor” may be, for example and without limitation, a processor, microcomputer, minicomputer, server, mainframe, laptop, personal data assistant (PDA), wireless e-mail device, smart phone, mobile phone, electronic tablet, cellular phone, pager, fax machine, scanner, or any other programmable device or computer apparatus configured to transmit, process, and/or receive data.
  • PDA personal data assistant
  • Computer systems and computer-based devices disclosed herein may include memory and/or storage components for storing certain software applications used in obtaining, processing, and communicating information. It can be appreciated that such memory may be internal or external with respect to operation of the disclosed embodiments.
  • a “host,” “engine,” “loader,” “filter,” “platform,” or “component” may include various computers or computer systems, or may include a reasonable combination of software, firmware, and/or hardware.
  • a “module” may include software, firmware, hardware, or any reasonable combination thereof.
  • a single component may be replaced by multiple components, and multiple components may be replaced by a single component, to perform a given function or functions.
  • Any of the servers described herein, for example, may be replaced by a “server farm” or other grouping of networked servers (e.g., a group of server blades) that are located and configured for cooperative functions. It can be appreciated that a server farm may serve to distribute workload between/among individual components of the farm and may expedite computing processes by harnessing the collective and cooperative power of multiple servers.
  • Such server farms may employ load-balancing software that accomplishes tasks such as, for example, tracking demand for processing power from different machines, prioritizing and scheduling tasks based on network demand, and/or providing backup contingency in the event of component failure or reduction in operability.
  • Examples of assembly languages include ARM, MIPS, and x86; examples of high-level languages include Ada, BASIC, C, C++, C#, COBOL, Fortran, lava, Lisp, Pascal, Object Pascal; and examples of scripting languages include Bourne script, JavaScript, Python, Ruby, PHP, and Perl.
  • Such software may be stored on any type of suitable computer- readable medium or media such as, for example, a magnetic or optical storage medium.
  • Various embodiments of the systems and methods described herein may employ one or more electronic computer networks to promote communication among different components, transfer data, or to share resources and information.
  • Such computer networks can be classified according to the hardware and software technology that is used to interconnect the devices in the network, such as optical fiber, Ethernet, wireless LAN, HomePNA, power line communication or G.hn.
  • Wireless communications described herein may be conducted with Wi-Fi and Bluetooth enabled networks and devices, among other types of suitable wireless communication protocols.
  • the computer networks may also be embodied as one or more of the following types of networks: local area network (LAN); metropolitan area network (MAN); wide area network (WAN); virtual private network (VPN); storage area network (SAN); or global area network (GAN), among other network varieties.
  • LAN local area network
  • MAN metropolitan area network
  • WAN wide area network
  • VPN virtual private network
  • SAN storage area network
  • GAN global area network
  • a WAN computer network may cover a broad area by linking communications across metropolitan, regional, or national boundaries.
  • the network may use routers and/or public communication links.
  • One type of data communication network may cover a relatively broad geographic area (e.g., city-to-city or country-to-country) which uses transmission facilities provided by common carriers, such as telephone service providers.
  • a GAN computer network may support mobile communications across multiple wireless LANs or satellite networks.
  • a VPN computer network may include links between nodes carried by open connections or virtual circuits in another network (e.g., the Internet) instead of by physical wires. The link-layer protocols of the VPN can be tunneled through the other network.
  • One VPN application can promote secure communications through the Internet.
  • the VPN can also be used to conduct the traffic of different user communities separately and securely over an underlying network.
  • the VPN may provide users with the virtual experience of accessing the network through an IP address location other than the actual IP address which connects the wireless device to the network.
  • the computer network may be characterized based on functional relationships among the elements or components of the network, such as active networking, client-server, or peer-to-peer functional architecture.
  • the computer network may be classified according to network topology, such as bus network, star network, ring network, mesh network, star-bus network, or hierarchical topology network, for example.
  • the computer network may also be classified based on the method employed for data communication, such as digital and analog networks.
  • Embodiments of the methods and systems described herein may employ internetworking for connecting two or more distinct electronic computer networks or network segments through a common routing technology.
  • the type of internetwork employed may depend on administration and/or participation in the internetwork.
  • Non-limiting examples of internetworks include intranet, extranet, and Internet.
  • Intranets and extranets may or may not have connections to the Internet. If connected to the Internet, the intranet or extranet may be protected with appropriate authentication technology or other security measures.
  • an intranet can be a group of networks which employ Internet Protocol, web browsers and/or file transfer applications, under common control by an administrative entity. Such an administrative entity could restrict access to the intranet to only authorized users, for example, or another internal network of an organization or commercial entity.
  • an extranet may include a network or internetwork generally limited to a primary organization or entity, but which also has limited connections to the networks of one or more other trusted organizations or entities (e.g., customers of an entity may be given access an intranet of the entity thereby creating an extranet).
  • Computer networks may include hardware elements to interconnect network nodes, such as network interface cards (NICs) or Ethernet cards, repeaters, bridges, hubs, switches, routers, and other like components. Such elements may be physically wired for communication and/or data connections may be provided with microwave links (e.g., IEEE 802.12) or fiber optics, for example.
  • NICs network interface cards
  • a network card, network adapter or NIC can be designed to allow computers to communicate over the computer network by providing physical access to a network and an addressing system through the use of MAC addresses, for example.
  • a repeater can be embodied as an electronic device that receives and retransmits a communicated signal at a boosted power level to allow the signal to cover a telecommunication distance with reduced degradation.
  • a network bridge can be configured to connect multiple network segments at the data link layer of a computer network while learning which addresses can be reached through which specific ports of the network.
  • the bridge may associate a port with an address and then send traffic for that address only to that port.
  • local bridges may be employed to directly connect local area networks (LANs); remote bridges can be used to create a wide area network (WAN) link between LANs; and/or, wireless bridges can be used to connect LANs and/or to connect remote stations to LANs.
  • LANs local area networks
  • WAN wide area network
  • wireless bridges can be used to connect LANs and/or to connect remote stations to LANs.
  • Embodiments of the methods and systems described herein may divide functions between separate CPUs, creating a multiprocessing configuration. For example, multiprocessor and multi-core (multiple CPUs on a single integrated circuit) computer systems with co-processing capabilities may be employed. Also, multitasking may be employed as a computer processing technique to handle simultaneous execution of multiple computer programs.
  • the functional components such as software, engines, and/or modules may be implemented by hardware elements that may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
  • processors microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
  • processors microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors
  • Examples of software, engines, and/or modules may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application programming interfaces (API), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints.
  • various embodiments may be implemented as an article of manufacture.
  • the article of manufacture may include a computer readable storage medium arranged to store logic, instructions and/or data for performing various operations of one or more embodiments.
  • the article of manufacture may comprise a magnetic disk, optical disk, flash memory or firmware containing computer program instructions suitable for execution by a processor or application specific processor.
  • processing refers to the action and/or processes of a computer or computing system, or similar electronic computing device, a DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein that manipulates and/or transforms data represented as physical quantities (e.g., electronic) within registers and/or memories into other data similarly represented as physical quantities within the memories, registers or other such information storage, transmission or display devices.
  • physical quantities e.g., electronic
  • Coupled and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, some embodiments may be described using the terms “connected” and/or “coupled” to indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, also may mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. With respect to software elements, for example, the term “coupled” may refer to interfaces, message interfaces, application program interface (API), exchanging messages, and so forth.
  • API application program interface
  • each block, step, or action may represent a module, segment, or portion of code that comprises program instructions to implement the specified logical functions.
  • the program instructions may be embodied in the form of source code that comprises human-readable statements written in a programming language or machine code that comprises numerical instructions recognizable by a suitable execution system such as a processing component in a computer system.
  • each block may represent a circuit or a number of interconnected circuits to implement the specified logical functions.
  • references to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is comprised in at least one embodiment.
  • the appearances of the phrase “in one embodiment” or “in one aspect” in the specification are not necessarily all referring to the same embodiment.
  • the terms “a” and “an” and “the” and similar referents used in the context of the present disclosure are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. Recitation of ranges of values herein is merely intended to serve as a shorthand method of referring individually to each separate value falling within the range.
  • search and optimization tools including search algorithms, mathematical optimization, and evolutionary computation methods can be used for intelligently searching through many possible solutions.
  • logical operations can involve searching for a path that leads from premises to conclusions, where each step is the application of an inference rule.
  • Planning algorithms can search through trees of goals and subgoals, attempting to find a path to a target goal, in a process called means-ends analysis.
  • Heuristics can be used that prioritize choices in favor of those more likely to reach a goal and to do so in a shorter number of steps. In some search methodologies heuristics can also serve to eliminate some choices unlikely to lead to a goal. Heuristics can supply a computer system with a best estimate for the path on which the solution lies. Heuristics can limit the search for solutions into a smaller sample size, thereby increasing overall computer system processing efficiency.
  • Propositional logic can be used which involves truth functions such as "or” and “not” search terms, and first-order logic can add quantifiers and predicates, and can express facts about objects, their properties, and their relationships with each other. Fuzzy logic assigns a degree of truth (e.g., between 0 and 1) to vague statements which may be too linguistically imprecise to be completely true or false. Default logics, non-monotonic logics and circumscription are forms of logic designed to help with default reasoning and the qualification problem. Several extensions of logic can be used to address specific domains of knowledge, such as description logics, situation calculus, event calculus and fluent calculus (for representing events and time), causal calculus, belief calculus (belief revision); and modal logics. Logic for modeling contradictory or inconsistent statements arising in multi-agent systems can also be used, such as paraconsistent logics.
  • Bayesian networks are tools that can be used for various problems: reasoning (using the Bayesian inference algorithm), learning (using the expectation-maximization algorithm), planning (using decision networks), and perception (using dynamic Bayesian networks).
  • Probabilistic algorithms can be used for filtering, prediction, smoothing and finding explanations for streams of data, helping perception systems to analyze processes that occur over time (e.g., hidden Markov models or Kalman filters).
  • Artificial intelligence can use the concept of utility as a measure of how valuable something is to an intelligent agent.
  • Mathematical tools can analyze how an agent can make choices and plan, using decision theory, decision analysis, and information value theory. These tools include models such as Markov decision processes, dynamic decision networks, game theory and mechanism design.
  • the artificial intelligence techniques applied to embodiments of the invention may leverage classifiers and controllers.
  • Classifiers are functions that use pattern matching to determine a closest match. They can be tuned according to examples known as observations or patterns. In supervised learning, each pattern belongs to a certain predefined class which represents a decision to be made. All of the observations combined with their class labels are known as a data set. When a new observation is received, that observation is classified based on previous experience.
  • a classifier can be trained in various ways; there are many statistical and machine learning approaches.
  • the decision tree is one kind of symbolic machine learning algorithm.
  • the naive Bayes classifier is one kind of classifier useful for its scalability, in particular. Neural networks can also be used for classification.
  • Classifier performance depends in part on the characteristics of the data to be classified, such as the data set size, distribution of samples across classes, dimensionality, and the level of noise. Model-based classifiers perform optimally when the assumed model is an optimized fit for the actual data. Otherwise, if no matching model is available, and if accuracy (rather than speed or scalability) is a primary concern, then discriminative classifiers (e.g., SVM) can be used to enhance accuracy.
  • SVM discriminative classifiers
  • a neural network is an interconnected group of nodes which can be used in connection with various embodiments of the invention, such as execution of various methods, processes, or algorithms disclosed herein.
  • Each neuron of the neural network can accept inputs from other neurons, each of which when activated casts a weighted vote for or against whether the first neuron should activate.
  • Learning achieved by the network involves using an algorithm to adjust these weights based on the training data. For example, one algorithm increases the weight between two connected neurons when the activation of one triggers the successful activation of another.
  • Neurons have a continuous spectrum of activation, and neurons can process inputs in a non-linear way rather than weighing straightforward votes.
  • Neural networks can model complex relationships between inputs and outputs or find patterns in data.
  • Neural networks can leam continuous functions and even digital logical operations.
  • Neural networks can be viewed as a type of mathematical optimization which performs a gradient descent on a multi-dimensional topology that was created by training the network. Another type of algorithm is a backpropagation algorithm.
  • Other examples of learning techniques for neural networks include Hebbian learning, group method of data handling (GMDH), or competitive learning.
  • the main categories of networks are acyclic or feedforward neural networks (where the signal passes in only one direction), and recurrent neural networks (which allow feedback and short-term memories of previous input events). Examples of feedforward networks include perceptrons, multi-layer perceptrons, and radial basis networks.
  • Deep learning techniques applied to various embodiments of the invention can use several layers of neurons between the network's inputs and outputs.
  • the multiple layers can progressively extract higher-level features from the raw input. For example, in image processing, lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits or letters or faces.
  • Deep learning may involve convolutional neural networks for many or all of its layers.
  • each neuron receives input from only a restricted area of the previous layer called the neuron's receptive field. This can substantially reduce the number of weighted connections between neurons.
  • the signal will propagate through a layer more than once.
  • a recurrent neural network is another example of a deep learning technique which can be trained by gradient descent, for example.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Selon la présente demande, une plateforme d'optimisation de contenu multimédia mise en œuvre par ordinateur comprend un serveur multimédia programmé pour recevoir un contenu multimédia d'origine comprenant un contenu multimédia de diffusion en continu et pour extraire des données audio et/ou vidéo du contenu multimédia de diffusion en continu reçu. Un processeur multimédia de la plateforme est programmé pour diviser le contenu multimédia de données audio et/ou vidéo transmis en segments de composants audio multiples et segments de composants vidéo multiples, appliquer au moins un profil d'optimisation au moins aux segments de composants audio du contenu multimédia de diffusion en continu, et combiner chacun des segments de composants vidéo avec leurs segments de composants audio optimisés correspondants afin de générer au moins un segment de contenu multimédia de diffusion en continu optimisé.
PCT/US2023/081372 2022-11-29 2023-11-28 Outils et techniques informatiques d'optimisation en temps réel de contenu audio et audiovisuel WO2024118624A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US18/071,367 US20240089515A1 (en) 2019-07-26 2022-11-29 Computer-based tools and techniques for real-time optimization of audio and audiovisual content
US18/071,367 2022-11-29

Publications (1)

Publication Number Publication Date
WO2024118624A1 true WO2024118624A1 (fr) 2024-06-06

Family

ID=91324837

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/081372 WO2024118624A1 (fr) 2022-11-29 2023-11-28 Outils et techniques informatiques d'optimisation en temps réel de contenu audio et audiovisuel

Country Status (1)

Country Link
WO (1) WO2024118624A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300289A1 (en) * 2016-04-13 2017-10-19 Comcast Cable Communications, Llc Dynamic Equalizer
US20190215540A1 (en) * 2016-07-22 2019-07-11 Dolby International Ab Network-based processing and distribution of multimedia content of a live musical performance
US20200152234A1 (en) * 2018-11-12 2020-05-14 Netflix, Inc. Systems and methods for adaptive streaming of multimedia content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300289A1 (en) * 2016-04-13 2017-10-19 Comcast Cable Communications, Llc Dynamic Equalizer
US20190215540A1 (en) * 2016-07-22 2019-07-11 Dolby International Ab Network-based processing and distribution of multimedia content of a live musical performance
US20200152234A1 (en) * 2018-11-12 2020-05-14 Netflix, Inc. Systems and methods for adaptive streaming of multimedia content

Similar Documents

Publication Publication Date Title
JP7575502B2 (ja) 音楽生成器
US11516545B1 (en) Optimization tools and techniques for audio and audiovisual content
US11151994B2 (en) Methods and systems for managing voice commands and the execution thereof
US10382509B2 (en) Audio-based application architecture
JP2020034895A (ja) 応答方法及び装置
US9720917B2 (en) Electronic meeting question management
US10891954B2 (en) Methods and systems for managing voice response systems based on signals from external devices
CN112328849A (zh) 用户画像的构建方法、基于用户画像的对话方法及装置
WO2022257708A1 (fr) Protection d'informations sensibles dans des échanges conversationnels
US11451601B2 (en) Systems and methods for dynamic allocation of computing resources for microservice architecture type applications
US20240089515A1 (en) Computer-based tools and techniques for real-time optimization of audio and audiovisual content
US11636282B2 (en) Machine learned historically accurate temporal classification of objects
CN113808610A (zh) 从多个说话者中分离目标语音的方法和装置
JP2023550959A (ja) ユーザインタラクションを伴う時系列予測のための自動化された深層学習アーキテクチャ選択
WO2022127485A1 (fr) Amplification vocale spécifique au locuteur
CN111968626A (zh) 变声处理方法、装置、设备及可读存储介质
CN114450747B (zh) 用于基于音频文件更新文档的方法、系统和计算机可读介质
US20210141815A1 (en) Methods and systems for ensuring quality of unstructured user input content
JP2023541879A (ja) 分離されたオーディオ入力からの音声内容のデータ解析およびダイレーションを使用する音声認識
US11875121B2 (en) Generating responses for live-streamed questions
WO2024118624A1 (fr) Outils et techniques informatiques d'optimisation en temps réel de contenu audio et audiovisuel
US20210142180A1 (en) Feedback discriminator
US11893305B2 (en) System and method for synthetic audio generation
US20190205469A1 (en) Cognitive system and method to select best suited audio content based on individual's past reactions
US20220180865A1 (en) Runtime topic change analyses in spoken dialog contexts

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23898708

Country of ref document: EP

Kind code of ref document: A1