WO2023121850A1 - Automatic in-game subtitles and closed captions - Google Patents
Automatic in-game subtitles and closed captions Download PDFInfo
- Publication number
- WO2023121850A1 WO2023121850A1 PCT/US2022/051581 US2022051581W WO2023121850A1 WO 2023121850 A1 WO2023121850 A1 WO 2023121850A1 US 2022051581 W US2022051581 W US 2022051581W WO 2023121850 A1 WO2023121850 A1 WO 2023121850A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- game
- subtitle
- overlay
- audio stream
- audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/50—Controlling the output signals based on the game progress
- A63F13/53—Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/30—Interconnection arrangements between game servers and game devices; Interconnection arrangements between game devices; Interconnection arrangements between game servers
- A63F13/35—Details of game servers
- A63F13/355—Performing operations on behalf of clients with restricted processing capabilities, e.g. servers transform changing game scene into an encoded video stream for transmitting to a mobile phone or a thin client
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/40—Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment
- A63F13/42—Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle
- A63F13/424—Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle involving acoustic input signals, e.g. by using the results of pitch or rhythm extraction or voice recognition
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/85—Providing additional services to players
- A63F13/87—Communicating with other players during game play, e.g. by e-mail or chat
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
- H04N21/4781—Games
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/30—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by output arrangements for receiving control signals generated by the game device
- A63F2300/303—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by output arrangements for receiving control signals generated by the game device for displaying additional data, e.g. simulating a Head Up Display
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/50—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers
- A63F2300/57—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by details of game servers details of game services offered to the player
- A63F2300/572—Communication between players during game play of non game information, e.g. e-mail, chat, file transfer, streaming of audio and streaming of video
Definitions
- Subtitles or closed captions for interactive content can provide a key accessibility feature for users with hearing impairments or difficult listening environments. Users that are deaf, hard of hearing, or affected by tinnitus or other hearing conditions may not be able to fully understand audio cues and spoken dialogue. Noisy environments can exacerbate the problem, such as when a user is using public transport, traversing crowded spaces, or is in proximity to construction, traffic, musical performances, or other sources of background noise. Conversely, in environments where silence must be maintained, such as at offices or libraries, or late at night when noise ordinances may be in effect, audio may need to be played at low volume or muted, rendering audio difficult to hear clearly.
- headphones may assist in hearing audio
- headphones may be misplaced, forgotten, or incompatible with hearing aids or other devices.
- spoken dialogue is clearly audible to the user, it may be spoken in a foreign language or in a dialect or accent that is not readily understood by the user.
- subtitles or closed captions can assist the user in better understanding audio.
- FIG. 1 is a block diagram that depicts a system for implementing automatic ingame subtitles, as described herein.
- FIG. 2A is a diagram that depicts an example graphical user interface (GUI) of a video game application.
- GUI graphical user interface
- FIG. 2B is a diagram that depicts an example graphical user interface (GUI) of a video game application with automatic in-game subtitles.
- GUI graphical user interface
- FIG. 2C is a diagram that depicts an example graphical user interface (GUI) of a video game application with automatic in-game subtitles positioned in proximity to sound sources.
- GUI graphical user interface
- FIG. 3 is a flow diagram that depicts an approach for implementing automatic ingame subtitles.
- An approach is provided for a gaming overlay application to provide automatic ingame subtitles and/or closed captions for video game applications.
- the overlay application accesses an audio stream and a video stream generated by an executing game application.
- the video stream comprises frames of image data that are rendered during the executing of the game application.
- the overlay application processes the audio stream through a text conversion engine, which, in implementations, includes a speech-to-text engine, to generate at least one subtitle.
- the overlay application determines a display position to associate with the at least one subtitle.
- the overlay application generates a subtitle overlay comprising the at least one subtitle located at the associated display position.
- the overlay application causes at least a portion of the video stream to be displayed with the subtitle overlay.
- Techniques discussed herein enable a gaming overlay application to analyze realtime audio streams from a video game to generate subtitles to be displayed, even when the video game does not natively support subtitles.
- various cues such as multi-channel surround sound information and machine learning based voice profde matching
- dialogue and audio cues are associated with specific characters, multiplayer users, or other elements shown in-game, and subtitles are positioned onscreen at a user preferred location or in proximity to the associated sound source.
- a user quickly identifies a speaker and their associated dialogue even if audio is difficult to hear or muted. This enables the user to react more quickly and efficiently by understanding and reacting to audio cues even with hearing impediments or challenging listening environments.
- subtitles are shown in a variety of contexts, including cutscenes, in matching lobbies or during gameplay.
- FIG. 1 is a block diagram that depicts a system 100 for implementing automatic in-game subtitles, as described herein.
- Subtitles include transcriptions or translations of dialogue or speech of a video, video game, etc. and descriptions of sound effects, musical cues or other relevant audio information from the video/video game.
- references to subtitles also include closed captions or subtitles with additional context such as speaker identification and non-speech elements such as descriptions of sound effects and audio cues.
- system 100 includes computing device 110, network 160, input/output (I/O) devices 170, and display 180.
- computing device 110 includes processor 120, graphics processing unit (GPU) 122, data bus 124, and memory 130.
- GPU graphics processing unit
- GPU 122 includes memory for storing one or more frame buffers 123 .
- memory 130 stores game application 140 and gaming overlay application 150.
- game application 140 outputs audio stream 142 and video stream 144.
- Gaming overlay application 150 includes text conversion engine 152, subtitle compositor 154, voice profde database 156, and user preferences 158.
- I/O devices 170 include microphone 172 and speakers 174.
- Display 180 includes an interface to receive game graphics 182 from computing device 110.
- game graphics 182 includes subtitle overlay 190.
- the components of system 100 are only exemplary and any configuration of system 100 is usable according to the requirements of game application 140.
- Game application 140 is executed on computing device 110 by one or more of processor 120, GPU 122, or other computing resources not specifically depicted.
- Processor 120 is any type of general-purpose single or multi core processor, or a specialized processor such as application-specific integrated circuit (ASIC) or field programmable gate array (FPGA). In implementations, more than one processor 120 is present.
- GPU 122 is any type of specialized hardware for graphics processing, which is addressable using various graphics application programming interfaces (APIs) such as DirectX, Vulkan, OpenGU, and OpenCU.
- APIs graphics application programming interfaces
- GPU 122 includes frame buffers 123, where finalized video frames are stored before outputting to display 180.
- Data bus 124 is any high-speed interconnect for communications between components of computing device 110, such as a Peripheral Component Interconnect (PCI) Express bus, an Infinity Fabric, or an Infinity Architecture.
- Memory 130 is any type of memory, such as a random access memory (RAM) or other storage device.
- game application 140 generates audio stream 142 and video stream 144, corresponding to real-time audio and video content.
- audio stream 142 and video stream 144 are combined into a single audiovisual stream.
- Audio stream 142 corresponds to internally generated in-game audio and in implementations includes multiple channels for surround sound and/or 3D positional audio information.
- game application 140 supports multiplayer gaming via network 160.
- voice chat streams from game participants are embedded in audio stream 142, either combined with existing in-game audio or as separate channels to be mixed by the operating system.
- microphone 172 is used to record voice chat from participants.
- gaming overlay application 150 is depicted as receiving audio stream 142 from game application 140, in implementations, audio stream 142 is received from an audio mixer output provided by an operating system of computing device 110.
- video stream 144 corresponds to in-game visuals which are generated by GPU 122 and exposed for access via a video capture service provided by GPU 122.
- completed frame buffers 123 are buffered in memory 130 for access by a video streaming application.
- gaming overlay application 150 is depicted as accessing video stream 144 from game application 140.
- gaming overlay application 150 corresponds to any program that includes functionality to display an overlay on top of in-game video content. This includes programs provided by the manufacturer of GPU 122, such as Radeon Software Crimson ReLive Edition or GeForce Experience, gaming clients such as Steam with Steam Overlay, voice chat tools such as Discord, or operating system features such as Windows Xbox Game Bar.
- gaming overlay application allows the user to enable options, such as displaying in-game overlay for configuring video capture, video streaming, audio mixing, voice chat, game profile settings, friend lists, and other options.
- gaming overlay application 150 includes functionality for video and audio capture and streaming. In implementations, this functionality is utilized to capture audio stream 142 and video stream 144 from game application 140. In implementations, gaming overlay application 150 is further extended to support automatic ingame subtitles by implementing or accessing text conversion engine 152 and subtitle compositor 154. In implementations, text conversion engine 152 accesses audio stream 142 and generates text corresponding to detected speech or sound effects. .
- text conversion engine 152 includes a speech-to-text engine and a video game sound effect detection engine.
- Example speech-to-text engines include DeepSpeech, Wav2Letter++, OpenSeq2Seq, Vosk, and ESPnet. By using alternative models that are trained with video game sound effects and other non-dialogue audio cues, the speech-to-text engines are also adaptable for use as video game sound effect detection engines.
- audio stream 142 is loaded into buffers of a limited size for processing through text conversion engine 152.
- the buffers are capped at a maximum size or length, such as no longer than 5 seconds, and buffers are split opportunistically according to pauses or breaks detected in audio stream 142.
- dialogue is processed in buffers containing short dialogue phrases and processed for displaying as quickly as possible.
- subtitle compositor 154 determines display positions associated with the subtitles.
- user preferences 158 define a preferred area of the screen for displaying subtitles, such as near the bottom of the screen.
- video stream 144 is scanned for user interface elements of game application 140, such as health indicators or other in-game indicators that are preferably kept unobscured, and these areas are marked as exclusion areas or keep-out zones that should not display subtitles.
- subtitle compositor 154 positions the subtitles in proximity to an in-game object associated with the in-game speaker, as described in conjunction with FIG. 2C below.
- voices detected in audio stream 142 are matched to machine learned classifications stored in voice profile database 156.
- spatial audio cues from audio stream 142 are utilized to triangulate a position of an in-game object associated with the in-game speaker.
- text conversion engine 152 and voice profile database 156 are shown as integral to gaming overlay application 150, in implementations, components of gaming overlay application 150 are implemented by a remote service (e.g., cloud server) that is accessed via network 160. This enables offloading of various tasks, such as text conversion, foreign language translation, and/or machine learning matching tasks to external cloud services.
- a remote service e.g., cloud server
- subtitle overlay 190 is generated accordingly. Display characteristics of the subtitles, such as font color and size, are set according to one or more of user preferences 158, readability considerations, or speaker intent detected from audio stream 142 as discussed further herein.
- subtitle overlay 190 is merged with data from one or more frame buffers 123 that are finalized prior to output to display 180, for example as one or more processing steps in a rendering pipeline within GPU 122, or by a desktop compositor of an operating system running on computing device 110. In this manner, subtitle support is provided via gaming overlay application 150 even when game application 140 does not natively support subtitles.
- Display 280A represents a display of game application 140 when subtitle overlay 190 is not generated or is disabled, or when gaming overlay application 150 is not running. In these cases, no subtitles appear and only in-game elements are shown, including character 284A positioned to the left side of display 280A, character 284B positioned to the right side of display 280A, and user interface element 286 displaying gameplay status including user health and ammo.
- subtitle overlay 290B is overlaid on top of game graphics 282 and includes the subtitles of “(Explosion sound from the right)” and “That doesn’t sound good. Let’s proceed down the left hallway instead.”
- subtitle overlay 290B is positioned near the bottom of display 280B, which is set, in implementations, according to user preferences 158. Further, note that subtitle overlay 290B avoids placement of subtitles over user interface element 286, thereby maintaining visibility of vital in-game information.
- subtitle overlay 290C and 290D are overlaid on top of game graphics 282.
- Subtitle overlay 290C contains the subtitle “That doesn’t sound good. Let’s proceed down the left hallway instead.” Further, subtitle overlay 290C is positioned to be proximate to an in-game object (e.g., character 284A) associated with an in-game speaker and appears in a speech bubble.
- Subtitle overlay 290D contains the closed caption “(Explosion sound)” and is positioned proximate to the right of display 280C. In this example, subtitle overlay 290D points offscreen since the explosion itself was determined to occur at a position to the right of the user that is not visible in game graphics 282.
- the position of audio sources in the game world are estimated according to positional cues in audio stream 142. For example, stereo audio panning position is used to determine whether an audio source is located to the left, right, or center of the user’s current viewpoint in the game world represented by video stream 144.
- the position of audio sources is estimated with greater accuracy, such as in front, behind, above, or below the user’s current viewpoint.
- multichannel or positional 3D audio in audio stream 142 indicates that the current in-game speaker is heard primarily from the left channels of speakers 174.
- the in-game object associated with the in-game speaker is more likely be character 284A, to the left, rather than character 284B, to the right.
- audio stream 142 indicates that the explosion sound is heard primarily from the right channels of speakers 174.
- the explosion itself is determined to be offscreen and further to the right.
- These positional audio cues are factors used to determine the positioning of subtitle overlays 290C and 290D within the display such that they are proximate to their sound source or in-game object associated with the in-game speaker. For example, sounds heard primarily from center or rear surround channels indicate sound sources positioned in the front center or behind the user in a game world rendered by game application 140, whereas sounds heard primarily from height channels indicate sound sources positioned above the user.
- FIG. 3 To illustrate an example process for implementing automatic in-game subtitles in a gaming overlay application, flow diagram 300 of FIG. 3 is described with respect to FIG. 1 and FIG. 2B and FIG. 2C.
- display 280B and 280C reflect examples of display 180 after gaming overlay application 150 generates subtitle overlay 190 for displaying with game graphics 182.
- Flow diagram 300 depicts an approach for implementing automatic in-game subtitles in a gaming overlay application.
- blocks 302, 304, 306, 308, and 310 are performed by one or more processors.
- blocks 302, 304, 306, 308 and 310 are performed by a single processor of a computing device, similar to Fig. 1.
- one or more of the blocks of flow diagram 300 are performed by one or more cloud servers or other computing devices distributed across a wireless or wired network.
- an audio stream 142 and video stream 144 generated as the result of executing game application 140 are accessed.
- a gaming overlay application executing on a processor receives the audio stream and video stream.
- the processor executes gaming overlay application 150 concurrently with game application.
- game application 140 executes on a remote server. For example, when using a cloud-based gaming streaming service, audio stream 142 and video stream 144 are received from a remote server via network 160.
- the audio stream 142 is processed through a text conversion engine 152 to generate at least one subtitle.
- text conversion engine 152 is part of gaming overlay application 150, and in other implementations, text conversion engine 152 is accessed using a cloud-based service via network 160.
- a display position is determined to associate with the at least one subtitle from block 304.
- subtitle compositor 154 uses one or more factors to determine the display position. One factor includes a user defined preference for subtitle location, such as near the bottom of the screen.
- This user preference is retrieved from user preferences 158.
- Another factor includes avoiding exclusion areas detected in video stream 144. For example, as previously described, video stream 144 is scanned for user interface elements generated by game application 140, and the portion of the display that includes these user interface elements are marked as exclusion areas that should not include subtitles.
- Yet another factor includes positioning the subtitle in proximity to the sound source or in-game speaker. For example, computer vision processing is performed to identify in-game characters, multiplayer users, and other objects within the video stream 144 that are potential sound sources associated with subtitles or closed captions. Once characters and objects are identified, the at least one subtitle from block 304 is matched to its most likely sound source and positioned proximate to its sound source within the video stream 144. [0033] Matching to the most likely sound source for the at least one subtitle is based on various considerations. As discussed above, in implementations matching is based on triangulation using spatial audio cues from audio stream 142. Thus, in-game objects (e.g., characters) positioned in the in-game world consistent with the spatial audio cues are more strongly correlated with the sound source.
- in-game objects e.g., characters
- voice profile database 156 includes classifications such as age range, gender, and dialect.
- traits analyzed from audio stream 142 and matched to voice profile database 156 are used to classify the in-game speaker as more or less likely to be a child, an adult, an elderly person, a male, a female, or a speaker with a regional dialect.
- the computer vision processing described above is used to confirm whether a potential sound source, or in-game character, is consistent with the matched classifications.
- audio stream 142 is classified as likely to be “female” in voice profile database 156, and computer vision processing of the video stream 144 identifies a potential in-game character as likely to be a female character, then matching the potential in-game character to the at least one subtitle is more strongly correlated.
- Yet another consideration includes matching audio stream 142 to a specific user.
- game application 140 is a multiplayer game wherein participants use voice chat to communicate with other participants.
- audio stream 142 includes multiple voice chat streams associated with specific users, and thus the user speaking at any given time is readily determined according to the originating voice chat stream.
- audio stream 142 is only available as a single mixed stream, then the other considerations described above are still usable to determine the in-game speaker. Further, since game overlay application 150 includes identifying information such as usernames or handles for each participant, the subtitles also include such identifying information when available.
- a subtitle overlay 190 is generated comprising the at least one subtitle from block 304 located at the associated display position from block 306.
- subtitle compositor 154 generates subtitle overlay 190 along with various visual characteristics of the subtitles.
- these visual characteristics include font attribute (e.g. italic, bold, outline), font color, font size, and speech bubble type.
- Speech bubble type includes, for example, speech bubbles, floating text, or other text presentation methods.
- Visual characteristics are set according to user preferences 158, for example user preferred font size and color. Visual characteristics are set according to readability considerations, for example by ensuring that the subtitles have high contrast according to colors in the associated area of video stream 144.
- Visual characteristics are also set according to the in-game speaker, for example by mapping specific font colors for each in-game character.
- visual characteristics are also set according to speaker intent detected from audio stream 142. For example, audio stream 142 is analyzed for loudness, speech tempo, syllable emphasis, voice pitch, and other elements to determine whether the ingame speaker is calm, and in this case the display characteristics use default values.
- the display characteristics emphasize this by using a bold font, a larger font size, or a speech bubble that is emphasized using spiked lines or other visual indicators.
- the intent of the speaker is better understood in a visual manner.
- a portion of video stream 144 is caused to be displayed with subtitle overlay 190.
- this is performed by modifying a rendering pipeline within GPU 122, or using a desktop compositor of an operating system, among other methods.
- display 180 outputs game graphics 182 with subtitle overlay 190.
- the subtitle overlay 290B is placed according to a user preference for subtitle placement.
- the subtitle overlay 290C and 290D are placed according to proximity to the sound source. In this manner, subtitle support is provided via gaming overlay application 150 even when game application 140 does not natively support subtitles.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Optics & Photonics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Processing Or Creating Images (AREA)
- Studio Devices (AREA)
- User Interface Of Digital Computer (AREA)
- Studio Circuits (AREA)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020247024720A KR20240131376A (ko) | 2021-12-23 | 2022-12-01 | 자동 게임 내 자막들 및 폐쇄 자막들 |
| JP2024535349A JP2025504748A (ja) | 2021-12-23 | 2022-12-01 | ゲーム内自動字幕及びクローズドキャプション |
| EP22912270.0A EP4452432A4 (en) | 2021-12-23 | 2022-12-01 | CLOSED SUBTITLES AND AUTOMATIC IN-GAME SUBTITLES |
| CN202280084788.1A CN118414200A (zh) | 2021-12-23 | 2022-12-01 | 自动的游戏中翻译字幕和隐藏式原文字幕 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/561,477 US11857877B2 (en) | 2021-12-23 | 2021-12-23 | Automatic in-game subtitles and closed captions |
| US17/561,477 | 2021-12-23 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023121850A1 true WO2023121850A1 (en) | 2023-06-29 |
Family
ID=86898719
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2022/051581 Ceased WO2023121850A1 (en) | 2021-12-23 | 2022-12-01 | Automatic in-game subtitles and closed captions |
Country Status (6)
| Country | Link |
|---|---|
| US (2) | US11857877B2 (https=) |
| EP (1) | EP4452432A4 (https=) |
| JP (1) | JP2025504748A (https=) |
| KR (1) | KR20240131376A (https=) |
| CN (1) | CN118414200A (https=) |
| WO (1) | WO2023121850A1 (https=) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11857877B2 (en) * | 2021-12-23 | 2024-01-02 | Ati Technologies Ulc | Automatic in-game subtitles and closed captions |
| US20240022682A1 (en) * | 2022-07-13 | 2024-01-18 | Sony Interactive Entertainment LLC | Systems and methods for communicating audio data |
| GB2622405A (en) * | 2022-09-15 | 2024-03-20 | Sony Interactive Entertainment Inc | Systems and methods for controlling dialogue complexity in video games |
| TWI891080B (zh) * | 2023-10-05 | 2025-07-21 | 宏碁股份有限公司 | 電子裝置與其影像片段萃取方法 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20180087009A (ko) * | 2017-01-24 | 2018-08-01 | 주식회사 소리자바 | 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템, 단말기 및 자막 서버 |
| CN111556372A (zh) * | 2020-04-20 | 2020-08-18 | 北京甲骨今声科技有限公司 | 为视音频节目实时添加字幕的方法和装置 |
| KR20200123988A (ko) * | 2019-04-23 | 2020-11-02 | 주식회사 비포에이 | Vr 영상 콘텐츠의 자막 처리 기기 |
| US20210136459A1 (en) * | 2019-11-04 | 2021-05-06 | Sling Media, L.L.C. | System to correct closed captioning display using context from audio/video |
| KR20210151874A (ko) * | 2019-05-02 | 2021-12-14 | 구글 엘엘씨 | 컴퓨팅 디바이스에서 콘텐츠의 청각적 부분을 자동으로 자막화하기 |
Family Cites Families (27)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10987597B2 (en) * | 2002-12-10 | 2021-04-27 | Sony Interactive Entertainment LLC | System and method for managing audio and video channels for video game players and spectators |
| US8620139B2 (en) * | 2011-04-29 | 2013-12-31 | Microsoft Corporation | Utilizing subtitles in multiple languages to facilitate second-language learning |
| EP2525568B1 (en) * | 2011-05-19 | 2017-11-15 | EchoStar Technologies L.L.C. | Automatic subtitle resizing |
| US8839292B1 (en) * | 2011-12-13 | 2014-09-16 | Google Inc. | Systems and methods for rendering multiple applications on television screens |
| US10304458B1 (en) * | 2014-03-06 | 2019-05-28 | Board of Trustees of the University of Alabama and the University of Alabama in Huntsville | Systems and methods for transcribing videos using speaker identification |
| EP3220374A4 (en) * | 2014-11-12 | 2018-07-18 | Fujitsu Limited | Wearable device, display control method, and display control program |
| KR102202576B1 (ko) * | 2014-12-12 | 2021-01-13 | 삼성전자주식회사 | 음향 출력을 제어하는 디바이스 및 그 방법 |
| US9922095B2 (en) * | 2015-06-02 | 2018-03-20 | Microsoft Technology Licensing, Llc | Automated closed captioning using temporal data |
| US10332506B2 (en) * | 2015-09-02 | 2019-06-25 | Oath Inc. | Computerized system and method for formatted transcription of multimedia content |
| KR20170035502A (ko) * | 2015-09-23 | 2017-03-31 | 삼성전자주식회사 | 디스플레이 장치 및 이의 제어 방법 |
| US10179291B2 (en) * | 2016-12-09 | 2019-01-15 | Microsoft Technology Licensing, Llc | Session speech-to-text conversion |
| US10299008B1 (en) * | 2017-11-21 | 2019-05-21 | International Business Machines Corporation | Smart closed caption positioning system for video content |
| CN108491127B (zh) * | 2018-03-12 | 2020-02-07 | Oppo广东移动通信有限公司 | 输入法界面显示方法、装置、终端及存储介质 |
| CN112154658B (zh) * | 2018-05-29 | 2024-07-23 | 索尼公司 | 图像处理装置、图像处理方法和存储介质 |
| US12451154B2 (en) * | 2018-08-08 | 2025-10-21 | Comcast Cable Communications, Llc | Generating and/or displaying synchronized captions |
| EP3719613B1 (en) * | 2019-04-01 | 2026-05-06 | Nokia Technologies Oy | Rendering captions for media content |
| US11094324B2 (en) * | 2019-05-14 | 2021-08-17 | Motorola Mobility Llc | Accumulative multi-cue activation of domain-specific automatic speech recognition engine |
| US10885893B2 (en) * | 2019-06-06 | 2021-01-05 | Sony Corporation | Textual display of aural information broadcast via frequency modulated signals |
| US20210074298A1 (en) * | 2019-09-11 | 2021-03-11 | Soundhound, Inc. | Video conference captioning |
| US11295497B2 (en) * | 2019-11-25 | 2022-04-05 | International Business Machines Corporation | Dynamic subtitle enhancement |
| US11557121B2 (en) * | 2020-04-26 | 2023-01-17 | Cloudinary Ltd. | System, device, and method for generating and utilizing content-aware metadata |
| US11475895B2 (en) * | 2020-07-06 | 2022-10-18 | Meta Platforms, Inc. | Caption customization and editing |
| US20230055421A1 (en) * | 2020-09-16 | 2023-02-23 | Meta Platforms, Inc. | Caption customization and editing |
| US11418849B2 (en) * | 2020-10-22 | 2022-08-16 | Rovi Guides, Inc. | Systems and methods for inserting emoticons within a media asset |
| US20240064485A1 (en) * | 2020-11-30 | 2024-02-22 | The Regents Of The University Of California | Systems and methods for sound-enhanced meeting platforms |
| US12342102B2 (en) * | 2021-11-19 | 2025-06-24 | Apple Inc. | Systems and methods for managing captions |
| US11857877B2 (en) * | 2021-12-23 | 2024-01-02 | Ati Technologies Ulc | Automatic in-game subtitles and closed captions |
-
2021
- 2021-12-23 US US17/561,477 patent/US11857877B2/en active Active
-
2022
- 2022-12-01 EP EP22912270.0A patent/EP4452432A4/en active Pending
- 2022-12-01 WO PCT/US2022/051581 patent/WO2023121850A1/en not_active Ceased
- 2022-12-01 CN CN202280084788.1A patent/CN118414200A/zh active Pending
- 2022-12-01 KR KR1020247024720A patent/KR20240131376A/ko active Pending
- 2022-12-01 JP JP2024535349A patent/JP2025504748A/ja active Pending
-
2023
- 2023-11-28 US US18/520,717 patent/US12427413B2/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20180087009A (ko) * | 2017-01-24 | 2018-08-01 | 주식회사 소리자바 | 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템, 단말기 및 자막 서버 |
| KR20200123988A (ko) * | 2019-04-23 | 2020-11-02 | 주식회사 비포에이 | Vr 영상 콘텐츠의 자막 처리 기기 |
| KR20210151874A (ko) * | 2019-05-02 | 2021-12-14 | 구글 엘엘씨 | 컴퓨팅 디바이스에서 콘텐츠의 청각적 부분을 자동으로 자막화하기 |
| US20210136459A1 (en) * | 2019-11-04 | 2021-05-06 | Sling Media, L.L.C. | System to correct closed captioning display using context from audio/video |
| CN111556372A (zh) * | 2020-04-20 | 2020-08-18 | 北京甲骨今声科技有限公司 | 为视音频节目实时添加字幕的方法和装置 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4452432A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| US20230201717A1 (en) | 2023-06-29 |
| EP4452432A4 (en) | 2025-12-31 |
| US12427413B2 (en) | 2025-09-30 |
| EP4452432A1 (en) | 2024-10-30 |
| US11857877B2 (en) | 2024-01-02 |
| US20240091640A1 (en) | 2024-03-21 |
| CN118414200A (zh) | 2024-07-30 |
| JP2025504748A (ja) | 2025-02-19 |
| KR20240131376A (ko) | 2024-08-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12427413B2 (en) | Automatic in-game subtitles and closed captions | |
| Peng et al. | Speechbubbles: Enhancing captioning experiences for deaf and hard-of-hearing people in group conversations | |
| CN110473525B (zh) | 获取语音训练样本的方法和装置 | |
| US6925438B2 (en) | Method and apparatus for providing an animated display with translated speech | |
| US11514924B2 (en) | Dynamic creation and insertion of content | |
| KR102136059B1 (ko) | 그래픽 객체를 이용한 자막 생성 시스템 | |
| US12141902B2 (en) | System and methods for resolving audio conflicts in extended reality environments | |
| US11600279B2 (en) | Transcription of communications | |
| JP2023059937A (ja) | データインタラクション方法、装置、電子機器、記憶媒体、および、プログラム | |
| JP2025504748A5 (https=) | ||
| JPWO2018037956A1 (ja) | 情報処理装置及び情報処理方法 | |
| JP2023184519A (ja) | 情報処理システム、情報処理方法およびコンピュータプログラム | |
| CN116582664A (zh) | 一种基于裸眼3d的智能交互虚拟展示系统 | |
| WO2010140254A1 (ja) | 映像音声出力装置及び音声定位方法 | |
| WO2025075827A1 (en) | Sonifying visual content for vision-impaired users | |
| KR102583986B1 (ko) | 목소리에 기반한 감정 분류가 반영된 음성 메시지의 말풍선 표현 방법 및 시스템 | |
| Yamamoto et al. | Audiovisual emotion perception develops differently from audiovisual phoneme perception during childhood | |
| WO2025025564A1 (zh) | 一种虚拟形象控制方法、装置及相关设备 | |
| CN116403583A (zh) | 语音数据处理方法和装置、非易失性存储介质及车辆 | |
| US12562006B2 (en) | System(s) and method(s) for training a sign language captioning model and subsequent use thereof | |
| JP7425243B1 (ja) | 情報処理装置及び情報処理方法 | |
| EP4535353A1 (en) | Method and apparatus for rendering audio data using adaptive fonts | |
| US20220351727A1 (en) | Conversaton method, conversation system, conversation apparatus, and program | |
| KR20190075765A (ko) | 문자음성자동변환을 이용한 웹툰의 음성출력 시스템 | |
| EP4623436A1 (en) | Separation of conversational clusters in automatic speech recognition transcriptions |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22912270 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024535349 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202280084788.1 Country of ref document: CN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202417052425 Country of ref document: IN |
|
| ENP | Entry into the national phase |
Ref document number: 20247024720 Country of ref document: KR Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2022912270 Country of ref document: EP Effective date: 20240723 |