WO2020136435A1 - Autopanoramique - Google Patents

Autopanoramique Download PDF

Info

Publication number
WO2020136435A1
WO2020136435A1 PCT/IB2019/001361 IB2019001361W WO2020136435A1 WO 2020136435 A1 WO2020136435 A1 WO 2020136435A1 IB 2019001361 W IB2019001361 W IB 2019001361W WO 2020136435 A1 WO2020136435 A1 WO 2020136435A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
processor
sound
movement
daw
Prior art date
Application number
PCT/IB2019/001361
Other languages
English (en)
Other versions
WO2020136435A4 (fr
Inventor
Jaroslav BECK
Original Assignee
Beck Jaroslav
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beck Jaroslav filed Critical Beck Jaroslav
Publication of WO2020136435A1 publication Critical patent/WO2020136435A1/fr
Publication of WO2020136435A4 publication Critical patent/WO2020136435A4/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field

Definitions

  • the invention relates to generally to audio mixing, and more particularly to controlling audio.
  • Panning is often used in audio mixing to distribute a sound signal into a new stereo or multi-channel sound field determined by a pan control setting. Audio panning can be used in audio mixing to create the impression that a source is moving from one side of a soundstage to the other.
  • a system embodiment may include: a processor having addressable memory, the processor configured to: receive a selection of a first object of one or more objects in a video; receive a selection of a first sound of one or more sounds, where the selected first sound may be connected to the selected first object; track a movement of the selected first object in the video; and generate a position data for the tracked movement of the selected first object.
  • the processor may be further configured to: track an x-axis location of the selected first object based on an x-axis position of the selected first object in each frame of the video; track a y-axis location of the selected first object based on a y-axis position of the selected first object in each frame of the video; and track a z-axis location of the selected first object based on a change in a size of the selected first object between frames of the video.
  • the processor may be further configured to: generate the x-axis location, the y-axis location, and the z-axis location of the selected first object for each frame of the video.
  • the processor may be further configured to: generate a three-dimensional representation of the generated x-axis location, y-axis location, and z-axis location of the selected first object for each frame of the video. In additional system embodiments, the processor may be further configured to: modify the tracked z-axis location of the selected first object. In additional system embodiments, the processor may be further configured to: recognize the selected first object in a frame of the video.
  • Additional system embodiments may include: a digital audio workstation (DAW) comprising a DAW processor and DAW addressable memory; where the DAW processor may be further configured to import the generated position data for the tracked movement of the selected first object into the DAW.
  • the DAW processor may be configured to: generate a spatial audio for the first sound based on the imported position data.
  • the processor may be further configured to: select a second object of the one or more objects in the video; connect the selected second object to a second sound of one or more sounds; track a movement of the selected second object in the video; and generate a position data for the tracked movement of the selected second object.
  • a method embodiment may include: selecting a first object of one or more objects in a video; selecting of a first sound of one or more sounds; connecting the selected first object to the selected first sound; tracking a movement of the selected first object in the video; and generating a position data for the tracked movement of the selected first object.
  • tracking the movement of the selected first object in the video may further include: tracking an x-axis location of the selected first object based on an x-axis position of the selected first object in each frame of the video; tracking a y-axis location of the selected first object based on a y-axis position of the selected first object in each frame of the video; and tracking a z-axis location of the selected first object based on a change in a size of the selected first object between frames of the video.
  • generating the position data for the tracked movement of the selected first object may further include: generating the x-axis location, the y-axis location, and the z-axis location of the selected first object for each frame of the video.
  • Additional method embodiments may include: generating a three-dimensional representation of the generated x-axis location, y-axis location, and z-axis location of the selected first object for each frame of the video. Additional method embodiments may include: modifying the tracked z-axis location of the selected first object. In additional method embodiments, selecting the first object of one or more objects in the video may further include: recognizing the first object in a frame of the video. In additional method embodiments, selecting the first object of one or more objects in the video may further include: painting one or more borders around the first object in a frame of the video.
  • Additional method embodiments may include: importing the generated position data for the tracked movement of the selected first object into a digital audio workstation (DAW). Additional method embodiments may include: generating, via the DAW, a spatial audio for the first sound based on the imported position data. Additional method embodiments may include: selecting a second object of one or more objects in the video; connecting the selected second object to a second sound of one or more sounds;
  • DAW digital audio workstation
  • Another system embodiment may include: one or more videos, where each video comprises one or more objects; one or more sounds associated with the one or more objects; an object tracking component comprising a processor having addressable memory, the processor configured to: receive a selection of at least one object of the one or more objects; receive a selection of at least one sound of the one or more sounds; connect the selected at least one object to the selected at least one sound; track a movement of selected at least one object in at least one video of the one or more videos; and generate a position data for the tracked movement of the selected at least one object in the at least one video; a digital audio workstation (DAW) component comprising a DAW processor having addressable memory, the DAW processor configured to: import the generated position data for the tracked movement of the selected at least one object into the DAW component; and generate a spatial audio for the at least one sound connected to the at least one object based on the imported position data.
  • DAW digital audio workstation
  • FIG. 1 depicts a high-level block diagram of an auto panner system, according to one embodiment
  • FIG. 2 depicts a high-level flowchart of a method embodiment of generating spatial audio based on one or more tracked objects, according to one embodiment
  • FIGS. 3A-3I depict an auto panner system for generating spatial audio based on one or more tracked objects, according to one embodiment
  • FIG. 4 illustrates an example top-level functional block diagram of a computing device embodiment
  • FIG. 5 shows a high-level block diagram and process of a computing system for implementing an embodiment of the system and process
  • FIG. 6 shows a block diagram and process of an exemplary system in which an embodiment may be implemented
  • FIG. 7 depicts a cloud computing environment for implementing an embodiment of the system and process disclosed herein.
  • FIG. 8 depicts an alternate embodiment of an auto panner system for generating spatial audio based on one or more tracked objects, according to one embodiment.
  • the present system allows for tracking the movement of one or more objects in two-dimensional, three-dimensional, and/or 360 degree video.
  • Each tracked object may be associated with one or more sounds or audio tracks.
  • Coordinates for an x-axis, y-axis, and z- axis of each tracked object may be recorded based on the movement of the object in the video.
  • the coordinates may be used to provide spatial audio such that the movement of each object corresponds to sound to be played in one or more speakers.
  • the disclosed system and method may use computer vision, a tracking mechanism, one or more video files, and one or more processors having addressable memory.
  • the disclosed system and method may be used for spacing audio with the use of tracking objects found on a two-dimensional, three-dimensional, or 360 video sources.
  • the tracking mechanism may be activated and the movement of the object on the screen may be moved with the sound in the space.
  • the disclosed system and method may be used in audio postproduction for mixing of audio for surround sound in 3D space, e.g., a 5.1, 7.1 and/or a system with more speakers involved for example Dolby Atmos, Auro 3D, and similar with infinite number of used speakers.
  • the disclosed system and method utilizes a tracking of selected objects directly from the video source and transmits the motion from picture into 3D space.
  • the disclosed system allows selecting and tracking for panning for as many objects, e.g., cars, planes, people, and the like, that are on the screen as possible at the same time.
  • the system provides extreme precision as the position of sound in the space is coming directly from the picture.
  • FIG. 1 depicts a high-level block diagram of an auto panner system 100, according to one embodiment.
  • the system includes an object-tracking component 102.
  • the object-tracking component 102 may include a processor 104, addressable memory 107, and a display 108.
  • the object tracking component 102 may also include a digital signal processor (DSP) 103, neural networks 105, and/or acceleration 107 via an acceleration card, an additional central processing unit (CPU), or the like.
  • DSP digital signal processor
  • the object-tracking component 102 is in communication with the media component 110.
  • the object-tracking component 102 may be a part of a computing device, a separate computing device, or the like.
  • the media component 110 may include a first video 112, a second video 114, one or more additional videos 116, a first sound 118, a second sound 120, and one or more additional sounds 122.
  • the media component 110 may include one or more videos 112, 114, 116 and associated sounds 118, 120, 122.
  • the videos 112, 114, 116 may include portions of a video, movie, or other visual content, such as single cuts from a movie being edited.
  • the sounds 118, 120, 122 may include audio tracks, sound effects, or other audio content, such as an audio track from one actor in a movie.
  • each sound 118, 120, 122 may be associated with a corresponding video 112, 114, 116, such as audio from a single take of a scene. In other embodiments, each sound 118, 120, 122 may be associated with one or more videos 112, 114, 116, such as a sound effect that may be applied to multiple takes of a scene in postproduction.
  • the media component 110 may be a part of the object tracking component 102, the digital audio workstation (DAW) component 124, a separate device, a file management system, a database, or the like.
  • DAW digital audio workstation
  • a digital audio workstation (DAW) component 124 may include a DAW processor 126, a DAW memory 128, and a DAW display 130.
  • the DAW and object-tracking component may share a processor, memory, and/or display.
  • the object tracking component 102 may be a plug-in for the DAW component 124. In other embodiments, the object-tracking component 102 may be separate from the DAW component 124.
  • the DAW component 124 may be used to edit sound for media, such as a movie.
  • the object-tracking component 102 may be in communication with the DAW component 124.
  • the object-tracking component 102 may be in communication with the media component 110.
  • the DAW component 124 may also be in communication with the media component 110.
  • the object-tracking component 102 may perform processing on one or more videos 112, 114, 116 and/or sounds 118, 120, 122 of the media component 110.
  • the object tracking component 102 may then provide data on the position of one or more objects in each video 112, 114, 116 with one or more sounds 118, 120, 122 corresponding to each of the one or more objects to the DAW component 124.
  • FIG. 2 depicts a high-level flowchart of a method embodiment 200 of generating spatial audio based on one or more tracked objects, according to one embodiment.
  • the method 200 may include selecting a first object of one or more objects in a video (step 202).
  • the method may include receiving a selection of the first object.
  • the selection may be by a user, a neural network, machine learning, or the like.
  • multiple objects may be selected.
  • the selected first object may be a person, an animal, any object associated with making a noise, or the like.
  • the selected first object may be a person who is delivering dialogue, a rocket being fired, or the like.
  • the method 200 may then include connecting the selected first object to a first sound of one or more sounds (step 204).
  • the first sound may be selected by a user, a neural network, machine learning, or the like.
  • two or more sounds may be connected to an object, such as a dialogue track and a sound effects track, such that any sounds associated with the object are connected to the object.
  • the first sound may be a sound associated with the first object. For example, if the selected first object is a person, then the selected first sound may be an edited dialogue audio track for the person.
  • the method 200 may then include tracking a movement of the selected first object in the video (step 206).
  • the video may be a two-dimensional video, a three- dimensional video, and/or a 360 degree video.
  • the video may be a fixed image having movement, such as a picture with a zoom and pan effect applied.
  • the movement may be tracked in an x-axis and a y-axis based on the movement of the object in the x-axis and y-axis of the frame.
  • the movement may also be tracked in the z-axis based on a change in size of the tracked object in the video frame. For example, as a person approaches the camera, the size of that person in the video frame is increased, and the disclosed system and method can determine that the person is closer, and track and record this change in the z- axis location.
  • the method 200 may include accounting for movement of one or more objects and/or movement of the camera recording the one or more objects.
  • a spatial curve may be selected for a selected audio source and connected to a tracked object.
  • the method 200 may allow selection of a precise curve, i.e., an exact x, y, and z position in time, or an approximate, i.e., smooth, position.
  • the method 200 may use a smooth curve to generate an average trajectory of the object from all position data.
  • the method 200 may allow for the elimination, or minimization, of any fast movements peak.
  • Position data for the tracked movement of the selected first object may be generated (step 2018).
  • the position data may include movement of the object in the z, y, and z axis for each frame of a video.
  • the generated position data may then be imported into a digital audio workstation (DAW) (step 210).
  • the DAW may generate spatial data for the first sound based on the imported position data (step 212).
  • the DAW may generate spatial data for two or more sounds associated with the object.
  • the DAW may record the trajectory of an object into position data.
  • the recorded trajectory may be viewed and/or edited in realtime or near realtime in the DAW.
  • the trajectory may also be edited within the DAW once it is recorded, such as via a manual edit using a pen tool.
  • a desired audio effect may not match the position data.
  • a selected object could move in the z-axis such that an associated dialogue audio track may make it difficult to hear the dialogue audio track or make it so that the dialogue audio track would be too loud. Editing the z-axis only so as to keep the audio from being too quiet or too loud would allow for the x-axis and y-axis tracking to remain in effect.
  • the disclosed method may provide data position to the DAW for x,y, and z coordinates for each time frame. This data position may be“recorded” into that
  • automatization and this automatization curve provides information to DAW, which may then generate directions as to which speaker should be playing, at what volume, and what audio source.
  • the generated directions may depend on the number of speakers, placement of speakers, speaker capabilities, or the like.
  • the speaker arrangement, placement, and/or type may be based on a surround sound standard.
  • the method may include adding a Doppler effect, which is a change in frequency (e.g., sound) emitted by an object caused by the motion of the object.
  • a Doppler effect is a change in frequency (e.g., sound) emitted by an object caused by the motion of the object.
  • the system may pan the siren audio from far left to far right.
  • an ambulance with an associated siren noise having a noise level changes in depth relative to the observer (e.g., directly approaches or directly recedes from the camera)
  • the system may increase the sound level of the siren audio as the ambulance approaches the observer or decrease the sound level of the siren audio as the ambulance recedes from the observer.
  • the system may also increase the sound level prior to passing a position in the screen and decrease after passing to account for this sound effect.
  • FIGS. 3A-3I depict an auto-panner system 300 for generating spatial audio based on one or more tracked objects, according to one embodiment.
  • the system 300 may include a user interface 302 for selecting options, generating data, exporting data, and the like.
  • the user interface 302 may include any number of windows, screens, or the like.
  • the user interface 302 may include a first window 304 and a second window 306.
  • the first window 304 may include a first video 308 having a first object 310.
  • a user may select 312 the first object 310, such as by drawing a border around the first object 312.
  • the user may select 312 the first object 310 by clicking on the object.
  • a neural network, machine learning, and/or object recognition may be used to select and/or identify one or more objects in the video.
  • the second window 306 may include a space visualization 314 of the first video 308.
  • the space visualization 314 may be a three-dimensional space visualization.
  • the selected first object 310 may be seen in the space visualization 314.
  • the space visualization may be rotated to see the movement of one or more selected objects as they are tracked in the first video 308.
  • the space visualization 314 may be used to confirm that the object 310 is being tracked correctly, identify the movement of the object 310 in three-dimensional space relative to one or more other tracked objects, or the like.
  • the user interface 302 may include a tracking button 316 and a three- dimensional head-tracking button 318. These buttons 316, 318 may be selected if tracking from a two-dimensional video, a three-dimensional video, or a 360 video. Other buttons or controls may be added to select different features or options in the disclosed system 300.
  • a list of objects 320 may be provided with an object column 322, a sound column 324, and a settings column 326.
  • the tracked movements of the objects may be adjusted between a smooth and raw curve or a path.
  • the settings column 326 there may be an option to select a type of curve. If a precise curve, i.e., not as smooth, is required, then a raw curve may be selected.
  • a smooth curve may calculate an average from data received from the tracker. Additional settings may be available in the settings column 326 depending on the video, object selected, or the like. For example, the settings column 326 may be used to select whether a selected object is a person, a talking person, a non-talking person, a moving object, a stationary object, or the like so as to apply different effects based on the object type.
  • the user may select a new target button 328.
  • the user may then name the target in the object column 322 and select 312 the first object 310 in the first video 308.
  • the user may then select a first sound of one or more sounds in the sound column 324.
  • the first object 310 may be a person and the first sound may be a dialogue track of this person talking.
  • Additional settings or options 330 may be available for selection by the user.
  • the user may manually adjust the tracking of an object in the system 300.
  • the user may manually adjust the tracking in the DAW and/or in the system or plugin disclosed herein.
  • the tracking may be adjusted via a pencil tool, joystick controller, or the like.
  • the system may include a pencil tool, eraser, or the like to adjust the trajectory of tracking objects.
  • the adjusted tracking may be displayed in the space visualization 314.
  • the system may also include options for selecting views from which angle the 3D space is viewed in the second window 306.
  • the space visualization 314 may display a first object visualization 332.
  • the user may select a current or starting Z-axis 333 for the selected first object 310.
  • This starting Z-axis 333 may be manually adjusted based on a desired position of the object, movement of the object in the screen, and the like.
  • the Z axis impacts where in the space is the object 310 is located.
  • the location of the object 310 may be on or close to the screen, on or close to the back wall of the cinema, or somewhere in between.
  • the starting z- axis 333 may be set at the beginning of the panning.
  • the system may determine the starting Z-axis 333 and/or provide an option of where the system thinks the object is located: if close to screen or somewhere in the back.
  • This starting Z-axis 333 can be adjusted for each object.
  • the z-axis may then change based on relative size changes of the object 310 in the frame, which can be used to determine a precise position in the Z-axis, as shown in FIG. 3E.
  • the starting Z-axis 333 may be dynamically adjusted as additional objects are selected and targets are added.
  • the user may adjust the starting Z-axis 333 as additional objects are selected and targets are added.
  • the first object 310 performs a first movement 304 in the first video.
  • the first object 310 is shown moving to the left of the screen.
  • the space visualization 314 depicts the first object visualization 332 moving to the left to track the movement of the first object 310.
  • the path 334 of the first object may also be depicted in the space visualization 314.
  • the resulting tracked movement may be used by the system 300 to pan audio to the left.
  • the audio would be panned toward the left-sided speakers to match the movement of the first object 310 in the video 308 so as to create a more immersive viewing experience.
  • the amount of audio panning from each speaker may be set by the system, by a user in the DAW, or the like.
  • the first object 310 does a second movement 336 toward the right side of the screen in the first video 308.
  • the space visualization 314 depicts the first object visualization 332 moving to the right to track the movement of the first object 310.
  • the updated path 334 of the first object may also be depicted in the space visualization 314.
  • the resulting tracked movement may be used by the system 300 to pan audio to the right. For example, in a surround sound speaker system, the audio would then be panned to the right-sided speakers to match the movement of the first object 310 in the video 308 so as to create a more immersive viewing experience.
  • the first object 310 does a third movement 338 toward the middle of the video 308.
  • the first object 310 also moves closer to the camera as the first object 310 becomes larger in the first video 308. If the first object 310 became smaller in the first video 308, then this may indicate that the first object 310 is moving farther from the camera.
  • the distance between the camera and the first object 310 may be caused by the first object 310 moving closer to or away from the camera and/or the camera moving closer to or away from the first object 310.
  • the movement shown in FIG. 3E could be caused by the object 310 moving toward the camera, by the camera moving toward the object 310, or a combination of movements. From the perspective of a viewer, the object 310 and camera are closer in FIG.
  • This change in distance between the first object 310 and the camera, as shown by the change in the relative size of the first object 310, may be used to determine a z-axis tracking of the first object 310.
  • the space visualization 314 depicts the first object visualization 332 moving to the middle and closer to track the movement of the first object 310.
  • the updated path 334 of the first object may also be depicted in the space visualization 314.
  • the resulting tracked movement may be used by the system 300 to pan audio to the center and at an increased volume. For example, in a surround sound speaker system, the audio would be panned to the middle and increased to match the movement of the first object 310 in the video 308 so as to create a more immersive viewing experience.
  • FIG. 3F depicts a second object 340 appearing in the first video 308.
  • the user may select the new target button 328.
  • the user may then name the target in the object column 322 and select the second object 340 in the first video 308.
  • the user may then select a second sound of one or more sounds in the sound column 324.
  • the second object 340 may be a second person and the second sound may be a dialogue track of this second person talking.
  • the space visualization 314 may depict the first object visualization 332 and the second object visualization 342.
  • the system may determine a starting z-axis position for the second object 340 based on a relative size of the first object.
  • both objects are the same or similar in size, then a larger object in the video frame would be further out in the z-axis while a smaller object in the video frame would be further in in the z-axis.
  • their respective z-axis may be adjusted accordingly by a user, machine learning, object recognition, or the like. While one sound is depicted for each object, two or more sounds may be applied to each object in some embodiments. While each objected is depicted as selected, the system may use neural networks, machine learning, object recognition, or the like to select and/or offer selection of one or more objects in the video.
  • FIG. 3G depicts the second object 340 moving 344 to a position on the left side of the first video 308.
  • the space visualization 314 may depict the first object
  • the path of movement may be tracked and displayed in the space visualization 314.
  • the path of movement is depicted as wavy to show that the second object 340 has not moved 344 in a straight line, but instead in a meandering pattern. If the second object 340 moved in a straight line relative to the camera, then the path of movement would also be depicted as a straight line.
  • FIG. 3H depicts a third object 346 appearing in the first video 308.
  • the user may select the new target button 328.
  • the user may then name the target in the object column 322 and select the second object 340 in the first video 308.
  • the user may then select a third sound of one or more sounds in the sound column 324.
  • the third object 340 may be a rocket of a rocket launcher and the third sound may be a sound effect of a rocket launching.
  • the space visualization 314 may also depict the third object visualization 348.
  • FIG. 31 depicts a movement 350 of the third object in the first video 308.
  • the space visualization 314 depicts the third object visualization 348 moving to the middle and inward in the z-axis to track the movement of the first object 310.
  • the path of the third object may also be depicted in the space visualization 314.
  • the resulting tracked movement may be used by the system 300 to pan audio to the center and increase in volume.
  • the audio would be panned to the middle and increased in volume to match the movement of the first object 310 in the video 308 so as to create a more immersive viewing experience.
  • the user may manually adjust the border of the Z-axis for each object, for a selection of objects, or for all objects.
  • the sound may remain after it has exited the video frame.
  • a rocket fired toward an observer will still make sound after it has passed the observer, but will be quieter due to the Doppler effect.
  • the third object 348 may continue to play sound, albeit at a lower level, once it has exited the video frame.
  • the z-axis may be extended to account for objects that have exited the video frame, but are still desired to produce sound from a sound track.
  • FIG. 4 illustrates an example of a top-level functional block diagram of a computing device embodiment 400.
  • the example operating environment is shown as a computing device 420 comprising a processor 424, such as a central processing unit (CPU), addressable memory 427, an external device interface 426, e.g., an optional universal serial bus port and related processing, and/or an Ethernet port and related processing, and an optional user interface 429, e.g., an array of status lights and one or more toggle switches, and/or a display, and/or a keyboard and/or a pointer-mouse system and/or a touch screen.
  • the addressable memory may, for example, be: flash memory, eprom, and/or a disk drive or other hard drive.
  • these elements may be in communication with one another via a data bus 428.
  • the processor 424 may be configured to execute steps of a process establishing a communication channel and processing according to the embodiments described above.
  • System embodiments include computing devices such as a server computing device, a buyer computing device, and a seller computing device, each comprising a processor and addressable memory and in electronic communication with each other.
  • the embodiments provide a server computing device that may be configured to: register one or more buyer computing devices and associate each buyer computing device with a buyer profile; register one or more seller computing devices and associate each seller computing device with a seller profile; determine search results of one or more registered buyer computing devices matching one or more buyer criteria via a seller search component.
  • the service computing device may then transmit a message from the registered seller computing device to a registered buyer computing device from the determined search results and provide access to the registered buyer computing device of a property from the one or more properties of the registered seller via a remote access component based on the transmitted message and the associated buyer computing device; and track movement of the registered buyer computing device in the accessed property via a viewer tracking component.
  • the system may facilitate the tracking of buyers by the system and sellers once they are on the property and aid in the seller’s search for finding buyers for their property.
  • the figures described below provide more details about the implementation of the devices and how they may interact with each other using the disclosed technology.
  • FIG. 5 is a high-level block diagram 500 showing a computing system comprising a computer system useful for implementing an embodiment of the system and process, disclosed herein.
  • the computer system includes one or more processors 502, and can further include an electronic display device 504 (e.g., for displaying graphics, text, and other data), a main memory 506 (e.g., random access memory (RAM)), storage device 508, a removable storage device 510 (e.g., removable storage drive, a removable memory module, a magnetic tape drive, an optical disk drive, a computer readable medium having stored therein computer software and/or data), user interface device 511 (e.g., keyboard, touch screen, keypad, pointing device), and a communication interface 512 (e.g., modem, a network interface (such as an Ethernet card), a communications port, or a PCMCIA slot and card).
  • an electronic display device 504 e.g., for displaying graphics, text, and other data
  • main memory 506 e.g.,
  • the communication interface 512 allows software and data to be transferred between the computer system and external devices.
  • the system further includes a communications infrastructure 514 (e.g., a communications bus, cross-over bar, or network) to which the aforementioned devices/modules are connected as shown.
  • a communications infrastructure 514 e.g., a communications bus, cross-over bar, or network
  • Information transferred via communications interface 514 may be in the form of signals such as electronic, electromagnetic, optical, or other signals capable of being received by communications interface 514, via a communication link 516 that carries signals and may be implemented using wire or cable, fiber optics, a phone line, a cellular/mobile phone link, an radio frequency (RF) link, and/or other communication channels.
  • Computer program instructions representing the block diagram and/or flowcharts herein may be loaded onto a computer, programmable data processing apparatus, or processing devices to cause a series of operations performed thereon to produce a computer implemented process.
  • Embodiments have been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments.
  • Each block of such illustrations/diagrams, or combinations thereof, can be implemented by computer program instructions.
  • the computer program instructions when provided to a processor produce a machine, such that the instructions, which execute via the processor, create means for implementing the functions/operations specified in the flowchart and/or block diagram.
  • Each block in the flowchart/block diagrams may represent a hardware and/or software module or logic, implementing embodiments. In alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures, concurrently, etc.
  • Computer programs are stored in main memory and/or secondary memory. Computer programs may also be received via a communications interface 512. Such computer programs, when executed, enable the computer system to perform the features of the embodiments as discussed herein. In particular, the computer programs, when executed, enable the processor and/or multi-core processor to perform the features of the computer system. Such computer programs represent controllers of the computer system.
  • FIG. 6 shows a block diagram of an example system 600 in which an embodiment may be implemented.
  • the system 600 includes one or more client devices 601 such as consumer electronics devices, connected to one or more server computing systems 630.
  • a server 630 includes a bus 602 or other communication mechanism for
  • the server 630 also includes a main memory 606, such as a random access memory (RAM) or other dynamic storage device, coupled to the bus 602 for storing information and instructions to be executed by the processor 604.
  • the main memory 606 also may be used for storing temporary variables or other intermediate information during execution or instructions to be executed by the processor 604.
  • the server computer system 630 further includes a read only memory (ROM) 608 or other static storage device coupled to the bus 602 for storing static information and instructions for the processor 604.
  • ROM read only memory
  • a storage device 610 such as a magnetic disk or optical disk, is provided and coupled to the bus 602 for storing information and instructions.
  • the bus 602 may contain, for example, thirty-two address lines for addressing video memory or main memory 606.
  • the bus 602 can also include, for example, a 32-bit data bus for transferring data between and among the components, such as the CPU 604, the main memory 606, video memory and the storage 610.
  • multiplex data/address lines may be used instead of separate data and address lines.
  • the server 630 may be coupled via the bus 602 to a display 612 for displaying information to a computer user.
  • An input device 614 is coupled to the bus 602 for communicating information and command selections to the processor 604.
  • cursor control 616 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to the processor 604 and for controlling cursor movement on the display 612.
  • the functions are performed by the processor 604 executing one or more sequences of one or more instructions contained in the main memory 606. Such instructions may be read into the main memory 606 from another computer-readable medium, such as the storage device 610. Execution of the sequences of instructions contained in the main memory 606 causes the processor 604 to perform the process steps described herein.
  • processors in a multi-processing arrangement may also be employed to execute the sequences of instructions contained in the main memory 606.
  • hard-wired circuitry may be used in place of or in combination with software instructions to implement the embodiments. Thus, embodiments are not limited to any specific combination of hardware circuitry and software.
  • “computer readable medium”, and “computer program product,” are used to generally refer to media such as main memory, secondary memory, removable storage drive, a hard disk installed in hard disk drive, and signals. These computer program products are means for providing software to the computer system.
  • the computer readable medium allows the computer system to read data, instructions, messages or message packets, and other computer readable information from the computer readable medium.
  • the computer readable medium may include non-volatile memory, such as a floppy disk, ROM, flash memory, disk drive memory, a CD-ROM, and other permanent storage. It is useful, for example, for transporting information, such as data and computer instructions, between computer systems.
  • the computer readable medium may comprise computer readable information in a transitory state medium such as a network link and/or a network interface, including a wired network or a wireless network that allow a computer to read such computer readable information.
  • Computer programs also called computer control logic
  • main memory and/or secondary memory Computer programs may also be received via a communications interface.
  • Such computer programs when executed, enable the computer system to perform the features of the embodiments as discussed herein.
  • the computer programs when executed, enable the processor multi-core processor to perform the features of the computer system. Accordingly, such computer programs represent controllers of the computer system.
  • Non-volatile media includes, for example, optical or magnetic disks, such as the storage device 610.
  • Volatile media includes dynamic memory, such as the main memory 606.
  • Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise the bus 602. Transmission media can also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.
  • Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.
  • Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to the processor 604 for execution.
  • the instructions may initially be carried on a magnetic disk of a remote computer.
  • the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
  • a modem local to the server 630 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal.
  • An infrared detector coupled to the bus 602 can receive the data carried in the infrared signal and place the data on the bus 602.
  • the bus 602 carries the data to the main memory 606, from which the processor 604 retrieves and executes the instructions.
  • the instructions received from the main memory 606 may optionally be stored on the storage device 610 either before or after execution by the processor 604.
  • the server 630 also includes a communication interface 618 coupled to the bus 602.
  • the communication interface 618 provides a two-way data communication coupling to a network link 620 that is connected to the world wide packet data
  • the Internet 628 uses electrical, electromagnetic or optical signals that carry digital data streams.
  • the signals through the various networks and the signals on the network link 620 and through the communication interface 618, which carry the digital data to and from the server 630, are exemplary forms or carrier waves transporting the information.
  • interface 618 is connected to a network 622 via a communication link 620.
  • the communication interface 618 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line, which can comprise part of the network link 620.
  • the communication interface 618 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented.
  • ISDN integrated services digital network
  • LAN local area network
  • communication interface 618 sends and receives electrical electromagnetic or optical signals that carry digital data streams representing various types of information.
  • the network link 620 typically provides data communication through one or more networks to other data devices.
  • the network link 620 may provide a connection through the local network 622 to a host computer 624 or to data equipment operated by an Internet Service Provider (ISP).
  • ISP Internet Service Provider
  • the ISP in turn provides data communication services through the Internet 628.
  • the local network 622 and the Internet 628 both use electrical, electromagnetic or optical signals that carry digital data streams.
  • the signals through the various networks and the signals on the network link 620 and through the communication interface 618, which carry the digital data to and from the server 630, are exemplary forms or carrier waves transporting the information.
  • the server 630 can send/receive messages and data, including e-mail, program code, through the network, the network link 620 and the communication interface 618.
  • the communication interface 618 can comprise a USB/Tuner and the network link 620 may be an antenna or cable for connecting the server 630 to a cable provider, satellite provider or other terrestrial transmission system for receiving messages, data and program code from another source.
  • the example versions of the embodiments described herein may be implemented as logical operations in a distributed processing system such as the system 600 including the servers 630.
  • the logical operations of the embodiments may be implemented as a sequence of steps executing in the server 630, and as interconnected machine modules within the system 600.
  • the implementation is a matter of choice and can depend on performance of the system 600 implementing the embodiments.
  • the logical operations constituting said example versions of the embodiments are referred to for e.g., as operations, steps or modules.
  • a client device 601 can include a processor, memory, storage device, display, input device and communication interface (e.g., e-mail interface) for connecting the client device to the Internet 628, the ISP, or LAN 622, for communication with the servers 630.
  • a processor e.g., a processor, memory, storage device, display, input device and communication interface (e.g., e-mail interface) for connecting the client device to the Internet 628, the ISP, or LAN 622, for communication with the servers 630.
  • communication interface e.g., e-mail interface
  • the system 600 can further include computers (e.g., personal computers, computing nodes) 605 operating in the same manner as client devices 601, where a user can utilize one or more computers 605 to manage data in the server 630.
  • computers e.g., personal computers, computing nodes
  • cloud computing environment 50 comprises one or more cloud computing nodes 10 with which local computing devices used by cloud consumers, such as, for example, personal digital assistant (PDA), smartphone, smart watch, set-top box, video game system, tablet, mobile computing device, or cellular telephone 54A, desktop computer 54B, laptop computer 54C, and/or automobile computer system 54N may communicate.
  • Nodes 10 may communicate with one another. They may be grouped (not shown) physically or virtually, in one or more networks, such as Private, Community, Public, or Hybrid clouds as described hereinabove, or a combination thereof. This allows cloud computing
  • computing environment 50 to offer infrastructure, platforms and/or software as services for which a cloud consumer does not need to maintain resources on a local computing device. It is understood that the types of computing devices 54A-N shown in FIG. 7 are intended to be illustrative only and that computing nodes 10 and cloud computing environment 50 can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser).
  • FIG. 8 depicts an alternate embodiment of an auto panner system 800 for generating spatial audio based on one or more tracked objects, according to one embodiment.
  • the system 800 may include a user interface 802 for selecting options, generating data, exporting data, and the like.
  • the user interface 802 may include any number of windows, screens, or the like.
  • the user interface 802 may include a previous video scene 802, an upcoming video scene 806, and a current video scene 808.
  • Each video scene 804,806,808 may be a video take or cut as a part of a film, video, or scene. In some embodiments, only a current video scene 808 may be used in the system 800. In other embodiments, two or more video scenes 804,806,808 may be used in the system 800. If a previous video scene 802 has completed automatization 820, then an automatization completed indicator 822 may be present in the previous video scene 802 as an indication to a user of the system 800.
  • the user may select an object to be tracked via a target 828.
  • a neural network, machine learning, and/or object recognition may be used to select and/or identify the target 828 in the current video scene 808.
  • the user may adjust a size of the target 828 to match the size of the object to be tracked.
  • the target 828 may be expanded to a size of a user’s head is a user’s head is to be tracked.
  • the target 828 may be contracted to a size of a user’s eye if one of the user’s eyes is to be tracked.
  • the target 828 may follow the object to be tracked throughout the current video scene 808 to create an automatization curve having an automatization curve start 824 and an automatization curve end 826.
  • the start of tracking may be adjusted via an audio slider 834.
  • the user may want to start tracking an object after a start of the video.
  • the user can determine when to start object tracking via moving the audio slider 834 relative to an audio track 832 corresponding to the current video scene 808.
  • the audio track 832 may be shown to allow a user of the system 800 to select the desired start point for the automatization.
  • a timeclock 830 may display the time of the current video scene 808.
  • the timeclock may show the time in hours, minutes, seconds, and frames.
  • the current video scene 808 may have a set number of frames per second (FPS), such as twenty-four FPS.
  • a video file name 816 of the current video scene 808 may also be displayed in the user interface 802.
  • the user may select the current video scene 808 by dragging and dropping a video file into the user interface 802 of the system 800.
  • the user interface 802 may include one or more tools that may be activated, such as a tracking tool 810, a pencil tool 812, and a hand tool 814.
  • the tracking tool 810 may be used to draw a target 828 and begin automatization.
  • the pencil 812 tool and/or hand 814 tool may be used to adjust the automatization curve.
  • the user interface 802 may also include a three-dimensional visualization 818 of the current video scene 808.
  • the three-dimensional visualization 818 may be rotated to see the movement of one or more targets 828 as they are tracked in the current video scene 808.
  • the three-dimensional visualization 818 may be used to confirm that the target 818 is being tracked correctly.
  • the three-dimensional visualization 818 may be used to identify the movement of the target 818 in three-dimensional space relative to one or more other tracked objects.
  • the three-dimensional visualization 818 may be used to adjust and/or confirm a tracking start point 843, a tracking end point 844, and one or more points 846 tool along a tracked curve 850.
  • the three-dimensional visualization 818 may include the current video scene in a three-dimensional mode 840.
  • the z-axis 842 may be adjusted using a z-axis slider 836. Adjusting the z-axis slider 836 via an adjustment 848 moves the position of the z-axis 842 in the three-dimensional visualization 818 along a z-axis visualization 838. By way of example, moving the adjustment 848 downward may move the z-axis 842 away from the video scene 840 and moving the adjustment 848 upward may move the z-axis 842 toward the video scene.
  • This change in distance between the target 828 and the camera, as shown by the change in the relative size of the target 828, may be used to determine a z-axis tracking of the target 828.
  • the user may manually adjust the z-axis 842 starting point so as to determine a desired audio panning. For example, a user may desire to keep audio panned to the front speakers at the start of a scene even if the target 828 would otherwise cause the audio to be panned primarily to the rear speakers.
  • the three-dimensional visualization 818 may include a tracking start point 843, a tracking end point 844, and one or more points 846 for the hand 814 tool along a tracked curve 850.
  • the tracked curve 850 shows the movement of the tracked object in a three-dimensional space. For example, the tracked object starts at tracking starting point 843, moves along the tracked curve 850 and ends at the tracking end point 844.
  • the resulting tracked movement may be used by the system 800 to pan audio to the left.
  • the audio would be panned toward the left-sided speakers to match the movement of the target 828 in the current video scene 808 so as to create a more immersive viewing experience.
  • the amount of audio panning from each speaker may be set by the system, by a user in the DAW, or the like.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)
  • Image Analysis (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

L'invention concerne des systèmes, des dispositifs et des procédés pour un système comprenant : un processeur (104) ayant une mémoire adressable (106), le processeur (104) étant configuré pour : recevoir une sélection (312) d'un premier objet (310) parmi un ou plusieurs objets (310, 340, 346) dans une vidéo (308) ; recevoir une sélection d'un premier son (324) parmi un ou plusieurs sons, le premier son sélectionné étant lié au premier objet sélectionné ; suivre un mouvement (304, 336, 338) du premier objet sélectionné dans la vidéo ; et produire des données de position (208) pour le mouvement suivi du premier objet sélectionné.
PCT/IB2019/001361 2018-12-26 2019-12-26 Autopanoramique WO2020136435A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862784921P 2018-12-26 2018-12-26
US62/784,921 2018-12-26

Publications (2)

Publication Number Publication Date
WO2020136435A1 true WO2020136435A1 (fr) 2020-07-02
WO2020136435A4 WO2020136435A4 (fr) 2020-08-20

Family

ID=69723994

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2019/001361 WO2020136435A1 (fr) 2018-12-26 2019-12-26 Autopanoramique

Country Status (1)

Country Link
WO (1) WO2020136435A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2530957A2 (fr) * 2011-05-30 2012-12-05 Sony Mobile Communications AB Placement par capteur du son dans un enregistrement vidéo
US20170364752A1 (en) * 2016-06-17 2017-12-21 Dolby Laboratories Licensing Corporation Sound and video object tracking

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2530957A2 (fr) * 2011-05-30 2012-12-05 Sony Mobile Communications AB Placement par capteur du son dans un enregistrement vidéo
US20170364752A1 (en) * 2016-06-17 2017-12-21 Dolby Laboratories Licensing Corporation Sound and video object tracking

Also Published As

Publication number Publication date
WO2020136435A4 (fr) 2020-08-20

Similar Documents

Publication Publication Date Title
CN108989691B (zh) 视频拍摄方法、装置、电子设备及计算机可读存储介质
US8867886B2 (en) Surround video playback
US9693009B2 (en) Sound source selection for aural interest
JP6741873B2 (ja) バーチャルリアリティ分野における装置および関連する方法
US10638247B2 (en) Audio processing
CN106296781B (zh) 特效图像生成方法及电子设备
KR102644833B1 (ko) Vr 스트림의 지연을 보상하는 방법 및 시스템
US20220174107A1 (en) Methods and apparatus for receiving virtual relocation during a network conference
CN112543344A (zh) 直播控制方法、装置、计算机可读介质及电子设备
CN112272817A (zh) 用于在沉浸式现实中提供音频内容的方法和装置
CN114339405B (zh) Ar视频数据流远程制作方法及装置、设备、存储介质
CN104935866A (zh) 实现视频会议的方法、合成设备和系统
JP2006041811A (ja) 自由視点画像ストリーミング方式
US10051403B2 (en) Controlling audio rendering
CN112017264B (zh) 虚拟演播厅的显示控制方法及装置、存储介质、电子设备
CN109636917B (zh) 三维模型的生成方法、装置、硬件装置
Oldfield et al. An object-based audio system for interactive broadcasting
WO2020136435A1 (fr) Autopanoramique
US11109151B2 (en) Recording and rendering sound spaces
CN103442202A (zh) 视频通信方法及装置
KR20160125322A (ko) 광고 콘텐츠 생성 및 관리 장치와 그 방법
CN109691140B (zh) 音频处理
EP3422743B1 (fr) Appareil et procédés associés de présentation d'audio spatial
CN112887653B (zh) 一种信息处理方法和信息处理装置
KR20180092411A (ko) 다원 방송 송출 방법 및 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19856477

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19856477

Country of ref document: EP

Kind code of ref document: A1