US20220157347A1 - Generation of audio-synchronized visual content - Google Patents
Generation of audio-synchronized visual content Download PDFInfo
- Publication number
- US20220157347A1 US20220157347A1 US17/587,980 US202217587980A US2022157347A1 US 20220157347 A1 US20220157347 A1 US 20220157347A1 US 202217587980 A US202217587980 A US 202217587980A US 2022157347 A1 US2022157347 A1 US 2022157347A1
- Authority
- US
- United States
- Prior art keywords
- content
- audio
- visual
- capture
- visual content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000000007 visual effect Effects 0.000 title claims abstract description 448
- 230000001360 synchronised effect Effects 0.000 claims abstract description 66
- 230000003287 optical effect Effects 0.000 claims description 44
- 238000000034 method Methods 0.000 claims description 26
- 230000000694 effects Effects 0.000 claims description 20
- 230000007704 transition Effects 0.000 claims description 17
- 230000001755 vocal effect Effects 0.000 claims description 12
- 230000009471 action Effects 0.000 claims description 6
- 230000003247 decreasing effect Effects 0.000 claims description 4
- 239000003550 marker Substances 0.000 description 32
- 238000012545 processing Methods 0.000 description 17
- 238000004590 computer program Methods 0.000 description 16
- 238000004891 communication Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000002123 temporal effect Effects 0.000 description 3
- 230000010365 information processing Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000033458 reproduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/79—Processing of colour television signals in connection with recording
- H04N9/80—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
- H04N9/804—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components
- H04N9/806—Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving pulse code modulation of the colour picture signal components with processing of the sound signal
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/34—Indicating arrangements
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
- G11B27/30—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording on the same track as the main recording
- G11B27/3009—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording on the same track as the main recording used signal is a pilot signal inside the frequency band of the recorded main information signal
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
- G11B27/32—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording on separate auxiliary tracks of the same or an auxiliary record carrier
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/79—Processing of colour television signals in connection with recording
- H04N9/87—Regeneration of colour television signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/02—Casings; Cabinets ; Supports therefor; Mountings therein
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/50—Constructional details
- H04N23/51—Housings
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/50—Constructional details
- H04N23/55—Optical parts specially adapted for electronic image sensors; Mounting thereof
-
- H04N5/2252—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/02—Casings; Cabinets ; Supports therefor; Mountings therein
- H04R1/028—Casings; Cabinets ; Supports therefor; Mountings therein associated with devices performing functions other than acoustics, e.g. electric candles
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2499/00—Aspects covered by H04R or H04S not otherwise provided for in their subgroups
- H04R2499/10—General applications
- H04R2499/11—Transducers incorporated or for use in hand-held devices, e.g. mobile phones, PDA's, camera's
Definitions
- This disclosure relates to generating audio-synchronized visual content based on playback of audio content during capture of the visual content.
- a user may wish to create a video edit from multiple video clips and synchronize the video edit to sound, such as music. Synchronizing a video edit to sound may be difficult and time consuming.
- An image capture device may include a housing.
- the housing may carry one or more of an image sensor, an optical element, a speaker, and/or other components.
- the optical element may guide light within a field of view to the image sensor.
- the image sensor may generate a visual output signal conveying visual information defining visual content based on light that becomes incident thereon.
- the speaker may provide playback of audio content for capture of the visual content.
- Audio information and/or other information may be obtained.
- the audio information may define the audio content.
- the audio content may have an audio progress length. Moments within the audio progress length may be associated with cue markers.
- the playback of at least a portion of the audio progress length of the audio content through the speaker may be effectuated for the capture of the visual content.
- the visual content may be captured during a capture duration with the playback of the at least the portion of the audio progress length of audio content.
- the visual content may have a visual progress length based on the capture duration and/or other information.
- the visual content captured during the capture duration may be synchronized with the at least the portion of the progress length of the audio content.
- the visual content captured during the capture duration may be synchronized such that one or more moments within the visual progress length of the visual content are associated with one or more of the cue markers of the audio content.
- a video edit may be generated based on the cue markers and/or other information.
- a video edit may be generated such that at least a portion of the visual content captured during the capture duration is included within the video edit based on the association of the moment(s) within the visual progress length of the visual content with the cue marker(s) of the audio content and/or other information.
- An electronic storage may store visual information, information relating to visual content, audio information, information relating to audio content, information relating to cue markers, information relating to playback of audio content, information relating to capture of visual content with playback of audio content, information relating to synchronization of visual content captured with playback of audio content, information relating to association of moments within visual progress length of visual content with cue markers of the audio content, information relating to video edit, and/or other information.
- the housing may carry one or more components of the image capture device.
- the housing may carry (be attached to, support, hold, and/or otherwise carry) one or more of an image sensor, an optical element, a speaker, a processor, an electronic storage, and/or other components.
- the image sensor may be configured to generate a visual output signal and/or other output signals.
- the visual output signal may convey visual information based on light that becomes incident thereon and/or other information.
- the visual information may define visual content.
- the optical element may be configured to guide light within a field of view to the image sensor.
- the field of view may be less than 180 degrees.
- the field of view may be equal to 180 degrees.
- the field of view may be greater than 180 degrees.
- the speaker may be configured to provide playback of audio content.
- the playback of the audio content may be provided for capture of the visual content.
- the processor(s) may be configured by machine-readable instructions. Executing the machine-readable instructions may cause the processor(s) to facilitate generating audio-synchronized visual content.
- the machine-readable instructions may include one or more computer program components.
- the computer program components may include one or more of an audio information component, an audio playback component, a capture component, a synchronization component, and/or other computer program components.
- the audio information component may be configured to obtain audio information and/or other information.
- the audio information may define audio content.
- the audio content may have an audio progress length. Moments within the audio progress length may be associated with cue markers.
- the audio content may include music. The moments within the audio progress length associated with the cue markers may include bars and/or beats of the music.
- the audio content may include verbal direction.
- the audio playback component may be configured to effectuate playback of audio content through one or more speakers.
- the audio playback component may be configured to effectuate playback of at least a portion of the audio progress length of the audio content through the speaker(s) for capture of the visual content.
- the playback of the audio content may pause at an end of the capture duration.
- the playback of the audio content may continue at a beginning of another capture duration.
- one or more previews of the audio content may be provided prior to the capture of the visual content. In some implementations, one or more previews of at least the portion of the audio progress length of the audio content may be provided prior to the capture of the visual content.
- an extent of the audio content to be played back during the capture of the visual content may be determined prior to the capture of the visual content. In some implementations, at least the portion of the audio progress length of the audio content played back during the capture duration may be determined prior to the capture of the visual content.
- the capture component may be configured to capture the visual content during a capture duration.
- the capture component may be configured to capture the visual content with the playback of the audio content.
- the capture component may be configured to capture the visual content with the playback of at least the portion of the audio progress length of audio content.
- the visual content may have a visual progress length based on the capture duration and/or other information.
- one or more audio tracks for the visual content captured during the capture duration may include at least the portion of the audio progress length of the audio content.
- the synchronization component may be configured to synchronize the visual content captured during the capture duration with the progress length of the audio content.
- the synchronization component may be configured to synchronize the visual content captured during the capture duration with at least the portion of the progress length of the audio content.
- the synchronization component may be configured to synchronize the visual content captured during the capture duration such that one or more moments within the visual progress length of the visual content are associated with one or more cue markers of the audio content.
- a video edit may be generated based on the cue markers and/or other information.
- a video edit may be generated such that at least a portion of the visual content captured during the capture duration is included within the video edit. At least the portion of the visual content captured during the capture duration may be included within the video edit based on the association of the moment(s) within the visual progress length of the visual content with the cue marker(s) of the audio content and/or other information.
- the video edit may be generated to include one or more bar-synced effects and/or one or more beat-synced effects based on the association of the moments within the audio progress lengths with the bars and/or the beats of the music, and/or other information.
- the video edit may be generated to include portions from multiple visual content captured during separate capture durations. Individual ones of the multiple visual content may be synchronized with a corresponding portion of the audio content. Transitions within the video edit between different portions from the multiple visual content may be determined based on association of moments within corresponding visual progress length of the multiple visual content with the cue markers of the audio content, and/or other information.
- FIG. 1 illustrates an example system that generates audio-synchronized visual content.
- FIG. 2 illustrates an example method for generating audio-synchronized visual content.
- FIG. 3 illustrates an example image capture device.
- FIG. 4 illustrates example cue markers for moments within audio progress length.
- FIG. 5 illustrates example audio content playback by an image capture device.
- FIG. 6 illustrates example synchronized moments within visual content.
- FIG. 7 illustrate example generation of a video edit.
- FIG. 1 illustrates a system 10 for generating audio-synchronized visual content.
- the system 10 may include one or more of a processor 11 , an interface 12 (e.g., bus, wireless interface), an electronic storage 13 , an optical element 14 , an image sensor 15 , a speaker 16 , and/or other components.
- the system 10 may include and/or be part of an image capture device.
- the image capture device may include a housing, and one or more of the optical element 14 , the image sensor 15 , the speaker 16 , and/or other components of the system 10 may be carried by the housing the image capture device.
- the optical element 14 may guide light within a field of view to the image sensor 15 .
- the image sensor 15 may generate a visual output signal conveying visual information defining visual content based on light that becomes incident thereon.
- the speaker 16 may provide playback of audio content for capture of the visual content.
- Audio information and/or other information may be obtained by the processor 11 .
- the audio information may define the audio content.
- the audio content may have an audio progress length. Moments within the audio progress length may be associated with cue markers.
- the playback of at least a portion of the audio progress length of the audio content through the speaker may be effectuated by the processor 11 for the capture of the visual content.
- the visual content may be captured by the processor 11 during a capture duration with the playback of the at least the portion of the audio progress length of audio content.
- the visual content may have a visual progress length based on the capture duration and/or other information.
- the visual content captured during the capture duration may be synchronized with the at least the portion of the progress length of the audio content by the processor 11 .
- the visual content captured during the capture duration may be synchronized such that one or more moments within the visual progress length of the visual content are associated with one or more of the cue markers of the audio content.
- a video edit may be generated by the processor 11 based on the cue markers and/or other information.
- a video edit may be generated such that at least a portion of the visual content captured during the capture duration is included within the video edit based on the association of the moment(s) within the visual progress length of the visual content with the cue marker(s) of the audio content and/or other information.
- the electronic storage 13 may be configured to include electronic storage medium that electronically stores information.
- the electronic storage 13 may store software algorithms, information determined by the processor 11 , information received remotely, and/or other information that enables the system 10 to function properly.
- the electronic storage 13 may store visual information, information relating to visual content, audio information, information relating to audio content, information relating to cue markers, information relating to playback of audio content, information relating to capture of visual content with playback of audio content, information relating to synchronization of visual content captured with playback of audio content, information relating to association of moments within visual progress length of visual content with cue markers of the audio content, information relating to video edit, and/or other information.
- Visual content may be captured by an image capture device during playback of audio content.
- Visual content may refer to content of image(s), video frame(s), and/or video(s) that may be consumed visually.
- visual content may be included within one or more images and/or one or more video frames of a video.
- the video frame(s) may define/contain the visual content of the video. That is, video may include video frame(s) that define/contain the visual content of the video.
- Video frame(s) may define/contain visual content viewable as a function of progress through the progress length of the video content.
- a video frame may include an image of the video content at a moment within the progress length of the video.
- video frame may be used to refer to one or more of an image frame, frame of pixels, encoded frame (e.g., I-frame, P-frame, B-frame), and/or other types of video frame.
- Visual content may be generated based on light received within a field of view of a single image sensor or within fields of view of multiple image sensors.
- Visual content (of image(s), of video frame(s), of video(s)) with a field of view may be captured by an image capture device during a capture duration.
- a field of view of visual content may define a field of view of a scene captured within the visual content.
- a capture duration may be measured/defined in terms of time durations and/or frame numbers. For example, visual content may be captured during a capture duration of 60 seconds, and/or from one point in time to another point in time. As another example, 1800 images may be captured during a capture duration. If the images are captured at 30 images/second, then the capture duration may correspond to 60 seconds. Other capture durations are contemplated.
- Audio content may refer to media content that may be consumed as one or more sounds.
- Audio content may include one or more sounds stored in one or more formats/containers, and/or other audio content.
- Audio content may include one or more sounds captured by one or more sound sensors (e.g., microphone).
- Audio content may include audio/sound provided/to be provided as an accompaniment for the visual content.
- Audio content may include one or more of voices, activities, songs, music, soundtrack, and/or other audio/sounds. For example, audio content may include music to be played during capture of visual content and/or playback of visual content.
- Visual content and/or audio content may be stored in one or more formats and/or one or more containers.
- a format may refer to one or more ways in which the information defining content (visual content, audio content) is arranged/laid out (e.g., file format).
- a container may refer to one or more ways in which information defining content is arranged/laid out in association with other information (e.g., wrapper format).
- Information defining visual content (visual information) and/or information defining audio content (audio information) may be stored within a single file or multiple files.
- visual information defining an image or video frames of a video may be stored within a single file (e.g., image file, video file), multiple files (e.g., multiple image files, multiple video files), a combination of different files, and/or other files.
- the system 10 may be remote from the image capture device or local to the image capture device. One or more portions of the image capture device may be remote from or be a part of the system 10 . One or more portions of the system 10 may be remote from or be a part of the image capture device.
- An image capture device may refer to a device captures visual content.
- An image capture device may capture visual content in form of images, videos, and/or other forms.
- An image capture device may refer to a device for recording visual information in the form of images, videos, and/or other media.
- An image capture device may be a standalone device (e.g., camera, action camera, image sensor) or may be part of another device (e.g., part of a smartphone, tablet).
- FIG. 3 illustrates an example image capture device 302 .
- Visual content e.g., of image(s), video frame(s)
- the image capture device 302 may include a housing 312 .
- the housing 312 may refer a device (e.g., casing, shell) that covers, protects, and/or supports one or more components of the image capture device 302 .
- the housing 312 may include a single-piece housing or a multi-piece housing.
- the housing 312 may carry (be attached to, support, hold, and/or otherwise carry) one or more of an optical element 304 , an image sensor 306 , a speaker 308 , a processor 310 , and/or other components.
- One or more components of the image capture device 302 may be the same as, be similar to, and/or correspond to one or more components of the system 10 .
- the processor 308 may be the same as, be similar to, and/or correspond to the processor 11 .
- the optical element 304 may be the same as, be similar to, and/or correspond to the optical element 14 .
- the image sensor 306 may be the same as, be similar to, and/or correspond to the image sensor 15 .
- the speaker 308 may be the same as, be similar to, and/or correspond to the speaker 16 .
- the housing may carry other components, such as the electronic storage 13 .
- the image capture device 302 may include other components not shown in FIG. 3 .
- the image capture device 302 may not include one or more components shown in FIG. 3 .
- Other configurations of image capture devices are contemplated.
- the optical element 304 may include instrument(s), tool(s), and/or medium that acts upon light passing through the instrument(s)/tool(s)/medium.
- the optical element 304 may include one or more of lens, mirror, prism, and/or other optical elements.
- the optical element 304 may affect direction, deviation, and/or path of the light passing through the optical element 304 .
- the optical element 304 may have a field of view 305 .
- the optical element 304 may be configured to guide light within the field of view 305 to the image sensor 306 .
- the field of view 305 may include the field of view of a scene that is within the field of view of the optical element 304 and/or the field of view of the scene that is delivered to the image sensor 306 .
- the optical element 304 may guide light within its field of view to the image sensor 306 or may guide light within a portion of its field of view to the image sensor 306 .
- the field of view of 305 of the optical element 304 may refer to the extent of the observable world that is seen through the optical element 304 .
- the field of view 305 of the optical element 304 may include one or more angles (e.g., vertical angle, horizontal angle, diagonal angle) at which light is received and passed on by the optical element 304 to the image sensor 306 .
- the field of view 305 may be greater than 180-degrees.
- the field of view 305 may be less than 180-degrees.
- the field of view 305 may be equal to 180-degrees.
- the image capture device 302 may include multiple optical elements.
- the image capture device 302 may include multiple optical elements that are arranged on the housing 312 to capture spherical images/videos (guide light within spherical field of view to one or more images sensors).
- the image capture device 302 may include two optical elements positioned on opposing sides of the housing 312 . The fields of views of the optical elements may overlap and enable capture of spherical images and/or spherical videos.
- the image sensor 306 may include sensor(s) that converts received light into output signals.
- the output signals may include electrical signals.
- the image sensor 306 may generate output signals conveying information that defines visual content of one or more images and/or one or more video frames of a video.
- the image sensor 306 may include one or more of a charge-coupled device sensor, an active pixel sensor, a complementary metal-oxide semiconductor sensor, an N-type metal-oxide-semiconductor sensor, and/or other image sensors.
- the image sensor 306 may be configured generate output signals conveying information that defines visual content of one or more images and/or one or more video frames of a video.
- the image sensor 306 may be configured to generate a visual output signal based on light that becomes incident thereon during a capture duration and/or other information.
- the visual output signal may convey visual information that defines visual content having the field of view 305 .
- the optical element 304 may be configured to guide light within the field of view 305 to the image sensor 306 , and the image sensor 306 may be configured to generate visual output signals conveying visual information based on light that becomes incident thereon via the optical element 304 .
- the visual information may define visual content by including information that defines one or more content, qualities, attributes, features, and/or other aspects of the visual content.
- the visual information may define visual content of an image by including information that makes up the content of the image, and/or information that is used to determine the content of the image.
- the visual information may include information that makes up and/or is used to determine the arrangement of pixels, characteristics of pixels, values of pixels, and/or other aspects of pixels that define visual content of the image.
- the visual information may include information that makes up and/or is used to determine pixels of the image. Other types of visual information are contemplated.
- Capture of visual content by the image sensor 306 may include conversion of light received by the image sensor 306 into output signals/visual information defining visual content. Capturing visual content may include recording, storing, and/or otherwise capturing the visual content for use in generating video content (e.g., content of video frames). For example, during a capture duration, the visual output signal generated by the image sensor 306 and/or the visual information conveyed by the visual output signal may be used to record, store, and/or otherwise capture the visual content for use in generating video content.
- video content e.g., content of video frames
- the image capture device 302 may include multiple image sensors.
- the image capture device 302 may include multiple image sensors carried by the housing 312 to capture spherical images/videos based on light guided thereto by multiple optical elements.
- the image capture device 302 may include two image sensors configured to receive light from two optical elements positioned on opposing sides of the housing 312 . The fields of views of the optical elements may overlap and enable capture of spherical images and/or spherical videos.
- the speaker 308 may refer to an electronic device that provides audible presentation of information.
- the speaker 308 may refer to an electronic device that makes sound.
- the speaker 308 may produce audio output in form of sound waves.
- the speaker 308 may include one or more transducers that coverts audio signal into sound.
- the speaker 308 may be configured to provide playback of audio content.
- the playback of the audio content may be provided for capture of visual content.
- the playback of the audio content may be provided during part(s) of or entirety of the capture duration for the visual content.
- the speaker 308 may provide playback of audio content, such as a song and/or verbal direction, during capture of visual content by the image capture device 302 .
- the processor 310 may include one or more processors (logic circuitry) that provide information processing capabilities in the image capture device 302 .
- the processor 310 may provide one or more computing functions for the image capture device 302 .
- the processor 310 may operate/send command signals to one or more components of the image capture device 302 to operate the image capture device 302 .
- the processor 310 may facilitate operation of the image capture device 302 in capturing image(s) and/or video(s), facilitate operation of the optical element 304 (e.g., change how light is guided by the optical element 304 ), facilitate operation of the image sensor 306 (e.g., change how the received light is converted into information that defines images/videos and/or how the images/videos are post-processed after capture), and/or facilitate operation of the speaker 308 (e.g., change how the speaker 308 produces sound).
- the optical element 304 e.g., change how light is guided by the optical element 304
- the image sensor 306 e.g., change how the received light is converted into information that defines images/videos and/or how the images/videos are post-processed after capture
- the speaker 308 e.g., change how the speaker 308 produces sound.
- the processor 310 may obtain information from the image sensor 306 and/or facilitate transfer of information from the image sensor 306 to another device/component.
- the processor 310 may be remote from the processor 11 or local to the processor 11 .
- One or more portions of the processor 310 may be remote from the processor 11 and/or one or more portions of the processor 10 may be part of the processor 310 .
- the processor 310 may include and/or perform one or more functionalities of the processor 11 shown in FIG. 1 .
- the image capture device 302 may play audio content (e.g., music,) through the speaker 308 during capture of visual content. Moments within an audio progress length of the audio content may be associated with cue markers. Visual content captured by the image capture device 302 during playback of the audio content may be synchronized with the audio content such that one or more moments within visual progress length of the visual content are associated with one or more cue markers of the audio content. The synchronization of the captured visual content with audio content played back during visual content capture may be used in generating one or more video edits. For example, specific portions of the visual content captured during playback of the audio content may be included within the video edit based on association of the moment(s) within the visual progress length of the visual content with the cue marker(s) of the audio content.
- audio content e.g., music
- the video edit may include portions of single visual content or multiple visual content (e.g., multiple visual content captured at different times, with individual visual content synchronized with the audio content played during visual content capture).
- the video edit may include audio content captured with capture of the visual content and/or the audio content that was played back during capture of the visual content.
- visual content may be captured with playback of music, and a video edit of the visual content may be generated (1) using synchronization of the visual content with the music, and (2) with the music as the audio content of the video edit (e.g., insert the music in an audio track of the video edit).
- Such synchronization of visual content with audio content played back during capture of the visual content may enable generation of video edit directly from the image capture device/from visual content provided by the image capture device.
- the processor 11 may be configured to obtain information to facilitate generating audio-synchronized visual content.
- Obtaining information may include one or more of accessing, acquiring, analyzing, determining, examining, identifying, loading, locating, opening, receiving, retrieving, reviewing, selecting, storing, and/or otherwise obtaining the information.
- the processor 11 may obtain information from one or more locations.
- the processor 11 may obtain information from a storage location, such as the electronic storage 13 , electronic storage of information and/or signals generated by one or more sensors, electronic storage of a device accessible via a network, and/or other locations.
- the processor 11 may obtain information from one or more hardware components (e.g., an image sensor) and/or one or more software components (e.g., software running on a computing device).
- the processor 11 may be configured to provide information processing capabilities in the system 10 .
- the processor 11 may comprise one or more of a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information.
- the processor 11 may be configured to execute one or more machine-readable instructions 100 to facilitate generating audio-synchronized visual content.
- the machine-readable instructions 100 may include one or more computer program components.
- the machine-readable instructions 100 may include one or more of an audio information component 102 , an audio playback component 104 , a capture component 106 , a synchronization component 108 , and/or other computer program components.
- the audio information component 102 may be configured to obtain audio information and/or other information. Obtaining audio information may include one or more of accessing, acquiring, analyzing, determining, examining, generating, identifying, loading, locating, opening, receiving, retrieving, reviewing, storing, and/or otherwise obtaining the audio information.
- the audio information component 102 may obtain audio information from one or more locations. For example, the audio information component 102 may obtain audio information from a storage location, such as the electronic storage 13 , electronic storage of a device accessible via a network, and/or other locations.
- the audio information component 102 may obtain audio information from one or more hardware components (e.g., a physical storage device) and/or one or more software components (e.g., software running on a computing device).
- the audio information may be obtained based on a user's interaction with a user interface/application (e.g., video editing application, video capture application), and/or other information.
- a user interface/application may provide option(s) for a user to select one or more sounds (e.g., music, verbal direction) to be played back during capture of visual content.
- the audio information defining the sound(s) may be obtained based on the user's selection of the sound(s) through the user interface/application.
- the audio information may define audio content.
- the audio information may define audio content by including information that defines one or more content, qualities, attributes, features, and/or other aspects of the audio content.
- the audio information may define audio content by including information that makes up the content of the audio, and/or information that is used to determine the content of the audio.
- the audio content may include one or more reproductions of the received sounds.
- the audio information may define audio content in one or more formats, such as WAV, MP3, MP4, RAW, and/or other formats.
- the audio content may have an audio progress length.
- the audio progress length may be defined in terms of time duration and/or other measurable factors. For example, audio content may have a time duration of five minutes. Other progress lengths and time durations are contemplated.
- Moments within the audio progress length may be associated with cue markers.
- a cue marker being associated with a moment within the audio progress length may include the cue marker identifying the moment, the cue marker being tied to the moment, the cue marker being connected to the moment, and/or the cue marker being otherwise associated with the moment.
- a moment within the audio progress length may refer to a point in time or a duration of time within the audio progress length.
- a cue marker may refer to a marker that indicates location of a cue.
- a cue may signal that the corresponding location may be used in automatically generating a video edit.
- a cue marker may mark a point in time or a duration of time within the audio progress length as a location in which one or more edits may be made for generating a video edit.
- a cue marker may mark a point in time or a duration of time within the audio progress length as a location to guide making edits for generating a video edit. For example, a cue marker may mark a moment within the audio progress length as a location in which a video edit may transition from one video clip to another video clip. A cue marker may mark a moment within the audio progress length as a location in which something of interest is occurring for a video edit. A cue marker may mark a moment within the audio progress length as a location in which a video edit may include a visual effect. Other indications and/or edits are contemplated.
- Moments within the audio progress length may be associated with cue markers based on analysis of the audio content, user input, and/or other information. For example, audio content may be analyzed to determine locations of particular sounds, and the moments corresponding to the particular sounds may be associated with cue markers. As another example, a user may manually associate particular moments within the audio progress length with cue markers, manually move moments with which the cue markers are associated, manually delete association between moments and cue markers, and/or make other manual changes to the cue markers.
- the audio content may include music.
- Music may refer to vocal and/or instrumental sounds.
- music may include one or more songs, instrumental musical piece, soundtrack, and/or other music.
- moments corresponding to particular musical features within the audio progress length of music may be associated with cue marker.
- moments corresponding to bars and/or beats of the music may be associated with cue markers. That is, moments within the audio progress length of the music associated with cue markers may include bars and/or beats of the music.
- a song may be obtained for use in capturing visual content, and the song may be analyzed to identify the bars and/or beats of the song.
- the moments of the bars and/or beats of the song may be associated with cue markers/used as cues in generating a video edit.
- the audio content may include verbal direction.
- Verbal direction may refer to direction that is expressed in words.
- verbal direction may include spoken words that guide user(s) of the image capture device to perform one or more actions.
- verbal direction may include countdown to an action to be performed (e.g., 4-3-2-1-Dance!).
- Moments corresponding to particular words/direction of the verbal direction may be associated with cue markers. For example, in an example of verbal direction to start dancing, a moment corresponding to word “Dance” may be associated with a cue marker.
- FIG. 4 illustrates example cue markers for moments within audio progress length.
- audio content 400 may be shown to have an audio progress length.
- Two moments within the audio progress length may be associated with cue markers.
- a cue marker A 402 may be associated with a moment near the beginning of the audio progress length
- a cue marker B 404 may be associated with a moment near the middle of the audio progress length.
- the audio playback component 104 may be configured to effectuate playback of audio content through one or more speakers. Effectuating playback of audio content through speaker(s) may include using the speaker(s) to provide playback of the audio content.
- the audio playback component 104 may be configured to effectuate playback of at least a portion of the audio progress length of the audio content through the speaker(s) for capture of the visual content. For example, the audio playback component 104 may be configured to effectuate playback of the entire length of the audio content or one or more parts of the audio content through the speaker(s) for capture of the visual content.
- Provide playback of audio content for capture of visual content may include providing playback of the audio content while an image capture device is capturing visual content.
- the image capture device 302 may provide playback of audio content through the speaker 308 so that the audio content is heard by user(s) of the image capture device 302 while the image capture device 302 is capturing visual content.
- FIG. 5 illustrates example audio content playback by an image capture device.
- the image capture device 502 may be operating to provide audio content playback 504 while the image capture device 502 is capturing visual content.
- the image capture device 502 may provide playback of audio content for capture of visual content when it is operating in a particular mode (e.g., music playback & sync mode).
- the audio content playback 504 may include the image capture device 502 playing music via one or more speakers when the image capture device 502 is capturing a video.
- the playback of the audio content may be provided during the capture duration of the visual content.
- the playback of the audio content may pause at an end of the capture duration.
- the image capture device may provide playback of music while the image capture device is recording a video. When the recording of the video is stopped/paused, the playback of the music may be paused.
- the playback of the audio content may continue at a beginning of another capture duration. For example, returning to the example of the image capture device providing playback of the music, the playback of the music may have been paused due the to the recording of the video being stopped/paused. When recording of the next video is started, the playback of the music may resume from the paused moment.
- one or more previews of the audio content may be provided prior to the capture of the visual content.
- the image capture device may provide playback of audio content to be played during capture of the video.
- Such provision of the audio content may allow the user(s) of the image capture device to anticipate the sounds that will be played.
- Preview(s) of the audio content may include preview of the entire length of the audio content or one or more parts of the audio content. For example, preview(s) of a part of the audio content that will be provided for capture of visual content may be provided prior to the capture of the visual content.
- an extent of the audio content to be played back during the capture of the visual content may be determined prior to the capture of the visual content.
- the extent of the audio content to be played back may refer to an amount/part of the progress length of the audio content to be played back. For example, a user may choose a predefined amount of audio content to be played back during the capture of the visual content. That is, a portion of the audio progress length of the audio content to be played back during the capture duration for the visual content may be determined prior to the capture of the visual content.
- the capture component 106 may be configured to capture the visual content during one or more capture durations.
- a capture duration may refer to a time duration in which visual content is captured.
- the visual content may be captured through one or more optical elements (e.g., the optical element 14 ). Capturing visual content during a capture duration may include using, recording, storing, and/or otherwise capturing the visual content during the capture duration.
- the visual content may be captured for use in generating images and/or video frames.
- the images/video frames may be stored in electronic storage and/or deleted after use (e.g., after preview).
- the visual content may be captured for use in generating audio-synchronized visual content.
- the capture component 102 may use the visual output signal generated by the image sensor 15 and/or the visual information conveyed by the visual output signal to record, store, and/or otherwise capture the visual content.
- the capture component 102 may store, in the electronic storage 13 and/or other (permanent and/or temporary) electronic storage medium, information (e.g., the visual information) defining the visual content based on the visual output signal generated by the image sensor 15 and/or the visual information conveyed by the visual output signal during the capture duration.
- information defining the captured visual content may be stored in one or more visual tracks.
- the visual content may have a visual progress length based on the capture duration and/or other information.
- the visual progress length of the visual content may be same as or different from the capture duration. For example, based on the capture of the visual content at regular speed (e.g., capture framerate is same as playback framerate), the visual progress length of the visual content may be same as the capture duration. Based on the capture of the visual content at non-regular speed(s) (e.g., slow motion capture, time-lapse capture), the visual progress length of the visual content may be different from the capture duration.
- speed of the playback of the audio content may be changed based on speed with which the visual content is captured. For example, based on the visual content being captured at regular speed (e.g., 1 ⁇ speed), regular speed of playback of the audio content may be used (e.g., audio content played at 1 ⁇ speed). Based on the capture of the visual content at non-regular speed(s), (e.g., 0.5 ⁇ speed, 4 ⁇ speed), the speed of the playback of the audio content may be changed to match the speed of capture of the visual content. The speed of the playback of the audio content may be changed to be inverse for the speed of the capture of the visual content.
- the speed of the playback of the audio content may be decreased when time-lapse capture is used to capture visual content (e.g., 1 ⁇ 4 ⁇ speed of audio content playback for 4 ⁇ time-lapse capture), and the speed of the playback of the audio content may be increased when slow motion capture is used to capture visual content (e.g., 2 ⁇ speed of audio content playback for 0.5 ⁇ slow-motion capture).
- time-lapse capture e.g., 1 ⁇ 4 ⁇ speed of audio content playback for 4 ⁇ time-lapse capture
- slow motion capture e.g., 2 ⁇ speed of audio content playback for 0.5 ⁇ slow-motion capture
- Such change in speed of playback of the audio content may allow the audio content to remain in synchronization with the visual content.
- the capture component 106 may be configured to capture the visual content with the playback of the audio content. That is, the capture component 106 may capture the visual content while the playback of the audio content is being provided. If a portion of the audio progress length is played back, the capture component may be configured to capture the visual content with the playback of the portion of the audio progress length of audio content. For example, referring to FIG. 5 , the image capture device 502 may capture visual content while it is providing the audio content playback 504 .
- the capture component 106 may be configured to capture audio content during one or more capture durations.
- the image capture device may include one or more sound sensors (e.g., microphone), and the capture component 106 may use the sound sensors to capture sounds that are heard during capture of the visual content.
- information defining the captured audio content may be stored in one or more audio tracks.
- audio content may not be captured during capture of the visual content.
- the image capture device may capture visual content without capturing audio content.
- one or more audio tracks for the visual content captured during the capture duration may include the audio content.
- the audio track(s) for the visual content captured during the capture duration may include the audio content that was played back during the capture of the visual content. If a portion of the audio progress length is played back during the capture duration, then that portion of the audio progress length of the audio content may be included in the audio track(s) for the visual content.
- the audio content included in the audio track(s) may be a copy of the original audio content that was played back. Rather than including the audio content that was heard through microphone of the image capture device during capture of the visual content, the portion of the original audio content file that was played back during capture of the visual content may be copied into the audio track(s).
- the portion of the music file defining the portion of the music that was played may be copied into the audio track(s).
- Such generation of audio track(s) may result in a video file that includes visual content synchronized to the audio content. For instance, if the visual content was captured while playing a song, then such generation of audio track(s) may result in a lip-synced video.
- the synchronization component 108 may be configured to synchronize the visual content captured during the capture duration with the audio content that was played back during the capture duration.
- the visual content captured during the capture duration may be synchronized with the portion of the progress length of the audio content that was played during the capture duration. For example, if a portion of the audio progress length is played back during the capture duration, then the visual content may be synchronized with that portion of the audio progress length of the audio content.
- Synchronization of the visual content with the audio content may include identification, determination, and/or recording of moments within the visual progress length that occur at the same time as moments within the audio progress length.
- Synchronization of the visual content with the audio content may include identification, determination, and/or recording of moments within the visual progress length that occur at the same time as moments within the audio progress length that are associated with cue marker(s). Synchronization of the visual content with the audio content may take into account changes in capture speed of the visual content/playback speed of the audio content during capture of the visual content (e.g., slow motion capture, time-lapse capture).
- the synchronization component 108 may synchronize the visual content and the audio content by identifying, determining, and/or recording when a particular moment in the visual content should occur at the same time with a particular moment in the audio content during playback.
- the synchronization component 108 may synchronize the visual content and the audio content by storing information about the cue markers/moments associated with the cue marker(s) with the visual content. For example, information that identifies the cue markers and/or the moments associated with the cue markers may be stored as metadata for the visual content.
- metadata of the audio content may identify the cue markers and/or the moments associated with the cue markers, and some or entirety of the metadata may be stored as metadata of the visual content.
- the visual content captured during the capture duration may automatically be synchronized with the audio content played during the capture duration based on operation of a single image capture device. That is, because the image capture device is both playing the audio content while it is capturing visual content, the timing of the visual content capture may automatically be matched with the timing of the audio content playback. For example, when the image capture device capture visual content of a video frame at a particular moment within the capture duration, the image capture device will also know which moment of the audio content is being played. The visual content captured by the image capture device may automatically be time-synchronized to the audio content that is played backed during capture when the visual content is captured and stored in memory.
- the image capture device may automatically synchronize the visual content with the audio content without analyzing the audio content that was captured with the visual content. That is, rather than analyzing the audio content captured during the visual content to determine how the timing of visual content capture matches with the timing of the audio content playback, the image capture device may utilize its knowledge regarding timing of audio content that it is played back during the capture duration to perform the synchronization.
- the audio content captured with the visual content may be used to modify and/or augment the synchronization of visual content with audio content. For example, there may be time difference between when timing of the audio content playback internally tracked by the image capture device and timing of the audio content playback actually provided through speaker(s) of the image capture device. The difference between the timings may be determined based on analysis of the audio content captured with the visual content, and the synchronization of visual content with audio content may be adjusted so that the synchronization matches the timing of the audio content playback actually provided through speaker(s) of the image capture device.
- the synchronization component 108 may be configured to synchronize the visual content captured during the capture duration such that one or more moments within the visual progress length of the visual content are associated with one or more cue markers of the audio content. Synchronization of the visual content with the audio content may result in one or more moments within the visual progress length being synchronized with one or more moments of the audio progress length that are associated with cue marker(s).
- FIG. 6 illustrates example synchronized moments within two different visual content.
- visual content A 610 and visual content B 620 may be synchronized with audio content 400 .
- the visual content A 610 and the visual content B 620 may have been captured separately.
- the visual content A 610 and the visual content B 620 may have been captured while the image capture device provided playback of different portions of the audio content 400 .
- the visual content A 610 may have been captured during playback of the entirety of the audio content 400 .
- the visual content B 620 may have been captured during playback of a portion in the middle of the visual content 400 .
- the visual content A 610 may be synchronized with the entirety of the audio content 400 . Such synchronization may result in a synchronized moment A- 1 612 in the visual progress length A being synchronized with the moment in the audio progress length corresponding to the cue marker A 402 , and a synchronized moment A- 2 614 in the visual progress length A being synchronized with the moment in the audio progress length corresponding to the cue marker B 404 .
- the visual content B 620 may be synchronized with the portion of the audio content 400 . Such synchronization may result in a synchronized moment B- 1 622 in the visual progress length B being synchronized with the moment in the audio progress length corresponding to the cue marker A 402 , and a synchronized moment B- 2 624 in the visual progress length B being synchronized with the moment in the audio progress length corresponding to the cue marker B 404 .
- the synchronized moment A- 1 612 and the synchronized moment B- 1 622 may be synchronized to the same moment in the audio progress length corresponding to the cue marker A 402
- the synchronized moment A- 2 614 and the synchronized moment B- 2 624 may be synchronized to the same moment in the audio progress length corresponding to the cue marker B 404 .
- Such synchronization of different visual content may enable synchronization of visual content captured at different times, at different locations, by different image capture devices, and/or by different users.
- Such synchronization of different visual content may enable automatic generation of video edits that transitions between different visual content, with the transitions using the synchronized moments.
- a video edit may be generated based on the cue markers and/or other information.
- a video edit may refer to an arrangement and/or a manipulation of one or more portions of one or more visual content.
- a video edit may refer to an arrangement and/or a manipulation of one or more time portion(s) (e.g., time duration(s)) and/or spatial portion(s) (e.g., punchouts) of one or more video clips.
- a video edit may define which portions (e.g., temporal portions, spatial portions) of visual content are included for playback and the order in which the portions are to be presented on playback.
- a video edit may be generated as an encoded version of the video edit and/or as instructions for rendering the video edit.
- the video edit may be encoded as a video clip, and the video clip may be opened in a video player for presentation.
- the video edit may be generated as instructions for presenting the video edit, such as instructions that identify arrangements and/or manipulations of visual content portions included in the video edit.
- the video edit may be generated as information defining a director track that includes information as to which portions of visual content are included in the video edit, the order in which the portions are to the presented on playback, and the edits to be applied to the different portions.
- a video player may use the director track to retrieve the portions of the visual content identified in the video edit for presentation, arrangement, and/or editing when the video edit is opened/to be presented.
- Generating a video edit based on the cue markers may include using moments in the visual progress length of the visual content synchronized with moments in the audio progress length of the audio content corresponding to the cue markers to generate the video edit.
- the synchronized moments in the visual content may be used to provide one or more effects in the video edit.
- An effect in the visual content may provide for visual changes and/or temporal changes in the video edit. For example, an effect may change one or more visual characteristics and/or one or more temporal characteristics of the visual content included in the video edit.
- a video edit may include transition between different visual content.
- a video edit may be generated such that at least a portion of the visual content captured during the capture duration is included within the video edit.
- the portion(s) of the visual content captured during the capture duration may be included within the video edit based on the association of the moment(s) within the visual progress length of the visual content with the cue marker(s) of the audio content and/or other information. That is, which portions of the visual content are included in the video edit may be determined based on synchronization of the visual content with the audio content that was played back during capture of the visual content.
- the audio content may include music, and moments.
- the video edit may be generated to include one or more bar-synced effects, one or more beat-synced effects, and/or other musical-feature effects based on the association of the moments within the audio progress lengths with the bars, the beats, and/or other musical features of the music, and/or other information.
- a bar-synced effect may refer to an effect that is used for a moment in the visual content that is synchronized to a moment in the audio content corresponding to a bar.
- a beat-synced effect may refer to an effect that is used for a moment in the visual content that is synchronized to a moment in the audio content corresponding to a beat.
- Same or different effects may be used for different visual content. For example, same bar-synced effects may be applied across all visual content included in the video edit. As another example different bar-synched effects may be applied to different visual content included in the video edit.
- the video edit may be generated to include portions from multiple visual content captured during separate capture durations.
- the video edit may be generated to include portions from different visual content captured at different times, at different locations, by different image capture devices, and/or by different users.
- Individual visual content may be synchronized with a corresponding portion of the same audio content (portion of the audio content that was played during capture of the visual content).
- individual visual content may be synchronized with portion(s) of same music played during capture of the visual content.
- the video edit may be generated to include transitions between different visual content. Transitions within the video edit between different portions from the multiple visual content may be determined based on association of moments within corresponding visual progress length of the multiple visual content with the cue markers of the audio content, and/or other information. Moments in different visual content that are synchronized to the same cue marker of the audio content may be used to transition from one visual content to another visual content. Because the multiple visual content are synchronized to the same audio content, the synchronized moments may provide transition points that preserves the timing of the visual content and the audio content. For example, multiple visual content may have been captured while image capture device(s) provides playback of the same song. Different visual content may include depiction of different persons lip-synching to the same song.
- creating transition between different visual content at the same synchronized moments may enable generation of a video edit that preserves timing of the different visual content to the song, Such transition may preserve the depiction of lip-synching by different persons in different visual content.
- FIG. 7 illustrate example generation of a video edit.
- the visual content A 610 and the visual content B 620 may be synchronized with audio content (as shown in FIG. 6 ).
- the synchronized moment A- 1 612 and the synchronized moment B- 1 622 may be synchronized to the same moment in the audio content, and the synchronized moment A- 2 614 and the synchronized moment B- 2 624 may be synchronized to the same moment in the audio content.
- the synchronized moment A- 1 612 and the synchronized moment B- 1 622 may be used to transition between the visual content A 610 and the visual content B 620 .
- the synchronized moment A- 2 614 and the synchronized moment B- 1 624 may be used to transition between the visual content A 610 and the visual content B 620 .
- portions of the visual content A 610 and the visual content B 620 may be used to generate a video edit 700 .
- the video edit 700 include a portion A- 1 712 (a portion of the visual content A 610 preceding the synchronized moment A- 1 612 ), a portion B 722 (a portion of the visual content B 620 between the synchronized moment B- 1 622 and the synchronized moment B- 2 624 ), and a portion A- 2 714 (a portion of the visual content A 610 following the synchronized moment A- 2 614 ).
- Use of the synchronized moments to provide transition between different visual content may result in the video edit 700 preserving the synchronized timing of the visual content A 610 and the visual content B 620 with the audio content.
- Other generation of video edit are contemplated.
- Implementations of the disclosure may be made in hardware, firmware, software, or any suitable combination thereof. Aspects of the disclosure may be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors.
- a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device).
- a tangible (non-transitory) machine-readable storage medium may include read-only memory, random access memory, magnetic disk storage media, optical storage media, flash memory devices, and others, and a machine-readable transmission media may include forms of propagated signals, such as carrier waves, infrared signals, digital signals, and others.
- Firmware, software, routines, or instructions may be described herein in terms of specific exemplary aspects and implementations of the disclosure, and performing certain actions.
- External resources may include hosts/sources of information, computing, and/or processing and/or other providers of information, computing, and/or processing outside of the system 10 .
- any communication medium may be used to facilitate interaction between any components of the system 10 .
- One or more components of the system 10 may communicate with each other through hard-wired communication, wireless communication, or both.
- one or more components of the system 10 may communicate with each other through a network.
- the processor 11 may wirelessly communicate with the electronic storage 13 .
- wireless communication may include one or more of radio communication, Bluetooth communication, Wi-Fi communication, cellular communication, infrared communication, or other wireless communication. Other types of communications are contemplated by the present disclosure.
- the processor 11 is shown in FIG. 1 as a single entity, this is for illustrative purposes only. In some implementations, the processor 11 may comprise a plurality of processing units. These processing units may be physically located within the same device, or the processor 11 may represent processing functionality of a plurality of devices operating in coordination.
- the processor 11 may be configured to execute one or more components by software; hardware; firmware; some combination of software, hardware, and/or firmware; and/or other mechanisms for configuring processing capabilities on the processor 11 .
- FIG. 1 it should be appreciated that although computer components are illustrated in FIG. 1 as being co-located within a single processing unit, in implementations in which processor 11 comprises multiple processing units, one or more of computer program components may be located remotely from the other computer program components.
- While computer program components are described herein as being implemented via processor 11 through machine-readable instructions 100 , this is merely for ease of reference and is not meant to be limiting. In some implementations, one or more functions of computer program components described herein may be implemented via hardware (e.g., dedicated chip, field-programmable gate array) rather than software. One or more functions of computer program components described herein may be software-implemented, hardware-implemented, or software and hardware-implemented
- processor 11 may be configured to execute one or more additional computer program components that may perform some or all of the functionality attributed to one or more of computer program components described herein.
- the electronic storage media of the electronic storage 13 may be provided integrally (i.e., substantially non-removable) with one or more components of the system 10 and/or as removable storage that is connectable to one or more components of the system 10 via, for example, a port (e.g., a USB port, a Firewire port, etc.) or a drive (e.g., a disk drive, etc.).
- a port e.g., a USB port, a Firewire port, etc.
- a drive e.g., a disk drive, etc.
- the electronic storage 13 may include one or more of optically readable storage media (e.g., optical disks, etc.), magnetically readable storage media (e.g., magnetic tape, magnetic hard drive, floppy drive, etc.), electrical charge-based storage media (e.g., EPROM, EEPROM, RAM, etc.), solid-state storage media (e.g., flash drive, etc.), and/or other electronically readable storage media.
- the electronic storage 13 may be a separate component within the system 10 , or the electronic storage 13 may be provided integrally with one or more other components of the system 10 (e.g., the processor 11 ).
- the electronic storage 13 is shown in FIG. 1 as a single entity, this is for illustrative purposes only.
- the electronic storage 13 may comprise a plurality of storage units. These storage units may be physically located within the same device, or the electronic storage 13 may represent storage functionality of a plurality of devices operating in coordination.
- FIG. 2 illustrates method 200 for generating audio-synchronized visual content.
- the operations of method 200 presented below are intended to be illustrative. In some implementations, method 200 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. In some implementations, two or more of the operations may occur substantially simultaneously.
- method 200 may be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information).
- the one or more processing devices may include one or more devices executing some or all of the operation of method 200 in response to instructions stored electronically on one or more electronic storage media.
- the one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 200 .
- an image capture device may include a housing.
- the housing may carry one or more of an image sensor, an optical element, a speaker, and/or other components.
- the optical element may guide light within a field of view to the image sensor.
- the image sensor may generate a visual output signal conveying visual information defining visual content based on light that becomes incident thereon.
- the speaker may provide playback of audio content for capture of the visual content.
- audio information and/or other information may be obtained.
- the audio information may define the audio content.
- the audio content may have an audio progress length. Moments within the audio progress length may be associated with cue markers.
- operation 201 may be performed by a processor component the same as or similar to the audio information component 102 (Shown in FIG. 1 and described herein).
- operation 202 the playback of at least a portion of the audio progress length of the audio content through the speaker may be effectuated for the capture of the visual content.
- operation 202 may be performed by a processor component the same as or similar to the audio playback component 104 (Shown in FIG. 1 and described herein).
- the visual content may be captured during a capture duration with the playback of the at least the portion of the audio progress length of audio content.
- the visual content may have a visual progress length based on the capture duration and/or other information.
- operation 203 may be performed by a processor component the same as or similar to the capture component 106 (Shown in FIG. 1 and described herein).
- the visual content captured during the capture duration may be synchronized with the at least the portion of the progress length of the audio content.
- the visual content captured during the capture duration may be synchronized such that one or more moments within the visual progress length of the visual content are associated with one or more of the cue markers of the audio content.
- a video edit may be generated based on the cue markers and/or other information.
- a video edit may be generated such that at least a portion of the visual content captured during the capture duration is included within the video edit based on the association of the moment(s) within the visual progress length of the visual content with the cue marker(s) of the audio content and/or other information.
- operation 204 may be performed by a processor component the same as or similar to the synchronization component 108 (Shown in FIG. 1 and described herein).
Abstract
Description
- This disclosure relates to generating audio-synchronized visual content based on playback of audio content during capture of the visual content.
- A user may wish to create a video edit from multiple video clips and synchronize the video edit to sound, such as music. Synchronizing a video edit to sound may be difficult and time consuming.
- This disclosure relates to image capture devices that generates audio-synchronized visual content. An image capture device may include a housing. The housing may carry one or more of an image sensor, an optical element, a speaker, and/or other components. The optical element may guide light within a field of view to the image sensor. The image sensor may generate a visual output signal conveying visual information defining visual content based on light that becomes incident thereon. The speaker may provide playback of audio content for capture of the visual content.
- Audio information and/or other information may be obtained. The audio information may define the audio content. The audio content may have an audio progress length. Moments within the audio progress length may be associated with cue markers. The playback of at least a portion of the audio progress length of the audio content through the speaker may be effectuated for the capture of the visual content. The visual content may be captured during a capture duration with the playback of the at least the portion of the audio progress length of audio content. The visual content may have a visual progress length based on the capture duration and/or other information. The visual content captured during the capture duration may be synchronized with the at least the portion of the progress length of the audio content. The visual content captured during the capture duration may be synchronized such that one or more moments within the visual progress length of the visual content are associated with one or more of the cue markers of the audio content.
- A video edit may be generated based on the cue markers and/or other information. A video edit may be generated such that at least a portion of the visual content captured during the capture duration is included within the video edit based on the association of the moment(s) within the visual progress length of the visual content with the cue marker(s) of the audio content and/or other information.
- An electronic storage may store visual information, information relating to visual content, audio information, information relating to audio content, information relating to cue markers, information relating to playback of audio content, information relating to capture of visual content with playback of audio content, information relating to synchronization of visual content captured with playback of audio content, information relating to association of moments within visual progress length of visual content with cue markers of the audio content, information relating to video edit, and/or other information.
- The housing may carry one or more components of the image capture device. The housing may carry (be attached to, support, hold, and/or otherwise carry) one or more of an image sensor, an optical element, a speaker, a processor, an electronic storage, and/or other components.
- The image sensor may be configured to generate a visual output signal and/or other output signals. The visual output signal may convey visual information based on light that becomes incident thereon and/or other information. The visual information may define visual content.
- The optical element may be configured to guide light within a field of view to the image sensor. The field of view may be less than 180 degrees. The field of view may be equal to 180 degrees. The field of view may be greater than 180 degrees.
- The speaker may be configured to provide playback of audio content. The playback of the audio content may be provided for capture of the visual content.
- The processor(s) may be configured by machine-readable instructions. Executing the machine-readable instructions may cause the processor(s) to facilitate generating audio-synchronized visual content. The machine-readable instructions may include one or more computer program components. The computer program components may include one or more of an audio information component, an audio playback component, a capture component, a synchronization component, and/or other computer program components.
- The audio information component may be configured to obtain audio information and/or other information. The audio information may define audio content. The audio content may have an audio progress length. Moments within the audio progress length may be associated with cue markers. In some implementations, the audio content may include music. The moments within the audio progress length associated with the cue markers may include bars and/or beats of the music. In some implementations, the audio content may include verbal direction.
- The audio playback component may be configured to effectuate playback of audio content through one or more speakers. The audio playback component may be configured to effectuate playback of at least a portion of the audio progress length of the audio content through the speaker(s) for capture of the visual content.
- In some implementations, the playback of the audio content may pause at an end of the capture duration. The playback of the audio content may continue at a beginning of another capture duration.
- In some implementations, one or more previews of the audio content may be provided prior to the capture of the visual content. In some implementations, one or more previews of at least the portion of the audio progress length of the audio content may be provided prior to the capture of the visual content.
- In some implementations, an extent of the audio content to be played back during the capture of the visual content may be determined prior to the capture of the visual content. In some implementations, at least the portion of the audio progress length of the audio content played back during the capture duration may be determined prior to the capture of the visual content.
- The capture component may be configured to capture the visual content during a capture duration. The capture component may be configured to capture the visual content with the playback of the audio content. The capture component may be configured to capture the visual content with the playback of at least the portion of the audio progress length of audio content. The visual content may have a visual progress length based on the capture duration and/or other information.
- In some implementations, one or more audio tracks for the visual content captured during the capture duration may include at least the portion of the audio progress length of the audio content.
- The synchronization component may be configured to synchronize the visual content captured during the capture duration with the progress length of the audio content. The synchronization component may be configured to synchronize the visual content captured during the capture duration with at least the portion of the progress length of the audio content. The synchronization component may be configured to synchronize the visual content captured during the capture duration such that one or more moments within the visual progress length of the visual content are associated with one or more cue markers of the audio content.
- A video edit may be generated based on the cue markers and/or other information. A video edit may be generated such that at least a portion of the visual content captured during the capture duration is included within the video edit. At least the portion of the visual content captured during the capture duration may be included within the video edit based on the association of the moment(s) within the visual progress length of the visual content with the cue marker(s) of the audio content and/or other information.
- In some implementations, the video edit may be generated to include one or more bar-synced effects and/or one or more beat-synced effects based on the association of the moments within the audio progress lengths with the bars and/or the beats of the music, and/or other information.
- In some implementations, the video edit may be generated to include portions from multiple visual content captured during separate capture durations. Individual ones of the multiple visual content may be synchronized with a corresponding portion of the audio content. Transitions within the video edit between different portions from the multiple visual content may be determined based on association of moments within corresponding visual progress length of the multiple visual content with the cue markers of the audio content, and/or other information.
- These and other objects, features, and characteristics of the system and/or method disclosed herein, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the invention. As used in the specification and in the claims, the singular form of “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise.
-
FIG. 1 illustrates an example system that generates audio-synchronized visual content. -
FIG. 2 illustrates an example method for generating audio-synchronized visual content. -
FIG. 3 illustrates an example image capture device. -
FIG. 4 illustrates example cue markers for moments within audio progress length. -
FIG. 5 illustrates example audio content playback by an image capture device. -
FIG. 6 illustrates example synchronized moments within visual content. -
FIG. 7 illustrate example generation of a video edit. -
FIG. 1 illustrates asystem 10 for generating audio-synchronized visual content. Thesystem 10 may include one or more of aprocessor 11, an interface 12 (e.g., bus, wireless interface), anelectronic storage 13, anoptical element 14, animage sensor 15, aspeaker 16, and/or other components. Thesystem 10 may include and/or be part of an image capture device. The image capture device may include a housing, and one or more of theoptical element 14, theimage sensor 15, thespeaker 16, and/or other components of thesystem 10 may be carried by the housing the image capture device. Theoptical element 14 may guide light within a field of view to theimage sensor 15. Theimage sensor 15 may generate a visual output signal conveying visual information defining visual content based on light that becomes incident thereon. Thespeaker 16 may provide playback of audio content for capture of the visual content. - Audio information and/or other information may be obtained by the
processor 11. The audio information may define the audio content. The audio content may have an audio progress length. Moments within the audio progress length may be associated with cue markers. The playback of at least a portion of the audio progress length of the audio content through the speaker may be effectuated by theprocessor 11 for the capture of the visual content. The visual content may be captured by theprocessor 11 during a capture duration with the playback of the at least the portion of the audio progress length of audio content. The visual content may have a visual progress length based on the capture duration and/or other information. The visual content captured during the capture duration may be synchronized with the at least the portion of the progress length of the audio content by theprocessor 11. The visual content captured during the capture duration may be synchronized such that one or more moments within the visual progress length of the visual content are associated with one or more of the cue markers of the audio content. - A video edit may be generated by the
processor 11 based on the cue markers and/or other information. A video edit may be generated such that at least a portion of the visual content captured during the capture duration is included within the video edit based on the association of the moment(s) within the visual progress length of the visual content with the cue marker(s) of the audio content and/or other information. - The
electronic storage 13 may be configured to include electronic storage medium that electronically stores information. Theelectronic storage 13 may store software algorithms, information determined by theprocessor 11, information received remotely, and/or other information that enables thesystem 10 to function properly. For example, theelectronic storage 13 may store visual information, information relating to visual content, audio information, information relating to audio content, information relating to cue markers, information relating to playback of audio content, information relating to capture of visual content with playback of audio content, information relating to synchronization of visual content captured with playback of audio content, information relating to association of moments within visual progress length of visual content with cue markers of the audio content, information relating to video edit, and/or other information. - Visual content may be captured by an image capture device during playback of audio content. Visual content may refer to content of image(s), video frame(s), and/or video(s) that may be consumed visually. For example, visual content may be included within one or more images and/or one or more video frames of a video. The video frame(s) may define/contain the visual content of the video. That is, video may include video frame(s) that define/contain the visual content of the video. Video frame(s) may define/contain visual content viewable as a function of progress through the progress length of the video content. A video frame may include an image of the video content at a moment within the progress length of the video. As used herein, term video frame may be used to refer to one or more of an image frame, frame of pixels, encoded frame (e.g., I-frame, P-frame, B-frame), and/or other types of video frame. Visual content may be generated based on light received within a field of view of a single image sensor or within fields of view of multiple image sensors.
- Visual content (of image(s), of video frame(s), of video(s)) with a field of view may be captured by an image capture device during a capture duration. A field of view of visual content may define a field of view of a scene captured within the visual content. A capture duration may be measured/defined in terms of time durations and/or frame numbers. For example, visual content may be captured during a capture duration of 60 seconds, and/or from one point in time to another point in time. As another example, 1800 images may be captured during a capture duration. If the images are captured at 30 images/second, then the capture duration may correspond to 60 seconds. Other capture durations are contemplated.
- Audio content may refer to media content that may be consumed as one or more sounds. Audio content may include one or more sounds stored in one or more formats/containers, and/or other audio content. Audio content may include one or more sounds captured by one or more sound sensors (e.g., microphone). Audio content may include audio/sound provided/to be provided as an accompaniment for the visual content. Audio content may include one or more of voices, activities, songs, music, soundtrack, and/or other audio/sounds. For example, audio content may include music to be played during capture of visual content and/or playback of visual content.
- Visual content and/or audio content may be stored in one or more formats and/or one or more containers. A format may refer to one or more ways in which the information defining content (visual content, audio content) is arranged/laid out (e.g., file format). A container may refer to one or more ways in which information defining content is arranged/laid out in association with other information (e.g., wrapper format). Information defining visual content (visual information) and/or information defining audio content (audio information) may be stored within a single file or multiple files. For example, visual information defining an image or video frames of a video may be stored within a single file (e.g., image file, video file), multiple files (e.g., multiple image files, multiple video files), a combination of different files, and/or other files.
- The
system 10 may be remote from the image capture device or local to the image capture device. One or more portions of the image capture device may be remote from or be a part of thesystem 10. One or more portions of thesystem 10 may be remote from or be a part of the image capture device. - An image capture device may refer to a device captures visual content. An image capture device may capture visual content in form of images, videos, and/or other forms. An image capture device may refer to a device for recording visual information in the form of images, videos, and/or other media. An image capture device may be a standalone device (e.g., camera, action camera, image sensor) or may be part of another device (e.g., part of a smartphone, tablet).
FIG. 3 illustrates an exampleimage capture device 302. Visual content (e.g., of image(s), video frame(s)) may be captured by theimage capture device 302. Theimage capture device 302 may include a housing 312. The housing 312 may refer a device (e.g., casing, shell) that covers, protects, and/or supports one or more components of theimage capture device 302. The housing 312 may include a single-piece housing or a multi-piece housing. The housing 312 may carry (be attached to, support, hold, and/or otherwise carry) one or more of anoptical element 304, animage sensor 306, aspeaker 308, aprocessor 310, and/or other components. - One or more components of the
image capture device 302 may be the same as, be similar to, and/or correspond to one or more components of thesystem 10. For example, theprocessor 308 may be the same as, be similar to, and/or correspond to theprocessor 11. Theoptical element 304 may be the same as, be similar to, and/or correspond to theoptical element 14. Theimage sensor 306 may be the same as, be similar to, and/or correspond to theimage sensor 15. Thespeaker 308 may be the same as, be similar to, and/or correspond to thespeaker 16. The housing may carry other components, such as theelectronic storage 13. Theimage capture device 302 may include other components not shown inFIG. 3 . Theimage capture device 302 may not include one or more components shown inFIG. 3 . Other configurations of image capture devices are contemplated. - The
optical element 304 may include instrument(s), tool(s), and/or medium that acts upon light passing through the instrument(s)/tool(s)/medium. For example, theoptical element 304 may include one or more of lens, mirror, prism, and/or other optical elements. Theoptical element 304 may affect direction, deviation, and/or path of the light passing through theoptical element 304. Theoptical element 304 may have a field ofview 305. Theoptical element 304 may be configured to guide light within the field ofview 305 to theimage sensor 306. - The field of
view 305 may include the field of view of a scene that is within the field of view of theoptical element 304 and/or the field of view of the scene that is delivered to theimage sensor 306. For example, theoptical element 304 may guide light within its field of view to theimage sensor 306 or may guide light within a portion of its field of view to theimage sensor 306. The field of view of 305 of theoptical element 304 may refer to the extent of the observable world that is seen through theoptical element 304. The field ofview 305 of theoptical element 304 may include one or more angles (e.g., vertical angle, horizontal angle, diagonal angle) at which light is received and passed on by theoptical element 304 to theimage sensor 306. In some implementations, the field ofview 305 may be greater than 180-degrees. In some implementations, the field ofview 305 may be less than 180-degrees. In some implementations, the field ofview 305 may be equal to 180-degrees. - In some implementations, the
image capture device 302 may include multiple optical elements. For example, theimage capture device 302 may include multiple optical elements that are arranged on the housing 312 to capture spherical images/videos (guide light within spherical field of view to one or more images sensors). For instance, theimage capture device 302 may include two optical elements positioned on opposing sides of the housing 312. The fields of views of the optical elements may overlap and enable capture of spherical images and/or spherical videos. - The
image sensor 306 may include sensor(s) that converts received light into output signals. The output signals may include electrical signals. Theimage sensor 306 may generate output signals conveying information that defines visual content of one or more images and/or one or more video frames of a video. For example, theimage sensor 306 may include one or more of a charge-coupled device sensor, an active pixel sensor, a complementary metal-oxide semiconductor sensor, an N-type metal-oxide-semiconductor sensor, and/or other image sensors. - The
image sensor 306 may be configured generate output signals conveying information that defines visual content of one or more images and/or one or more video frames of a video. Theimage sensor 306 may be configured to generate a visual output signal based on light that becomes incident thereon during a capture duration and/or other information. The visual output signal may convey visual information that defines visual content having the field ofview 305. Theoptical element 304 may be configured to guide light within the field ofview 305 to theimage sensor 306, and theimage sensor 306 may be configured to generate visual output signals conveying visual information based on light that becomes incident thereon via theoptical element 304. - The visual information may define visual content by including information that defines one or more content, qualities, attributes, features, and/or other aspects of the visual content. For example, the visual information may define visual content of an image by including information that makes up the content of the image, and/or information that is used to determine the content of the image. For instance, the visual information may include information that makes up and/or is used to determine the arrangement of pixels, characteristics of pixels, values of pixels, and/or other aspects of pixels that define visual content of the image. For example, the visual information may include information that makes up and/or is used to determine pixels of the image. Other types of visual information are contemplated.
- Capture of visual content by the
image sensor 306 may include conversion of light received by theimage sensor 306 into output signals/visual information defining visual content. Capturing visual content may include recording, storing, and/or otherwise capturing the visual content for use in generating video content (e.g., content of video frames). For example, during a capture duration, the visual output signal generated by theimage sensor 306 and/or the visual information conveyed by the visual output signal may be used to record, store, and/or otherwise capture the visual content for use in generating video content. - In some implementations, the
image capture device 302 may include multiple image sensors. For example, theimage capture device 302 may include multiple image sensors carried by the housing 312 to capture spherical images/videos based on light guided thereto by multiple optical elements. For instance, theimage capture device 302 may include two image sensors configured to receive light from two optical elements positioned on opposing sides of the housing 312. The fields of views of the optical elements may overlap and enable capture of spherical images and/or spherical videos. - The
speaker 308 may refer to an electronic device that provides audible presentation of information. Thespeaker 308 may refer to an electronic device that makes sound. Thespeaker 308 may produce audio output in form of sound waves. Thespeaker 308 may include one or more transducers that coverts audio signal into sound. Thespeaker 308 may be configured to provide playback of audio content. The playback of the audio content may be provided for capture of visual content. The playback of the audio content may be provided during part(s) of or entirety of the capture duration for the visual content. For example, thespeaker 308 may provide playback of audio content, such as a song and/or verbal direction, during capture of visual content by theimage capture device 302. - The
processor 310 may include one or more processors (logic circuitry) that provide information processing capabilities in theimage capture device 302. Theprocessor 310 may provide one or more computing functions for theimage capture device 302. Theprocessor 310 may operate/send command signals to one or more components of theimage capture device 302 to operate theimage capture device 302. For example, theprocessor 310 may facilitate operation of theimage capture device 302 in capturing image(s) and/or video(s), facilitate operation of the optical element 304 (e.g., change how light is guided by the optical element 304), facilitate operation of the image sensor 306 (e.g., change how the received light is converted into information that defines images/videos and/or how the images/videos are post-processed after capture), and/or facilitate operation of the speaker 308 (e.g., change how thespeaker 308 produces sound). - The
processor 310 may obtain information from theimage sensor 306 and/or facilitate transfer of information from theimage sensor 306 to another device/component. Theprocessor 310 may be remote from theprocessor 11 or local to theprocessor 11. One or more portions of theprocessor 310 may be remote from theprocessor 11 and/or one or more portions of theprocessor 10 may be part of theprocessor 310. Theprocessor 310 may include and/or perform one or more functionalities of theprocessor 11 shown inFIG. 1 . - The
image capture device 302 may play audio content (e.g., music,) through thespeaker 308 during capture of visual content. Moments within an audio progress length of the audio content may be associated with cue markers. Visual content captured by theimage capture device 302 during playback of the audio content may be synchronized with the audio content such that one or more moments within visual progress length of the visual content are associated with one or more cue markers of the audio content. The synchronization of the captured visual content with audio content played back during visual content capture may be used in generating one or more video edits. For example, specific portions of the visual content captured during playback of the audio content may be included within the video edit based on association of the moment(s) within the visual progress length of the visual content with the cue marker(s) of the audio content. The video edit may include portions of single visual content or multiple visual content (e.g., multiple visual content captured at different times, with individual visual content synchronized with the audio content played during visual content capture). The video edit may include audio content captured with capture of the visual content and/or the audio content that was played back during capture of the visual content. For example, visual content may be captured with playback of music, and a video edit of the visual content may be generated (1) using synchronization of the visual content with the music, and (2) with the music as the audio content of the video edit (e.g., insert the music in an audio track of the video edit). Such synchronization of visual content with audio content played back during capture of the visual content may enable generation of video edit directly from the image capture device/from visual content provided by the image capture device. - Referring back to
FIG. 1 , the processor 11 (or one or more components of the processor 11) may be configured to obtain information to facilitate generating audio-synchronized visual content. Obtaining information may include one or more of accessing, acquiring, analyzing, determining, examining, identifying, loading, locating, opening, receiving, retrieving, reviewing, selecting, storing, and/or otherwise obtaining the information. Theprocessor 11 may obtain information from one or more locations. For example, theprocessor 11 may obtain information from a storage location, such as theelectronic storage 13, electronic storage of information and/or signals generated by one or more sensors, electronic storage of a device accessible via a network, and/or other locations. Theprocessor 11 may obtain information from one or more hardware components (e.g., an image sensor) and/or one or more software components (e.g., software running on a computing device). - The
processor 11 may be configured to provide information processing capabilities in thesystem 10. As such, theprocessor 11 may comprise one or more of a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information. Theprocessor 11 may be configured to execute one or more machine-readable instructions 100 to facilitate generating audio-synchronized visual content. The machine-readable instructions 100 may include one or more computer program components. The machine-readable instructions 100 may include one or more of anaudio information component 102, anaudio playback component 104, a capture component 106, asynchronization component 108, and/or other computer program components. - The
audio information component 102 may be configured to obtain audio information and/or other information. Obtaining audio information may include one or more of accessing, acquiring, analyzing, determining, examining, generating, identifying, loading, locating, opening, receiving, retrieving, reviewing, storing, and/or otherwise obtaining the audio information. Theaudio information component 102 may obtain audio information from one or more locations. For example, theaudio information component 102 may obtain audio information from a storage location, such as theelectronic storage 13, electronic storage of a device accessible via a network, and/or other locations. Theaudio information component 102 may obtain audio information from one or more hardware components (e.g., a physical storage device) and/or one or more software components (e.g., software running on a computing device). - In some implementations, the audio information may be obtained based on a user's interaction with a user interface/application (e.g., video editing application, video capture application), and/or other information. For example, a user interface/application may provide option(s) for a user to select one or more sounds (e.g., music, verbal direction) to be played back during capture of visual content. The audio information defining the sound(s) may be obtained based on the user's selection of the sound(s) through the user interface/application.
- The audio information may define audio content. The audio information may define audio content by including information that defines one or more content, qualities, attributes, features, and/or other aspects of the audio content. For example, the audio information may define audio content by including information that makes up the content of the audio, and/or information that is used to determine the content of the audio. The audio content may include one or more reproductions of the received sounds. The audio information may define audio content in one or more formats, such as WAV, MP3, MP4, RAW, and/or other formats.
- The audio content may have an audio progress length. The audio progress length may be defined in terms of time duration and/or other measurable factors. For example, audio content may have a time duration of five minutes. Other progress lengths and time durations are contemplated.
- Moments within the audio progress length may be associated with cue markers. A cue marker being associated with a moment within the audio progress length may include the cue marker identifying the moment, the cue marker being tied to the moment, the cue marker being connected to the moment, and/or the cue marker being otherwise associated with the moment. A moment within the audio progress length may refer to a point in time or a duration of time within the audio progress length. A cue marker may refer to a marker that indicates location of a cue. A cue may signal that the corresponding location may be used in automatically generating a video edit. A cue marker may mark a point in time or a duration of time within the audio progress length as a location in which one or more edits may be made for generating a video edit. A cue marker may mark a point in time or a duration of time within the audio progress length as a location to guide making edits for generating a video edit. For example, a cue marker may mark a moment within the audio progress length as a location in which a video edit may transition from one video clip to another video clip. A cue marker may mark a moment within the audio progress length as a location in which something of interest is occurring for a video edit. A cue marker may mark a moment within the audio progress length as a location in which a video edit may include a visual effect. Other indications and/or edits are contemplated.
- Moments within the audio progress length may be associated with cue markers based on analysis of the audio content, user input, and/or other information. For example, audio content may be analyzed to determine locations of particular sounds, and the moments corresponding to the particular sounds may be associated with cue markers. As another example, a user may manually associate particular moments within the audio progress length with cue markers, manually move moments with which the cue markers are associated, manually delete association between moments and cue markers, and/or make other manual changes to the cue markers.
- In some implementations, the audio content may include music. Music may refer to vocal and/or instrumental sounds. For example, music may include one or more songs, instrumental musical piece, soundtrack, and/or other music. In some implementations, moments corresponding to particular musical features within the audio progress length of music may be associated with cue marker. For example, moments corresponding to bars and/or beats of the music may be associated with cue markers. That is, moments within the audio progress length of the music associated with cue markers may include bars and/or beats of the music. For instance, a song may be obtained for use in capturing visual content, and the song may be analyzed to identify the bars and/or beats of the song. The moments of the bars and/or beats of the song may be associated with cue markers/used as cues in generating a video edit.
- In some implementations, the audio content may include verbal direction. Verbal direction may refer to direction that is expressed in words. For example, verbal direction may include spoken words that guide user(s) of the image capture device to perform one or more actions. For instance, verbal direction may include countdown to an action to be performed (e.g., 4-3-2-1-Dance!). Moments corresponding to particular words/direction of the verbal direction may be associated with cue markers. For example, in an example of verbal direction to start dancing, a moment corresponding to word “Dance” may be associated with a cue marker.
-
FIG. 4 illustrates example cue markers for moments within audio progress length. InFIG. 4 ,audio content 400 may be shown to have an audio progress length. Two moments within the audio progress length may be associated with cue markers. For example, acue marker A 402 may be associated with a moment near the beginning of the audio progress length, and acue marker B 404 may be associated with a moment near the middle of the audio progress length. - The
audio playback component 104 may be configured to effectuate playback of audio content through one or more speakers. Effectuating playback of audio content through speaker(s) may include using the speaker(s) to provide playback of the audio content. Theaudio playback component 104 may be configured to effectuate playback of at least a portion of the audio progress length of the audio content through the speaker(s) for capture of the visual content. For example, theaudio playback component 104 may be configured to effectuate playback of the entire length of the audio content or one or more parts of the audio content through the speaker(s) for capture of the visual content. Provide playback of audio content for capture of visual content may include providing playback of the audio content while an image capture device is capturing visual content. For example, referring toFIG. 3 , theimage capture device 302 may provide playback of audio content through thespeaker 308 so that the audio content is heard by user(s) of theimage capture device 302 while theimage capture device 302 is capturing visual content. -
FIG. 5 illustrates example audio content playback by an image capture device. InFIG. 5 , theimage capture device 502 may be operating to provideaudio content playback 504 while theimage capture device 502 is capturing visual content. In some implementation, theimage capture device 502 may provide playback of audio content for capture of visual content when it is operating in a particular mode (e.g., music playback & sync mode). For example, theaudio content playback 504 may include theimage capture device 502 playing music via one or more speakers when theimage capture device 502 is capturing a video. - The playback of the audio content may be provided during the capture duration of the visual content. In some implementations, the playback of the audio content may pause at an end of the capture duration. For example, the image capture device may provide playback of music while the image capture device is recording a video. When the recording of the video is stopped/paused, the playback of the music may be paused.
- The playback of the audio content may continue at a beginning of another capture duration. For example, returning to the example of the image capture device providing playback of the music, the playback of the music may have been paused due the to the recording of the video being stopped/paused. When recording of the next video is started, the playback of the music may resume from the paused moment.
- In some implementations, one or more previews of the audio content may be provided prior to the capture of the visual content. For example, before the image capture device starts capture of a video, the image capture device may provide playback of audio content to be played during capture of the video. Such provision of the audio content may allow the user(s) of the image capture device to anticipate the sounds that will be played. Preview(s) of the audio content may include preview of the entire length of the audio content or one or more parts of the audio content. For example, preview(s) of a part of the audio content that will be provided for capture of visual content may be provided prior to the capture of the visual content.
- In some implementations, an extent of the audio content to be played back during the capture of the visual content may be determined prior to the capture of the visual content. The extent of the audio content to be played back may refer to an amount/part of the progress length of the audio content to be played back. For example, a user may choose a predefined amount of audio content to be played back during the capture of the visual content. That is, a portion of the audio progress length of the audio content to be played back during the capture duration for the visual content may be determined prior to the capture of the visual content.
- The capture component 106 may be configured to capture the visual content during one or more capture durations. A capture duration may refer to a time duration in which visual content is captured. The visual content may be captured through one or more optical elements (e.g., the optical element 14). Capturing visual content during a capture duration may include using, recording, storing, and/or otherwise capturing the visual content during the capture duration. The visual content may be captured for use in generating images and/or video frames. The images/video frames may be stored in electronic storage and/or deleted after use (e.g., after preview). The visual content may be captured for use in generating audio-synchronized visual content.
- For example, during a capture duration, the
capture component 102 may use the visual output signal generated by theimage sensor 15 and/or the visual information conveyed by the visual output signal to record, store, and/or otherwise capture the visual content. For instance, thecapture component 102 may store, in theelectronic storage 13 and/or other (permanent and/or temporary) electronic storage medium, information (e.g., the visual information) defining the visual content based on the visual output signal generated by theimage sensor 15 and/or the visual information conveyed by the visual output signal during the capture duration. In some implementations, information defining the captured visual content may be stored in one or more visual tracks. - The visual content may have a visual progress length based on the capture duration and/or other information. The visual progress length of the visual content may be same as or different from the capture duration. For example, based on the capture of the visual content at regular speed (e.g., capture framerate is same as playback framerate), the visual progress length of the visual content may be same as the capture duration. Based on the capture of the visual content at non-regular speed(s) (e.g., slow motion capture, time-lapse capture), the visual progress length of the visual content may be different from the capture duration.
- In some implementations, speed of the playback of the audio content may be changed based on speed with which the visual content is captured. For example, based on the visual content being captured at regular speed (e.g., 1× speed), regular speed of playback of the audio content may be used (e.g., audio content played at 1× speed). Based on the capture of the visual content at non-regular speed(s), (e.g., 0.5× speed, 4× speed), the speed of the playback of the audio content may be changed to match the speed of capture of the visual content. The speed of the playback of the audio content may be changed to be inverse for the speed of the capture of the visual content. For example, the speed of the playback of the audio content may be decreased when time-lapse capture is used to capture visual content (e.g., ¼× speed of audio content playback for 4× time-lapse capture), and the speed of the playback of the audio content may be increased when slow motion capture is used to capture visual content (e.g., 2× speed of audio content playback for 0.5× slow-motion capture). Such change in speed of playback of the audio content may allow the audio content to remain in synchronization with the visual content.
- The capture component 106 may be configured to capture the visual content with the playback of the audio content. That is, the capture component 106 may capture the visual content while the playback of the audio content is being provided. If a portion of the audio progress length is played back, the capture component may be configured to capture the visual content with the playback of the portion of the audio progress length of audio content. For example, referring to
FIG. 5 , theimage capture device 502 may capture visual content while it is providing theaudio content playback 504. - In some implementations, the capture component 106 may be configured to capture audio content during one or more capture durations. For example, the image capture device may include one or more sound sensors (e.g., microphone), and the capture component 106 may use the sound sensors to capture sounds that are heard during capture of the visual content. In some implementations, information defining the captured audio content may be stored in one or more audio tracks. In some implementations, audio content may not be captured during capture of the visual content. For example, the image capture device may capture visual content without capturing audio content.
- In some implementations, one or more audio tracks for the visual content captured during the capture duration may include the audio content. The audio track(s) for the visual content captured during the capture duration may include the audio content that was played back during the capture of the visual content. If a portion of the audio progress length is played back during the capture duration, then that portion of the audio progress length of the audio content may be included in the audio track(s) for the visual content. The audio content included in the audio track(s) may be a copy of the original audio content that was played back. Rather than including the audio content that was heard through microphone of the image capture device during capture of the visual content, the portion of the original audio content file that was played back during capture of the visual content may be copied into the audio track(s). For example, if music is played during capture of a video, with the music stored in a music file, then the portion of the music file defining the portion of the music that was played may be copied into the audio track(s). Such generation of audio track(s) may result in a video file that includes visual content synchronized to the audio content. For instance, if the visual content was captured while playing a song, then such generation of audio track(s) may result in a lip-synced video.
- The
synchronization component 108 may be configured to synchronize the visual content captured during the capture duration with the audio content that was played back during the capture duration. The visual content captured during the capture duration may be synchronized with the portion of the progress length of the audio content that was played during the capture duration. For example, if a portion of the audio progress length is played back during the capture duration, then the visual content may be synchronized with that portion of the audio progress length of the audio content. Synchronization of the visual content with the audio content may include identification, determination, and/or recording of moments within the visual progress length that occur at the same time as moments within the audio progress length. Synchronization of the visual content with the audio content may include identification, determination, and/or recording of moments within the visual progress length that occur at the same time as moments within the audio progress length that are associated with cue marker(s). Synchronization of the visual content with the audio content may take into account changes in capture speed of the visual content/playback speed of the audio content during capture of the visual content (e.g., slow motion capture, time-lapse capture). - For example, the
synchronization component 108 may synchronize the visual content and the audio content by identifying, determining, and/or recording when a particular moment in the visual content should occur at the same time with a particular moment in the audio content during playback. Thesynchronization component 108 may synchronize the visual content and the audio content by storing information about the cue markers/moments associated with the cue marker(s) with the visual content. For example, information that identifies the cue markers and/or the moments associated with the cue markers may be stored as metadata for the visual content. For instance, metadata of the audio content may identify the cue markers and/or the moments associated with the cue markers, and some or entirety of the metadata may be stored as metadata of the visual content. - In some implementations, the visual content captured during the capture duration may automatically be synchronized with the audio content played during the capture duration based on operation of a single image capture device. That is, because the image capture device is both playing the audio content while it is capturing visual content, the timing of the visual content capture may automatically be matched with the timing of the audio content playback. For example, when the image capture device capture visual content of a video frame at a particular moment within the capture duration, the image capture device will also know which moment of the audio content is being played. The visual content captured by the image capture device may automatically be time-synchronized to the audio content that is played backed during capture when the visual content is captured and stored in memory.
- The image capture device may automatically synchronize the visual content with the audio content without analyzing the audio content that was captured with the visual content. That is, rather than analyzing the audio content captured during the visual content to determine how the timing of visual content capture matches with the timing of the audio content playback, the image capture device may utilize its knowledge regarding timing of audio content that it is played back during the capture duration to perform the synchronization.
- In some implementation, the audio content captured with the visual content may be used to modify and/or augment the synchronization of visual content with audio content. For example, there may be time difference between when timing of the audio content playback internally tracked by the image capture device and timing of the audio content playback actually provided through speaker(s) of the image capture device. The difference between the timings may be determined based on analysis of the audio content captured with the visual content, and the synchronization of visual content with audio content may be adjusted so that the synchronization matches the timing of the audio content playback actually provided through speaker(s) of the image capture device.
- The
synchronization component 108 may be configured to synchronize the visual content captured during the capture duration such that one or more moments within the visual progress length of the visual content are associated with one or more cue markers of the audio content. Synchronization of the visual content with the audio content may result in one or more moments within the visual progress length being synchronized with one or more moments of the audio progress length that are associated with cue marker(s). - For example,
FIG. 6 illustrates example synchronized moments within two different visual content. InFIG. 6 ,visual content A 610 andvisual content B 620 may be synchronized withaudio content 400. Thevisual content A 610 and thevisual content B 620 may have been captured separately. Thevisual content A 610 and thevisual content B 620 may have been captured while the image capture device provided playback of different portions of theaudio content 400. For instance, thevisual content A 610 may have been captured during playback of the entirety of theaudio content 400. Thevisual content B 620 may have been captured during playback of a portion in the middle of thevisual content 400. - The
visual content A 610 may be synchronized with the entirety of theaudio content 400. Such synchronization may result in a synchronized moment A-1 612 in the visual progress length A being synchronized with the moment in the audio progress length corresponding to thecue marker A 402, and a synchronized moment A-2 614 in the visual progress length A being synchronized with the moment in the audio progress length corresponding to thecue marker B 404. - The
visual content B 620 may be synchronized with the portion of theaudio content 400. Such synchronization may result in a synchronized moment B-1 622 in the visual progress length B being synchronized with the moment in the audio progress length corresponding to thecue marker A 402, and a synchronized moment B-2 624 in the visual progress length B being synchronized with the moment in the audio progress length corresponding to thecue marker B 404. Thus, the synchronized moment A-1 612 and the synchronized moment B-1 622 may be synchronized to the same moment in the audio progress length corresponding to thecue marker A 402, and the synchronized moment A-2 614 and the synchronized moment B-2 624 may be synchronized to the same moment in the audio progress length corresponding to thecue marker B 404. Such synchronization of different visual content may enable synchronization of visual content captured at different times, at different locations, by different image capture devices, and/or by different users. Such synchronization of different visual content may enable automatic generation of video edits that transitions between different visual content, with the transitions using the synchronized moments. - A video edit may be generated based on the cue markers and/or other information. A video edit may refer to an arrangement and/or a manipulation of one or more portions of one or more visual content. For example, a video edit may refer to an arrangement and/or a manipulation of one or more time portion(s) (e.g., time duration(s)) and/or spatial portion(s) (e.g., punchouts) of one or more video clips. A video edit may define which portions (e.g., temporal portions, spatial portions) of visual content are included for playback and the order in which the portions are to be presented on playback. A video edit may be generated as an encoded version of the video edit and/or as instructions for rendering the video edit. For example, the video edit may be encoded as a video clip, and the video clip may be opened in a video player for presentation. The video edit may be generated as instructions for presenting the video edit, such as instructions that identify arrangements and/or manipulations of visual content portions included in the video edit. For example, the video edit may be generated as information defining a director track that includes information as to which portions of visual content are included in the video edit, the order in which the portions are to the presented on playback, and the edits to be applied to the different portions. A video player may use the director track to retrieve the portions of the visual content identified in the video edit for presentation, arrangement, and/or editing when the video edit is opened/to be presented.
- Generating a video edit based on the cue markers may include using moments in the visual progress length of the visual content synchronized with moments in the audio progress length of the audio content corresponding to the cue markers to generate the video edit. The synchronized moments in the visual content may be used to provide one or more effects in the video edit. An effect in the visual content may provide for visual changes and/or temporal changes in the video edit. For example, an effect may change one or more visual characteristics and/or one or more temporal characteristics of the visual content included in the video edit. A video edit may include transition between different visual content.
- A video edit may be generated such that at least a portion of the visual content captured during the capture duration is included within the video edit. The portion(s) of the visual content captured during the capture duration may be included within the video edit based on the association of the moment(s) within the visual progress length of the visual content with the cue marker(s) of the audio content and/or other information. That is, which portions of the visual content are included in the video edit may be determined based on synchronization of the visual content with the audio content that was played back during capture of the visual content.
- In some implementations, the audio content may include music, and moments. The video edit may be generated to include one or more bar-synced effects, one or more beat-synced effects, and/or other musical-feature effects based on the association of the moments within the audio progress lengths with the bars, the beats, and/or other musical features of the music, and/or other information. A bar-synced effect may refer to an effect that is used for a moment in the visual content that is synchronized to a moment in the audio content corresponding to a bar. A beat-synced effect may refer to an effect that is used for a moment in the visual content that is synchronized to a moment in the audio content corresponding to a beat. Same or different effects may be used for different visual content. For example, same bar-synced effects may be applied across all visual content included in the video edit. As another example different bar-synched effects may be applied to different visual content included in the video edit.
- In some implementations, the video edit may be generated to include portions from multiple visual content captured during separate capture durations. For example, the video edit may be generated to include portions from different visual content captured at different times, at different locations, by different image capture devices, and/or by different users. Individual visual content may be synchronized with a corresponding portion of the same audio content (portion of the audio content that was played during capture of the visual content). For example individual visual content may be synchronized with portion(s) of same music played during capture of the visual content.
- The video edit may be generated to include transitions between different visual content. Transitions within the video edit between different portions from the multiple visual content may be determined based on association of moments within corresponding visual progress length of the multiple visual content with the cue markers of the audio content, and/or other information. Moments in different visual content that are synchronized to the same cue marker of the audio content may be used to transition from one visual content to another visual content. Because the multiple visual content are synchronized to the same audio content, the synchronized moments may provide transition points that preserves the timing of the visual content and the audio content. For example, multiple visual content may have been captured while image capture device(s) provides playback of the same song. Different visual content may include depiction of different persons lip-synching to the same song. Because the different visual content are synchronized to the same song, creating transition between different visual content at the same synchronized moments may enable generation of a video edit that preserves timing of the different visual content to the song, Such transition may preserve the depiction of lip-synching by different persons in different visual content.
-
FIG. 7 illustrate example generation of a video edit. InFIG. 7 , thevisual content A 610 and thevisual content B 620 may be synchronized with audio content (as shown inFIG. 6 ). The synchronized moment A-1 612 and the synchronized moment B-1 622 may be synchronized to the same moment in the audio content, and the synchronized moment A-2 614 and the synchronized moment B-2 624 may be synchronized to the same moment in the audio content. The synchronized moment A-1 612 and the synchronized moment B-1 622 may be used to transition between thevisual content A 610 and thevisual content B 620. The synchronized moment A-2 614 and the synchronized moment B-1 624 may be used to transition between thevisual content A 610 and thevisual content B 620. For example, portions of thevisual content A 610 and thevisual content B 620 may be used to generate avideo edit 700. Thevideo edit 700 include a portion A-1 712 (a portion of thevisual content A 610 preceding the synchronized moment A-1 612), a portion B 722 (a portion of thevisual content B 620 between the synchronized moment B-1 622 and the synchronized moment B-2 624), and a portion A-2 714 (a portion of thevisual content A 610 following the synchronized moment A-2 614). Use of the synchronized moments to provide transition between different visual content may result in thevideo edit 700 preserving the synchronized timing of thevisual content A 610 and thevisual content B 620 with the audio content. Other generation of video edit are contemplated. - Implementations of the disclosure may be made in hardware, firmware, software, or any suitable combination thereof. Aspects of the disclosure may be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a tangible (non-transitory) machine-readable storage medium may include read-only memory, random access memory, magnetic disk storage media, optical storage media, flash memory devices, and others, and a machine-readable transmission media may include forms of propagated signals, such as carrier waves, infrared signals, digital signals, and others. Firmware, software, routines, or instructions may be described herein in terms of specific exemplary aspects and implementations of the disclosure, and performing certain actions.
- In some implementations, some or all of the functionalities attributed herein to the
system 10 may be provided by external resources not included in thesystem 10. External resources may include hosts/sources of information, computing, and/or processing and/or other providers of information, computing, and/or processing outside of thesystem 10. - Although the
processor 11 and theelectronic storage 13 are shown to be connected to theinterface 12 inFIG. 1 , any communication medium may be used to facilitate interaction between any components of thesystem 10. One or more components of thesystem 10 may communicate with each other through hard-wired communication, wireless communication, or both. For example, one or more components of thesystem 10 may communicate with each other through a network. For example, theprocessor 11 may wirelessly communicate with theelectronic storage 13. By way of non-limiting example, wireless communication may include one or more of radio communication, Bluetooth communication, Wi-Fi communication, cellular communication, infrared communication, or other wireless communication. Other types of communications are contemplated by the present disclosure. - Although the
processor 11 is shown inFIG. 1 as a single entity, this is for illustrative purposes only. In some implementations, theprocessor 11 may comprise a plurality of processing units. These processing units may be physically located within the same device, or theprocessor 11 may represent processing functionality of a plurality of devices operating in coordination. Theprocessor 11 may be configured to execute one or more components by software; hardware; firmware; some combination of software, hardware, and/or firmware; and/or other mechanisms for configuring processing capabilities on theprocessor 11. - It should be appreciated that although computer components are illustrated in
FIG. 1 as being co-located within a single processing unit, in implementations in whichprocessor 11 comprises multiple processing units, one or more of computer program components may be located remotely from the other computer program components. - While computer program components are described herein as being implemented via
processor 11 through machine-readable instructions 100, this is merely for ease of reference and is not meant to be limiting. In some implementations, one or more functions of computer program components described herein may be implemented via hardware (e.g., dedicated chip, field-programmable gate array) rather than software. One or more functions of computer program components described herein may be software-implemented, hardware-implemented, or software and hardware-implemented - The description of the functionality provided by the different computer program components described herein is for illustrative purposes, and is not intended to be limiting, as any of computer program components may provide more or less functionality than is described. For example, one or more of computer program components may be eliminated, and some or all of its functionality may be provided by other computer program components. As another example,
processor 11 may be configured to execute one or more additional computer program components that may perform some or all of the functionality attributed to one or more of computer program components described herein. - The electronic storage media of the
electronic storage 13 may be provided integrally (i.e., substantially non-removable) with one or more components of thesystem 10 and/or as removable storage that is connectable to one or more components of thesystem 10 via, for example, a port (e.g., a USB port, a Firewire port, etc.) or a drive (e.g., a disk drive, etc.). Theelectronic storage 13 may include one or more of optically readable storage media (e.g., optical disks, etc.), magnetically readable storage media (e.g., magnetic tape, magnetic hard drive, floppy drive, etc.), electrical charge-based storage media (e.g., EPROM, EEPROM, RAM, etc.), solid-state storage media (e.g., flash drive, etc.), and/or other electronically readable storage media. Theelectronic storage 13 may be a separate component within thesystem 10, or theelectronic storage 13 may be provided integrally with one or more other components of the system 10 (e.g., the processor 11). Although theelectronic storage 13 is shown inFIG. 1 as a single entity, this is for illustrative purposes only. In some implementations, theelectronic storage 13 may comprise a plurality of storage units. These storage units may be physically located within the same device, or theelectronic storage 13 may represent storage functionality of a plurality of devices operating in coordination. -
FIG. 2 illustratesmethod 200 for generating audio-synchronized visual content. The operations ofmethod 200 presented below are intended to be illustrative. In some implementations,method 200 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. In some implementations, two or more of the operations may occur substantially simultaneously. - In some implementations,
method 200 may be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information). The one or more processing devices may include one or more devices executing some or all of the operation ofmethod 200 in response to instructions stored electronically on one or more electronic storage media. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations ofmethod 200. - Referring to
FIG. 2 andmethod 200, an image capture device may include a housing. The housing may carry one or more of an image sensor, an optical element, a speaker, and/or other components. The optical element may guide light within a field of view to the image sensor. The image sensor may generate a visual output signal conveying visual information defining visual content based on light that becomes incident thereon. The speaker may provide playback of audio content for capture of the visual content. - At
operation 201, audio information and/or other information may be obtained. The audio information may define the audio content. The audio content may have an audio progress length. Moments within the audio progress length may be associated with cue markers. In some implementation,operation 201 may be performed by a processor component the same as or similar to the audio information component 102 (Shown inFIG. 1 and described herein). - At
operation 202, the playback of at least a portion of the audio progress length of the audio content through the speaker may be effectuated for the capture of the visual content. In some implementations,operation 202 may be performed by a processor component the same as or similar to the audio playback component 104 (Shown inFIG. 1 and described herein). - At
operation 203, the visual content may be captured during a capture duration with the playback of the at least the portion of the audio progress length of audio content. The visual content may have a visual progress length based on the capture duration and/or other information. In some implementations,operation 203 may be performed by a processor component the same as or similar to the capture component 106 (Shown inFIG. 1 and described herein). - At
operation 204, the visual content captured during the capture duration may be synchronized with the at least the portion of the progress length of the audio content. The visual content captured during the capture duration may be synchronized such that one or more moments within the visual progress length of the visual content are associated with one or more of the cue markers of the audio content. A video edit may be generated based on the cue markers and/or other information. A video edit may be generated such that at least a portion of the visual content captured during the capture duration is included within the video edit based on the association of the moment(s) within the visual progress length of the visual content with the cue marker(s) of the audio content and/or other information. In some implementations,operation 204 may be performed by a processor component the same as or similar to the synchronization component 108 (Shown inFIG. 1 and described herein). - Although the system(s) and/or method(s) of this disclosure have been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred implementations, it is to be understood that such detail is solely for that purpose and that the disclosure is not limited to the disclosed implementations, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present disclosure contemplates that, to the extent possible, one or more features of any implementation can be combined with one or more features of any other implementation.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/587,980 US20220157347A1 (en) | 2020-09-03 | 2022-01-28 | Generation of audio-synchronized visual content |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/948,130 US11238901B1 (en) | 2020-09-03 | 2020-09-03 | Generation of audio-synchronized visual content |
US17/587,980 US20220157347A1 (en) | 2020-09-03 | 2022-01-28 | Generation of audio-synchronized visual content |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/948,130 Continuation US11238901B1 (en) | 2020-09-03 | 2020-09-03 | Generation of audio-synchronized visual content |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220157347A1 true US20220157347A1 (en) | 2022-05-19 |
Family
ID=80034512
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/948,130 Active US11238901B1 (en) | 2020-09-03 | 2020-09-03 | Generation of audio-synchronized visual content |
US17/587,980 Abandoned US20220157347A1 (en) | 2020-09-03 | 2022-01-28 | Generation of audio-synchronized visual content |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/948,130 Active US11238901B1 (en) | 2020-09-03 | 2020-09-03 | Generation of audio-synchronized visual content |
Country Status (1)
Country | Link |
---|---|
US (2) | US11238901B1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11462248B1 (en) * | 2021-06-29 | 2022-10-04 | Gopro, Inc. | Uttilizing multiple versions of music for video playback |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11778129B1 (en) * | 2022-07-28 | 2023-10-03 | Gopro, Inc. | Synchronization of image capture devices prior to capture |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180012618A1 (en) * | 2015-02-10 | 2018-01-11 | Sony Semiconductor Solutions Corporation | Image processing apparatus, image pickup device, image processing method, and program |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9749709B2 (en) * | 2010-03-23 | 2017-08-29 | Apple Inc. | Audio preview of music |
US9838730B1 (en) * | 2016-04-07 | 2017-12-05 | Gopro, Inc. | Systems and methods for audio track selection in video editing |
US20180295427A1 (en) * | 2017-04-07 | 2018-10-11 | David Leiberman | Systems and methods for creating composite videos |
-
2020
- 2020-09-03 US US16/948,130 patent/US11238901B1/en active Active
-
2022
- 2022-01-28 US US17/587,980 patent/US20220157347A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180012618A1 (en) * | 2015-02-10 | 2018-01-11 | Sony Semiconductor Solutions Corporation | Image processing apparatus, image pickup device, image processing method, and program |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11462248B1 (en) * | 2021-06-29 | 2022-10-04 | Gopro, Inc. | Uttilizing multiple versions of music for video playback |
Also Published As
Publication number | Publication date |
---|---|
US11238901B1 (en) | 2022-02-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10939069B2 (en) | Video recording method, electronic device and storage medium | |
KR101516850B1 (en) | Creating a new video production by intercutting between multiple video clips | |
US20220157347A1 (en) | Generation of audio-synchronized visual content | |
US20170339447A1 (en) | System and method for visual editing | |
KR101711009B1 (en) | Apparatus to store image, apparatus to play image, method to store image, method to play image, recording medium, and camera | |
US10924636B1 (en) | Systems and methods for synchronizing information for videos | |
US11790952B2 (en) | Pose estimation for video editing | |
WO2015148972A1 (en) | Improving the sound quality of the audio portion of audio/video files recorded during a live event | |
US20170032823A1 (en) | System and method for automatic video editing with narration | |
US11837259B2 (en) | Interface for indicating video editing decisions | |
US20230402064A1 (en) | Systems and methods for switching between video views | |
JP2007035121A (en) | Reproduction controller and method, and program | |
US11689692B2 (en) | Looping presentation of video content | |
JP2013131871A (en) | Editing device, remote controller, television receiver, specific audio signal, editing system, editing method, program, and recording medium | |
JP2010200079A (en) | Photography control device | |
US11373686B1 (en) | Systems and methods for removing commands from sound recordings | |
KR101078367B1 (en) | Synchronous apparatus of image data and sound data for karaoke player and method for the same | |
US11955142B1 (en) | Video editing using music characteristics | |
JP2021101525A5 (en) | ||
US11462248B1 (en) | Uttilizing multiple versions of music for video playback | |
US11967346B1 (en) | Systems and methods for identifying events in videos | |
JP2019096950A5 (en) | ||
JP7051923B2 (en) | Video generator, video generation method, video generator, playback device, video distribution device, and video system | |
US20240038276A1 (en) | Systems and methods for presenting videos and video edits | |
JP6715907B2 (en) | Image editing apparatus, image editing method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: GOPRO, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEFEBVRE, ALEXIS;OULES, GUILLAUME;CHINNAIYAN, ANANDHAKUMAR;AND OTHERS;SIGNING DATES FROM 20200826 TO 20200902;REEL/FRAME:058816/0207 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |