WO2022042387A1 - 视频处理方法及电子设备 - Google Patents
视频处理方法及电子设备 Download PDFInfo
- Publication number
- WO2022042387A1 WO2022042387A1 PCT/CN2021/113153 CN2021113153W WO2022042387A1 WO 2022042387 A1 WO2022042387 A1 WO 2022042387A1 CN 2021113153 W CN2021113153 W CN 2021113153W WO 2022042387 A1 WO2022042387 A1 WO 2022042387A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- delay
- image
- timestamp
- electronic device
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 89
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 64
- 230000005540 biological transmission Effects 0.000 claims abstract description 10
- 238000000034 method Methods 0.000 claims description 106
- 238000003384 imaging method Methods 0.000 claims description 51
- 238000004590 computer program Methods 0.000 claims description 39
- 230000004044 response Effects 0.000 claims description 18
- 230000008569 process Effects 0.000 description 67
- 238000010586 diagram Methods 0.000 description 25
- 230000006870 function Effects 0.000 description 21
- 108010014173 Factor X Proteins 0.000 description 15
- 238000004364 calculation method Methods 0.000 description 11
- 238000012937 correction Methods 0.000 description 10
- 230000005236 sound signal Effects 0.000 description 10
- 230000000694 effects Effects 0.000 description 9
- 230000008859 change Effects 0.000 description 8
- 230000009467 reduction Effects 0.000 description 6
- 230000001360 synchronised effect Effects 0.000 description 6
- 230000006399 behavior Effects 0.000 description 5
- 230000001965 increasing effect Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000033001 locomotion Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 238000005562 fading Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 240000007643 Phytolacca americana Species 0.000 description 1
- 238000012952 Resampling Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000011514 reflex Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/34—Indicating arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/4223—Cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42203—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4305—Synchronising client clock from received content stream, e.g. locking decoder clock with encoder clock, extraction of the PCR packets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
- H04N21/43072—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/85406—Content authoring involving a specific file format, e.g. MP4 format
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8547—Content authoring involving timestamps for synchronizing content
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/62—Control of parameters via user interfaces
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/63—Control of cameras or camera modules by using electronic viewfinders
- H04N23/631—Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/63—Control of cameras or camera modules by using electronic viewfinders
- H04N23/631—Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
- H04N23/632—Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters for displaying or modifying preview images prior to image capturing, e.g. variety of image resolutions or capturing parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/667—Camera operation mode switching, e.g. between still and video, sport and normal or high- and low-resolution modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/67—Focus control based on electronic image sensor signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/69—Control of means for changing angle of the field of view, e.g. optical zoom objectives or electronic zooming
Definitions
- the embodiments of the present application relate to the field of video processing, and in particular, to a video processing method and an electronic device.
- the optical zoom capability of the camera of the mobile phone can be ten times or even higher during the video recording process.
- the processing capability of the mobile phone microphone is also improved simultaneously, that is, the mobile phone microphone can collect the sound within the zoom range to realize directional sound pickup.
- the present application provides a video processing method and electronic device.
- the electronic device in the process of video recording, can correct the audio time stamp based on the time delay corresponding to the audio, so as to realize the synchronization of the audio and video images and improve the user experience.
- a video processing method includes: detecting a user's first instruction. In response to the first instruction, a photographing interface is displayed. Acquire a first zoom factor, and acquire a first video corresponding to the first zoom factor, where the first video includes a first audio and a first image, and the first audio corresponds to the first image, and the first image includes a photographed object , the first audio is generated according to the sound emitted by the sound source, the sound source is the object to be photographed, or the distance between the sound source and the object to be photographed is within a set range.
- the first delay corresponding to the first audio where the first delay includes the first sound propagation delay, or, the first delay includes the first sound propagation delay and the set algorithm processing delay, and the first sound propagation delay Latency is the delay caused by the transmission of sound from a sound source from the sound source to the electronic device.
- a first audio timestamp of the first audio is determined.
- the first image is displayed, and the first image and the corresponding relationship between the first audio and the first audio time stamp are saved.
- the above-mentioned electronic device of the executive body can also be replaced with at least one software module, or at least one software module and at least one hardware module, such as the modules in FIG. 1A and FIG. 2 .
- the electronic device can read the audio that has been collected by the collection point at the trigger time of each cycle of the reading cycle, the collection point can be integrated on the chip of the electronic device, or it can be outside the chip, and the collection point can be a microphone. .
- the electronic device acquires the first video corresponding to the first zoom factor, specifically, the electronic device reads the audio collected by the microphone in the first reading cycle.
- the correspondence between the first audio and the first image refers to that the sound corresponding to the first audio and the picture corresponding to the first image occur synchronously in the shooting scene, for example, as shown in FIG. 7c .
- the photographed object included in the first image is a moving or stationary person or object.
- the distance between the object to be photographed and the sound source is within a set range.
- the set range may be 1 meter, that is, the sound source may be at any position within a circle whose center is the photographed object and whose radius is 1 meter.
- the first delay includes the first sound propagation delay. Since it takes a certain time for the sound emitted by the sound source to be transmitted from the sound source to the electronic device, there is a time difference between the sound emitted by the sound source and the time when the sound is collected, which results in the propagation delay of the first sound. Time delay, you can use the distance from the sound source to the collection point or collection device, divided by the speed of sound propagation. Among them, the distance from the sound source to the collection point or collection device can be obtained by sensor ranging. Since the distance between the sound source and the object to be photographed is relatively close, the distance from the object to be photographed to the collection point or collection device can also be approximated.
- the electronic device may store the correspondence between the first audio and the first audio time stamp by writing the first audio and the first audio time stamp in a corresponding video file
- the video file may be an MP4 file.
- the electronic device can determine the audio time stamp corresponding to the audio based on the acquired time delay, so as to realize the correction of the audio time stamp, so that the time required for the sound emitted by the sound source to be transmitted from the sound source to the electronic device,
- there is a sound propagation delay between the audio obtained by the electronic device and the actual occurrence time of the corresponding sound and in the case of the algorithm processing delay caused by the processing of the audio timestamp, it can effectively cancel the sound propagation delay and
- the algorithm processes the influence of delay on the audio timestamp, thereby reducing the error between the audio timestamp corresponding to the audio and the time when the sound corresponding to the audio is actually emitted from the sound source (referring to the relative time to the start of the video recording),
- the stored correspondence between the first image and the first audio is consistent with the corresponding relationship between the picture corresponding to the first image and the sound corresponding to the first audio, thereby realizing the synchronization of the audio and the image.
- the method further includes: playing the first audio and the first image synchronously according to the first audio timestamp.
- the electronic device may also obtain a first image timestamp corresponding to the first image, and during the process of playing the first video, the electronic device may associate the first image timestamp with the first audio time.
- the stamps are aligned, and the first image corresponding to the first image timestamp and the audio corresponding to the first audio timestamp are played.
- the moment when the first video is played may be any moment after the video recording is completed.
- the electronic device detects the user's play instruction and plays the first video, or the electronic device detects the user's stop recording instruction, stops the video recording, and plays the first video. It should be noted that, the electronic device that plays the first video may also be another electronic device.
- the audio timestamp corresponding to each audio after the delay correction is consistent with the actual occurrence time of the sound corresponding to the audio, or in other words, the occurrence time of the image corresponding to the audio.
- the first audio and the first image are played synchronously, that is, the synchronization of the audio and the image (ie, the video picture) is realized.
- acquiring the first time delay corresponding to the first audio frequency includes: acquiring the first time delay according to the first zoom factor and the corresponding relationship between the zoom factor and the imaging distance. the first imaging distance corresponding to the first zoom factor.
- the first sound propagation delay is calculated based on the following formula:
- d1 is the first imaging distance
- c is the propagation speed of the sound in the photographed medium.
- determining the first audio timestamp of the first audio includes: calculating the first audio timestamp based on the following formula:
- latency1 is the first delay
- l is the cycle duration of the read cycle
- the read cycle is the cycle of periodically reading the audio that has been collected by the collection point from the start of video recording
- N1 is the read cycle corresponding to the first audio
- N1 is an integer greater than or equal to 1.
- the electronic device is set with one or more zoom multiples and the imaging distance corresponding to each zoom multiple. the first imaging distance.
- the first imaging distance may be greater than the actual distance between the sound source and the electronic device, may also be less than the actual distance, and may also be equal to the actual distance.
- the electronic device periodically reads audio from the collection point to acquire one or more audios whose duration is equal to the reading period, wherein the first audio belongs to the one or more audios.
- the start time of video recording may be denoted as a reading cycle n, and then each subsequent reading cycle is denoted as N+n.
- c may be the speed of sound in air.
- c is the propagation speed of the sound in the water.
- the electronic device can obtain the time delay corresponding to the zoom factor based on the imaging distance corresponding to the zoom factor, and based on the obtained time delay, correct the audio timestamp corresponding to the audio obtained under the zoom factor to To achieve audio and image synchronization.
- the method further includes: detecting a second instruction of the user. According to the second instruction, a second zoom factor is acquired, and a second video corresponding to the second zoom factor is acquired, wherein the second video includes a second audio and a second image; the second image includes another photographed object, and the second audio According to the sound generated by another sound source, the other sound source is another photographed object, or the distance between the other sound source and the another photographed object is within a set range, and the second zoom factor is different from the first zoom factor.
- Based on the second delay, a second audio timestamp of the second audio is determined.
- the electronic device may adjust the zoom factor based on the user's instruction.
- the adjusted zoom factor ie, the second zoom factor
- the zoom factor at the previous moment for example, the first zoom factor.
- the electronic device may acquire the second video based on the second zoom factor.
- another photographed object may be the same object as the photographed object described in the first aspect, or may be a different object.
- the other sound source may be the same sound source as the sound source in the first aspect, or may be a different sound source.
- the corresponding time delay changes. Therefore, the second image and the second audio acquired by the electronic device may not correspond, that is, the sound corresponding to the second audio corresponds to the second image.
- the pictures do not happen synchronously.
- the electronic device can correct the audio time stamp corresponding to the audio acquired under the zoom factor based on the time delay corresponding to the audio acquired under different zoom factors, so as to realize that each audio saved during the video recording process corresponds to the corresponding audio time stamp. images correspond.
- acquiring the second time delay corresponding to the second audio frequency includes: acquiring the second zoom factor according to the second zoom factor and the corresponding relationship between the zoom factor and the imaging distance The second imaging distance corresponding to the multiple. Calculate the second sound propagation delay based on the following formula:
- d2 is the second imaging distance
- the electronic device can acquire, based on the imaging distances corresponding to different zoom factors, the time delays corresponding to the audio obtained under different zoom factors.
- determining the second audio timestamp of the second audio based on the second delay includes: calculating the second audio timestamp based on the following formula:
- latency2 is the second delay
- N2 is the reading cycle corresponding to the second audio frequency
- N2 and N1 are adjacent cycles
- N2 is greater than N1.
- N2 and N1 are adjacent cycles, and N2 is greater than N1, that is, the read cycle corresponding to the first tone is the previous cycle adjacent to the read cycle corresponding to the second tone.
- the electronic device can obtain the delay corresponding to the audio obtained under different zoom factors based on the imaging distance corresponding to the different zoom factors, so as to modify the audio timestamp corresponding to the audio based on the delay corresponding to the audio, so as to realize The audio is synchronized with the image.
- the method includes: obtaining, based on the following formula, a distance between the second audio timestamp and the first audio timestamp difference:
- the second image is displayed on the shooting interface, and the second image and the corresponding relationship between the second audio and the second audio timestamp are saved.
- 21 may represent twice the reading cycle duration, or may represent twice the audio frame length, where the audio frame length is equal to the reading cycle duration.
- the electronic device can determine, based on the difference between two adjacent audio time stamps, whether the acquired audio time stamp (that is, the audio time stamp corresponding to the audio obtained in the current reading cycle) has the problem of audio time stamp jitter. , if the difference between two adjacent audio time stamps is greater than 0 and less than 21, it is determined that the problem of audio time stamp jitter does not occur, and the obtained audio and audio time stamps are stored correspondingly.
- the second image is displayed on the shooting interface, the second image is saved, and the second audio and the second audio time are discarded stamp.
- the second image in the second video will correspond to the audio (for example, the third audio) acquired in the next reading cycle, that is, during the video playback process.
- the second image is played synchronously with the third audio.
- the electronic device can solve the problem of audio timestamp jitter by discarding the second audio and the second audio timestamp.
- saving the second image, and after discarding the second audio and the second audio timestamp further comprising: according to the third audio timestamp, storing the third audio Play synchronously with the second image; wherein, the third audio time stamp corresponds to the third audio, and the third audio is acquired in the next reading cycle of the reading cycle corresponding to the second audio.
- the electronic device may play the saved video, that is, display the corresponding image based on the image timestamp corresponding to each image, and play the corresponding audio based on the audio timestamp corresponding to each audio.
- the saved second image will correspond to the third audio, that is, the corresponding relationship between the original second image and the second audio is updated to the second image.
- the corresponding relationship with the third audio that is to say, the picture corresponding to the second image and the sound corresponding to the third audio occur synchronously.
- the second image is displayed, and the correspondence between the second image, the second audio and the second audio timestamp is saved, and the corresponding relationship between the inserted audio and the inserted audio timestamp, where the inserted audio is obtained based on the second audio.
- the inserted audio timestamp is based on the following formula:
- p is the algorithm processing delay.
- the inserted audio frame may be obtained by fading the second audio in and out.
- the electronic device can solve the audio problem by inserting the audio time stamp and inserting the audio time stamp between the first audio time stamp and the second audio time stamp. Timestamp jitter.
- the method further includes: : Plays the inserted audio and the second image synchronously according to the inserted audio timestamp.
- the saved second image corresponds to the inserted audio.
- the second audio may correspond to the third image, that is, the sound corresponding to the second audio and the picture corresponding to the third image occur synchronously.
- the problem of the jitter of the audio time stamp caused by the increase of the zoom factor is suppressed, so as to realize the audio time stamp jitter during playback.
- the audio in the process is synchronized with the image.
- determining the second audio timestamp of the second audio based on the second delay including:
- the first delay difference the first sound propagation delay - the second sound propagation delay
- the read cycle of the first audio frequency is the previous cycle adjacent to the read cycle of the second audio frequency.
- Second delay difference second sound propagation delay - first sound propagation delay
- the second audio timestamp is calculated based on the following formula:
- latency2 is the second delay
- N2 is the reading cycle corresponding to the second audio frequency
- N2 and N1 are adjacent cycles
- N2 is greater than N1.
- the shooting interface On the shooting interface, the second image is displayed, and the correspondence between the second image, the second audio and the second audio time stamp is saved.
- the second image is displayed on the shooting interface, the second image is saved, and the second image is discarded. audio.
- the second image is saved, and after discarding the second audio, the method further includes: synchronously playing the third audio and the second image according to the third audio timestamp ; wherein, the third audio time stamp corresponds to the third audio, and the third audio is acquired in the next reading cycle of the reading cycle corresponding to the second audio.
- the second delay difference is greater than or equal to 1
- the second image is displayed, and the second image, the second audio and the second audio timestamp are saved
- the corresponding relationship of and the corresponding relationship between the inserted audio and the inserted audio timestamp, wherein the inserted audio is obtained based on the second audio.
- the second audio timestamp is calculated based on the following formula:
- latency2 is the second delay
- N2 is the reading cycle corresponding to the second audio frequency
- N2 and N1 are adjacent cycles
- N2 is greater than N1.
- p is the algorithm processing delay.
- the method further includes: : Plays the inserted audio and the second image synchronously according to the inserted audio timestamp.
- acquiring the first delay corresponding to the first audio includes: when the first zoom factor is greater than a set zoom factor, acquiring the first time delay corresponding to the first audio time delay.
- the electronic device may detect whether the obtained zoom factor is greater than the set zoom factor, and after the zoom factor is greater than the set zoom factor, obtain the first delay corresponding to the first audio
- the set zoom factor may be 2 times, or 2.5 times, or the like. That is to say, in a scenario where the zoom factor is smaller than the set zoom factor, the solution for determining the audio timestamp based on the delay described in this application is not triggered. In this way, the processing pressure of the device can be effectively reduced to save system resources.
- acquiring the first zoom factor includes: reading the last stored zoom factor obtained before the end of the previous video recording; or, detecting to the user's zoom instruction, and in response to the zoom instruction, obtain the first zoom factor; or, detecting the user's mode setting instruction, in response to the mode setting instruction, determine the first zoom mode, and according to the zoom mode and The corresponding relationship of zoom magnifications is to obtain the first zoom magnification corresponding to the first zoom mode.
- the first zoom factor may be set by the user in the process of displaying the preview image on the shooting interface.
- the first zoom factor may also be set by the user in the zoom factor adjustment option in the shooting interface of the electronic device at any moment during the video recording process.
- the first zoom factor may also be the last zoom factor obtained in the last video recording process, and this zoom factor is used in the current video recording process, that is, at the start moment of the video recording.
- the electronic device can obtain the first zoom factor.
- the first zoom factor may also be a large focal length mode set by the user during the preview image or video recording process, for example, a telephoto shooting mode, and the zoom factor corresponding to this mode is 5 times.
- the electronic device may only collect and display the image, but not the audio.
- the electronic device can start to acquire the zoom factor based on the received user instruction during the video recording process, that is, trigger the video processing method in the embodiment of the present application.
- the algorithm processing delay is a set fixed delay.
- the electronic device can process the time delay based on an algorithm configured in the electronic device (eg, the memory of the electronic device) to obtain the time delay corresponding to the audio when correcting the audio time stamp.
- an algorithm configured in the electronic device eg, the memory of the electronic device
- an embodiment of the present application provides an electronic device.
- the electronic device includes: one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored on the memory and, when executed by the one or more processors, cause the electronic device Perform the following steps: detecting a first instruction of the user; displaying a shooting interface in response to the first instruction; acquiring a first zoom factor, and acquiring a first video corresponding to the first zoom factor, wherein the first video includes the first audio and The first image; the first audio corresponds to the first image, the first image includes the object to be photographed, the first audio is generated according to the sound emitted by the sound source, the sound source is the object to be photographed, or the distance between the sound source and the object to be photographed is within the set range; obtain the first delay corresponding to the first audio, where the first delay includes the first sound propagation delay, or the first delay includes the first sound propagation delay and the set algorithm processing delay , the first sound propagation delay is the delay caused by the transmission of
- the computer program when executed by the one or more processors, causes the electronic device to perform the steps of: playing the first audio and the first image synchronously according to the first audio time stamp.
- the electronic device when the computer program is executed by one or more processors, the electronic device is caused to perform the following steps: according to the first zoom factor, and the difference between the zoom factor and the imaging distance Corresponding relationship, obtain the first imaging distance corresponding to the first zoom factor; calculate the first sound propagation delay based on the following formula:
- d1 is the first imaging distance
- c is the propagation speed of the sound in the photographed medium
- the electronic device also performs the following steps: calculating the first audio timestamp based on the following formula:
- latency1 is the first delay
- l is the cycle duration of the reading cycle
- the reading cycle is the cycle of periodically reading the audio that has been collected by the collection point from the start of video recording
- N1 is the reading corresponding to the first audio. Taking the period, N1 is an integer greater than or equal to 1.
- the electronic device when the computer program is executed by one or more processors, the electronic device is caused to perform the following steps: detecting the second instruction of the user; according to the second instruction, Acquire a second zoom factor, and acquire a second video corresponding to the second zoom factor, where the second video includes a second audio and a second image; the second image includes another object to be photographed, and the second audio is emitted according to another sound source sound generation, the other sound source is another object to be photographed, or the distance between the other sound source and the other object to be photographed is within the set range; the second zoom factor is different from the first zoom factor; obtain the second audio corresponding
- the second delay includes the second sound propagation delay, or the second delay includes the second sound propagation delay and the algorithm processing delay, wherein the second sound propagation delay is emitted by another sound source
- the time delay caused by the sound transmitted from another sound source to the electronic device; based on the second time delay, a second audio timestamp of the second audio is determined.
- the electronic device when the computer program is executed by one or more processors, the electronic device is caused to perform the following steps: according to the second zoom factor, and the difference between the zoom factor and the imaging distance Corresponding relationship, obtaining the second imaging distance corresponding to the second zoom factor;
- d2 is the second imaging distance
- the electronic device when the computer program is executed by one or more processors, the electronic device is caused to perform the following steps: calculate the second audio timestamp based on the following formula:
- latency2 is the second delay
- N2 is the reading cycle corresponding to the second audio frequency
- N2 and N1 are adjacent cycles
- N2 is greater than N1.
- the electronic device when the computer program is executed by one or more processors, the electronic device is caused to perform the following steps: obtaining the second audio timestamp and the first audio timestamp based on the following formula The difference between an audio timestamp:
- the second image is displayed on the shooting interface, and the second image and the corresponding relationship between the second audio and the second audio timestamp are saved.
- the electronic device when the computer program is executed by one or more processors, the electronic device is caused to perform the following steps: if the difference is less than 0, on the shooting interface, display the second image, and, save the second image, and discard the second audio and the second audio timestamp.
- the electronic device when the computer program is executed by the one or more processors, the electronic device is caused to perform the following steps: according to the third audio timestamp, the third audio and the third audio The two images are played synchronously; wherein, the third audio time stamp corresponds to the third audio, and the third audio is acquired in a reading cycle next to the reading cycle corresponding to the second audio.
- the electronic device when the computer program is executed by one or more processors, the electronic device is caused to perform the following steps: if the difference is greater than or equal to 21, displaying the second image, And, save the second image, the corresponding relationship between the second audio and the second audio time stamp, and the corresponding relationship between the inserted audio and the inserted audio time stamp; wherein, the inserted audio is obtained based on the second audio;
- the inserted audio timestamp is based on the following formula:
- p is the algorithm processing delay.
- the computer program when executed by the one or more processors, causes the electronic device to perform the steps of: inserting the audio and the second image according to the inserted audio timestamp Synchronized playback.
- the electronic device when the computer program is executed by one or more processors, the electronic device is caused to perform the following steps: if the second sound propagation delay is smaller than the first sound propagation Delay, based on the following formula, obtain the first delay difference value:
- the first delay difference the first sound propagation delay - the second sound propagation delay
- the read cycle of the first audio frequency is the previous cycle adjacent to the read cycle of the second audio frequency.
- Second delay difference second sound propagation delay - first sound propagation delay
- the second audio timestamp is calculated based on the following formula:
- latency2 is the second delay
- N2 is the reading cycle corresponding to the second audio frequency
- N2 and N1 are adjacent cycles
- N2 is greater than N1.
- the shooting interface On the shooting interface, the second image is displayed, and the correspondence between the second image, the second audio and the second audio time stamp is saved.
- the electronic device when the computer program is executed by one or more processors, the electronic device is caused to perform the following steps: if the first delay difference is greater than or equal to 1, in The photographing interface displays the second image, saves the second image, and discards the second audio.
- the electronic device when the computer program is executed by the one or more processors, the electronic device is caused to perform the following steps: according to the third audio timestamp, the third audio and the third audio The two images are played synchronously; wherein, the third audio time stamp corresponds to the third audio, and the third audio is acquired in a reading cycle next to the reading cycle corresponding to the second audio.
- the electronic device when the computer program is executed by one or more processors, the electronic device is caused to perform the following steps: if the second delay difference is greater than or equal to 1, displaying The second image, and the corresponding relationship between the second image, the second audio and the second audio time stamp, and the corresponding relationship between the inserted audio and the inserted audio time stamp, wherein the inserted audio is obtained based on the second audio.
- the second audio timestamp is calculated based on the following formula:
- latency2 is the second delay
- N2 is the reading cycle corresponding to the second audio frequency
- N2 and N1 are adjacent cycles
- N2 is greater than N1.
- p is the algorithm processing delay.
- the computer program when executed by the one or more processors, causes the electronic device to perform the steps of: inserting the audio and the second image according to the inserted audio timestamp Synchronized playback.
- the electronic device when the computer program is executed by one or more processors, the electronic device is caused to perform the following steps: when the first zoom factor is greater than the set zoom factor, obtain The first delay corresponding to the first audio.
- the computer program when executed by one or more processors, it causes the electronic device to perform the following steps: read the stored previous video recording before the end, and finally The zoom factor obtained at one time; or, the user's zoom instruction is detected, and the first zoom factor is acquired in response to the zoom instruction; or, the user's mode setting instruction is detected, and the first zoom mode is determined in response to the mode setting instruction, and According to the corresponding relationship between the zoom mode and the zoom factor, the first zoom factor corresponding to the first zoom mode is obtained.
- the second aspect and any implementation manner of the second aspect correspond to the first aspect and any implementation manner of the first aspect, respectively.
- the technical effects corresponding to the second aspect and any implementation manner of the second aspect reference may be made to the technical effects corresponding to the first aspect and any implementation manner of the first aspect, which will not be repeated here.
- a computer-readable storage medium includes a computer program that, when executed on an electronic device, causes the electronic device to perform the first aspect and the video processing method of any one of the first aspect.
- the third aspect and any implementation manner of the third aspect correspond to the first aspect and any implementation manner of the first aspect, respectively.
- the technical effects corresponding to the third aspect and any implementation manner of the third aspect reference may be made to the technical effects corresponding to the first aspect and any implementation manner of the first aspect, which will not be repeated here.
- a chip in a fourth aspect, includes at least one processing circuit and an interface, and the processing circuit can execute the first aspect and the video processing method of any one of the first aspect.
- the fourth aspect and any implementation manner of the fourth aspect correspond to the first aspect and any implementation manner of the first aspect, respectively.
- the technical effects corresponding to the fourth aspect and any implementation manner of the fourth aspect reference may be made to the technical effects corresponding to the first aspect and any implementation manner of the first aspect, which will not be repeated here.
- 1A is a schematic structural diagram of an exemplary electronic device
- FIG. 1B is a schematic diagram of a microphone layout on an exemplary electronic device
- FIG. 2 is a block diagram of the software structure of an exemplary electronic device
- FIG. 3 is a schematic flowchart of an exemplary creation process
- FIG. 5 is a schematic diagram of an exemplary time stamp-based audio and image playback process
- FIG. 6 is a schematic diagram of an exemplary module interaction flow
- FIG. 7a is a schematic diagram of an exemplary application scenario
- FIG. 7b is an exemplary schematic diagram of a shooting scene
- Fig. 7c is one of the schematic diagrams of the corresponding relationship between the image and the audio shown exemplarily;
- FIG. 7d is one of the schematic diagrams of the corresponding relationship between the image and the audio shown exemplarily;
- FIG. 8 is a schematic diagram of a schematic diagram of a video processing method provided by an embodiment of the present application.
- FIG. 9 is one of the schematic flowcharts of a video processing method provided by an embodiment of the present application.
- Fig. 10a is one of the schematic diagrams illustrating the comparison of audio time stamps in the scene where the zoom factor is increased;
- FIG. 10b is one of the schematic diagrams illustrating the comparison of audio time stamps in the scene where the zoom factor is increased;
- FIG. 11a is one of the schematic diagrams illustrating the comparison of audio timestamps in a zoom factor reduction scene
- FIG. 11b is one of the schematic diagrams illustrating the comparison of audio time stamps in a scene where the zoom factor is reduced;
- FIG. 13 is one of the schematic flowcharts of a video processing method provided by an embodiment of the present application.
- FIG. 14 is a schematic structural diagram of an apparatus provided by an embodiment of the present application.
- first and second in the description and claims of the embodiments of the present application are used to distinguish different objects, rather than to describe a specific order of the objects.
- first target object, the second target object, etc. are used to distinguish different target objects, rather than to describe a specific order of the target objects.
- words such as “exemplary” or “for example” are used to represent examples, illustrations or illustrations. Any embodiments or designs described in the embodiments of the present application as “exemplary” or “such as” should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as “exemplary” or “such as” is intended to present the related concepts in a specific manner.
- multiple processing units refers to two or more processing units; multiple systems refers to two or more systems.
- the video processing method provided in the embodiment of the present application can be applied to an electronic device, and the electronic device may also be referred to as a terminal, a terminal device, or the like.
- the electronic device may specifically be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a notebook computer, an ultra-mobile personal computer (ultra-mobile) personal computer, UMPC), netbook, personal digital assistant (personal digital assistant, PDA) or special camera (for example, a single-lens reflex camera, a card camera), etc.
- the embodiments of the present application do not impose any restrictions on the specific type of the electronic device.
- FIG. 1A shows a schematic structural diagram of an electronic device 100 .
- the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
- SIM Subscriber identification module
- the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
- application processor application processor, AP
- modem processor graphics processor
- graphics processor graphics processor
- ISP image signal processor
- controller memory
- video codec digital signal processor
- DSP digital signal processor
- NPU neural-network processing unit
- the controller may be the nerve center and command center of the electronic device 100 .
- the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
- a memory may also be provided in the processor 110 for storing instructions and data.
- the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
- the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
- the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
- Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
- Display screen 194 is used to display images, videos, and the like.
- Display screen 194 includes a display panel.
- the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.
- the display screen 194 may display the camera's shooting preview interface, video recording preview interface, and shooting interface, and may also display a video playback interface during video playback.
- the electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
- the ISP is used to process the data fed back by the camera 193 .
- the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, converting it into an image visible to the naked eye.
- Camera 193 is used to capture still images or video. The object is projected through the lens to generate an optical image onto the photosensitive element.
- the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
- the camera 193 may be located in the edge area of the electronic device, may be an under-screen camera, or may be a camera that can be raised and lowered.
- the camera 193 may include a rear camera, and may also include a front camera. The embodiment of the present application does not limit the specific position and shape of the camera 193 .
- Video codecs are used to compress or decompress digital video.
- the electronic device 100 may support one or more video codecs.
- the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
- MPEG Moving Picture Experts Group
- MPEG2 moving picture experts group
- MPEG3 MPEG4
- MPEG4 Moving Picture Experts Group
- Internal memory 121 may be used to store computer executable program code, which includes instructions.
- the processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 .
- the internal memory 121 may include a storage program area and a storage data area.
- the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
- the storage data area can store data created during the use of the electronic device 100 (such as images collected by the electronic device 100, audio data, phone book, etc.) and the like.
- the processor 110 causes the electronic device to execute the video processing method in the present application by running the instructions stored in the internal memory 121, specifically, adjusting the time stamp of the audio according to the size of the delay.
- the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a plurality of microphones 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
- the audio module 170 is used for converting digital audio data into analog audio electrical signal output, and also for converting analog audio electrical signal input into digital audio data.
- the audio module 170 is used to convert the analog audio electrical signal output by the microphone 170C into digital audio data.
- the audio module 170 may further include an audio processing module.
- the audio processing module is used for, in the video recording mode, to perform audio processing on the digital audio data to generate audio. Audio module 170 may also be used to encode and decode audio data.
- the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
- Speaker 170A also referred to as “speaker” is used to convert analog audio electrical signals into sound signals.
- the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
- the receiver 170B also referred to as the "earpiece" is used to convert the analog audio electrical signal into a sound signal.
- the electronic device 100 answers a call or a voice message, the voice can be answered by placing the receiver 170B close to the human ear.
- the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into analog audio electrical signals.
- the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
- the electronic device 100 may include at least three microphones 170C, which can realize the functions of collecting sound signals in various directions, and converting the collected sound signals into analog audio electrical signals, and can also realize noise reduction, Identify the source of the sound, or directional recording functions, etc.
- the layout of the microphone 170C on the electronic device 100 may be as shown in FIG. 1B , and the electronic device 100 may include a microphone 1 arranged at the bottom, a microphone 2 arranged at the top, and a microphone 3 arranged at the back.
- the combination of the microphones 1-3 can collect sound signals from all directions around the electronic device 100 .
- the electronic device 100 may further include a larger number of microphones 170C.
- the electronic device 100 may include one or more microphones and microphones provided on the bottom, one or more microphones provided on the top, one or more microphones provided on the back, and one or more microphones provided on the front of the screen. These microphones can collect sound signals from all directions around the electronic device 100 .
- This screen is a display screen 194 or a touch screen.
- the microphone 170C may be a built-in component of the electronic device 100 or an external accessory of the electronic device 100 .
- the electronic device 100 may include a microphone 1 provided at the bottom, a microphone 2 provided at the top, and external accessories.
- the external accessory may be a miniature microphone connected to the electronic device 100 (wired connection or wireless connection), or an earphone with a microphone (such as a wired earphone or a TWS earphone, etc.).
- the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
- the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
- the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
- the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
- the embodiments of the present application take an Android system with a layered architecture as an example to exemplarily describe the software structure of the electronic device 100 .
- FIG. 2 is a block diagram of the software structure of the electronic device 100 according to the embodiment of the present application.
- the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
- the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, an Android runtime (Android runtime) and a system library, and a hardware abstraction layer (HAL) .
- the application layer can include a series of application packages.
- the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message and so on.
- the application framework (Framework) layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
- the application framework layer includes some predefined functions.
- the application framework layer may include media services (Media Server), audio services (Audio Server), camera services (Camera Server), system services (System Server), and the like.
- Media Server media services
- Audio Server audio services
- Camera Services Camera services
- System Server system services
- the Media Server is used to manage audio data and image data, such as controlling the data flow direction of audio data and image data, and writing audio streams and image streams into MP4 files.
- audio data and image data may also be referred to as audio streams and image streams, respectively, or audio information and image information, which are not limited in this application.
- the Audio Server is used to process the audio stream accordingly, for example, to obtain the audio timestamp corresponding to the audio stream.
- Camera Server is used to perform corresponding processing on the image stream, for example, to obtain the video timestamp corresponding to the image stream and other processing.
- the system library and runtime layer includes the system library and the Android Runtime.
- a system library can include multiple functional modules. For example: browser kernel, 3D graphics library (eg: OpenGL ES), font library, etc.
- the browser kernel is responsible for interpreting the syntax of the web page (such as an application HTML and JavaScript under the standard general markup language) and rendering (displaying) the web page.
- the 3D graphics library is used to implement 3D graphics drawing, image rendering, compositing and layer processing, etc.
- the font library is used to implement the input of different fonts.
- the Android runtime includes core libraries and a virtual machine. The Android runtime is responsible for the scheduling and management of the Android system.
- the core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
- the application layer and the application framework layer run in virtual machines.
- the virtual machine executes the java files of the application layer and the application framework layer as binary files.
- the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, safety and exception management, and garbage collection.
- the components included in the system framework layer, system library and runtime layer shown in FIG. 2 do not constitute a specific limitation on the electronic device 100 .
- the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
- the HAL layer is the interface layer between the operating system kernel and the hardware circuit.
- the HAL layer includes but is not limited to: audio hardware abstraction layer (Audio HAL) and camera hardware abstraction layer (Camera HAL).
- Audio HAL is used to process the audio stream, for example, noise reduction, directional enhancement and other processing of the audio stream
- Camera HAL is used to process the image stream.
- the kernel layer is the layer between the hardware and the aforementioned software layers.
- the kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
- the hardware may include devices such as a camera, a display screen, a microphone, a processor, and a memory.
- the display screen in the hardware may display a shooting preview interface, a video preview interface and a shooting interface during video recording.
- a camera in the hardware can be used to capture images.
- Microphones in hardware can be used to collect sound signals and generate analog audio electrical signals.
- the recording process of the electronic device can be divided into two parts.
- the first part is the creation process, that is, each module creates a corresponding instance, which can also be understood as the preparation process, as shown in Figure 3, and the second part is the recording process, that is , and each instance processes the acquired data (audio or image), as shown in FIG. 4 .
- the creation process is mainly to create a corresponding instance of each module.
- the recording process is the processing of data (including audio stream and image stream) by each instance.
- the first part the creation process
- the camera application starts the camera application.
- the camera application responds to the user's instruction and creates a MediaRecorder instance in the Framework layer through the interface with the Framework layer to start recording process.
- the Media Recorder instance instructs the Media Server to create the corresponding instance.
- the "instance” described in the embodiments of the present application can also be understood as program code or process code running in a process for correspondingly processing received data (eg, audio stream or image stream).
- the camera application is used as an example for description. Not limited.
- the Media Server creates instances corresponding to audio and images in response to the indication of the Media Recorder instance.
- Media Server creates an instance of StagefrightRecorder (recording processing).
- the StagefrightRecorder instance is used to manage the initialization and data flow of audio and image data.
- the StagefrightRecorder instance creates an instance of Camera Source (camera source), Audio Record (audio recording) instance, Video Encoder (video encoding) instance, Audio Encoder (audio encoding) instance, Mpeg4Writer (Motion Picture Experts Group write) instance.
- Camera Source camera source
- Audio Record audio recording
- Video Encoder video encoding
- Audio Encoder audio encoding
- Mpeg4Writer Motion Picture Experts Group write
- Media Server instructs Camera Server (camera service) and Audio Server (audio service) to create corresponding instances.
- the CameraSource instance instructs Camera Server to create a Camera instance
- the Audio Record instance instructs Audio Server to create a Record Thread (recording process) instance.
- Camera Server creates Camera instances
- Audio Server creates Record Thread instances.
- Camera Server instructs Carema Hal (camera hardware abstraction layer) to create a corresponding instance
- Audio Server instructs Audio Hal (audio hardware abstraction layer) to create a corresponding instance.
- the Camera instance instructs Camera Hal to create a Camera 3Device (a camera device, where the number 3 represents the version number of the camera service, which can be updated with the version) instance
- the Record Thread instance instructs Audio Hal to create an Input Stream (input stream) instance and the recording directional enhancement algorithm instance
- the recording directional enhancement algorithm instance can only be enabled in the zoom shooting scene, and the specific implementation process will be described in detail below.
- Carema Hal calls the camera driver
- Audio Hal calls the microphone driver.
- the Camera 3Device instance triggers the startup of the camera driver
- the Input Stream instance triggers the startup of the microphone driver.
- the camera driver calls the camera to collect the image stream
- the microphone driver calls the microphone to collect the audio stream
- the camera outputs the captured image stream to the camera driver, and the microphone outputs the picked-up audio stream to the microphone driver.
- the camera driver outputs the image stream and the corresponding system time to the Camera Hal
- the microphone driver outputs the audio stream to the Audio Hal.
- the Camera 3Device instance obtains the image stream input by the camera and the corresponding system time
- the recording orientation enhancement algorithm instance obtains the audio stream input by the microphone driver, and outputs the obtained audio stream to the Input Stream instance.
- the image stream includes multiple images
- the system time is the system time corresponding to each image captured by the camera.
- the system time may coincide with real time (ie, physical world time).
- Camera Hal outputs the acquired image stream and system time to CameraServer
- Audio Hal outputs the acquired audio stream to Audio Server.
- the Camera instance obtains the image stream and system time input by the Camera 3Device instance
- the Record Thread instance obtains the audio stream input by the Input Stream instance.
- the Camera Server obtains the time stamps corresponding to each image in the image stream (hereinafter referred to as video time stamps), and outputs each image in the image stream and the video time stamp corresponding to each image to the Media Server.
- the Audio Server obtains the timestamp corresponding to each audio stream (hereinafter referred to as the audio timestamp), and outputs each audio stream and the audio timestamp corresponding to each audio stream to the Media Server.
- the Camera Source instance obtains each image in the image stream input by the Camera instance and the video time stamp corresponding to each image
- the Audio Record instance obtains the audio stream input by the Record Thread instance and the audio time stamp corresponding to each audio stream .
- the video time stamp and the audio time stamp are relative times calculated according to the start time of recording, where the start time of recording is 0ms. That is to say, the video time stamp can be understood as the relative time converted from the recording start time and the system time corresponding to each image, and the audio time stamp is similar. For example, if the audio time stamp is 60ms, it means the audio time stamp The acquisition time of the corresponding audio stream is 60 ms away from the recording start time (ie, 0 ms). The description will not be repeated below, and the acquisition method of the audio timestamp will be described in detail below.
- Media Server generates MP4 files based on the obtained multiple images and the corresponding video timestamps of each image, as well as multiple audio streams and the corresponding audio timestamps of each audio stream.
- the Camera Source instance outputs the acquired multiple images and the video timestamps corresponding to each image to the Vidio Encoder instance
- the Audio Record instance outputs the acquired multiple audio streams and the audio corresponding to each audio stream. Timestamps are output to the Audio Encoder instance.
- the Vidio Encoder instance encodes multiple images to generate corresponding image frames, each image frame corresponds to a video timestamp (that is, the video timestamp corresponding to the image described above), and the Vidio Encoder instance encodes multiple image frames and The video timestamp corresponding to each image frame is output to the Mpeg4Writer instance.
- the Audio Encoder instance encodes multiple audio streams to generate corresponding audio frames, each audio frame corresponds to an audio timestamp (that is, the audio timestamp corresponding to the audio stream described above), and the Audio Encoder instance encodes multiple audio The frames and audio timestamps corresponding to each audio frame are output to the Mpeg4Writer instance.
- the Mpeg4Writer instance generates an MP4 file based on the acquired multiple images and the video timestamps corresponding to each image, and the multiple audio streams and the audio timestamps corresponding to each audio stream.
- the MP4 file includes image data (ie, multiple image frames) and audio data (ie, multiple audio frames).
- the player will decode the image frame and audio frame respectively according to the MPEG4 standard to obtain the original image corresponding to the image frame and the original audio corresponding to the audio frame.
- the player can align the decoded image and audio based on the video timestamp corresponding to the image frame and the audio timestamp corresponding to the audio frame, so that the image and the audio are played synchronously.
- the MP4 file includes image data and audio data.
- the decoding of the image data and the audio data is independent, and after decoding, they are also played independently.
- the playback speed of the video picture is determined by the frame rate, which is the frequency (rate) at which bitmap images continuously appear on the display per unit time.
- the frame rate can also be called is the frame frequency and is expressed in Hertz (Hz).
- the audio playback speed is determined by the audio sampling rate, which refers to the number of times the sound signal is sampled by the recording device in one second.
- the playback duration of each audio frame is 20ms, and the playback duration of each image frame is 17ms. It should be noted that in other embodiments, the playback duration of the audio frame is It may be 23.22ms, and the playback duration of the image frame may be 33ms, etc., which is not limited in this application.
- the audio and video images are completely synchronized (referred to as audio and image synchronization).
- audio and image synchronization due to the influence of decoding and other factors, the audio and image may be out of synchronization.
- the prior art Time stamps are introduced to calibrate audio and video images by time stamps to achieve audio and image synchronization.
- the calculation method of the video timestamp is as follows: take a single image as the granularity to illustrate, specifically, the CameraServer (specifically, the Camera instance) inputs the system time corresponding to the image according to the CameraHal (specifically, the Camera3Device instance), combined with the start of video recording time, the video timestamp corresponding to the image can be calculated, that is, the relative duration between the system time and the start time. For example, the relative duration between the system time and the start time is 17ms, then the video timestamp corresponding to the image is 17ms.
- the Media Server specifically, an instance of Media Encoder
- the video timestamp corresponds to the image frame corresponding to the image.
- AudioServer (specifically, the Record Thread instance) calculates the audio timestamp corresponding to the audio frame according to the general formula of the audio timestamp.
- the general formula of the audio timestamp is as follows:
- N is the number of readings
- l is the audio frame length (unit is millisecond (ms)).
- the number of readings is the number of times that AudioServer (specifically, Record Thread instance) periodically reads audio from AudioHal (specifically, Input Stream instance), and the audio frame length is equal to the reading cycle of AudioServer, that is, AudioServer periodically
- the cycle duration when reading audio from AudioHal can be understood as the audio stream collected by the microphone and output to AudioHal is continuous.
- AudioServer reads the audio stream obtained by AudioHal at the trigger time of each reading cycle, that is, What AudioServer obtains is one or more audio streams whose duration is equal to the duration of the read cycle.
- the read cycle (ie, the audio frame length) can be set by each manufacturer according to the actual situation. For example, if the reading period of AudioServer is 20ms, the corresponding frame length of each audio frame is 20ms, AudioServer reads audio from AudioHal every 20ms, for example, the audio timestamp corresponding to the audio read for the third time That is: 3*20ms, which is 60ms.
- Figure 5 is a schematic diagram of a time stamp-based audio and image playback process.
- the reference clock is used as a reference to realize the synchronization of audio and images. For example, when playing audio with a time stamp of 20ms frame, the image frame corresponding to the 17ms timestamp (that is, the image frame corresponding to the video timestamp closest to the 20ms reference time) is played.
- the same time axis as the audio time axis has been referred to as an example, that is to say, during the playback process, the audio time stamp is used as the reference clock to synchronize the image to the audio, which is also understandable To synchronize the image based on the playback speed of the audio.
- the reference time axis may also be an image time axis or an external clock, which is not limited in this application.
- reference may be made to the audio and image synchronization solutions in the prior art, which will not be repeated in this application.
- audio and image synchronization is achieved by aligning the audio timestamps with the video timestamps, however, when the audio and image timestamps are offset from the actual recorded physical time (i.e. real world time) , even if the timestamps are aligned, the audio is still out of sync with the image.
- the reason for the above problem may be due to the focal length change during shooting.
- the camera application may determine that the current zoom factor is 15 (times) (15x shown in the figure means the zoom factor is 15).
- the mobile phone can adjust the shooting focal length based on the current zoom factor (for example, 15 times).
- the shooting focal length may also be referred to as the imaging distance, that is, the distance between the camera and the zoom range corresponding to the zoom factor.
- the correspondence between different zoom factors and imaging distances is stored in the memory of the mobile phone, for example, in the form of a relationship table.
- the module or instance (for example, Audio HaL or Camera Hal) can determine the imaging distance corresponding to the current zoom factor by querying the relation table. For example, if the zoom factor is X, and X is 3, the corresponding imaging distance is 10 meters, that is, the camera captures a picture with an imaging distance 10 meters away from the camera.
- the conversion method between the zoom factor and the imaging distance may be the same or different for electronic devices of different manufacturers, which is not limited in this application.
- the imaging distance corresponding to the zoom factor X configured by the electronic device of manufacturer A is 8 meters
- the imaging distance corresponding to the zoom factor X configured by the electronic device of manufacturer B is 10 meters.
- the corresponding relationship between the zoom factor and the imaging distance involved in the embodiments of the present application is only a schematic example, which is not limited in the present application.
- the module interaction process in the zoom factor adjustment scene will be described below with reference to FIG. 6 .
- the camera application obtains the zoom factor, it outputs the zoom factor to Camera Hal and AudioHal.
- Camera Hal inputs a zoom factor to the camera driver, and the camera driver controls the camera to collect images within the zoom range corresponding to the current zoom factor based on the zoom factor. Do limit.
- Audio Hal can process the audio collected by one or more microphones through the recording directional enhancement algorithm instance according to the received zoom factor. , in order to narrow the mono beam, or narrow the included angle of the stereo beam, so that the sound within the zoom range can be preserved, and the sound outside the zoom range can be suppressed to highlight the sound within the zoom range, thereby realizing directional sound pickup.
- the zoom range refers to the shooting range corresponding to the current zoom factor.
- Audio Hal specifically, an example of the recording directional enhancement algorithm
- the algorithm processing delay means that the recording directional enhancement algorithm in AudioHal will wait for at least one audio frame when calculating, such as waiting for the length of two audio frames, such as 40ms, to determine the previous frame of audio.
- FIG. 7a is an exemplary schematic diagram of a scene
- the user holds the mobile phone and is 10 meters away from the photographed subject, wherein the photographed subject is dancing, and the mobile phone beside the photographed subject (for example, within a range of 1 meter) is playing synchronously. music.
- the mobile phone screen displays a shooting interface (or a shooting preview interface), as shown in Figure 7b, referring to Figure 7b, the shooting preview interface not only displays the currently shot video screen, but also includes but not limited to focus adjustment options, Recording start option, recording pause option, recording stop option, etc.
- the camera application determines that the current zoom factor is 6.1 times in response to the user's instruction.
- the camera application outputs the zoom factor (ie 6.1 times) to Camera Hal and Audio Hal.
- Camera Hal and Audio Hal can query the correspondence table between zoom factor and imaging distance to determine that the imaging distance corresponding to a zoom factor of 6.1 times is 10 meters, that is, the image and sound collected by the current camera and microphone are 10 meters away from the mobile phone.
- the increase of the zoom factor has no effect on the video image, that is to say, there is no difference between the video timestamp corresponding to each image frame in the MP4 file generated by the mobile phone and the actual occurrence time of the image frame. error, or the error is negligible. Due to the influence of sound propagation delay and algorithm processing delay, there is a difference between the audio timestamp corresponding to each audio stream in the MPR file (referring to the audio stream collected after zooming) and the actual sound occurrence time corresponding to the audio stream.
- Time delay (that is, including sound propagation delay and algorithm processing delay), exemplarily, in the scene shown in Figure 7a, that is, when the imaging distance is 10 meters, the sound transmission delay is about 30ms, that is, the microphone The sound collected at the current moment is actually emitted by the mobile phone next to the subject 30ms ago.
- Figure 7c shows the actual correspondence between the image and audio captured in the scene shown in Figure 7b.
- the audio frame corresponding to the image frame 1 corresponding to the action 1 is: Audio frame 1, that is to say, when the subject is jumping to action 1, the mobile phone beside the subject plays the audio corresponding to audio frame 1.
- image frame 2 ie, the image frame corresponding to action 2
- image frame 3 corresponds to audio frame 3
- image frame 4 corresponds to audio frame 4.
- the correct audio timestamp of audio frame 1 (that is, the duration between the actual sound occurrence time and the recording start time) is 20ms
- the correct audio timestamp of audio frame 2 is 40ms
- the correct audio timestamp of audio frame 3 is 60ms
- the audio The correct audio timestamp for frame 4 is 80ms.
- the audio and image in the MP4 file generated by the mobile phone are out of sync.
- an electronic device such as a mobile phone
- the image frame 2 played by the mobile phone corresponds to the audio frame 1, that is, the video screen of the mobile phone is displayed.
- the audio corresponding to audio frame 1 is played in the speaker
- the audio corresponding to image frame 3 is played in the speaker. Audio corresponding to frame 2, etc.
- the written audio time stamp of audio frame 1 written in the MP4 file (that is, the audio time stamp corresponding to audio frame 1 recorded in the MP4 file) is 40ms, and the written audio time of audio frame 2
- the stamp is 60ms
- the written audio timestamp of audio frame 3 is 80ms
- the written audio timestamp of audio frame 4 is 100ms. Therefore, the mobile phone plays the corresponding video timestamp based on the audio timestamp (that is, the written audio timestamp).
- the audio frame and the image frame are recorded, there will be a problem as shown in Figure 7d, that is, the dance movement played does not match the music, and the music is slower than the dance movement.
- the delay may be tens or even hundreds of milliseconds.
- the influence of the propagation speed of light and the system processing time on the video time stamp is negligible, that is to say, the actual occurrence time of the video picture corresponding to the video time stamp and the video time stamp can be considered to be unbiased.
- the audio time stamp is aligned with the video time stamp.
- the influence of delay including sound propagation delay and algorithm processing delay
- the audio and video images are still out of synchronization.
- AudioServer generates and outputs to Media Server
- the audio timestamp is the result of subtracting the delay (Latency) (also called audio delay) from the original audio timestamp (that is, the audio timestamp calculated based on formula 1), where the delay includes the sound propagation time. Delay, or sound propagation delay and algorithm processing delay, so as to calibrate the audio time stamp and achieve audio and image synchronization.
- the audio server (specifically, the Record Thread instance) is used as an example to describe the execution body that processes audio time stamps.
- the execution body that processes audio time stamps is also It can be other modules or instances in the electronic device, such as Media Server, optionally an AudioRecord instance, or AudioHal, optionally an Input Stream instance, which is not limited in this application.
- the mobile phone starts the camera application in response to the user's operation behavior, and the camera application recognizes that the initial zoom factor is X.
- the zoom factor is 3 times as an example for description. In other embodiments, the zoom factor is It can also be 5 times, 1.5 times, etc., which is not limited in this application.
- the creation process of each instance is performed. For details, refer to the relevant description in FIG. 3 , which will not be repeated here.
- the camera application outputs the zoom factor X to Camera Hal and Audio Hal. For details, please refer to the relevant description of FIG. 6 , which will not be repeated here.
- the Audio HAL may determine whether it is necessary to trigger the video processing method of the embodiment of the present application based on the received zoom factor.
- the camera application outputs an initial zoom factor to the Audio HAL, optionally, the initial zoom factor may be greater than or equal to 1.
- the Audio HAL receives the zoom factor X input by the camera application, and detects that the zoom factor X is greater than the set zoom factor, such as 2 times, and the Audio HAL determines that the video processing method of the present application needs to be triggered.
- the Audio HAL detects that the zoom factor is less than or equal to the set zoom factor, it is determined not to trigger the technical solution described in this application, that is, processing is performed according to the recording process shown in FIG. 3 above.
- the initial zoom factor can be the zoom factor obtained and saved by the camera application for the last time during the last recording, or the initial zoom factor can be the zoom factor set by the user in the preview interface (the interface when recording has not yet started) .
- the user can set the corresponding zoom factor through the zoom factor option provided on the shooting interface.
- the user can set the corresponding zoom factor through the mode setting option provided by the shooting interface.
- different modes correspond to different zoom factors.
- the zoom factor corresponding to the long-range shooting mode is 5 times. Do limit.
- Audio HAL may execute the technical solutions in the embodiments of the present application after recording starts, that is, when the zoom factor is less than the set zoom factor, for example, in a scene where the zoom factor is 1x
- the Audio HAL will still execute the technical solutions of the present application, such as correcting the audio timestamp based on the time delay.
- the Audio HAL is in the scene where the zoom factor is 1. The obtained delay is 0, and the result after correction of the audio timestamp is consistent with the result before correction.
- processing flow of the image thread can still refer to the flow shown in FIG. 3 , and only the processing flow of the audio thread is described in detail in the following embodiments.
- the microphone driver acquires the audio stream collected by the microphone from the microphone, and the microphone driver inputs the acquired audio stream to Audio Hal.
- the example of the recording directional enhancement algorithm in Audio Hal can process the audio stream according to the zoom factor X.
- the processing of the audio stream by the example of the recording directional enhancement algorithm includes, but is not limited to, processing such as enhancing the audio within the zoom range, and reducing noise.
- the recording directional enhancement algorithm instance outputs the processed audio stream to the Input Stream instance, and the Input Stream instance further processes the audio stream, including but not limited to: resampling, channel transformation and other processing.
- the Input Stream instance of Audio Hal not only processes the audio stream, but also obtains the delay (including the sound propagation delay, or the sound propagation delay and the algorithm processing delay), which is used in the Record Thread of the Audio Server.
- the instance periodically reads data from the Input Stream instance, it outputs the audio stream and delay (also known as delay information) to the Record Thread instance.
- the delay may include the sound propagation delay, or the sound propagation delay and the algorithm processing delay. It should be noted that if the sound propagation delay is much greater than the algorithm processing delay, the algorithm processing delay can be ignored. For example, if the effective acquisition range of the microphone can reach 30 meters or more, then the sound propagation delay will be When it reaches more than 100ms, if the processing delay of the algorithm is 20-40ms, the processing delay of the algorithm is negligible for the sound propagation delay.
- the Input Stream instance can still output the audio stream, sound propagation delay and algorithm processing delay to the Record Thread instance, and the Record Thread instance ignores the algorithm processing delay when calculating the audio timestamp.
- the Input Stream instance can output only the audio stream and sound propagation delay to the Record Thread instance.
- the calculation method of the sound propagation delay is:
- d represents the imaging distance (unit is meter (m)) corresponding to the zoom factor X
- c is the propagation speed of sound in the air (340 meters (m)/second (s)).
- the algorithm processing delay is a fixed value, which is obtained through experiments, and ranges from about 1 to 100 ms.
- the specific value is set according to the experimental results, which is not limited in this application.
- Figure 9 is a schematic diagram of the interaction between Audio Hal, Audio Server and Media Server. It should be noted that in the relevant description of Figure 9, the examples in each module are used as the main body for description. In Figure 9 middle:
- AudioServer obtains the audio stream and delay input by Audio Hal.
- the Record Thread instance of AudioServer periodically reads the audio stream and the delay from the Input Stream instance of Audio Hal.
- the reading period is 20ms as an example for description, that is, the The frame length is 20ms (for related concepts, see the calculation method of the audio timestamp above).
- the period length and the audio frame length may also be other values, which are not limited in this application.
- the delay may only include the sound propagation delay, and in another example, the delay may include the sound propagation delay and the algorithm processing delay.
- Audio Hal detects that the zoom factor is greater than the set zoom factor, such as 2 times
- Audio Hal triggers the recording orientation algorithm, that is, the recording orientation algorithm instance processes the audio stream based on the recording orientation algorithm
- the algorithm processing delay will be introduced in the processing process, and the delay input by Audio Hal to AudioServer includes algorithm processing delay and sound propagation delay.
- Audio Hal detects that the zoom factor is less than or equal to the set zoom factor Audio Hal does not trigger the recording orientation algorithm, that is, there is no algorithm processing delay.
- the delay input by Audio Hal to AudioServer includes the sound propagation delay.
- the AudioServer obtains the audio timestamp based on the delay.
- the Record Thread instance of AudioServer can obtain the audio timestamp (unit is ms) according to the following formula:
- Audio timestamp N*l-latency(3)
- N is the number of readings, that is, the Nth reading cycle
- l is the audio frame length (unit is ms)
- latency is the delay (unit is ms).
- the audio timestamp is the result of rounding up the audio frame duration (for example, 20ms), that is to say, the result calculated by the Record Thread instance based on formula (3) needs to be rounded to the multiple of 20ms.
- the result calculated based on the formula (3) is 38ms, then the audio time stamp after rounding to a multiple of 20ms is 40ms.
- AudioServer outputs audio stream and audio timestamp to MediaServer.
- the Audio stream and the corresponding audio time stamp are output to MediaServer (specifically, the Audio Record instance).
- the AudioServer (specifically, the Record Thread instance) executes S101 to S103 periodically (for example, the period is 20ms), that is to say, the Record Thread instance reads an audio stream every 20ms, and obtains the corresponding audio stream. Audio timestamps, and output to Auido Record instance.
- Audio Server outputs audio time stamp and audio stream to Media Server, and through each instance in Media Server, audio stream and time stamp are processed (process can refer to the relevant description in Fig. 4, not repeated here ) to generate an MP4 file, the corresponding relationship between the video time stamp and the audio time stamp in the MP4 file may refer to FIG. 4 .
- the calculation of the delay can also be performed by AudioServer, such as an instance of Record Thread.
- AudioServer can periodically read audio and the current zoom factor from AudioHal, and calculate according to the current zoom factor.
- the sound propagation delay, and the AudioServer can calculate the corrected audio timestamp based on the above-mentioned calculation formula of the audio timestamp.
- the corrected audio timestamp may also be calculated by AudioHal or MediaServer, and the specific method is similar to that of AudioServer, and details are not described here.
- the solution described in the first embodiment is executed under the condition that the zoom factor X remains unchanged.
- the camera application can control the zoom factor transformation in response to the user's operation behavior, and based on the audio time of this application.
- the problem of audio timestamp jitter may occur, making the audio discontinuous. Timestamp jitter problem.
- the camera application can adjust the zoom factor based on the user's operation behavior, assuming that the adjusted zoom factor is Y.
- the zoom factor Y is greater than the zoom factor X, that is, the zoom factor increases, that is, the shooting distance (or imaging distance) increases, it may cause the current audio timestamp to be less than or equal to the previous audio timestamp.
- d2 is the imaging distance (unit is meter) corresponding to the zoom factor Y
- d1 is the imaging distance (unit is meter) corresponding to the zoom factor X
- N is the current number of readings (which can also be understood as the current reading cycle. )
- (N-1) is the last number of readings (that is, the last reading cycle)
- p is the algorithm processing delay
- the delay is the sum of the sound propagation delay and the algorithm processing delay, that is to say, the current delay (set as latency2) and the time delay of the previous cycle
- the difference between the delay (set as latency1) is greater than or equal to the audio frame length, expressed as latency2-latency1 ⁇ audio frame length (20ms).
- the audio timestamp corresponding to the audio frame read in the current cycle (denoted as recording audio timestamp 2) will be less than or equal to the audio timestamp corresponding to the audio frame read in the previous cycle (denoted as recording audio timestamp 1).
- recorded audio timestamp 2 - recorded audio timestamp 1 ⁇ 0ms
- recorded audio timestamp 1 and recorded audio timestamp 2 are only to better represent the relationship between recorded audio timestamps
- this application does not limit the number of audio time stamps, as shown in FIG. 10a, with reference to FIG. 10a, the expected audio time stamp, or the audio time stamp written to the MP4 file (referred to as the written audio time stamp for short) ) should be: 0ms, 20ms, 40ms, 60ms, 80ms (only the audio timestamps corresponding to 4 audio streams are taken as an example for illustration in the figure). That is, the correct audio timestamp should be the written audio timestamp in Figure 10a.
- the uncorrected audio time stamp is called the recorded audio time stamp, for example, referring to FIG. 10a, which shows the comparison of the written audio time stamp and the recorded audio time stamp,
- the recording audio timestamp that is, the recording audio timestamp 2
- each subsequent recorded audio timestamp differs from the expected or correct written audio timestamp by 20ms. It can also be understood that if the audio frame corresponding to the recorded audio timestamp shown in FIG. 10a is played, the audio played at the 40th ms will be the audio collected at the 20th ms.
- the zoom factor Y is smaller than the zoom factor X, that is, the zoom factor is enlarged, that is, the imaging distance increases
- the interval between the current recorded audio timestamp and the last recorded audio timestamp may be greater than or equal to 40ms.
- the difference between the delay of the previous cycle (set as latency1) and the current delay (set as latency2) is greater than or equal to the audio frame length, which is expressed as latency1-latency2 ⁇ audio frame length (20ms).
- the interval between the current recorded audio timestamp and the last recorded audio timestamp is greater than or equal to 40ms (that is, twice the audio frame length, which can also be expressed as twice the reading cycle length), which is expressed as the recorded audio timestamp 2 - Recording audio timestamp 1 ⁇ 40ms, as shown in Figure 11a, referring to Figure 11a, exemplarily, the audio timestamp that should be 40ms reduces the time delay due to the reduction of the zoom factor, so that the processed recorded audio
- the audio timestamp that should be originally 60ms, the recorded audio timestamp after processing is 80ms, that is, starting from the moment when the zoom factor is changed, each subsequent recorded audio timestamp is the same as expected or correctly written.
- the input audio timestamps differ by 20ms. It can also be understood that if the audio frame corresponding to the recorded audio timestamp shown in FIG. 11a is played, the audio played at the 40th ms will be the audio collected at the 60th ms.
- the audio timestamp when calculating the audio timestamp, it is obtained by rounding up to 20ms, that is to say, the audio timestamp is a multiple of 20ms, and if latency2 is greater than latency1, and the difference between the two is less than the audio frame length ( 20ms), or, when latency1 is greater than latency2, and the difference between the two is less than the audio frame length, the calculated audio timestamp will not have jitter problems.
- the transmission distance of sound within 20ms is about 6.8m, that is to say, when the change of zoom factor makes the shooting distance increase or decrease by 6.8m or more, there will be audio time Poke jitter issues.
- AudioHal obtains the adjusted zoom factor Y.
- the camera application may determine the zoom factor of the current shot in response to the detected user operation behavior, and the adjusted zoom factor Y may be greater than the zoom factor X, or may be smaller than the zoom factor X.
- the camera application can output the adjusted zoom factor Y to AudioHal (specifically, an instance of Input Stream).
- AudioHal obtains the audio stream and delay.
- AudioHal (specifically, the Input Stream instance can be based on the zoom factor Y, obtain the sound propagation delay, and determine that the delay is the sum of the sound propagation delay (ms) and the algorithm processing delay.
- the specific acquisition method can refer to the embodiment. One, it will not be repeated here.
- the AudioServer obtains the audio stream and delay input by the AudioHal.
- the Audio Server (specifically, the Record Thread instance) periodically (for example, 20ms) reads the audio stream and the delay from reading the AudioHal (specifically, the Input Stream), as shown in Figure 12.
- the AudioHal specifically, the Input Stream
- the corresponding delay before the zoom factor adjustment is latency1
- the corresponding delay time after the zoom factor adjustment is latency2 as an example for description. That is, the Record Thread instance obtains the audio stream and latency2 from the Input Stream instance.
- the AudioServer obtains the recorded audio timestamp 2 based on the delay.
- the AudioServer determines whether the difference between the recorded audio timestamp 2 and the recorded audio timestamp 1 is within the difference range.
- the difference range is greater than 0 and less than 2 times the reading cycle duration, expressed as (0, 21), where l is the reading cycle duration (that is, the audio frame length).
- the difference between the recorded audio timestamp 2 and the recorded audio timestamp 1 is greater than 0 and less than 21, for example, 0ms ⁇ recorded audio timestamp 2-recorded audio timestamp 1 ⁇ 40ms, that is, the current
- the delay difference between the cycle delay and the previous cycle delay is less than 20ms, including: latency2-latency1 ⁇ 20ms, latency1-latency2 ⁇ 20ms, that is, when the zoom factor remains the same, or the zoom factor changes little.
- the Audio Server (specifically the Record Thread instance) determines that the difference between the recorded audio timestamp 2 and the recorded audio timestamp 1 is within the difference range, and executes step 206.
- the difference between the recorded audio timestamp 2 and the recorded audio timestamp 1 is not within the difference range, that is, the recorded audio timestamp 2 - the recorded audio timestamp 1 ⁇ 0ms, or, the recorded audio timestamp 2 - record audio timestamp 1 ⁇ 40ms, then execute S207.
- the AudioServer outputs the recorded audio timestamp 2 and the audio stream to the Media Server.
- the AudioServer (specifically the Record Thread instance) determines in S205 that the difference between the two recorded audio timestamps is within the difference range, then the recorded audio timestamp 2 and the corresponding audio stream are input to the Media Server, and the Media Server can write recorded audio timestamp 2 and audio frame into MP4 file.
- the recorded audio time stamp is written into the MP4 file, it may also be referred to as writing the audio time stamp.
- AudioServer judges whether the difference is less than the difference range.
- step 208 is performed.
- the recorded audio timestamp 2 - the recorded audio timestamp 1 ⁇ 40ms that is, the difference between the recorded audio timestamp 2 and the recorded audio timestamp 1 is greater than the difference range, that is, in the current cycle
- the corresponding delay (latency2) is less than the delay corresponding to the previous cycle, and the difference between the two is greater than or equal to the audio frame length (20ms), that is, latency1-latency2 ⁇ 20ms, that is, in the scenario where the zoom factor is reduced, perform the steps 209.
- AudioServer discards the audio stream and recorded audio timestamp 2.
- AudioServer (specifically the Record Thread instance) periodically reads the audio stream and time delay from the Audio Hal (specifically the Input Stream instance), and obtains the corresponding recorded audio timestamp, if in step 205 , after determining the recording audio timestamp 2-recording audio timestamp 1 ⁇ 0ms, the Record Thread instance discards the audio timestamp (that is, the recording audio timestamp 2) and the corresponding audio frame, that is, skips the audio frame, And output the audio frame and audio time stamp of the next cycle to the MediaServer (specifically, the Audio Record instance), and the audio time stamp after jitter processing (or called correction processing, correction processing, etc.) is as shown in Figure 10b, with reference to Figure 10b , the Record Thread instance discards the second repeated 20ms audio timestamp (that is, the recording audio timestamp 2 described above) and its corresponding audio stream, and then writes the audio timestamp corresponding to the audio stream of the MP4 file (that is, Fig.
- the distance of sound propagation per 20ms is 6.8m, and the calculation is carried out with a gradient of 6.8m. That is, after the zoom factor transformation, the sound pickup distance increases by 6.8m, the sound delay will increase. 20ms, correspondingly, every time the sound delay increases by 20ms, a period of audio frames and their timestamps will be discarded.
- the AudioServer After the AudioServer discards the audio frame and its audio timestamp, it will repeat S201 and its subsequent steps in the next cycle.
- the AudioServer performs frame insertion compensation.
- AudioServer (specifically, Record Thread) can perform pin compensation between the current audio frame and the previous audio frame to prevent audio jitter.
- the Record Thread instructs AudioHal to perform the calculation of the audio time stamp of the inserted frame, and the audio time stamp of the inserted audio frame is (referred to as the inserted audio time stamp):
- AudioHal calculates the audio timestamp according to the middle value of the imaging distance corresponding to the zoom factor Y and the zoom factor X. AudioServer reads the audio frame and audio timestamp output by AudioHal twice in this cycle (that is, the inserted audio timestamp), and inserts the audio timestamp between two audio timestamps with an interval of 40ms, as shown in the figure As shown in 11b, referring to FIG. 11b, the audio time stamps corresponding to the audio in the MP4 file are: 0ms, 20ms, 40ms (that is, insert audio time stamps), and 60ms.
- the audio frame corresponding to the inserted audio timestamp may be the result of fading in and out of the previous audio frame.
- the AudioServer outputs the current audio stream and the recorded audio timestamp 2, as well as the inserted audio stream and the inserted audio timestamp correspondingly to the Media Server.
- AudioServer will read the audio stream twice and the corresponding audio timestamp, namely the newly inserted audio timestamp and its corresponding audio stream, and the current audio timestamp (that is, the recording audio timestamp 2 ) and its corresponding audio stream output to the Media Server (specifically, the Audio Record instance), and the Media Server can write the newly inserted audio timestamp and its corresponding audio frame, as well as the current audio timestamp and its corresponding audio frame.
- MP4 file is imported, and in the next cycle, S201 and its subsequent steps are re-executed.
- S204 to S209 in FIG. 12 are all performed based on the judgment result of the difference between the recorded audio timestamp 2 and the recorded audio timestamp 1.
- AudioServer may obtain The delay difference between the delay of the current cycle and the delay of the previous cycle is determined, and the subsequent processing method is determined.
- Figure 13 shows another processing method after S203. In Figure 13:
- the AudioServer determines whether the delay difference between the current cycle delay and the previous cycle delay is within a preset range.
- the Audio Server (specifically, the Record Thread instance) determines that the delay difference between the current cycle delay and the previous cycle delay is not within the preset range, and executes S304.
- the Audio Server (specifically, the Record Thread instance) determines that the delay difference between the current cycle delay and the previous cycle delay is not within the preset range, and executes S304.
- the Audio Server (specifically, the Record Thread instance) determines that the delay difference between the current cycle delay and the previous cycle delay is within a preset range, and executes S302.
- the AudioServer obtains the recorded audio timestamp 2 based on the delay.
- the AudioServer outputs the recorded audio timestamp 2 and the audio stream to the Media Server.
- the AudioServer (specifically the Record Thread instance) determines in S301 that the time difference between the two delays is within the preset range, then the AudioServer obtains the corresponding recorded audio timestamp 2, and inputs the audio timestamp and the audio stream to the Media Server, Media Server can write audio timestamps and audio frames to MP4 files.
- the AudioServer determines whether the delay of the current cycle is greater than the delay of the previous cycle.
- AudioServer discards the audio stream.
- AudioServer performs frame insertion compensation.
- the AudioServer outputs the current audio stream and the recorded audio timestamp 2, as well as the inserted audio stream and the inserted audio timestamp correspondingly to the Media Server.
- S305-S307 are the same as those of S207-S209 respectively, and will not be repeated here.
- the electronic device can adjust the time stamp of the audio based on the current zoom factor to offset the time delay caused by the distance between the electronic device and the sound source when the sound is directionally collected.
- the electronic device can also dynamically adjust the audio time stamp based on the change of the zoom factor, so as to suppress the time stamp jitter caused by the change of the zoom factor, so as to provide a video processing method, which can effectively correct the audio delay, so as to realize the audio and video synchronization.
- Image synchronization can be used to adjust the time stamp of the audio based on the current zoom factor to offset the time delay caused by the distance between the electronic device and the sound source when the sound is directionally collected.
- the electronic device can also dynamically adjust the audio time stamp based on the change of the zoom factor, so as to suppress the time stamp jitter caused by the change of the zoom factor, so as to provide a video processing method, which can effectively correct the audio delay, so as to realize the audio and video synchronization.
- the electronic device includes corresponding hardware and/or software modules for executing each function.
- the present application can be implemented in hardware or in the form of a combination of hardware and computer software in conjunction with the algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functionality for each particular application in conjunction with the embodiments, but such implementations should not be considered beyond the scope of this application.
- FIG. 14 shows a schematic block diagram of an apparatus 200 according to an embodiment of the present application.
- the apparatus 200 may include: a processor 201 , a transceiver/transceiver pin 202 , and optionally, a memory 203 .
- bus 204 includes a power bus, a control bus and a status signal bus in addition to a data bus.
- bus 204 includes a power bus, a control bus and a status signal bus in addition to a data bus.
- the various buses are referred to as bus 204 in the figures.
- the memory 203 may be used for instructions in the foregoing method embodiments.
- the processor 201 can be used to execute the instructions in the memory 203, and to control the receive pins to receive signals, and to control the transmit pins to transmit signals.
- the apparatus 200 may be the electronic device or the chip of the electronic device in the above method embodiments.
- This embodiment also provides a computer storage medium, where computer instructions are stored in the computer storage medium, and when the computer instructions are executed on the electronic device, the electronic device executes the above-mentioned related method steps to implement the video processing method in the above-mentioned embodiment.
- This embodiment also provides a computer program product, which when the computer program product runs on the computer, causes the computer to execute the above-mentioned relevant steps, so as to realize the video processing method in the above-mentioned embodiment.
- the embodiments of the present application also provide an apparatus, which may specifically be a chip, a component or a module, and the apparatus may include a connected processor and a memory; wherein, the memory is used for storing computer execution instructions, and when the apparatus is running, The processor can execute the computer-executable instructions stored in the memory, so that the chip executes the video processing methods in the foregoing method embodiments.
- the electronic device, computer storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, for the beneficial effects that can be achieved, reference can be made to the corresponding provided above. The beneficial effects in the method will not be repeated here.
- the disclosed apparatus and method may be implemented in other manners.
- the apparatus embodiments described above are only illustrative.
- the division of modules or units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be combined or May be integrated into another device, or some features may be omitted, or not implemented.
- the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
- Units described as separate components may or may not be physically separated, and components shown as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium.
- a readable storage medium including several instructions to make a device (which may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods in the various embodiments of the present application.
- the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Computer Security & Cryptography (AREA)
- Studio Devices (AREA)
Abstract
Description
Claims (27)
- 一种视频处理方法,其特征在于,包括:检测到用户的第一指令;响应于所述第一指令,显示拍摄界面;获取第一变焦倍数,并获取所述第一变焦倍数对应的第一视频,其中,所述第一视频包括第一音频和第一图像;所述第一音频和所述第一图像对应,所述第一图像包括被拍摄物体,所述第一音频根据声源发出的声音生成,所述声源为所述被拍摄物体,或者,所述声源与所述被拍摄物体的距离在设定的范围内;获取所述第一音频对应的第一时延,所述第一时延包括第一声音传播时延,或者,所述第一时延包括所述第一声音传播时延和设定的算法处理时延,所述第一声音传播时延为所述声源发出的声音从所述声源传输至电子设备导致的时延;基于所述第一时延,确定所述第一音频的第一音频时间戳;在所述拍摄界面,显示所述第一图像,并且,保存所述第一图像,以及所述第一音频和所述第一音频时间戳的对应关系。
- 根据权利要求1所述的方法,其特征在于,所述保存所述第一图像,以及所述第一音频和所述第一音频时间戳的对应关系之后,方法还包括:根据所述第一音频时间戳,将所述第一音频和所述第一图像同步播放。
- 根据权利要求1所述的方法,其特征在于,所述获取所述第一音频对应的第一时延,包括:根据所述第一变焦倍数,以及变焦倍数与成像距离的对应关系,获取所述第一变焦倍数对应的第一成像距离;基于下述公式,计算所述第一声音传播时延:其中,d1为所述第一成像距离,c为声音在拍摄的介质中的传播速度;所述基于所述第一音频对应的时延,确定所述第一音频的第一音频时间戳,包括:基于下述公式,计算所述第一音频时间戳:第一音频时间戳=N1*l-latency1其中,latency1为所述第一时延,l为读取周期的周期时长,所述读取周期为从视频录制起始时刻周期性地读取采集点已采集的音频的周期,N1为所述第一音频对应的读取周期,N1为大于或等于1的整数。
- 根据权利要求3所述的方法,其特征在于,所述保存所述第一图像,以及所述第一音频和所述第一音频时间戳的对应关系之后,方法还包括:检测到用户的第二指令;根据所述第二指令,获取第二变焦倍数,并获取所述第二变焦倍数对应的第二视频,其中,所述第二视频包括第二音频和第二图像;所述第二图像包括另一被拍摄物体,所述第二音频根据另一声源发出的声音生成,所述另一声源为所述另一被拍摄物体,或者,所述另一声源与所述另一被拍摄物体的距离在所述设定的范围内;所述第二变焦倍数与所述第一变焦倍数不同;获取所述第二音频对应的第二时延,所述第二时延包括第二声音传播时延,或者,所述第二时延包括所述第二声音传播时延和所述算法处理时延,其中,所述第二声音传播时延为所述另一声源发出的声音从所述另一声源传输至所述电子设备导致的时延;基于所述第二时延,确定所述第二音频的第二音频时间戳。
- 根据权利要求5所述的方法,其特征在于,所述基于所述第二时延,确定所述第二音频的第二音频时间戳,包括:基于下述公式,计算所述第二音频时间戳:第二音频时间戳=N2*l-latency2其中,latency2为所述第二时延,N2为所述第二音频对应的读取周期,N2与N1为相邻周期,且N2大于N1。
- 根据权利要求6所述的方法,其特征在于,所述确定所述第二音频的第二音频时间戳之后,包括:基于下述公式,获取所述第二音频时间戳与所述第一音频时间戳之间的差值:差值=第二音频时间戳-第一音频时间戳若所述差值大于0且小于2l,在所述拍摄界面,显示所述第二图像,并且,保存所述第二图像,以及所述第二音频和所述第二音频时间戳的对应关系。
- 根据权利要求7所述的方法,其特征在于,若所述差值小于0,在所述拍摄界面,显示所述第二图像,并且,保存所述第二图像,以及,丢弃所述第二音频和所述第二音频时间戳。
- 根据权利要求8所述的方法,其特征在于,所述保存所述第二图像,以及,丢弃所述第二音频和所述第二音频时间戳之后,还包括:根据第三音频时间戳,将第三音频和所述第二图像同步播放;其中,所述第三音频时间戳与所述第三音频对应,所述第三音频为在所述第二音频对应的读取周期的下一个读取周期获取到的。
- 根据权利要求10所述的方法,其特征在于,所述保存所述第二图像,所述第二音频和所述第二音频时间戳的对应关系,以及插入音频和插入音频时间戳的对应关系之后,还包括:根据所述插入音频时间戳,将所述插入音频和所述第二图像同步播放。
- 根据权利要求1所述的方法,其特征在于,所述获取所述第一音频对应的第一时延,包括:当所述第一变焦倍数大于设定的变焦倍数,获取所述第一音频对应的第一时延。
- 根据权利要求1所述的方法,其特征在于,所述获取所述第一变焦倍数,包括:读取已存储的前一次视频录制结束前最后一次获取到的变焦倍数;或者,检测到用户的变焦指令,响应于所述变焦指令,获取所述第一变焦倍数;或者,检测到用户的模式设置指令,响应于所述模式设置指令,确定第一变焦模式,并根据变焦模式与变焦倍数的对应关系,获取所述第一变焦模式对应的所述第一变焦倍数。
- 一种电子设备,其特征在于,包括:一个或多个处理器;存储器;以及一个或多个计算机程序,其中所述一个或多个计算机程序存储在所述存储器上,当所述计算机程序被所述一个或多个处理器执行时,使得所述电子设备执行以下步骤:检测到用户的第一指令;响应于所述第一指令,显示拍摄界面;获取第一变焦倍数,并获取所述第一变焦倍数对应的第一视频,其中,所述第一视频包括第一音频和第一图像;所述第一音频和所述第一图像对应,所述第一图像包括被拍摄物体,所述第一音频根据声源发出的声音生成,所述声源为所述被拍摄物体,或者,所述声源与所述被拍摄物体的距离在设定的范围内;获取所述第一音频对应的第一时延,所述第一时延包括第一声音传播时延,或者,所述第一时延包括所述第一声音传播时延和设定的算法处理时延,所述第一声音传播时延为所述声源发出的声音从所述声源传输至所述电子设备导致的时延;基于所述第一时延,确定所述第一音频的第一音频时间戳;在所述拍摄界面,显示所述第一图像,并且,保存所述第一图像,以及所述第一音频和所述第一音频时间戳的对应关系。
- 根据权利要求14所述的电子设备,其特征在于,当所述计算机程序被所述一个或多个处理器执行时,使得所述电子设备执行以下步骤:根据所述第一音频时间戳,将所述第一音频和所述第一图像同步播放。
- 根据权利要求14所述的电子设备,其特征在于,当所述计算机程序被所述一个或多个处理器执行时,使得所述电子设备执行以下步骤:根据所述第一变焦倍数,以及变焦倍数与成像距离的对应关系,获取所述第一变焦倍数对应的第一成像距离;基于下述公式,计算所述第一声音传播时延:其中,d1为所述第一成像距离,c为声音在拍摄的介质中的传播速度;所述电子设备还执行以下步骤:基于下述公式,计算所述第一音频时间戳:第一音频时间戳=N1*l-latency1其中,latency1为所述第一时延,l为读取周期的周期时长,所述读取周期为从视频录制起始时刻周期性地读取采集点已采集的音频的周期,N1为所述第一音频对应的读取周期,N1为大于或等于1的整数。
- 根据权利要求16所述的电子设备,其特征在于,当所述计算机程序被所述一个或多个处理器执行时,使得所述电子设备执行以下步骤:检测到用户的第二指令;根据所述第二指令,获取第二变焦倍数,并获取所述第二变焦倍数对应的第二视频,其中,所述第二视频包括第二音频和第二图像;所述第二图像包括另一被拍摄物体,所述第二音频根据另一声源发出的声音生成,所述另一声源为所述另一被拍摄物体,或者,所述另一声源与所述另一被拍摄物体的距离在所述设定的范围内;所述第二变焦倍数与所述第一变焦倍数不同;获取所述第二音频对应的第二时延,所述第二时延包括第二声音传播时延,或者,所述第二时延包括所述第二声音传播时延和所述算法处理时延,其中,所述第二声音传播时延为所述另一声源发出的声音从所述另一声源传输至所述电子设备导致的时延;基于所述第二时延,确定所述第二音频的第二音频时间戳。
- 根据权利要求18所述的电子设备,其特征在于,当所述计算机程序被所述一个或多个处理器执行时,使得所述电子设备执行以下步骤:基于下述公式,计算所述第二音频时间戳:第二音频时间戳=N2*l-latency2其中,latency2为所述第二时延,N2为所述第二音频对应的读取周期,N2与N1为相邻周期,且N2大于N1。
- 根据权利要求19所述的电子设备,其特征在于,当所述计算机程序被所述一个或多个处理器执行时,使得所述电子设备执行以下步骤:基于下述公式,获取所述第二音频时间戳与所述第一音频时间戳之间的差值:差值=第二音频时间戳-第一音频时间戳若所述差值大于0且小于2l,在所述拍摄界面,显示所述第二图像,并且,保存所述第二图像,以及所述第二音频和所述第二音频时间戳的对应关系。
- 根据权利要求20所述的电子设备,其特征在于,当所述计算机程序被所述一个或多个处理器执行时,使得所述电子设备执行以下步骤:若所述差值小于0,在所述拍摄界面,显示所述第二图像,并且,保存所述第二图像,以及,丢弃所述第二音频和所述第二音频时间戳。
- 根据权利要求21所述的电子设备,其特征在于,当所述计算机程序被所述一个或多个处理器执行时,使得所述电子设备执行以下步骤:根据第三音频时间戳,将第三音频和所述第二图像同步播放;其中,所述第三音频时间戳与所述第三音频对应,所述第三音频为在所述第二音频对应的读取周期的下一个读取周期获取到的。
- 根据权利要求23所述的电子设备,其特征在于,当所述计算机程序被所述一个或多个处理器执行时,使得所述电子设备执行以下步骤:根据所述插入音频时间戳,将所述插入音频和所述第二图像同步播放。
- 根据权利要求14所述的电子设备,其特征在于,当所述计算机程序被所述一个或多个处理器执行时,使得所述电子设备执行以下步骤:当所述第一变焦倍数大于设定的变焦倍数,获取所述第一音频对应的第一时延。
- 根据权利要求14所述的电子设备,其特征在于,当所述计算机程序被所述一个或多个处理器执行时,使得所述电子设备执行以下步骤:读取已存储的前一次视频录制结束前,最后一次获取到的变焦倍数;或者,检测到用户的变焦指令,响应于所述变焦指令,获取所述第一变焦倍数;或者,检测到用户的模式设置指令,响应于所述模式设置指令,确定第一变焦模式,并根据变焦模式与变焦倍数的对应关系,获取所述第一变焦模式对应的所述第一变焦倍数。
- 一种计算机可读存储介质,包括计算机程序,其特征在于,当所述计算机程序在电子设备上运行时,使得所述电子设备执行如权利要求1-13中任意一项所述的视频处理方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21860225.8A EP4195653A4 (en) | 2020-08-26 | 2021-08-18 | VIDEO PROCESSING METHOD AND ELECTRONIC DEVICE |
US18/173,904 US20230197115A1 (en) | 2020-08-26 | 2023-02-24 | Video Processing Method and Electronic Device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010883703.9A CN114125258B (zh) | 2020-08-26 | 2020-08-26 | 视频处理方法及电子设备 |
CN202010883703.9 | 2020-08-26 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/173,904 Continuation US20230197115A1 (en) | 2020-08-26 | 2023-02-24 | Video Processing Method and Electronic Device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022042387A1 true WO2022042387A1 (zh) | 2022-03-03 |
Family
ID=80354589
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/113153 WO2022042387A1 (zh) | 2020-08-26 | 2021-08-18 | 视频处理方法及电子设备 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230197115A1 (zh) |
EP (1) | EP4195653A4 (zh) |
CN (2) | CN116437197A (zh) |
WO (1) | WO2022042387A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115604540A (zh) * | 2022-09-01 | 2023-01-13 | 荣耀终端有限公司(Cn) | 视频获取方法及装置 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115942108A (zh) * | 2021-08-12 | 2023-04-07 | 北京荣耀终端有限公司 | 一种视频处理方法及电子设备 |
CN115022537B (zh) * | 2022-05-24 | 2023-08-29 | Oppo广东移动通信有限公司 | 视频拍摄方法、装置、电子设备及存储介质 |
CN115022536B (zh) * | 2022-05-24 | 2023-10-03 | Oppo广东移动通信有限公司 | 视频拍摄方法、装置、电子设备及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008048374A (ja) * | 2006-07-21 | 2008-02-28 | Victor Co Of Japan Ltd | ビデオカメラ装置 |
CN101567969A (zh) * | 2009-05-21 | 2009-10-28 | 上海交通大学 | 基于麦克风阵列声音制导的智能视频导播方法 |
JP2017103542A (ja) * | 2015-11-30 | 2017-06-08 | 株式会社小野測器 | 同期装置、同期方法及び同期プログラム |
CN107404599A (zh) * | 2017-07-17 | 2017-11-28 | 歌尔股份有限公司 | 音视频数据同步方法、装置及系统 |
CN108287924A (zh) * | 2018-02-28 | 2018-07-17 | 福建师范大学 | 一种可定位视频数据采集与组织检索方法 |
US20200092442A1 (en) * | 2016-12-21 | 2020-03-19 | Interdigital Ce Patant Holdings | Method and device for synchronizing audio and video when recording using a zoom function |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090207277A1 (en) * | 2008-02-20 | 2009-08-20 | Kabushiki Kaisha Toshiba | Video camera and time-lag correction method |
CN106060534A (zh) * | 2016-06-03 | 2016-10-26 | 公安部第三研究所 | 一种音视频同步测试的系统及方法 |
CN107135413A (zh) * | 2017-03-20 | 2017-09-05 | 福建天泉教育科技有限公司 | 一种音视频同步方法及系统 |
US20180367839A1 (en) * | 2017-06-16 | 2018-12-20 | Oohms Ny Llc | Method and system for synchronization of audio content for a remotely displayed video |
-
2020
- 2020-08-26 CN CN202310404489.8A patent/CN116437197A/zh active Pending
- 2020-08-26 CN CN202010883703.9A patent/CN114125258B/zh active Active
-
2021
- 2021-08-18 WO PCT/CN2021/113153 patent/WO2022042387A1/zh active Application Filing
- 2021-08-18 EP EP21860225.8A patent/EP4195653A4/en active Pending
-
2023
- 2023-02-24 US US18/173,904 patent/US20230197115A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008048374A (ja) * | 2006-07-21 | 2008-02-28 | Victor Co Of Japan Ltd | ビデオカメラ装置 |
CN101567969A (zh) * | 2009-05-21 | 2009-10-28 | 上海交通大学 | 基于麦克风阵列声音制导的智能视频导播方法 |
JP2017103542A (ja) * | 2015-11-30 | 2017-06-08 | 株式会社小野測器 | 同期装置、同期方法及び同期プログラム |
US20200092442A1 (en) * | 2016-12-21 | 2020-03-19 | Interdigital Ce Patant Holdings | Method and device for synchronizing audio and video when recording using a zoom function |
CN107404599A (zh) * | 2017-07-17 | 2017-11-28 | 歌尔股份有限公司 | 音视频数据同步方法、装置及系统 |
CN108287924A (zh) * | 2018-02-28 | 2018-07-17 | 福建师范大学 | 一种可定位视频数据采集与组织检索方法 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4195653A4 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115604540A (zh) * | 2022-09-01 | 2023-01-13 | 荣耀终端有限公司(Cn) | 视频获取方法及装置 |
CN115604540B (zh) * | 2022-09-01 | 2023-11-14 | 荣耀终端有限公司 | 视频获取方法、电子设备及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN116437197A (zh) | 2023-07-14 |
EP4195653A1 (en) | 2023-06-14 |
EP4195653A4 (en) | 2024-01-03 |
CN114125258A (zh) | 2022-03-01 |
US20230197115A1 (en) | 2023-06-22 |
CN114125258B (zh) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022042387A1 (zh) | 视频处理方法及电子设备 | |
US20230116044A1 (en) | Audio processing method and device | |
EP4224831A1 (en) | Image processing method and electronic device | |
CN113572954B (zh) | 一种视频录制方法、电子设备及介质 | |
US12096134B2 (en) | Big aperture blurring method based on dual cameras and TOF | |
CN113473229B (zh) | 一种动态调节丢帧阈值的方法及相关设备 | |
KR20140039920A (ko) | 영상 데이터 처리 방법과 장치 및 이를 포함하는 단말기 | |
US11750926B2 (en) | Video image stabilization processing method and electronic device | |
US11870941B2 (en) | Audio processing method and electronic device | |
CN108769738B (zh) | 视频处理方法、装置、计算机设备和存储介质 | |
CN115597706B (zh) | 一种环境光的检测方法、电子设备及芯片系统 | |
CN113643728A (zh) | 一种音频录制方法、电子设备、介质及程序产品 | |
WO2024040990A1 (zh) | 拍照方法及电子设备 | |
CN115048012A (zh) | 数据处理方法和相关装置 | |
CN113810589A (zh) | 电子设备及其视频拍摄方法和介质 | |
CN115176455A (zh) | 功率高效的动态电子图像稳定 | |
EP4254927A1 (en) | Photographing method and electronic device | |
CN113573119B (zh) | 多媒体数据的时间戳生成方法及装置 | |
CN115689963A (zh) | 一种图像处理方法及电子设备 | |
WO2024156206A1 (zh) | 一种显示方法及电子设备 | |
US20230335081A1 (en) | Display Synchronization Method, Electronic Device, and Readable Storage Medium | |
WO2022228259A1 (zh) | 一种目标追踪方法及相关装置 | |
CN115904184A (zh) | 数据处理方法和相关装置 | |
WO2024159950A1 (zh) | 一种显示方法、装置、电子设备及存储介质 | |
CN113382162B (zh) | 一种视频拍摄方法及电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21860225 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2021860225 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2021860225 Country of ref document: EP Effective date: 20230310 |
|
ENP | Entry into the national phase |
Ref document number: 2021860225 Country of ref document: EP Effective date: 20230308 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |