WO2020215453A1 - 一种视频录制方法及系统 - Google Patents

一种视频录制方法及系统 Download PDF

Info

Publication number
WO2020215453A1
WO2020215453A1 PCT/CN2019/090322 CN2019090322W WO2020215453A1 WO 2020215453 A1 WO2020215453 A1 WO 2020215453A1 CN 2019090322 W CN2019090322 W CN 2019090322W WO 2020215453 A1 WO2020215453 A1 WO 2020215453A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
audio
data
live broadcast
time node
Prior art date
Application number
PCT/CN2019/090322
Other languages
English (en)
French (fr)
Inventor
陈燕鹏
Original Assignee
网宿科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 网宿科技股份有限公司 filed Critical 网宿科技股份有限公司
Priority to EP19800908.6A priority Critical patent/EP3748972B1/en
Priority to US16/717,638 priority patent/US10951857B2/en
Publication of WO2020215453A1 publication Critical patent/WO2020215453A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/4147PVR [Personal Video Recorder]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/92Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N5/9201Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving the multiplexing of an additional signal and the video signal
    • H04N5/9202Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback involving the multiplexing of an additional signal and the video signal the additional signal being a sound signal
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/36Monitoring, i.e. supervising the progress of recording or reproducing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4333Processing operations in response to a pause request
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4334Recording operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs

Definitions

  • This application relates to the field of Internet technology, and in particular to a video recording method and system.
  • the live video has high timeliness, and while watching the live video, the user can also communicate with the anchor or other users in real time by publishing barrage.
  • the current playback device can provide a video recording function.
  • the screen content of the live video the screen content can be intercepted into video frames by means of screenshots.
  • the advantage of this process is that it can retain some additional visual effects in the screen content. For example, you can retain special effects such as answering questions and brushing gifts during the live broadcast.
  • the audio content of the live video the original audio stream of the live video can be read. Subsequently, the intercepted video frame and the read original audio stream are separately encoded, which can be synthesized into a recorded video file.
  • the video files recorded in the above way may have the audio and picture out of sync.
  • the reason is that during the live broadcast, it is very likely that the network will freeze or the host actively pauses the live broadcast, resulting in a pause in the video screen. In this case, you can restore the paused scene at that time by taking screenshots.
  • the audio stream will be interrupted, which results in that the video duration corresponding to the intercepted video frame cannot match the video duration corresponding to the audio stream. For example, suppose that the live video lasts for a total of 1 hour, and 10 minutes of which are paused. Then the captured video frame can match the duration of 1 hour, but the acquired audio stream is only 50 minutes in duration. In this way, when the video frame and the video stream are subsequently combined into a video file, it will cause the situation that the video picture and the audio cannot be synchronized.
  • the purpose of this application is to provide a video recording method and system, which can ensure that the recorded video files maintain audio and video synchronization.
  • one aspect of the present application provides a video recording method.
  • the method includes: reading the audio data stream of the target video in real time, and converting the video picture of the target video into video frame data; monitoring the target The playback state of the video.
  • another aspect of the present application also provides a video recording system.
  • the system includes: an audio and video data acquisition unit for reading the audio data stream of the target video in real time and combining the video images of the target video Converted into video frame data; the playback status monitoring unit is used to monitor the playback status of the target video.
  • the start time node when the live broadcast is paused is recorded, and the target
  • the playback state of the video characterizes the end time node when the live broadcast resumes when the live broadcast resumes;
  • the empty packet data insertion unit is used to calculate the amount of data to be inserted based on the start time node and the end time node, and check the Before the audio data stream when the live broadcast resumes, insert the audio empty packet data equal to the calculated amount of data;
  • the recording and synthesis unit is used to combine the audio data stream with the audio empty packet data inserted into the video
  • the frame data is combined into a recorded video file.
  • the technical solution provided by this application can monitor the playback status of the target video when the audio data stream of the target video is read in real time.
  • the start time node can be recorded, and when the target video resumes, the end time node can be recorded.
  • the data amount of the audio empty packet data that needs to be inserted during the pause can be calculated.
  • the corresponding empty audio packet data can be inserted before the audio data stream.
  • the audio data stream with the audio empty packet data inserted can match the video picture of the target video.
  • the audio data stream can play silent audio empty packet data, which can ensure that the recorded video maintains audio Draw synchronization.
  • Fig. 1 is a schematic diagram of a video recording method in an embodiment of the present application
  • FIG. 2 is a first schematic diagram of inserting audio null packet data in an embodiment of the present application
  • FIG. 3 is a second schematic diagram of inserting audio null packet data in an embodiment of the present application.
  • Fig. 4 is a schematic diagram of functional modules of a video recording system in an embodiment of the present application.
  • This application provides a video recording method, which can be applied to devices that support video recording.
  • the device can be, for example, a tablet computer, a smart phone, a desktop computer, a smart wearable device, a notebook computer and other electronic devices used by the user. It can also be a live video platform or a service server of a video on demand platform, or it can be used in conjunction with a display. Video recording equipment, etc.
  • the video recording method may include the following multiple steps.
  • S1 Read the audio data stream of the target video in real time, and convert the video picture of the target video into video frame data.
  • the target video may include live video or on-demand video.
  • the audio capture and video image capture can be performed separately through parallel threads in the video recording device. Specifically, when the player in the video recording device is initialized, or when the video recording device receives a video recording instruction, an audio acquisition and editing module and a video acquisition and editing module can be created.
  • the number of audio acquisition and editing modules may be one or more, each audio acquisition and editing module may correspond to one audio track in the target video, and the number of video acquisition and editing modules is usually one.
  • the video recording device can assign the audio parameters of the player that plays the target video to the audio editing module, so that the audio editing module is based on the assigned audio
  • the parameter reads the audio data stream of the target video in real time.
  • the audio parameters may include parameters such as audio sampling rate, number of audio channels, and number of audio sampling bits.
  • the audio data stream of the target video may be PCM (Pulse Code Modulation) data, and the PCM data may be obtained after decoding the original audio data of the target video. After the video recording device receives the PCM data, it can be used directly for playback. Generally speaking. The data volume of PCM data is often large.
  • the audio acquisition and editing module can encode the PCM data.
  • an audio encoder can be created through the audio acquisition and editing module, and the audio encoder can be used to encode the read audio data stream of the target video into an audio file of a specified format .
  • the specified format may be specified by the user before the video recording starts, or may be the default format of the video recording device.
  • the specified format may be, for example, mp3 format, or AAC (Advanced Audio Coding, advanced audio coding) format.
  • you can enable the corresponding database according to the encoding type of the audio encoder. For example, for the encoding type of mp3, the libmp3lame library can be enabled.
  • the audio acquisition and editing module can read the PCM data of the target video, and the audio encoder can encode the read PCM data to generate an audio file in a specified format.
  • the read PCM data and the generated audio file can be stored in the temporary buffer file path.
  • the temporary buffer file path may be generated based on the system time of creating the audio editing module. In this way, by creating the system time of the audio acquisition and editing module, the temporary buffer file path for storing PCM data and audio files can be uniformly expressed.
  • three management queues can be set, and the three management queues can respectively correspond to recording start and stop actions, PCM data storage, and audio encoding.
  • different management queues can be activated correspondingly, so as to ensure that the recording process proceeds in order.
  • the video recording device may also create an audio acquisition and editing management module in advance. Then, after one or more audio acquisition and editing modules are created, the created audio acquisition and editing module can be added to the preset audio acquisition and editing management module. In this way, the audio editing management module can start one or more audio editing modules currently managed when the video recording starts, and after the video recording ends, close the one or more audio editing modules, and clear the video recording process Generated temporary files to realize batch management of audio acquisition and editing modules.
  • video acquisition parameters can be set for the video acquisition and editing module.
  • the video acquisition parameters may include the video frame file output path, video resolution, video acquisition frame rate, and video frame. Pixel format and video frame encoding method and other parameters.
  • the video resolution can be a specified video resolution. If the video resolution is not specified, the screen size of the video recording device can be directly used as the default video resolution.
  • the video capture frame rate may be a frame rate range. For example, the frame rate of a general device usually ranges from 10 to 30 frames, and with the advancement of technology, it can also reach between 10 to 60 frames.
  • the video frame pixel format can be a 32-bit BGRA format. Of course, in practical applications, other video frame pixel formats can be flexibly selected according to requirements, and are not limited to the 32-bit BGRA format.
  • the video frame encoding method can also be various, for example, it can be H.264 encoding format, VP8, VP9 encoding format, HEVC encoding format, and so on.
  • the required number of parameters can be changed according to actual application scenarios.
  • the output path of the video frame file, the video capture frame rate, and the video frame encoding method are usually necessary, while the video resolution is optional. If the video resolution is not specified, the video resolution can be considered Match the screen resolution of the video recording device.
  • the video picture of the target video can be redrawn as a binary stream according to the video acquisition parameters, and then the binary stream can be further encoded into video frame data.
  • the video frame data obtained in this way can retain various visual special effects such as answering questions and brushing gifts in the video screen, so that the screen content of the target video can be completely restored.
  • the paused screen can also be recorded, so as to maximize the real scene of the target video during the live broadcast.
  • S3 Monitor the playback status of the target video, when the playback status of the target video indicates that the live broadcast is paused, record the start time node when the live broadcast is paused, and record the live broadcast when the playback status of the target video indicates that the live broadcast resumes End time node during recovery.
  • the audio acquisition and editing module in the process of reading the audio data stream of the target video in real time, if the target video is playing normally, the audio acquisition and editing module can normally receive the PCM data. If the target video is paused, the audio acquisition and editing module cannot receive the PCM data. After the target video resumes playing, the audio acquisition and editing module can continue to receive the PCM data. In order to ensure that the read audio data stream can be synchronized with the video frame data converted by the video acquisition and editing module, for the live broadcast pause process, the audio empty packet data can be inserted into the audio data stream, so that the audio with the audio empty packet data is inserted The data stream can align the time axis with the video frame data.
  • the playback status of the target video can be monitored.
  • the pause state of the target video can include two types: active pause and passive pause.
  • active pause may mean that the target video is actively paused by the video player.
  • the video player can call the pause interface of the player to actively pause the target video being played.
  • the playback status parameter that characterizes the pause of the live broadcast can be passed to the audio editing module.
  • the pause port can be used as a designated port for monitoring, and by monitoring the playback status parameters passed by the designated interface of the player that plays the target video, it can be determined whether the target video is suspended for live broadcasting.
  • Passive pause can refer to a live broadcast pause due to network fluctuations or insufficient buffered data, resulting in the player having no data to play.
  • the playback system where the player is located will issue a global broadcast notification that indicates that the live broadcast is paused. In this way, by monitoring the global broadcast notification sent by the playback system, it can be determined whether the live broadcast of the target video is suspended.
  • the audio editing module can learn the paused state. At this time, the audio editing module can record the start time node when the live broadcast is paused. Subsequently, when the audio acquisition and editing module can receive the PCM data again, it means that the target video has resumed playing. At this time, the audio acquisition and editing module can record the end time node when the live broadcast resumes. In this way, the time period formed by the end time node and the start time node may be the time period during which the target video is paused.
  • S5 Calculate the amount of data to be inserted based on the start time node and the end time node, and insert an audio space equal to the calculated amount of data before the audio data stream when the live broadcast resumes. Packet data.
  • the pause duration of the target video can be calculated, and then the amount of data to be inserted can be calculated based on the pause duration and preset audio parameters .
  • the preset audio parameters may be audio parameters such as audio sampling rate, audio sampling bit depth, and audio channel number assigned when the audio acquisition and editing module is created.
  • the amount of data to be inserted can be calculated according to the following formula:
  • D represents the amount of data to be inserted
  • S represents the audio sampling rate
  • T represents the pause duration
  • B represents the audio sampling bit depth
  • W represents the number of audio channels.
  • an amount equal to the calculated amount of data may be inserted before the audio data stream read when the live broadcast resumes.
  • Audio empty packet data As shown in Figure 2, the shaded part is the actual audio data stream, and the blank part is the audio empty packet data inserted when the target video is paused.
  • the audio empty packet data can be parsed and played normally by the playback device, but it is silent audio information during playback, and the playback duration is consistent with the pause duration of the video live broadcast, so that the audio data stream with the audio empty packet data can be inserted It can correspond to the video frame data on the time axis.
  • a global variable used to characterize whether to insert audio null packet data may be set in the system in advance.
  • the global variable can have two variable values. Wherein, when the variable value is the first value, it can indicate that audio empty packet data needs to be inserted currently; and when the variable value is the second value, it can indicate that there is no need to insert audio empty packet data currently.
  • the variable value of the global variable can be set to the first value.
  • each time the audio acquisition and editing module receives the audio data stream of the target video it can detect the current variable value of the global variable.
  • the audio editing module can detect the current variable value of the global variable. Then, since the variable value of the global variable was set to the first value when the live broadcast was paused, the audio editing module can determine if the current variable value of the global variable is the first value.
  • the amount of data to be inserted can be calculated based on the above start time node and the end time node, and the data obtained by inserting and calculating before the audio data stream when the live broadcast resumes The audio empty packet data of the same amount of data.
  • the audio acquisition and editing module can set the variable value of the global variable to the second value. In this way, when the audio data stream of the target video is subsequently read, there is no need to insert the audio empty packet data until The next time the live broadcast is paused, the variable value of the global variable is set to the first value again.
  • S7 Synthesize the audio data stream with the audio empty packet data inserted and the video frame data into a recorded video file.
  • the video recording device may synthesize the audio data stream and video frame data into a video file, So as to get the recorded video file.
  • the video recording device may create a synthesis management module in advance, and the synthesis management module may be used to synthesize the recorded video file.
  • the composition management module can read the converted video frame data and generate the AVURLAsset object of the video frame data. Then, you can add the AVURLAsset object to the video track of the AVMutableComposition object. Then, the composition management module may use the audio data stream inserted into the video empty packet data and the AVMutableComposition object as initialization parameters to generate the AVAssetExportSession object. Finally, the AVAssetExportSession export session can be started asynchronously to synthesize and export the recorded video file.
  • the synthesis management module can combine these audio data streams to form an audio data stream of a new audio track. Subsequently, the audio data stream of the new audio track and the AVMutableComposition object can be used as initialization parameters to generate an AVAssetExportSession object.
  • the composition management module may also manage various mechanisms in the recording process. These mechanisms can include timeout mechanisms, termination mechanisms, and so on. Among them, the timeout mechanism may mean that the audio acquisition and editing module or the video acquisition and editing module exceeds the preset acquisition and editing time. Specifically, since audio editing and video editing are completed in parallel in different threads, the two usually have a completion sequence. For example, audio editing may be completed first due to the small amount of data processed. However, video editing will be completed later due to the excessive amount of data to be processed. In the acquisition and editing process, sometimes due to some uncontrollable reasons, there may be abnormalities in the acquisition and editing, and these abnormalities will increase the time spent in the acquisition and editing process.
  • the timeout period can be set in the synthesis management module, and when one of the audio acquisition and editing modules or the video acquisition and editing modules completes audio encoding or video encoding, the other module is counted Encoding duration. For example, after the audio encoding module completes audio encoding, it can start to count the encoding duration of the video encoding module. If the counted encoding duration reaches the timeout duration, it indicates that an abnormality may have occurred during the encoding process. At this time, the current video recording process can be stopped and an abnormality log can be generated. Later, according to the user's instructions, the audio and video editing process can be restarted. The purpose of such processing is to stop the time-consuming and long editing process in time and avoid endless waiting.
  • the termination mechanism may include normal termination, abnormal termination, and cancellation.
  • normal end can mean that when the video recording ends, the preset duration threshold is not exceeded, and the available capacity of the device is greater than or equal to the preset capacity threshold; it can also mean that the video recording duration does not exceed the preset duration threshold and the device is available
  • the capacity is also greater than or equal to the preset capacity threshold, an instruction to end the recording input by the user is received. In this case, when the video recording is over, the recorded audio and video data can be processed, and the recorded video file can be synthesized.
  • Abnormal end can mean that the duration of video recording exceeds the aforementioned preset duration threshold, or the available capacity of the device is less than or equal to the aforementioned preset capacity threshold. In this case, the recording is forced to end, and this time can be stopped at this time The video recording process, and the recorded audio data stream and video frame data are combined into a recorded video file.
  • Cancellation can mean that the user voluntarily abandons the current recording process.
  • a video recording cancellation instruction is received during the video recording process, and the duration of the video recording is less than the minimum recording duration mentioned above, the current video can be stopped Recording process, and clear the recorded audio data stream and video frame data.
  • the instruction to record the video is a user's misoperation. If the user's cancellation instruction is received shortly after the instruction to record the video, the video recording process can be abandoned this time, and the collected data Do not deal with it, just empty it.
  • a video recording cancellation instruction is received during the video recording process, and the duration of the video recording is greater than or equal to the minimum recording duration, then the video recording process can be stopped this time, but the recorded audio data stream and video frame must be changed.
  • the data is combined into a recorded video file.
  • the reason for this processing is that the duration of the recording has reached the minimum recording duration, indicating that the user may only want to record a small part of the target video content. Therefore, in this case, after stopping this recording process, it is necessary to synthesize the collected data into a recorded video file.
  • the playback of the target video may not resume until the end of the video recording.
  • the time node at the end of the video recording can be recorded, and according to The time node when the video recording ends and the start time node when the live broadcast is paused are used to calculate the amount of data to be supplemented.
  • empty audio packet data equal to the amount of data to be supplemented may be filled.
  • the time period after the live broadcast is paused can be filled with empty audio data until the end of the video recording.
  • this application also provides a video recording system, which includes:
  • An audio and video data acquisition unit for reading the audio data stream of the target video in real time, and converting the video picture of the target video into video frame data;
  • the playback status monitoring unit is used to monitor the playback status of the target video.
  • the playback status of the target video indicates that the live broadcast is paused
  • the end time node when the live broadcast resumes
  • the empty packet data insertion unit is configured to calculate the amount of data to be inserted based on the start time node and the end time node, and insert and calculate the data obtained before the audio data stream when the live broadcast resumes Equal amount of audio empty packet data;
  • the recording and synthesizing unit is used to synthesize the audio data stream with the audio empty packet data inserted and the video frame data into a recorded video file.
  • system further includes:
  • the global variable setting unit is used to set the variable value of the global variable used to characterize whether to insert empty audio packet data to the first value when the playback state of the target video indicates that the live broadcast is paused, and when the live broadcast resumes, detect all State the current variable value of the global variable;
  • the empty packet data insertion unit is configured to calculate the data to be inserted based on the start time node and the end time node when the current variable value of the global variable is the first value The amount of data.
  • the empty packet data insertion unit is further configured to, if the target video is in the live broadcast pause state from the live broadcast to the end of the video recording, record the time node when the video recording ends, and Calculate the amount of data to be supplemented according to the time node when the video recording ends and the start time node when the live broadcast is paused; and after the audio data stream when the live broadcast is paused, fill with the data to be supplemented The same amount of audio empty packet data.
  • the technical solution provided by this application can monitor the playback status of the target video when the audio data stream of the target video is read in real time.
  • the start time node can be recorded, and when the target video resumes, the end time node can be recorded.
  • the data amount of the audio empty packet data that needs to be inserted during the pause can be calculated.
  • the corresponding empty audio packet data can be inserted before the audio data stream.
  • the audio data stream with the audio empty packet data inserted can match the video picture of the target video.
  • the audio data stream can play silent audio empty packet data, which can ensure that the recorded video maintains audio Draw synchronization.
  • each embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
  • the above technical solutions can be embodied in the form of software products, which can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., include a number of instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in each embodiment or some parts of the embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Television Signal Processing For Recording (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

本申请公开了一种视频录制方法及系统,其中,所述方法包括:实时读取目标视频的音频数据流,并将所述目标视频的视频画面转换为视频帧数据(S1);监控所述目标视频的播放状态,当所述目标视频的播放状态表征直播暂停时,记录直播暂停时的起始时间节点,并在所述目标视频的播放状态表征直播恢复时,记录直播恢复时的终止时间节点(S3);基于所述起始时间节点和所述终止时间节点,计算待插入的数据量,并在所述直播恢复时的音频数据流之前,插入与计算得到的所述数据量相等量的音频空包数据(S5);将插入了所述音频空包数据的音频数据流与所述视频帧数据合成为录制后的视频文件(S7)。本申请提供的技术方案,能够确保录制后的视频文件保持音画同步。

Description

一种视频录制方法及系统
交叉引用
本申请引用于2019年4月24日递交的名称为“一种视频录制方法及系统”的第201910335713.6号中国专利申请,其通过引用被全部并入本申请。
技术领域
本申请涉及互联网技术领域,特别涉及一种视频录制方法及系统。
背景技术
随着视频直播技术的不断发展,越来越多的用户倾向于在视频直播平台中观看直播视频。直播视频具备较高的时效性,并且在观看直播视频的同时,用户还能够通过发表弹幕的方式,与主播或者其他用户实时沟通。
目前,为了让错过直播时间的用户能够重新观看直播视频,当前的播放设备可以提供视频录制功能。具体地,针对直播视频的画面内容,可以通过屏幕截图的方式,将画面内容截取为视频帧。这样处理的好处在于,能够保留画面内容中附加的一些视觉特效。例如,可以保留直播过程中的答题、刷礼物等特效。而针对直播视频的音频内容,则可以读取直播视频的原始音频流。后续,通过对截取的视频帧和读取的原始音频流分别进行编码,从而可以合成为录制的视频文件。
然而,按照上述方式录制的视频文件可能会存在音频和画面不同步的情况。原因在于,在直播过程中,很有可能会由于网络卡顿或者主播主动暂停直播,而导致视频画面出现暂停的情况。在这种情况下,通过屏幕截图的方式,能够还原出当时的暂停场景。然而,在直播暂停时,音频流会中断,这就导致,截取的视频帧对应的视频持续时间,与音频流对应的视频持续时间无法匹配。 举例来说,假设直播视频一共持续了1小时,其中有10分钟处于暂停状态。那么截取的视频帧能够与1小时的时长相匹配,但是获取的音频流只有50分钟的时长。这样,后续在将视频帧与视频流合成为视频文件时,就会导致视频画面与音频无法同步的情况。
由上可见,目前亟需一种能够保证音画同步的视频录制方法。
发明内容
本申请的目的在于提供一种视频录制方法及系统,能够确保录制后的视频文件保持音画同步。
为实现上述目的,本申请一方面提供一种视频录制方法,所述方法包括:实时读取目标视频的音频数据流,并将所述目标视频的视频画面转换为视频帧数据;监控所述目标视频的播放状态,当所述目标视频的播放状态表征直播暂停时,记录直播暂停时的起始时间节点,并在所述目标视频的播放状态表征直播恢复时,记录直播恢复时的终止时间节点;基于所述起始时间节点和所述终止时间节点,计算待插入的数据量,并在所述直播恢复时的音频数据流之前,插入与计算得到的所述数据量相等量的音频空包数据;将插入了所述音频空包数据的音频数据流与所述视频帧数据合成为录制后的视频文件。
为实现上述目的,本申请另一方面还提供一种视频录制系统,所述系统包括:音视频数据采集单元,用于实时读取目标视频的音频数据流,并将所述目标视频的视频画面转换为视频帧数据;播放状态监控单元,用于监控所述目标视频的播放状态,当所述目标视频的播放状态表征直播暂停时,记录直播暂停时的起始时间节点,并在所述目标视频的播放状态表征直播恢复时,记录直播恢复时的终止时间节点;空包数据插入单元,用于基于所述起始时间节点和所述终止时间节点,计算待插入的数据量,并在所述直播恢复时的音频数据流之前,插入与计算得到的所述数据量相等量的音频空包数据;录制合成单元,用于将插入了所述音频空包数据的音频数据流与所述视频帧数据合成为录制后 的视频文件。
由上可见,本申请提供的技术方案,在实时读取目标视频的音频数据流时,可以监控目标视频的播放状态。当目标视频暂停播放时,可以记录起始时间节点,当目标视频恢复播放时,可以记录终止时间节点。这样,根据起始时间节点和终止时间节点,可以计算出在暂停过程中需要插入的音频空包数据的数据量。在目标视频恢复播放时,可以在音频数据流之前插入对应的音频空包数据。这样,插入了音频空包数据的音频数据流,可以与目标视频的视频画面相匹配,在目标视频暂停时,音频数据流可以播放无声的音频空包数据,从而可以确保录制后的视频保持音画同步。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例中的视频录制方法示意图;
图2是本申请实施例中插入音频空包数据的第一示意图;
图3是本申请实施例中插入音频空包数据的第二示意图;
图4是本申请实施例中视频录制系统的功能模块示意图。
具体实施例
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施例作进一步地详细描述。
本申请提供一种视频录制方法,该方法可以应用于支持视频录制的设备中。该设备例如可以是用户使用的平板电脑、智能手机、台式电脑、智能可穿戴设备、笔记本电脑等电子设备,还可以是视频直播平台或者视频点播平台的 业务服务器,还可以是与显示器配套使用的视频录制装置等。
请参阅图1,所述视频录制方法可以包括以下多个步骤。
S1:实时读取目标视频的音频数据流,并将所述目标视频的视频画面转换为视频帧数据。
在本实施例中,所述目标视频可以包括直播视频,也可以包括点播视频。可以通过视频录制设备中并行的线程,分别进行音频采集和视频画面采集。具体地,当视频录制设备中的播放器初始化时,或者视频录制设备接收到视频录制的指令时,可以创建音频采编模块和视频采编模块。其中,所述音频采编模块的数量可以是一个或者多个,每个音频采编模块可以对应目标视频中的一条音轨,而视频采编模块的数量通常是一个。
在本实施例中,在创建音频采编模块之后,视频录制设备可以将播放所述目标视频的播放器的音频参数赋值给所述音频采编模块,以使得所述音频采编模块根据赋值的所述音频参数实时读取目标视频的音频数据流。其中,所述音频参数例如可以包括音频采样率、音频通道数、音频采样位数等参数。在实际应用中,所述目标视频的音频数据流可以是PCM(Pulse Code Modulation,脉冲编码调制)数据,该PCM数据可以是对目标视频的原始音频数据解码后得到的。视频录制设备接收到该PCM数据后,可以直接用于播放。通常而言。PCM数据的数据量往往较大,为了节省录制后的视频文件的容量,音频采编模块可以对该PCM数据进行编码。具体地,在创建所述音频采编模块后,还可以通过该音频采编模块创建音频编码器,该音频编码器可以用于将读取的所述目标视频的音频数据流编码为指定格式的音频文件。其中,所述指定格式可以是视频录制开始之前,由用户指定的,也可以是视频录制设备默认的格式。在实际应用中,所述指定格式例如可以是mp3格式,或者AAC(Advanced Audio Coding,高级音频编码)格式等。在创建音频编码器后,可以根据音频编码器的编码类型,启用对应的数据库。例如,对于mp3的编码类型而言,可以启用libmp3lame库。
在本实施例中,音频采编模块可以读取目标视频的PCM数据,音频编码器可以对读取的PCM数据编码后,生成指定格式的音频文件。在视频录制过程中,读取的PCM数据和生成的音频文件,可以存放于临时缓冲文件路径下。该临时缓冲文件路径,可以基于创建音频采编模块的系统时间生成。这样,通过创建音频采编模块的系统时间,可以统一表示用于存放PCM数据和音频文件的临时缓冲文件路径。
在实际应用中,所述音频采编模块在创建之后,可以设置三个管理队列,这三个管理队列可以分别对应录制开始与停止动作、PCM数据存放以及音频编码。这样,在不同的录制阶段,可以对应启用不同的管理队列,从而保证录制过程按序进行。
在一个实施例中,由于音频采编模块的数量可能不止一个,为了有效地对多个音频采编模块进行同步管理,视频录制设备还可以预先创建一个音频采编管理模块。那么,在创建了一个或者多个音频采编模块后,可以将创建的音频采编模块添加至该预设的音频采编管理模块中。这样,该音频采编管理模块可以在视频录制开始时,启动当前管理的一个或者多个音频采编模块,以及在视频录制结束后,关闭所述一个或者多个音频采编模块,并清除视频录制过程中产生的临时文件,从而实现对音频采编模块的批量管理。
在本实施例中,在创建视频采编模块后,可以为所述视频采编模块设定视频采集参数,所述视频采集参数可以包括视频帧文件输出路径、视频分辨率、视频采集帧率、视频帧像素格式以及视频帧编码方式等多个参数。在实际应用中,所述视频分辨率可以是指定的视频分辨率,如果没有指定视频分辨率,可以直接将视频录制设备的屏幕尺寸作为默认的视频分辨率。此外,所述视频采集帧率可以是一个帧率范围。例如,一般设备的帧率范围通常在10至30帧之间,随着技术的进步,也可以达到10至60帧之间。如果用户输入的视频采集帧率处于该范围内,可以按照用户输入的视频采集帧率,对目标视频进行录制。而如果用户输入的视频采集帧率超过了该范围,那么可以默认取上限值或者下 限值,从而保证视频录制流程能够正常进行。所述视频帧像素格式可以是32位的BGRA格式,当然,在实际应用中,可以按照需求灵活地选用其它的视频帧像素格式,并非限定于32位的BGRA格式。所述视频帧编码方式也可以多种多样,例如,可以是H.264编码格式,也可以是VP8、VP9编码格式,还可以是HEVC编码格式等等。
需要是说明的是,可以根据实际应用场景更改所需的参数数量。例如,在视频录制过程中,频帧文件输出路径、视频采集帧率以及视频帧编码方式通常是必需的,而视频分辨率是可选的,如果没有指定视频分辨率,则可以认为视频分辨率与视频录制设备的屏幕分辨率相匹配。
在本实施例中,通过所述视频采编模块,可以按照所述视频采集参数将所述目标视频的视频画面重新绘制为二进制流,然后,可以进一步地将所述二进制流编码为视频帧数据。按照这种方式得到的视频帧数据,可以保留视频画面中的答题、刷礼物等各种视觉特效,从而能够完整地还原目标视频的画面内容。并且,当目标视频的画面暂停时,也可以录制下暂停的画面,从而最大化地还原出目标视频在直播时的真实场景。
S3:监控所述目标视频的播放状态,当所述目标视频的播放状态表征直播暂停时,记录直播暂停时的起始时间节点,并在所述目标视频的播放状态表征直播恢复时,记录直播恢复时的终止时间节点。
在本实施例中,在实时读取目标视频的音频数据流的过程中,如果目标视频在正常播放,那么音频采编模块便可以正常接收到PCM数据。而如果目标视频暂停播放,那么音频采编模块便无法接收到PCM数据,后续在目标视频恢复播放后,音频采编模块才能继续接收到PCM数据。为了保证读取的音频数据流能够与视频采编模块转换得到的视频帧数据保持同步,针对直播暂停的过程,可以在音频数据流中插入音频空包数据,从而使得插入了音频空包数据的音频数据流,能够与视频帧数据对齐时间轴。
具体地,在目标视频的录制过程中,可以监控目标视频的播放状态。通 常而言,目标视频的暂停播放状态,可以包含两种:主动暂停和被动暂停。其中,主动暂停可以指目标视频被视频播放者主动暂停,此时,视频播放者可以调用播放器的暂停接口,从而主动暂停正在播放的目标视频。该暂停端口被调用时,可以将表征直播暂停的播放状态参数传递至音频采编模块。这样,该暂停端口可以作为监听的指定端口,通过监听播放所述目标视频的播放器的指定接口传递的播放状态参数,从而可以确定目标视频是否被暂停直播。被动暂停可以指由于网络波动或者缓冲数据不足,导致播放器没有数据可播放而导致的直播暂停。在这种情况下,播放器所在的播放系统会发出表征直播暂停的全局广播通知。这样,通过监听该播放系统发出的全局广播通知,从而可以判定目标视频是否被暂停直播。
这样,通过上述的方式,一旦目标视频暂停播放,音频采编模块便可以获知该暂停播放的状态。此时,音频采编模块可以记录直播暂停时的起始时间节点。后续,当音频采编模块能够再次接收到PCM数据时,则表示目标视频恢复了播放,此时,音频采编模块可以记录直播恢复时的终止时间节点。这样,由该终止时间节点和起始时间节点构成的时段,便可以是目标视频暂停播放的时段。
S5:基于所述起始时间节点和所述终止时间节点,计算待插入的数据量,并在所述直播恢复时的音频数据流之前,插入与计算得到的所述数据量相等量的音频空包数据。
在本实施例中,根据所述起始时间节点和所述终止时间节点,可以计算出目标视频的暂停时长,然后,可以根据所述暂停时长和预设的音频参数,计算待插入的数据量。其中,所述预设的音频参数,可以是音频采编模块创建时赋值的音频采样率、音频采样位深和音频通道数等音频参数。在一个实际的应用示例中,该待插入的数据量可以按照以下公式计算:
D=S×T×B/8×W
其中,D表示所述待插入的数据量,S表示所述音频采样率,T表示所述 暂停时长,B表示所述音频采样位深,W表示所述音频通道数。
请参阅图2,在本实施例中,计算出所述待插入的数据量后,可以在所述直播恢复时的读取的音频数据流之前,插入与计算得到的所述数据量相等量的音频空包数据。这样,图2所示,阴影部分为实际的音频数据流,而空白部分为目标视频暂停时插入的音频空包数据。该音频空包数据可以正常地被播放设备解析并播放,只不过播放时是无声的音频信息,并且播放的时长与视频直播的暂停时长一致,从而能够使得插入了音频空包数据的音频数据流与视频帧数据能够在时间轴上对应一致。
在一个实施例中,为了使得音频采编模块能够正确地对读取的音频数据流插入音频空包数据,可以预先在系统中设置一个用于表征是否插入音频空包数据的全局变量。该全局变量可以具备两种变量值。其中,变量值为第一数值时,可以表示当前需要插入音频空包数据;而变量值为第二数值时,可以表示当前无需插入音频空包数据。这样,在所述目标视频的播放状态表征直播暂停时,可以将该全局变量的变量值设置为第一数值。并且,每次音频采编模块接收到目标视频的音频数据流时,可以检测该全局变量当前的变量值。这样,当目标视频恢复播放时,音频采编模块可以检测所述全局变量当前的变量值。那么,由于该全局变量的变量值在之前直播暂停时被设置为第一数值,那么此时音频采编模块在检测到所述全局变量当前的变量值为所述第一数值的情况下,可以判定目前需要插入音频空包数据,因此,可以基于上述的起始时间节点和所述终止时间节点,计算待插入的数据量,并在所述直播恢复时的音频数据流之前,插入与计算得到的所述数据量相等量的音频空包数据。在插入音频空包数据之后,音频采编模块可以将所述全局变量的变量值设置为第二数值,这样,后续在读取到目标视频的音频数据流时,便无需插入音频空包数据,直至下一次直播暂停时,该全局变量的变量值又被设置为第一数值。
S7:将插入了所述音频空包数据的音频数据流与所述视频帧数据合成为录制后的视频文件。
在本实施例中,在得到插入了所述音频空包数据的音频数据流与转换的所述视频帧数据之后,视频录制设备可以将该音频数据流和视频帧数据合成为一份视频文件,从而得到录制后的视频文件。具体地,视频录制设备可以预先创建一个合成管理模块,该合成管理模块可用于合成录制后的视频文件。
在一个应用示例中,合成管理模块可以读取完成转换的视频帧数据,并生成该视频帧数据的AVURLAsset对象。然后,可以将该AVURLAsset对象添加到AVMutableComposition对象的视频轨道中。接着,合成管理模块可以将插入视频空包数据的音频数据流和所述AVMutableComposition对象作为初始化参数,生成AVAssetExportSession对象。最终,可以异步启动AVAssetExportSession导出会话,从而合成并导出录制后的视频文件。
需要说明的是,如果在音频录制过程中,包含多个音轨的音频数据流,那么合成管理模块可以将这些音频数据流并轨,从而形成新音轨的音频数据流。后续,便可以将新音轨的音频数据流和所述AVMutableComposition对象作为初始化参数生成AVAssetExportSession对象。
在一个实施例中,所述合成管理模块还可以对录制过程中的各项机制进行管理。这些机制可以包括超时机制、结束机制等。其中,超时机制可以指音频采编模块或者视频采编模块超出了预先设置的采编时长。具体地,由于音频采编和视频采编是在不同的线程中并行完成的,那么两者通常会具备完成的先后关系。例如,音频采编由于处理的数据量较少,可能会最先完成。而视频采编则由于需要处理的数据量过多,会稍晚一些完成。在采编过程中,有时候可能会由于一些不可控的原因,导致采编出现异常,这些异常会增加采编过程所耗费的时长。在这种情况下,可以在所述合成管理模块中设定超时时长,并且当所述音频采编模块或者所述视频采编模块中的一种模块完成音频编码或者视频编码后,统计另一种模块的编码时长。例如,在音频编码模块完成音频编码之后,可以开始统计视频编码模块的编码时长。若统计的所述编码时长达到所述超时时长,则表明编码过程中可能出现了异常,此时可以停止当前的视频录 制流程,并生成异常日志。后续,可以根据用户的指令,重新开始音视频的采编过程。这样处理的目的在于,能够及时停止耗时过长的采编过程,避免无休止地等待。
在本实施例中,结束机制可以包含正常结束、异常结束以及取消。其中,正常结束可以指视频录制结束时,没有超过预设时长阈值,并且设备的可用容量也大于或者等于预设容量阈值;还可以指在视频录制时长没有超过预设时长阈值,并且设备的可用容量也大于或者等于预设容量阈值的情况下,接收到用户输入的结束录制的指令。在这种情况下,当视频录制结束后,可以对完成录制的音视频数据进行处理,并合成录制后的视频文件。
异常结束则可以指视频录制的时长超过了上述的预设时长阈值,或者设备的可用容量小于或者等于上述的预设容量阈值,在这种情况下,录制被迫结束,此时可以停止本次的视频录制流程,并将已录制的音频数据流和视频帧数据合成为录制后的视频文件。
取消则可以指用户主动放弃本次的录制过程,通常而言,若在视频录制过程中接收到视频录制取消指令,并且视频录制的持续时长小于上述的录制最低时长,则可以停止本次的视频录制流程,并清空已录制的音频数据流和视频帧数据。在这种情况下,可以认为录制视频的指令是用户的误操作,那么在录制视频的指令后短时间接收到用户的取消指令,则可以放弃本次的视频录制过程,并且对已经采集的数据也不做处理,直接清空。而如果在视频录制过程中接收到视频录制取消指令,并且视频录制的持续时长大于或者等于录制最低时长时,那么可以停止本次的视频录制流程,但需要将已录制的音频数据流和视频帧数据合成为录制后的视频文件。这样处理的原因在于,录制的持续时长已经达到录制最低时长,表明用户可能只是想录制一小部分目标视频的内容。因此,在这种情况下,在停止本次录制流程后,需要将已经采集的数据合成为录制后的视频文件。
在一个应用场景中,可能在目标视频暂停播放后,一直到视频录制结束, 目标视频都没有恢复播放。在这种情况下,由于不存在恢复播放的时机,因此无法按照上述的方式插入音频空包数据。鉴于此,请参阅图3,在一个实施例中,若所述目标视频从直播暂停起,直至视频录制结束,均处于直播暂停状态,则可以记录所述视频录制结束时的时间节点,并根据所述视频录制结束时的时间节点和所述直播暂停时的起始时间节点,计算待补充的数据量。此时,可以在所述直播暂停时的音频数据流之后,填充与所述待补充的数据量相等量的音频空包数据。这样,直播暂停后的时段,均可以填充音频空包数据,直至视频录制结束为止。
请参阅图4,本申请还提供一种视频录制系统,所述系统包括:
音视频数据采集单元,用于实时读取目标视频的音频数据流,并将所述目标视频的视频画面转换为视频帧数据;
播放状态监控单元,用于监控所述目标视频的播放状态,当所述目标视频的播放状态表征直播暂停时,记录直播暂停时的起始时间节点,并在所述目标视频的播放状态表征直播恢复时,记录直播恢复时的终止时间节点;
空包数据插入单元,用于基于所述起始时间节点和所述终止时间节点,计算待插入的数据量,并在所述直播恢复时的音频数据流之前,插入与计算得到的所述数据量相等量的音频空包数据;
录制合成单元,用于将插入了所述音频空包数据的音频数据流与所述视频帧数据合成为录制后的视频文件。
在一个实施例中,所述系统还包括:
全局变量设置单元,用于在所述目标视频的播放状态表征直播暂停时,将用于表征是否插入音频空包数据的全局变量的变量值设置为第一数值,以及在直播恢复时,检测所述全局变量当前的变量值;
相应地,所述空包数据插入单元,用于在所述全局变量当前的变量值为所述第一数值的情况下,基于所述起始时间节点和所述终止时间节点,计算待插入的数据量。
在一个实施例中,所述空包数据插入单元,还用于若所述目标视频从直播暂停起,直至视频录制结束,均处于直播暂停状态,记录所述视频录制结束时的时间节点,并根据所述视频录制结束时的时间节点和所述直播暂停时的起始时间节点,计算待补充的数据量;以及在所述直播暂停时的音频数据流之后,填充与所述待补充的数据量相等量的音频空包数据。
由上可见,本申请提供的技术方案,在实时读取目标视频的音频数据流时,可以监控目标视频的播放状态。当目标视频暂停播放时,可以记录起始时间节点,当目标视频恢复播放时,可以记录终止时间节点。这样,根据起始时间节点和终止时间节点,可以计算出在暂停过程中需要插入的音频空包数据的数据量。在目标视频恢复播放时,可以在音频数据流之前插入对应的音频空包数据。这样,插入了音频空包数据的音频数据流,可以与目标视频的视频画面相匹配,在目标视频暂停时,音频数据流可以播放无声的音频空包数据,从而可以确保录制后的视频保持音画同步。
通过以上的实施例的描述,本领域的技术人员可以清楚地了解到各实施例可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件来实现。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (16)

  1. 一种视频录制方法,所述方法包括:
    实时读取目标视频的音频数据流,并将所述目标视频的视频画面转换为视频帧数据;
    监控所述目标视频的播放状态,当所述目标视频的播放状态表征直播暂停时,记录直播暂停时的起始时间节点,并在所述目标视频的播放状态表征直播恢复时,记录直播恢复时的终止时间节点;
    基于所述起始时间节点和所述终止时间节点,计算待插入的数据量,并在所述直播恢复时的音频数据流之前,插入与计算得到的所述数据量相等量的音频空包数据;
    将插入了所述音频空包数据的音频数据流与所述视频帧数据合成为录制后的视频文件。
  2. 根据权利要求1所述的方法,其中,监控所述目标视频的播放状态包括:
    监听播放所述目标视频的播放器的指定接口传递的播放状态参数;或者,监听播放系统发出的全局广播通知,以通过所述播放状态参数或者所述全局广播通知,判定所述目标视频当前的播放状态。
  3. 根据权利要求1所述的方法,其中,在基于所述起始时间节点和所述终止时间节点,计算待插入的数据量之前,所述方法还包括:
    在所述目标视频的播放状态表征直播暂停时,将用于表征是否插入音频空包数据的全局变量的变量值设置为第一数值,以及在直播恢复时,检测所述全局变量当前的变量值;
    相应地,在所述全局变量当前的变量值为所述第一数值的情况下,基于所述起始时间节点和所述终止时间节点,计算待插入的数据量,并在所述直播恢复时的音频数据流之前,插入与计算得到的所述数据量相等量的音频空包数据。
  4. 根据权利要求3所述的方法,其中,在插入与计算得到的所述数据量相等量的音频空包数据之后,所述方法还包括:
    将所述全局变量的变量值设置为第二数值,所述第二数值用于表征当前无需插入音频空包数据。
  5. 根据权利要求1所述的方法,其中,计算待插入的数据量包括:
    根据所述起始时间节点和所述终止时间节点计算暂停时长,并根据所述暂停时长和预设的音频参数,计算待插入的数据量;其中,所述预设的音频参数包括音频采样率、音频采样位深和音频通道数。
  6. 根据权利要求1所述的方法,其中,所述方法还包括:
    若所述目标视频从直播暂停起,直至视频录制结束,均处于直播暂停状态,记录所述视频录制结束时的时间节点,并根据所述视频录制结束时的时间节点和所述直播暂停时的起始时间节点,计算待补充的数据量;
    在所述直播暂停时的音频数据流之后,填充与所述待补充的数据量相等量的音频空包数据。
  7. 根据权利要求1所述的方法,其中,实时读取目标视频的音频数据流包括:
    创建音频采编模块,并将播放所述目标视频的播放器的音频参数赋值给所述音频采编模块,以使得所述音频采编模块根据赋值的所述音频参数实时读取目标视频的音频数据流。
  8. 根据权利要求7所述的方法,其中,在创建音频采编模块之后,所述方法还包括:
    通过所述音频采编模块创建音频编码器,所述音频编码器用于将读取的所述目标视频的音频数据流编码为指定格式的音频文件。
  9. 根据权利要求7所述的方法,其中,在创建音频采编模块之后,所述方法还包括:
    将创建的所述音频采编模块添加至预设的音频采编管理模块中,所述音频采编管理模块用于在视频录制开始时,启动当前管理的一个或者多个音频采编 模块,以及在视频录制结束后,关闭所述一个或者多个音频采编模块,并清除视频录制过程中产生的临时文件。
  10. 根据权利要求7所述的方法,其中,将所述目标视频的视频画面转换为视频帧数据包括:
    创建视频采编模块,并为所述视频采编模块设定视频采集参数,所述视频采集参数至少包括视频帧文件输出路径、视频采集帧率以及视频帧编码方式;
    通过所述视频采编模块,按照所述视频采集参数将所述目标视频的视频画面重新绘制为二进制流,并将所述二进制流编码为视频帧数据。
  11. 根据权利要求10所述的方法,其中,所述方法还包括:
    创建合成管理模块,并在所述合成管理模块中设定超时时长;当所述音频采编模块或者所述视频采编模块中的一种模块完成音频编码或者视频编码后,统计另一种模块的编码时长,若统计的所述编码时长达到所述超时时长,停止当前的视频录制流程,并生成异常日志。
  12. 根据权利要求1所述的方法,其中,所述方法还包括:
    若在视频录制过程中,录制的时长大于或者等于预设时长阈值,或者设备的可用容量小于或者等于预设容量阈值,停止本次的视频录制流程,并将已录制的音频数据流和视频帧数据合成为录制后的视频文件。
  13. 根据权利要求1或12所述的方法,其中,所述方法还包括:
    若在视频录制过程中接收到视频录制取消指令,并且视频录制的持续时长小于录制最低时长,停止本次的视频录制流程,并清空已录制的音频数据流和视频帧数据;
    若在视频录制过程中接收到视频录制取消指令,并且视频录制的持续时长大于或者等于录制最低时长,停止本次的视频录制流程,并将已录制的音频数据流和视频帧数据合成为录制后的视频文件。
  14. 一种视频录制系统,其中,所述系统包括:
    音视频数据采集单元,用于实时读取目标视频的音频数据流,并将所述目标视频的视频画面转换为视频帧数据;
    播放状态监控单元,用于监控所述目标视频的播放状态,当所述目标视频的播放状态表征直播暂停时,记录直播暂停时的起始时间节点,并在所述目标视频的播放状态表征直播恢复时,记录直播恢复时的终止时间节点;
    空包数据插入单元,用于基于所述起始时间节点和所述终止时间节点,计算待插入的数据量,并在所述直播恢复时的音频数据流之前,插入与计算得到的所述数据量相等量的音频空包数据;
    录制合成单元,用于将插入了所述音频空包数据的音频数据流与所述视频帧数据合成为录制后的视频文件。
  15. 根据权利要求14所述的系统,其中,所述系统还包括:
    全局变量设置单元,用于在所述目标视频的播放状态表征直播暂停时,将用于表征是否插入音频空包数据的全局变量的变量值设置为第一数值,以及在直播恢复时,检测所述全局变量当前的变量值;
    相应地,所述空包数据插入单元,用于在所述全局变量当前的变量值为所述第一数值的情况下,基于所述起始时间节点和所述终止时间节点,计算待插入的数据量。
  16. 根据权利要求14所述的系统,其中,所述空包数据插入单元,还用于若所述目标视频从直播暂停起,直至视频录制结束,均处于直播暂停状态,记录所述视频录制结束时的时间节点,并根据所述视频录制结束时的时间节点和所述直播暂停时的起始时间节点,计算待补充的数据量;以及在所述直播暂停时的音频数据流之后,填充与所述待补充的数据量相等量的音频空包数据。
PCT/CN2019/090322 2019-04-24 2019-06-06 一种视频录制方法及系统 WO2020215453A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19800908.6A EP3748972B1 (en) 2019-04-24 2019-06-06 Video recording method and system
US16/717,638 US10951857B2 (en) 2019-04-24 2019-12-17 Method and system for video recording

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910335713.6 2019-04-24
CN201910335713.6A CN110324643B (zh) 2019-04-24 2019-04-24 一种视频录制方法及系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/717,638 Continuation US10951857B2 (en) 2019-04-24 2019-12-17 Method and system for video recording

Publications (1)

Publication Number Publication Date
WO2020215453A1 true WO2020215453A1 (zh) 2020-10-29

Family

ID=68112988

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/090322 WO2020215453A1 (zh) 2019-04-24 2019-06-06 一种视频录制方法及系统

Country Status (4)

Country Link
US (1) US10951857B2 (zh)
EP (1) EP3748972B1 (zh)
CN (1) CN110324643B (zh)
WO (1) WO2020215453A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021131921A (ja) * 2020-02-21 2021-09-09 キヤノン株式会社 音声処理装置及び音声処理装置の制御方法
CN113365139B (zh) * 2020-03-03 2023-05-02 腾讯科技(深圳)有限公司 一种基于iOS系统的视频录制方法、装置以及存储介质
CN111629255B (zh) * 2020-05-20 2022-07-01 广州视源电子科技股份有限公司 音视频录制方法、装置、计算机设备及存储介质
CN111629253A (zh) * 2020-06-11 2020-09-04 网易(杭州)网络有限公司 视频处理方法及装置、计算机可读存储介质、电子设备
CN112218109B (zh) * 2020-09-21 2023-03-24 北京达佳互联信息技术有限公司 多媒体资源获取方法、装置、设备及存储介质
CN114598895B (zh) * 2020-12-04 2023-08-11 腾讯云计算(长沙)有限责任公司 音视频处理方法、装置、设备及计算机可读存储介质
CN113038181B (zh) * 2021-03-15 2021-12-21 中国科学院计算机网络信息中心 Android平台下RTMP音视频推流中启停音频容错方法及系统
CN116052701B (zh) * 2022-07-07 2023-10-20 荣耀终端有限公司 一种音频处理方法及电子设备
CN116506077B (zh) * 2023-06-28 2023-10-20 微网优联科技(成都)有限公司 一种信号处理方法及基于xpon的通信系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007060537A (ja) * 2005-08-26 2007-03-08 Toshiba Corp 放送素材編集装置
CN105657447A (zh) * 2016-01-06 2016-06-08 无锡天脉聚源传媒科技有限公司 一种视频合并方法及装置
CN105791951A (zh) * 2014-12-26 2016-07-20 Tcl海外电子(惠州)有限公司 音视频码流的录制方法和装置
CN107018443A (zh) * 2017-02-16 2017-08-04 乐蜜科技有限公司 视频录制方法、装置和电子设备

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1175157A (ja) * 1997-08-29 1999-03-16 Sony Corp 映像信号及び音声信号の記録装置
JP2004015114A (ja) * 2002-06-03 2004-01-15 Funai Electric Co Ltd デジタル放送記録装置及びそれを備えたデジタル放送システム
US7471337B2 (en) * 2004-06-09 2008-12-30 Lsi Corporation Method of audio-video synchronization
JP4820136B2 (ja) * 2005-09-22 2011-11-24 パナソニック株式会社 映像音声記録装置及び映像音声記録方法
TWI322949B (en) * 2006-03-24 2010-04-01 Quanta Comp Inc Apparatus and method for determining rendering duration of video frame
JP2010226258A (ja) * 2009-03-19 2010-10-07 Fujitsu Ltd 情報取得システム、送信装置、データ捕捉装置、送信方法及びデータ捕捉方法
CN101729908B (zh) * 2009-11-03 2012-06-13 上海大学 一种传输流视音频同步复用方法
US9143820B2 (en) * 2012-03-25 2015-09-22 Mediatek Inc. Method for performing fluent playback control in response to decoding status, and associated apparatus
CN105357590A (zh) * 2014-08-22 2016-02-24 中兴通讯股份有限公司 一种实现终端多媒体广播的方法及装置
CN105959773B (zh) * 2016-04-29 2019-06-18 魔方天空科技(北京)有限公司 多媒体文件的处理方法和装置
CN107155126A (zh) * 2017-03-30 2017-09-12 北京奇艺世纪科技有限公司 一种音视频播放方法及装置
CN108769786B (zh) * 2018-05-25 2020-12-29 网宿科技股份有限公司 一种合成音视频数据流的方法和装置
CN108830551A (zh) * 2018-05-25 2018-11-16 北京小米移动软件有限公司 日程提示方法及装置
CN109600564B (zh) * 2018-08-01 2020-06-02 北京微播视界科技有限公司 用于确定时间戳的方法和装置
CN109600661B (zh) * 2018-08-01 2022-06-28 北京微播视界科技有限公司 用于录制视频的方法和装置
CN109547812A (zh) * 2019-01-22 2019-03-29 广州虎牙信息科技有限公司 一种直播方法、装置、移动终端与存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007060537A (ja) * 2005-08-26 2007-03-08 Toshiba Corp 放送素材編集装置
CN105791951A (zh) * 2014-12-26 2016-07-20 Tcl海外电子(惠州)有限公司 音视频码流的录制方法和装置
CN105657447A (zh) * 2016-01-06 2016-06-08 无锡天脉聚源传媒科技有限公司 一种视频合并方法及装置
CN107018443A (zh) * 2017-02-16 2017-08-04 乐蜜科技有限公司 视频录制方法、装置和电子设备

Also Published As

Publication number Publication date
US20200382741A1 (en) 2020-12-03
EP3748972A4 (en) 2020-12-09
CN110324643A (zh) 2019-10-11
CN110324643B (zh) 2021-02-02
EP3748972B1 (en) 2021-12-15
EP3748972A1 (en) 2020-12-09
US10951857B2 (en) 2021-03-16

Similar Documents

Publication Publication Date Title
WO2020215453A1 (zh) 一种视频录制方法及系统
US10911817B2 (en) Information processing system
CN101843099B (zh) 存储视频数据的装置和方法
US12015770B2 (en) Method for encoding video data, device, and storage medium
CN104618786A (zh) 音视频同步方法和装置
CN104410807A (zh) 一种多路视频同步回放方法及装置
CN110300322B (zh) 一种屏幕录制的方法、客户端和终端设备
JP4380585B2 (ja) 映像再生装置
CN113490047A (zh) 一种Android音视频播放方法
CN113207040A (zh) 一种视频远程快速回放的数据处理方法、装置及系统
US9008488B2 (en) Video recording apparatus and camera recorder
CN114257771B (zh) 一种多路音视频的录像回放方法、装置、存储介质和电子设备
CN111836071B (zh) 一种基于云会议的多媒体处理方法、装置及存储介质
US10354695B2 (en) Data recording control device and data recording control method
CN114339284A (zh) 直播延迟的监控方法、设备、存储介质及程序产品
CN111107296B (zh) 音频数据采集方法、装置、电子设备及可读存储介质
CN106792111A (zh) 一种使用ffmpeg接口录制直播网站视频的方法及装置
KR20190101579A (ko) 초고해상도 다채널 영상처리를 위한 재구성 가능한 영상 시스템
JP2014135728A (ja) 映像伝送システム及び映像伝送方法
CN115834943B (zh) 音视频同步方法及装置
CN113038181B (zh) Android平台下RTMP音视频推流中启停音频容错方法及系统
US11197028B2 (en) Recovery during video encoding
JP4350638B2 (ja) 映像記録装置
CN112511891A (zh) 录制文件处理方法、装置、设备及介质
CN117319588A (zh) 一种基于医疗设备的视频存储方法及系统

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019800908

Country of ref document: EP

Effective date: 20191119

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19800908

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE