WO2022179306A1 - Audio/video playing method and apparatus, and electronic device - Google Patents

Audio/video playing method and apparatus, and electronic device Download PDF

Info

Publication number
WO2022179306A1
WO2022179306A1 PCT/CN2021/143557 CN2021143557W WO2022179306A1 WO 2022179306 A1 WO2022179306 A1 WO 2022179306A1 CN 2021143557 W CN2021143557 W CN 2021143557W WO 2022179306 A1 WO2022179306 A1 WO 2022179306A1
Authority
WO
WIPO (PCT)
Prior art keywords
playback
audio
video image
data
video
Prior art date
Application number
PCT/CN2021/143557
Other languages
French (fr)
Chinese (zh)
Inventor
罗诚
李刚
周凡
向宇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022179306A1 publication Critical patent/WO2022179306A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/462Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs

Definitions

  • the present application relates to the technical field of intelligent terminals, and in particular, to an audio and video playback method, device, and electronic device.
  • a common application scenario is the audio and video separation application scenario.
  • Different collection devices are used to collect audio data and video image data respectively, and different collection devices are used to play audio data and video image data respectively.
  • the mobile phone cooperates with the surrounding large screen, camera, microphone, and speaker to realize the video telephony service with the remote end.
  • the audio data and video image data are respectively transmitted from the data output end to the audio playback end (eg, a smart speaker) and the video image playback end (eg, a large-screen TV) for playback.
  • the audio playback end eg, a smart speaker
  • the video image playback end eg, a large-screen TV
  • unfavorable factors such as transmission errors, unstable transmission signal strength, transmission delay, etc. These unfavorable factors will cause playback freezes and playback delays in audio playback and video image playback. This greatly reduces the user experience.
  • the present application provides an audio and video playback method, device and electronic device; the present application also provides an audio playback method, device and electronic device; The present application also provides a video image playback method, apparatus and electronic device; the present application also provides a computer-readable storage medium.
  • the present application provides a method for playing audio and video, including:
  • the chasing strategy is configured as a playback strategy for audio playback and/or video image playback, wherein the chasing strategy includes:
  • the playback delay of the image playback compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
  • the audio and video playback scenarios are identified, and the corresponding playback strategy is selected according to the specific application scenarios, which can greatly improve the user experience of audio and video playback.
  • using a catch-up strategy for audio and video playback in interactive application scenarios can ensure that the playback delay of audio and video playback meets the application requirements of interactive application scenarios, thereby greatly improving the user experience of interactive application scenarios. experience.
  • the method further includes:
  • a smooth playback strategy is configured as a playback strategy for the audio playback and/or the video image playback.
  • the catch-up strategy is used to play audio and video for non-interactive application scenarios, which can ensure that the smoothness of audio and video playback meets the application requirements of non-interactive application scenarios, thereby greatly improving the user experience of interactive application scenarios.
  • the configuration catch-up strategy is a playback strategy for audio playback and/or video image playback, including:
  • the chasing strategy is configured as a playing strategy of the audio playing and the video image playing.
  • the configuration catch-up strategy is a playback strategy for audio playback and/or video image playback, including:
  • the configuration synchronization strategy is the playback strategy of the video image playback, wherein the synchronization strategy includes:
  • the video image playback is adjusted so that the playback progress of the video image playback is synchronized with the playback progress of the audio playback.
  • the configuration catch-up strategy is a playback strategy for audio playback and/or video image playback, including:
  • the configuration synchronization strategy is the playback strategy of the audio playback, wherein the synchronization strategy includes:
  • the audio playback is adjusted based on the playback progress of the video image playback, so that the playback progress of the audio playback is synchronized with the playback progress of the video image playback.
  • an audio playback method comprising:
  • the audio playback is performed based on the interactive scene playback delay threshold, and during the execution of the audio playback, the audio playback is adjusted so that the playback progress of the audio playback catches up with the output progress of the audio and video data, so that the audio playback progresses
  • the playback delay of audio playback compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
  • the audio playback based on the interactive scene playback delay threshold includes:
  • the audio playback is adjusted so that the playback delay of the audio playback is less than or equal to the interactive scene playback delay threshold.
  • the audio playback based on the interactive scene playback delay threshold includes:
  • the data buffer amount of the unplayed data of the audio playback exceeds the preset data buffer threshold
  • the data buffer amount of the unplayed data of the audio playback is adjusted so that the data buffer amount of the unplayed data of the audio playback is less than Equal to the preset data cache threshold.
  • the adjusting the audio playback includes:
  • All or part of the unplayed data in the audio buffer is deleted, so that the audio playback skips the deleted unplayed data, so that the playback progress of the audio playback catches up with the output progress of the audio and video data.
  • the deletion of all or part of the unplayed data in the audio buffer includes:
  • the present application provides a video image playback method, comprising:
  • the video image playback is performed based on the interactive scene playback delay threshold, and during the execution of the video image playback, the video image playback is adjusted so that the playback progress of the video image playback catches up with the audio and video data output progress, So that the playback delay of the video image compared to the audio and video data output is less than or equal to a preset interactive scene playback delay threshold.
  • the video image playback based on the interactive scene playback delay threshold includes:
  • the video image playback is adjusted so that the playback delay of the video image playback is less than or equal to the interactive scene playback delay threshold.
  • the video image playback based on the interactive scene playback delay threshold includes:
  • the adjusting the video image playback includes:
  • All or part of the unplayed data in the video image buffer is deleted, so that the video image playback skips the deleted unplayed data, so that the playback progress of the video image playback catches up with the audio and video data output progress.
  • the present application provides an audio and video playback device, the device comprising:
  • a scene identification module which is used to identify the current application scene
  • a playback strategy configuration module configured to configure the chasing strategy as a playback strategy for audio playback and/or video image playback when the current application scenario is an interactive application scenario, wherein the chasing strategy includes:
  • the playback delay of the image playback compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
  • the present application provides an audio playback device, the device comprising:
  • a threshold acquisition module which is used to acquire a preset interactive scene playback delay threshold when the playback strategy of audio playback is a catch-up strategy
  • a playback adjustment module which is used to adjust the audio playback in the process of performing the audio playback, so that the playback progress of the audio playback catches up with the progress of the audio and video data output, so that the audio playback is compared with the audio and video data.
  • the output playback delay is less than or equal to the preset interactive scene playback delay threshold.
  • the present application provides a video image playback device, the device comprising:
  • a threshold acquisition module which is used to acquire a preset interactive scene playback delay threshold when the playback strategy of video image playback is a catch-up strategy
  • a playback adjustment module which is used to adjust the playback of the video image in the process of playing the video image, so that the playback progress of the video image playback catches up with the progress of the audio and video data output, so that the video image playback is relatively
  • the playback delay of the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
  • the present application provides an audio and video playback device, the device comprising:
  • a scene identification module which is used to identify the current application scene
  • a playback strategy configuration module which is used to configure a playback strategy for audio playback and/or video image playback according to the current application scenario, including: when the current application scenario is an interactive application scenario, configuring a catch-up strategy for audio playback and/or Or a playback strategy for video image playback, wherein the chasing strategy includes: adjusting the audio playback and/or the video image playback, so that the playback progress of the audio playback and/or the video image playback chases the audio and video data The progress of the output, so that the playback delay of the audio playback and/or the video image playback compared to the audio and video data output is less than or equal to a preset interactive scene playback delay threshold;
  • a threshold acquisition module configured to acquire a preset interactive scene playback delay threshold when the playback strategy of the audio playback and/or the video image playback is a catch-up strategy
  • a playback adjustment module which is used to adjust the audio playback and/or the video image playback in the process of performing the audio playback and/or the video image playback, so that the audio playback and/or the video image playback
  • the playback progress of playing catches up with the progress of the audio and video data output, so that the playback delay of the audio playback and/or the video image playback compared to the audio and video data output is less than or equal to a preset interactive scene playback delay threshold.
  • the application provides an audio and video playback system, the system comprising:
  • An audio and video output device which is used to output audio data and video data respectively; and, identify the current application scene, when the current application scene is an interactive application scene, configure the chasing strategy to play audio playback and/or video image playback strategy, wherein the chasing strategy includes: adjusting the audio playback and/or the video image playback, so that the playback progress of the audio playback and/or the video image playback catches up with the audio and video data output progress, so that The playback delay of the audio playback and/or the video image playback compared to the audio and video data output is less than or equal to a preset interactive scene playback delay threshold;
  • An audio playback device configured to receive the audio data, and play the audio data according to a playback strategy of the audio playback configured by the audio and video output device;
  • a video image playback device configured to receive the video image data, and play the video image data according to a playback strategy of the video image playback configured by the audio and video output device.
  • the application provides an audio and video playback system, the system comprising:
  • an audio output device for outputting audio data
  • a video image output device for outputting video image data
  • Audio and video playback device which is used for:
  • Identifying the current application scenario when the current application scenario is an interactive application scenario, configure a catch-up strategy as a playback strategy for audio playback and/or video image playback, wherein the catch-up strategy includes: adjusting the audio playback and/or The video image is played, so that the playback progress of the audio playback and/or the video image playback catches up with the progress of the audio and video data output, so that the audio playback and/or the video image playback is compared with the audio and video data.
  • the output playback delay is less than or equal to the preset interactive scene playback delay threshold;
  • the audio data and the video image data are played according to the playback strategy of the audio playback and the playback strategy of the video image playback.
  • the present application provides an electronic device, the electronic device comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the processor, triggering
  • the electronic device performs the method steps as described in the first aspect above.
  • the present application provides an electronic device comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the processor, The electronic device is triggered to perform the method steps described in the second aspect above.
  • the present application provides an electronic device comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the processor, The electronic device is triggered to perform the method steps described in the third aspect above.
  • the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, the computer causes the computer to execute the above-mentioned first to third aspects. 's method.
  • Fig. 1 is a schematic diagram of an application scenario of audio and video data transmission respectively;
  • Figure 2 is a schematic diagram of an application scenario of audio and video data transmission respectively
  • Fig. 3 shows the flow chart of audio and video data playing respectively
  • FIG. 4 is a schematic diagram of an application scenario where audio and video data are respectively transmitted
  • FIG. 5 is a flowchart of an audio and video playback method according to an embodiment of the present application.
  • FIG. 6 shows a structural block diagram of an audio and video playback system according to an embodiment of the present application
  • FIG. 7 is a structural block diagram of an audio and video playback system according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an application scenario where audio and video data are transmitted respectively
  • FIG. 9 shows a flowchart of an audio playback method according to an embodiment of the present application.
  • FIG. 10 shows a flowchart of an audio playback method according to an embodiment of the present application.
  • Figure 11 shows a schematic diagram of the AAC audio data decoding process
  • FIG. 12 is a flowchart of a video image playback method according to an embodiment of the present application.
  • FIG. 13 shows a flowchart of a video image playback method according to an embodiment of the present application
  • FIG. 14 shows a flowchart of an audio and video playback method according to an embodiment of the present application
  • FIG. 15 shows a flowchart of an audio and video playback method according to an embodiment of the present application
  • FIG. 16 is a partial flowchart of a method for playing audio and video according to an embodiment of the present application.
  • FIG. 1 shows a schematic diagram of an application scenario of transmitting audio and video data respectively.
  • the drone 11 collects video images for the residential building, and transmits the collected video image data to the mobile phone 13 ;
  • the microphone 12 performs audio collection for the community (for example, to collect the current environment of the shooting area of the drone 11 ) audio, collect the on-site explanation voice of the tour guide for the shooting area of the drone 11 ), and transmit the collected audio data to the mobile phone 13 .
  • the mobile phone 13 plays the audio data collected by the microphone 12 while playing the video image data collected by the UAV 11 , so that the user of the mobile phone 13 can take a virtual tour of the cell.
  • the drone 11 continuously collects video image data (for example, A11, A12, A13...) and sends it to the mobile phone 13 in sequence; the microphone 12 continuously collects audio data (for example, B11, B12, B13) synchronized with the video image data Certainly and sent to the mobile phone 13 in turn.
  • video image data for example, A11, A12, A13
  • audio data for example, B11, B12, B13
  • the mobile phone 13 plays video image data and audio data in the order in which the data is received.
  • the mobile phone 13 will also continuously receive the video image data, so as to perform continuous video image playback; however, if If the video image data transmission delay is unstable, the mobile phone 13 cannot continuously receive the video image data, and the video image playback of the mobile phone 13 may freeze or skip frames.
  • the audio data is sent, the mobile phone 13 will also continuously receive the audio data, thereby performing continuous audio playback; however, if the audio data transmission delay is unstable, the mobile phone 13 cannot continuously receive the audio data, and the audio of the mobile phone 13 Stuttering, skipping, etc. may occur during playback.
  • FIG. 2 shows a schematic diagram of an application scenario of transmitting audio and video data respectively.
  • the drone 21 collects video images for the residential building, and transmits the collected video image data to the mobile phone 23 ; the microphone 22 performs audio collection for the community (for example, to collect the current environment of the shooting area of the drone 21 )
  • the audio data collected by the tour guide is transmitted to the mobile phone 23 .
  • the mobile phone 23 transmits video image data to the large-screen TV 25 for playback; at the same time, the mobile phone 23 also transmits audio data to the smart speaker 24 for playback. In this way, the user can take a virtual tour of the community through the cooperation of the large-screen TV 25 and the smart speaker 24 .
  • the mobile phone 23 continuously transmits video image data (eg, A21, A22, A23 . . . ) to the large-screen TV 25 in sequence.
  • the mobile phone 23 continuously sends the audio data (eg, B21, B22, B23 . . . ) synchronized with the video image data to the smart speaker 24 in sequence.
  • the large-screen television 25 plays video image data in the order in which the data is received. During this process, if the transmission delay of the video image data is stable, when the mobile phone 23 continuously transmits the video image data, the large-screen TV 25 will also continuously receive the video image data, thereby performing continuous video image playback; however, if The video image data transmission delay is unstable, the large-screen TV 25 cannot continuously receive video image data, and the video image playback of the large-screen TV 25 may freeze or skip frames.
  • the smart speaker 24 plays audio data in the order in which the data is received. If the audio data transmission delay is stable, when the mobile phone 23 continuously transmits the audio data, the smart speaker 24 will also continuously receive the audio data, thereby performing continuous audio playback; however, if the audio data transmission delay is unstable, the smart speaker 24 If the audio data cannot be received continuously, the audio playback of the smart speaker 24 will be stuck, skipped, and the like.
  • an embodiment of the present application provides a method for playing audio and video based on a cache. Specifically, after receiving the audio data and video data, the receiving device for audio data and video data does not play the audio data and video data immediately, but temporarily buffers the audio data and video data, and after buffering a certain amount of data, plays the buffered data. data.
  • FIG. 3 shows a flow chart of playing audio and video data respectively. As shown in Figure 3, in the process of playing video image data:
  • Step 311 the data output terminal outputs video image data
  • Step 312 the playback terminal receives video image data
  • Step 314 The playback end plays the video image data in the buffer according to the buffer order in the buffer.
  • Step 321 the data output terminal outputs audio data
  • Step 322 the player receives audio data
  • Step 324 The playback end plays the audio data in the buffer according to the buffer order in the buffer.
  • the data output terminal for audio playback and the data output terminal for video image playback may be the same device or different devices; the playback terminal for audio playback and the playback terminal for video image playback may be the same device, or Can be for different devices.
  • the data buffering amount can be dynamically changed according to the data transmission delay fluctuation; or, a maximum data buffering amount can be set according to the maximum value of the data delay fluctuation.
  • the player plays the video image data or audio data in the cache according to the cache order in the cache, the player receives the time node of a certain video image data or a certain audio data, and the player plays a
  • the time nodes of the video image data or the audio data may be different time nodes.
  • the larger the data volume of buffered audio data or video image data the less interference caused by fluctuations in data transmission delay during subsequent playing. For example, if the audio data and video image data of a certain video are all cached and then played, then in the playback process, there will be no problems of freezing and frame skipping caused by fluctuations in data transmission delay.
  • the smart speaker 24 performs audio playback according to the arrangement order in the cache queue (in the order of B21, B22, and B23). Before being played, the larger the amount of buffered data, the greater the playback delay.
  • the increase in playback delay will greatly reduce the user experience.
  • the user of the mobile phone 13 controls the drone according to the image captured by the drone displayed on the mobile phone 13 . Then, if the time difference between outputting the video image from the drone 11 and receiving and playing the video image on the mobile phone 13 is too large, then the user confirms the drone according to the drone photographed image displayed on the mobile phone 13 . There is a deviation between the flight attitude and the real-time flight attitude of the UAV 11, and the user cannot control the flight attitude smoothly.
  • the user of the mobile phone 23 controls the drone according to the image captured by the drone displayed on the large-screen TV 25 . Then, even if the delay between the video image output by the drone 21 and the reception of the video image by the mobile phone 23 is not considered, if the video image is output from the mobile phone 23 and the video image is received and displayed by the large-screen TV 25, the difference between If the time difference is too large, then, there will be a deviation between the flight attitude of the drone confirmed by the user according to the image captured by the drone displayed on the large-screen TV 25 and the real-time flight attitude of the drone 21, and the user will not be able to smoothly Control the flight attitude.
  • FIG. 4 shows a schematic diagram of an application scenario of transmitting audio and video data respectively.
  • user A and user B use their respective mobile phones (mobile phone 43 and mobile phone 44 ) to implement a video call with each other.
  • user B sends the video image of the video call to the large-screen TV 41 for display through the mobile phone 43, and sends the voice of the video call to the smart speaker 42 for playback.
  • the playback delay between the display of the video image on the large-screen TV 41 and the playback of the voice on the smart speaker 42 increases, and the delay between user A and user B increases.
  • the interaction delay will also increase, which will greatly affect the user experience of the video call between user A and user B.
  • an embodiment of the present application proposes an audio and video playback method based on a catch-up strategy, which identifies audio and video playback scenarios, and selects a corresponding playback strategy according to specific application scenarios.
  • an embodiment of the present application proposes an audio and video playback method based on a catch-up strategy.
  • the playback delay threshold of the interaction scenario is preset.
  • the interactive scene playback delay threshold is the maximum value of playback delay between the audio data and/or video image data output by the output device and the playback end of the audio data and/or video image data on the basis of satisfying the expected user interaction experience. .
  • the playback delay thresholds of the interaction scenes between the mobile phone 43 and the large-screen TV 41 and between the mobile phone 43 and the smart speaker 42 are set to be 150 ms.
  • the application scenario is an interactive application scenario
  • audio playback and/or video image playback is performed based on the catch-up strategy.
  • the preset interactive scene playback delay threshold is called, and in the process of audio playback and/or video image playback, the audio playback and/or video image playback are adjusted to make the audio playback and/or video image playback playback.
  • the delay is less than or equal to the interactive scene playback delay threshold.
  • FIG. 5 is a flowchart of a method for playing audio and video according to an embodiment of the present application.
  • the audio and video playback method is executed by one or more devices in the audio and video playback system (including audio and video data output devices, audio playback terminals, and video image playback terminals). As shown in Figure 5:
  • Step 500 identifying the current application scenario
  • Step 501 judging whether the current application scene is an interactive application scene
  • step 502 is performed;
  • Step 502 calling the catch-up strategy
  • Step 520 using a catch-up strategy to play audio and/or video images, including:
  • Step 521 calling a preset interactive scene playback delay threshold
  • Step 522 adjust audio playback and/or video image playback, so that the playback progress of audio playback and/or video image playback catches up with the progress of audio and video data output, so that audio playback and/or video image playback is compared to audio and video data output.
  • the playback delay is less than or equal to the interactive scene playback delay threshold.
  • the audio and video playback scenarios are identified, and the corresponding playback strategy is selected according to the specific application scenarios, which can greatly improve the user experience of audio and video playback.
  • using a catch-up strategy for audio and video playback in interactive application scenarios can ensure that the playback delay of audio and video playback meets the application requirements of interactive application scenarios, thereby greatly improving the user experience of interactive application scenarios. experience.
  • FIG. 6 is a structural block diagram of an audio and video playback system according to an embodiment of the present application.
  • an audio output device 601 eg, a sound capture device
  • a video image output device 602 eg, an image capture device
  • the audio data and video image data are output to a playback device 603 (eg, a mobile phone), and the playback device 603 plays the received audio data and video image data synchronously.
  • the user realizes the viewing and listening of the video through the playback device 603 .
  • steps 500 , 501 and 502 can be performed by the playback device 603 (for example, a mobile phone).
  • the playback device 603 for example, a mobile phone.
  • the current application scene is an interactive application scene, it invokes the catch-up strategy (calls the preset interactive scene playback delay threshold), and performs playback based on the catch-up strategy.
  • the mobile phone 13 determines that the current application scenario is an interactive application scenario (real-time control of the flight attitude of the drone 11 ).
  • the mobile phone 13 invokes the preset interactive scene playback delay threshold, implements a catch-up strategy based on the interactive scene playback delay threshold, and plays audio data from the microphone 12 and video image data from the drone 11 based on the catch-up strategy.
  • FIG. 7 is a structural block diagram of an audio and video playback system according to an embodiment of the present application.
  • an audio and video output device 701 eg, a mobile phone
  • synchronously outputs audio data and video image data of the same video eg, a video of a video call
  • the audio data is output to the audio playback device 702 (eg, smart speakers)
  • the video image data is output to the video image playback device 703 (eg, a large-screen display device).
  • the audio playing device 702 plays the received audio file
  • the video image playing device 703 plays the received video image file.
  • the user realizes the viewing and listening of the video by watching the video image playing device 703 and listening to the audio playing device 702 .
  • steps 500 , 501 and 502 can be performed by an audio and video output device 701 (eg, a mobile phone).
  • the audio and video output device 701 When the audio and video output device 701 When determining that the current application scene is an interactive application scene, it invokes the catch-up strategy (including the interactive scene playback delay threshold), and sends the catch-up strategy (including the interactive scene playback delay threshold) to the audio playback device 702 and/or video image playback The device 703; the audio playback device 702 and/or the video image playback device 703 implements a catch-up strategy based on the acquired interactive scene playback delay threshold.
  • the catch-up strategy including the interactive scene playback delay threshold
  • the catch-up strategy including the interactive scene playback delay threshold
  • the mobile phone 23 determines that the current application scenario is an interactive application scenario (real-time control of the flight attitude of the drone 21 ).
  • the mobile phone 23 invokes the preset interactive scene playback delay threshold, and sends the interactive scene playback delay threshold to the large-screen TV 25; the large-screen TV 25 implements a catch-up strategy based on the acquired interactive scene playback delay threshold, and plays the data from the mobile phone based on the catch-up strategy. 23 video image data.
  • the mobile phone 43 determines that the current application scenario is an interactive application scenario (making a video call with the mobile phone 44).
  • the mobile phone 43 calls the preset interactive scene playback delay threshold, and sends the interactive scene playback delay threshold to the large-screen TV 41 and the smart speaker 42;
  • the large-screen TV 41 implements a catch-up strategy based on the acquired interactive scene playback delay threshold, and based on the catch-up strategy
  • the strategy plays the video image data from the mobile phone 23 ;
  • the smart speaker 42 implements a catch-up strategy based on the acquired interactive scene playback delay threshold, and plays the audio data from the mobile phone 23 based on the catch-up strategy.
  • the user only uses the mobile phone 13 to perform a virtual tour of the cell with the designated route, and does not control the drone 11 . Therefore, the playback delay between the audio output from the microphone 12 and the playback of the audio from the mobile phone 13, and the playback delay between the video image output from the drone 11 and the playback of the video image from the mobile phone 13 will not significantly affect the user's ability to perform virtual tours. user experience.
  • FIG. 8 is a schematic diagram of an application scenario of transmitting audio and video data respectively.
  • the mobile phone 63 is connected to the large-screen TV 61 and the smart speaker 62 . If the user uses the mobile phone 63 to watch a video (for example, in the embodiment shown in FIG. 1 , a video generated by integrating the video image collected by the drone 11 and the audio collected by the microphone 12 ).
  • the mobile phone 63 plays the video.
  • the mobile phone 63 does not output audio and video images through its own speakers and screen, but sends the video image of the played video to the large-screen TV 61 for display, and the video image of the played video is displayed.
  • the audio is sent to the smart speaker 62 for playback.
  • the playback delay between the video image and audio output by the mobile phone 63 and the video image displayed by the large-screen TV 61 and the audio playback by the smart speaker 62 will not significantly affect the user experience of video viewing.
  • an embodiment of the present application proposes an audio and video playback method based on a smooth playback strategy.
  • the data cache value is set according to the data transmission delay fluctuation situation of the application scenario. Specifically, the data cache value matches the maximum value of the data transmission delay fluctuation, or the data cache value is adjusted in real time according to the data transmission delay fluctuation.
  • audio playback and/or video image playback is performed based on the smooth playback strategy.
  • the smooth playback strategy the audio data and video image data are cached and played based on the data cache value to ensure that the audio and video data can be played smoothly when the data transmission delay fluctuates.
  • step 503 When the current application scenario is not an interactive application scenario (non-interactive scenario), perform step 503;
  • Step 503 calling the smooth playback strategy
  • Step 510 using a smooth playback strategy to play audio and/or video images.
  • step 503 may be performed by a playback device (eg, a mobile phone).
  • a playback device eg, a mobile phone.
  • the catch-up strategy is used to play audio and video for non-interactive application scenarios, which can ensure that the smoothness of audio and video playback meets the application requirements of non-interactive application scenarios, thereby greatly improving the user experience of interactive application scenarios.
  • the mobile phone 13 determines that the current application scenario is a non-interactive application scenario (simple video viewing, without controlling the flight attitude of the drone 11 ).
  • the mobile phone 13 invokes the smooth playback strategy, and plays the audio data from the microphone 12 and the video image data of the drone 11 based on the smooth playback strategy.
  • step 503 can be performed by an audio and video output device (for example, a mobile phone).
  • the audio and video output device determines that the current application scenario is an interactive application scenario, its Invoke the interactive scene playback delay threshold, and deliver the interactive scene playback delay threshold to the audio player and/or the video image player; the audio player and/or the video image player implement a catch-up strategy based on the acquired interactive scene playback delay threshold .
  • the mobile phone 63 determines that the current application scenario is a non-interactive application scenario (simple video playback scenario).
  • the mobile phone 63 invokes the smooth playback strategy and delivers the smooth playback strategy to the large-screen TV 61 and the smart speaker 62;
  • the large-screen TV 61 plays the video image data from the mobile phone 63 based on the smooth playback strategy;
  • the smart speaker 62 plays the video data from the mobile phone 63 based on the smooth playback strategy.
  • Video image data of the mobile phone 63 .
  • the smooth playback strategy can implement by adopting various implementation manners.
  • the data cache value is set according to the fluctuation situation of the data transmission delay in the application scenario.
  • the amount of data buffering is monitored to implement the catch-up strategy, and when the current amount of buffered data reaches the data buffering value, it is negotiated again to increase the buffering.
  • the video playback device reports the central device (audio and video output device); the central device (audio and video output device) monitors the change and renegotiates to the video playback device,
  • the audio playback device issues a new data cache value.
  • an embodiment of the present application proposes an audio playback method, including: when the playback strategy of audio playback is a catch-up strategy, obtaining a preset interactive scene playback delay threshold; Play, during the execution of audio playback, adjust the audio playback so that the playback progress of the audio playback catches up with the progress of the audio and video data output, so that the playback delay of the audio playback compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
  • the playback delay of the audio playback relative to the audio and video data output is directly monitored; when the playback delay of the audio playback exceeds the playback delay of the interactive scene
  • adjust the audio playback so that the playback progress of the audio playback terminal based on the audio cache data catches up with the audio and video data output progress, so as to reduce the playback delay of the audio playback, so that the playback delay of the audio playback is less than or equal to the interactive scene playback delay threshold.
  • FIG. 9 is a flowchart of an audio playback method according to an embodiment of the present application.
  • the audio player executes the following process as shown in Figure 9:
  • Step 910 monitor the playback delay of audio playback relative to the audio and video data output
  • Step 920 judging whether the playback delay of audio playback exceeds a preset interactive scene playback delay threshold
  • step 930 When the playback delay of audio playback exceeds the preset interactive scene playback delay threshold, perform step 930;
  • Step 930 adjust the audio playback, so that the playback progress of the audio playback terminal based on the audio buffer data to play catches up with the audio and video data output progress, to reduce the playback delay of the audio playback, so that the playback delay of the audio playback is less than or equal to the interactive scene playback delay threshold.
  • step 910 by adopting various solutions.
  • the output device when outputting audio data, the output device adds a time stamp to the audio data, and when playing the audio data, the playback end compares the time stamp in the audio data being played with the current moment to calculate the playback delay.
  • the audio playback terminal plays according to the storage order of the audio buffer data in the audio buffer. If the audio data in the audio buffer is moved out of the buffer after being played, then the amount of data buffered in the audio buffer can be output on behalf of the data output terminal. The difference between the progress of the audio data and the progress of playing the audio data by the audio player. Therefore, in order to simplify the operation process and reduce the difficulty of implementing the solution, in an implementation manner of the catch-up strategy, the buffered amount of audio data is monitored to implement the catch-up strategy.
  • the corresponding buffered data volume threshold is determined according to the preset interactive scene playback delay threshold, and when the current audio buffered data volume exceeds the buffered data volume threshold, it can be determined that the audio playback delay exceeds the interactive scene playback delay threshold. For example, assuming that the interactive scene playback delay threshold is 150ms, according to the audio playback speed, the buffered data volume threshold corresponding to the audio can be calculated to be 4 frames.
  • FIG. 10 is a flowchart of an audio playback method according to an embodiment of the present application.
  • the audio player executes the following process as shown in Figure 10:
  • Step 1010 monitor the data buffer amount of unplayed data of audio playback
  • Step 1020 judging whether the audio data buffering amount exceeds a preset audio data buffering threshold, wherein the data buffering threshold is a value determined according to the interactive scene playback delay threshold;
  • step 1030 When the audio data buffering amount exceeds the preset audio data buffering threshold, perform step 1030;
  • Step 1030 Adjust the audio data buffering amount of the unplayed audio data, so that the audio data buffering amount is less than or equal to a preset audio data buffering threshold.
  • step 522 when the audio playback end plays based on the audio buffer data, the audio data is not read and played in sequence according to the storage order of the data in the buffer, but the audio data is read and played in a skip mode.
  • step 522 when the playback progress of the audio player needs to be made to catch up with the progress of the audio and video data output, delete all or part of the cache of the audio player.
  • the audio data is not played, so that the playback of the player side skips the deleted data, so that the audio playback progress of the audio player side catches up with the output progress of the audio and video data.
  • the deletion amount of the audio data corresponds to the playback progress of the current audio playback, so that the playback progress of the audio playback terminal satisfies the interactive scene playback delay threshold.
  • the deletion amount of unplayed audio data corresponds to a value where the playback delay exceeds a playback delay threshold of the interactive scene.
  • step 1030 the data in the audio buffer is deleted, so that the audio data buffer volume is less than or equal to the audio buffer data volume threshold.
  • step 522 those skilled in the art may adopt various different schemes to delete the unplayed audio data in the cache. For example, delete the earliest unplayed audio data stored in the audio buffer; another example, randomly select the data to be deleted from the unplayed audio data in the audio buffer; Select the data to be deleted from the audio data.
  • step 522 the waveform and frequency of the audio data in the audio cache are monitored, and when necessary When deleting audio data in the cache, preferentially delete audio data that is not sensitive to human ears.
  • the sound frequency range that normal people can feel is 20-20000Hz, but the human ear is most sensitive to sound in the frequency range of 1000-3000Hz.
  • the audio frames are selectively dropped according to the sensitivity priority of the human ear.
  • the format of the transmitted audio frame is generally Advanced Audio Coding (AAC).
  • AAC Advanced Audio Coding
  • the frequency domain information of the frame data is extracted (usually through fast Fourier transform), and the sensitivity of this frame of audio to the human ear is analyzed (for example, 3000Hz is the most sensitive, and the sensitivity is represented by 100). ; 20000Hz is the least sensitive, and the sensitivity is represented by 0).
  • the main control module puts a part of the AAC bit stream (AAC Stream) into the input buffer, and obtains the start of a frame by searching for the synchronization word.
  • AAC Stream AAC bit stream
  • the syntax described in IEC 13818-7 starts decoding.
  • the main task of the main control module is to operate the input and output buffers and call other modules to work together.
  • the input and output buffers are provided with interfaces by a digital signal processing (digital signal processing, DSP) control module.
  • the data stored in the output buffer is the decoded Pulse Code Modulation (PCM) data, which represents the amplitude of the sound. It consists of a fixed-length buffer.
  • PCM Pulse Code Modulation
  • the interrupt processing is called to output the audio analog/digital converter (Analog) connected to the I2S interface.
  • ADC analog/digital converter
  • DAC digital to analog converter
  • DirectDrive direct drive
  • FIG. 11 is a schematic diagram showing the decoding process of AAC audio data. As shown in Figure 11, AAC Stream goes through:
  • Step 1101 noiseless decoding (Noisless Decoding), noiseless coding is Huffman coding, and its function is to further reduce the redundancy of the scale factor and the quantized spectrum, that is, to perform Huffman coding on the scale factor and the quantized spectrum information. ;
  • Step 1102 Dequantize
  • Step 1103 joint stereo (Joint Stereo), joint stereo is a certain rendering work performed on the original sample, so that the sound is more "good”;
  • Step 1104 perceptual noise substitution (PNS).
  • Step 1105 transient noise shaping (temporary noise shaping, TNS);
  • Step 1106 inverse discrete cosine transform (IMDCT).
  • Step 1107 frequency band replication (Spectral Band Replication, SBR);
  • the PCM code stream of the left and right channels is obtained, and then the main control module puts it into the output buffer and outputs it to the sound playback device.
  • the sensitive distribution of each audio frame in the current audio buffer queue is calculated first, and then which frames are lost. For example, frames with a sensitivity level below 60 account for 50%. If half of the audio frames need to be discarded at present, the frames with a sensitivity level below 60 are discarded.
  • the catch-up strategy for video image playback can be implemented in a manner similar to the above-mentioned implementation manner of the catch-up strategy for audio playback.
  • an embodiment of the present application also proposes a video image playback method, including: when the playback strategy of video image playback is a catch-up strategy, acquiring a preset interactive scene playback delay threshold; Play, during the execution of the video image playback, adjust the video image playback, so that the playback progress of the video image playback catches up with the progress of the audio and video data output, so that the playback delay of the video image playback compared to the audio and video data output is less than or equal to the preset.
  • the interactive scene playback delay threshold is a preset interactive scene playback delay threshold.
  • the playback delay of the video image playback relative to the audio and video data output is directly monitored; when the playback delay of the video image playback exceeds the interaction
  • the scene playback delay threshold is set, adjust the video image playback so that the playback progress of the video image playback terminal based on the video image cache data catches up with the audio and video data output progress, so as to reduce the playback delay of the video image playback and make the playback of the video image playback.
  • the delay is less than or equal to the interactive scene playback delay threshold.
  • FIG. 12 is a flowchart of a method for playing a video image according to an embodiment of the present application.
  • the video image player performs the following process as shown in Figure 12:
  • Step 1110 monitor the playback delay of the video image playback relative to the audio and video data output
  • Step 1120 judging whether the playback delay of the video image playback exceeds a preset interactive scene playback delay threshold
  • step 1130 When the playback delay of the video image playback exceeds the preset interactive scene playback delay threshold, perform step 1130;
  • Step 1130 adjust the video image playback, so that the playback progress of the video image playback terminal based on the video image cache data to catch up with the audio and video data output progress, to reduce the playback delay of the video image playback, so that the playback delay of the video image playback is less than or equal to Interaction scene playback delay threshold.
  • step 1110 when outputting video image data, the output device adds a time stamp to the video image data, and the video image player compares the time stamp in the video image data being played with the current moment when playing the video image data, so as to calculate the playback delay .
  • the video image player plays according to the storage order of the video image cache data in the video image cache. If the video image data in the video image cache is moved out of the cache after being played, then the amount of data cached in the video image cache can be It represents the difference between the progress of outputting video image data at the data output end and the progress of playing video image data at the video image playback end. Therefore, in order to simplify the operation process and reduce the difficulty of implementing the solution, in an implementation manner of the catch-up strategy, the buffered amount of video image data is monitored to implement the catch-up strategy.
  • the corresponding video image buffer data volume threshold is determined, and when the current video image buffer data volume exceeds the video image buffer data volume threshold, it can be determined that the playback delay exceeds the interactive scene playback delay threshold.
  • the playback delay threshold of the interactive scene is 150ms
  • the buffered data volume threshold corresponding to the video image can be calculated to be 3 frames.
  • FIG. 13 is a flowchart of a method for playing a video image according to an embodiment of the present application.
  • the video image player performs the following process as shown in Figure 13:
  • Step 1210 monitor the data buffer amount of unplayed data of audio playback and/or video image playback
  • Step 1220 judging whether the video image data buffering amount exceeds a preset data buffering threshold, wherein: the data buffering threshold is a value determined according to the interactive scene playback delay threshold; when the execution device is an audio player, the data buffering threshold is audio data Cache threshold; when the execution device is the video image player, the data cache threshold is the video image data cache threshold;
  • step 1230 When the data cache amount exceeds the preset data cache threshold, perform step 1230;
  • Step 1230 Adjust the data buffering amount of the unplayed video image data, so that the video image data buffering amount is less than or equal to a preset data buffering threshold.
  • step 522 by adopting various solutions.
  • the playback speed of video image playback is accelerated (for example, the original playback speed of 10 frames/second is adjusted to 15 frames/second); for another example, when the video image playback end plays based on the video image cache data, it does not follow the cached data.
  • the data storage order the data is read and played in sequence, but the data is read and played in a skip mode (for example, for video image buffer data, the mode of reading and playing every other frame is used for reading and playing).
  • step 522 when it is necessary to make the playback progress of the video image player catch up with the progress of the audio and video data output, delete all the buffers in the buffer of the video image player. Or part of the video image data is not played, so that the playback of the video image player skips the deleted data, so that the playback progress of the video image player catches up with the audio and video data output progress.
  • the deletion amount of the video image data corresponds to the playback progress of the current video image playback, so that the playback progress of the video image playback terminal satisfies the interactive scene playback delay threshold.
  • the deletion amount of the unplayed video image data corresponds to the value of the video image playback delay exceeding the interactive scene playback delay threshold.
  • step 1230 the data in the video image buffer is deleted, so that the video image data buffer amount is less than or equal to the video image buffer data amount threshold.
  • step 522 those skilled in the art may adopt various solutions to delete the unplayed video image data in the video image cache. For example, delete the earliest unplayed video image data stored in the video image cache; another example, randomly select the data to be deleted from the unplayed video image data in the video image cache; The data to be deleted is selected from the unplayed video image data in the cache.
  • step 522 when the video image data in the cache needs to be deleted, the video frames to be deleted are selected according to fixed intervals. .
  • the fluctuation of transmission delay will also cause the audio and video playback to be unsynchronized, thereby reducing the user experience.
  • independent equipment is used to collect and transmit audio data and video image data respectively.
  • the simultaneously captured audio data and video image data are transmitted separately. Due to the fluctuation of transmission delay, the transmission delay of audio data and video data may be different, then, the audio data and video data received by the player will be out of synchronization, which may lead to out-of-sync audio data and video data playback. .
  • the microphone 12 when the drone 11 collects the video image data A1, the microphone 12 will collect the audio data B1 that is synchronized with the video image data A1.
  • the microphone 12 transmits the audio data B1 to the mobile phone 13. If the video image data A1 and the audio data B1 are received synchronously on the mobile phone 13, then, if the mobile phone 13.
  • the video image data A1 and the audio data B1 are played at the same time as the video image data A1 and the audio data B1 are received, and the playback of the video image data A1 and the audio data B1 is synchronized.
  • the transmission delay of the microphone 12 transmitting the audio data B1 to the mobile phone 13 is greater than the transmission delay of the drone 11 transmitting the video image data A1 to the mobile phone 13, then the reception of the video image data A1 and the audio data B1 on the mobile phone 13 is If it is not synchronized, the mobile phone 13 will first receive the video image data A1. If the mobile phone 13 plays the video image data A1 while receiving the video image data A1, when the mobile phone 13 starts to play the video image data A1, the mobile phone 13 has not yet received the audio data B1, and finally the audio and video playback on the mobile phone 13 will not be synchronized. The user experience of virtual tours in the community will be greatly reduced.
  • a solution for solving the asynchronous playback of audio and video is to integrate the video image data and audio data before playing the video image data and the audio data to generate a video file with audio and video synchronization, and play the audio and video files.
  • the video file to achieve audio and video synchronization.
  • the mobile phone 13 receives video image data A1 and audio data B1, integrates them to generate a video file C1, and the mobile phone 13 plays the video file C1 to realize audio and video synchronization.
  • the application (APP) for receiving the video image data of the drone 11 and the application for receiving the audio data of the microphone 12 are applications of different manufacturers.
  • the application that receives the video image data of the drone 11 can realize the simultaneous playback of the video image data
  • the application that receives the audio data of the microphone 12 can realize the simultaneous playback of the audio data.
  • the integration of the audio and video data cannot be easily realized. For example, video image data and audio data need to be imported into a third-party video production application, and then integrated after synchronizing time tags, to generate video files.
  • the user can only perform a virtual tour after the drone 11 and the microphone 12 have completed the audio and video collection, but not when the drone 11 and the microphone 12 are collecting audio and video. Take a synchronized virtual tour.
  • different playback terminals are used to play audio data and video image data respectively.
  • the difference in transmission delay between the transmission of audio data from the mobile phone 43 to the smart speaker 42 and the transmission of video image data from the mobile phone 43 to the large-screen TV 41 may cause audio and video playback. out of sync.
  • the difference in transmission delay between the transmission of audio data from the mobile phone 63 to the smart speaker 62 and the transmission of video image data from the mobile phone 63 to the large-screen TV 41 may cause audio and video playback. out of sync.
  • an audio and video synchronization operation is introduced.
  • a catch-up strategy is adopted for both audio playback and video image playback to achieve audio and video synchronization. That is, a catch-up strategy is used for audio playback and video image playback. Adjust the audio playback and video image playback so that the playback progress of the audio playback terminal and the video image playback terminal catches up with the progress of the audio and video data output, so that the playback delay of the audio playback and video image playback compared to the audio and video data output is less than or equal to the interactive scene. Playback delay threshold.
  • Using the catch-up strategy for audio playback and video image playback can control the audio playback delay of audio playback and the video image playback delay of video image playback within the interactive scene playback delay threshold. Since playback delay is the delay between reference data playback and data output, and audio output and video image output are synchronized, the difference in playback progress between audio playback and video image playback is also controlled within The interactive scene playback delay threshold is within the threshold, thus realizing the synchronization of audio playback and video image playback.
  • a catch-up strategy is only used for one playback process in audio playback and video image playback, so that the playback delay of the playback process using the catch-up strategy is controlled within the interactive scene playback delay threshold.
  • the synchronization strategy is adopted for playback.
  • the playback process using the synchronization strategy is adjusted so that the playback progress of the playback process using the synchronization strategy is synchronized with the playback progress of the playback process using the catch-up strategy.
  • a catch-up strategy based on audio and video output progress is adopted; for video image playback, a synchronization strategy based on audio playback progress is adopted.
  • FIG. 14 is a flowchart of a method for playing audio and video according to an embodiment of the present application.
  • the audio and video playback method is executed by one or more devices in the audio and video playback system (including audio and video data output devices, audio playback terminals, and video image playback terminals). As shown in Figure 14:
  • Step 1300 identifying the current application scenario
  • Step 1301 judging whether the current application scene is an interactive application scene
  • step 1310 When the current application scenario is not an interactive application scenario, perform step 1310;
  • Step 1310 using a smooth playback strategy to play audio and/or video images
  • step 1120 When the current application scenario is an interactive application scenario, perform step 1120;
  • Step 1320 using the catch-up strategy to play audio
  • Step 1330 determine whether the playback progress of the video image is synchronized with the playback progress of the audio
  • step 1340 When the playback progress of the video image is not synchronized with the playback progress of the audio, perform step 1340;
  • Step 1340 Adjust the playback of the video image based on the playback progress of the audio, so that the playback progress of the video image is synchronized with the playback progress of the audio.
  • the video image is used to reflect the status of the remote device, and the live sound is only used to assist, and the video image freeze can easily lead to control errors, so it is necessary to give priority to ensuring the video The smoothness of image playback, minimize the number of times to adjust the video image playback progress.
  • a catch-up strategy based on audio and video output progress is adopted; for audio playback, a synchronization strategy based on video image playback progress is adopted.
  • FIG. 15 is a flowchart of a method for playing audio and video according to an embodiment of the present application.
  • the audio and video playback method is executed by one or more devices in the audio and video playback system (including audio and video data output devices, audio playback terminals, and video image playback terminals). As shown in Figure 15:
  • Step 1400 identifying the current application scenario
  • Step 1401 judging whether the current application scene is an interactive application scene
  • step 1410 When the current application scenario is not an interactive application scenario, perform step 1410;
  • Step 1410 using a smooth playback strategy to play audio and/or video images
  • step 1420 When the current application scenario is an interactive application scenario, perform step 1420;
  • Step 1420 using a catch-up strategy to play video images
  • Step 1430 determine whether the playback progress of the video image is synchronized with the playback progress of the audio
  • step 1420 When the playback progress of the video image is synchronized with the playback progress of the audio, return to step 1420;
  • step 1440 When the playback progress of the video image is not synchronized with the playback progress of the audio, perform step 1440;
  • Step 1440 Adjust the audio playback based on the playback progress of the video image, so that the playback progress of the audio is synchronized with the playback progress of the video image.
  • the judgment based on the synchronous playback delay threshold is adopted to judge whether the playback progress of the audio is synchronized with the playback progress of the video image.
  • a synchronous playback delay threshold is preset, and the synchronous playback delay threshold is the maximum value of the difference in playback progress between audio playback and video playback on the basis of satisfying the expected user video viewing experience. For example, when the difference between the playback progress of audio playback and video playback exceeds 150ms, the user will feel obvious inconsistency of audio and video playback. Therefore, the synchronous playback delay threshold is set to 150ms. When the difference value between the playback progress of the audio and the playback progress of the video image exceeds the synchronous playback delay threshold, it is determined that the playback progress of the audio is not synchronized with the playback progress of the video image.
  • the difference in playback progress can be represented by a timestamp.
  • the playback time (time stamp T1) is recorded; when the video image data B9 is played at the video image playback end, the playback time is recorded. (time stamp T2); the interval duration between T1 and T2 is the difference in playback progress between audio playback and video image playback.
  • the way of deleting the unplayed data in the cache can be used to accelerate the progress of playing, or the way of adding transition data to the unplayed data in the cache can be used to delay.
  • the playback progress is advanced, so that the playback progress of the audio is synchronized with the playback progress of the video image.
  • a transition frame is added between two video image frames (the transition frame can be a duplicate frame of adjacent video image frames), so that the playback of the transition frame is added to the playback process of the video image playback end, thereby delaying the The progress of the video image playback progress.
  • step 520 audio and video synchronization is implemented based on monitoring the data transmission delay.
  • the playback cache link is set. After the playback device receives the audio data or video image data, before the audio data or video image data is played, the audio data or video image data is additionally buffered for a certain period of time (playback buffering) before playing, and the buffering period is the audio data transmission.
  • playback buffering a certain period of time
  • the difference between the delay and the transmission delay of the video image data so that the difference between the transmission delay of the audio data and the transmission delay of the video image data can be compensated to ensure that the audio data and the video image data are played synchronously.
  • the playback buffering link is limited.
  • the duration of cached data cannot exceed the interactive scene playback delay threshold. Since the cache duration of the playback cache is the difference between the audio data transmission delay and the video image data transmission delay, when the difference between the audio data transmission delay and the video image data transmission delay exceeds the interactive scene playback delay threshold, Delete the cached data in the cache and add new data in the cache, so as to realize the catch-up strategy on the premise of maintaining the synchronization of audio and video playback.
  • the mobile phone 63 uses the mobile phone 63 to play games.
  • the game screen projection mode the mobile phone 63 does not output game audio and game video images through its own speakers and screen, but sends the game video images to the large-screen TV 61 for display, and sends the game audio to the smart speaker 62 for playback.
  • the user obtains the game content through the image output of the large-screen TV 61 and the audio output of the smart speaker 62, and performs corresponding game operations on the mobile phone. Since there is real-time interaction between the user and the mobile phone 63, the game screen projection application scenario is an interactive application scenario. Specifically, after the user activates the game screen projection mode, the mobile phone 63, the large-screen TV 61 and the smart speaker 62 execute the following processes:
  • the mobile phone 63 After the mobile phone 63 recognizes the game screen projection scene, it establishes connections with the large-screen TV 61 and the smart speaker 62 respectively;
  • the large-screen TV 61 reports a network delay of 100ms to the mobile phone 63, and the smart speaker 62 reports a network delay of 200ms to the mobile phone 63;
  • the mobile phone 63 sends a catch-up strategy to the large-screen TV 61 and the smart speaker 62 respectively (the interactive scene playback delay threshold is 150ms), and the mobile phone 63 sends the network delay data of the large-screen TV 61 to the smart speaker 62, and sends the network delay data of the smart speaker 62. to the big screen TV 61;
  • the mobile phone 63 starts to send video image frames to the large-screen TV 61 and audio frames to the smart speaker 62;
  • the smart speaker 62 learns that the network delay of the large-screen TV 61 is 100ms, which is less than its own network delay of 200ms, so it directly plays the audio frame;
  • the large-screen TV 61 knows that the network delay of the smart speaker 62 is 200ms, so the video image frame is buffered for 100ms ( ⁇ interactive scene playback delay threshold of 150ms) before decoding and playback, thereby synchronizing with the audio frame playback of the smart speaker 62.
  • the network delay of the smart speaker 62 further deteriorates to 300ms
  • the network delay of the large-screen TV 61 is still 100ms
  • the smart speaker 62 reports the network delay to the mobile phone 63 300ms
  • the large-screen TV 61 reports the network delay to the mobile phone 63 100ms ;
  • the mobile phone 63 notifies the large-screen TV 61 and the smart speaker 62 of the updated network delay data respectively;
  • the large-screen TV 61 updates the smart speaker 62 with a network delay of 300ms.
  • the calculation needs to be cached for 200ms to synchronize with the audio playback, but the threshold of 150ms is exceeded. Therefore, after discarding the first 50ms of video image data that was previously cached for 100ms, and then caching the video image data for 100ms ( 150ms in total), decode and play the video image data, so as to synchronize the video image playback with the audio playback.
  • the network of the large-screen TV 61 is interrupted for 50ms and then continues, since the current large-screen TV 61 has a video buffer of 150ms, within 50ms of the flash, there will continue to be video image data available for playback and the audio of the smart speaker 62. Data is played synchronously.
  • step 520 the transmission delay of the audio playback device and the video image playback device is periodically refreshed, and playback buffering is performed periodically based on the data transmission delay to synchronize audio playback and video image playback.
  • the synchronization effect is improved by adopting a combination of a synchronization audio and video playback scheme based on data transmission delay and other audio and video synchronization schemes.
  • a time stamp synchronization operation periodically.
  • compare the time stamps of the currently playing audio data and video image data to confirm the difference in playback progress and adjust the audio playback.
  • video image playback so that the playback progress difference between audio playback and video image playback is controlled within the synchronous playback delay threshold to achieve audio and video playback synchronization.
  • the audio and video playback scheme based on data transmission delay is used to synchronize audio and video playback.
  • the data transmission delay of audio data and video image data is regularly refreshed, and playback buffering is performed according to the difference between data transmission delays to achieve audio and video playback synchronization. .
  • the playback device does not play the audio data or video image data immediately after receiving the audio data, but buffers a certain amount of data to cope with fluctuations in transmission delay.
  • the playback device extracts data from the audio buffer or video image buffer, so that the extracted data enters the playback link.
  • a playback buffer link is added in the playback link. Specifically, after the audio data or the video image data is extracted from the audio buffer or the video image buffer, before the audio data or the video image data is played, the audio data or the video image data are additionally buffered for a certain period of time (playback buffer).
  • the buffering time is the difference between the audio data transmission delay and the video image data transmission delay, so that the difference between the audio data transmission delay and the video image data transmission delay can be compensated, so as to ensure that the audio data is synchronized with the video image data. play.
  • FIG. 16 shows a partial flowchart of an audio and video playback method according to an embodiment of the present application.
  • the following process shown in Figure 16 is performed to realize audio and video playback synchronization:
  • Step 1510 Before the transmission of audio data and video data, the initial transmission delay of audio data transmission and video image transmission is obtained, and the data type with high transmission delay is the first type of data, and the data type with low transmission delay is the second type of data.
  • Type data the initial data transmission delay of the first type of data is the first delay, and the initial data transmission delay of the second type of data is the second delay (when the transmission delays are the same, the first type of data and the second type of data are arbitrarily defined);
  • Step 1511 determine whether the first delay difference is less than or equal to the interactive scene playback delay threshold, wherein the first delay difference is the difference between the first delay and the second delay;
  • Step 1512 when the first delay difference is less than or equal to the interactive scene playback delay threshold, transmit the first type of data and the second type of data, and play directly after the first type of data enters the playback link, and after the second type of data enters the playback link. Play the second type of data after buffering the first delay difference;
  • Step 1513 when the first delay difference is greater than the interactive scene playback delay threshold, it means that in the current application scenario, if the audio and video synchronization is to be ensured, the real-time interactive requirements cannot be met. Therefore, the output delay is too high to remind the user to remind the user of the current data. The transmission link cannot meet the real-time interaction requirements.
  • the second type of data is buffered and then played, it is equivalent to the second type of data waiting for the first type of data, so it can be ensured that the first type of data and the second type of data are played synchronously.
  • step 1512 After step 1512 is performed, the following steps are also performed:
  • Step 1520 obtaining the current transmission delay of audio data transmission and video image transmission, taking the current data transmission delay of the first type of data as the third delay, and the current data transmission delay of the second type of data as the fourth delay;
  • Step 1521 determine whether the current transmission delay of the first type of data is higher than that of the second type of data
  • Step 1531 when the current transmission delay of the first type of data is higher than that of the second type of data, determine whether the second delay difference is less than or equal to the interactive scene playback delay threshold, where the second delay difference is the third delay and the third delay.
  • Step 1532 when the second delay difference is less than or equal to the interactive scene playback delay threshold, directly play the first type of data after entering the playback link, and cache the second type of data after the second type of data enters the playback link.
  • Step 1533 when the second delay difference is greater than the interactive scene playback delay threshold, directly play the first type of data after entering the playback link; delete some data in the cached second type of data, and the deletion amount is the second delay difference
  • the value that exceeds the interactive scene playback delay threshold increases the cache amount of the second type of data, and starts to play the second type of data when the cache amount of the second type of data reaches the interactive scene playback delay threshold.
  • step 1521 the following steps are also executed:
  • Step 1541 when the current transmission delay of the second type of data is higher than that of the first type of data, determine whether the third delay difference is less than or equal to the interactive scene playback delay threshold, where the third delay difference is the fourth delay and the third delay.
  • Step 1542 when the third delay difference is less than or equal to the interactive scene playback delay threshold, directly play the second type of data after entering the playback link, and cache the first type of data after the first type of data enters the playback link.
  • Step 1543 when the third delay difference is greater than the interactive scene playback delay threshold, directly play the second type of data after entering the playback link; delete some data in the cached first type of data, and the deletion amount is the third delay difference
  • the value that exceeds the interactive scene playback delay threshold increases the buffer amount of the first type of data, and starts to play the first type of data when the buffered amount of the first type of data reaches the interactive scene playback delay threshold.
  • an audio and video synchronization operation is introduced based on a smooth playback strategy.
  • a synchronization strategy is used to play a playback process of audio playback and video image playback.
  • the playback process with the synchronization strategy is adjusted so that the playback progress of the playback process with the synchronization strategy is the same as the playback progress of the playback process with the smooth playback strategy. Synchronize.
  • a smooth playback strategy is adopted for audio playback, and a synchronization strategy based on audio playback progress is adopted for video image playback.
  • a smooth playback strategy is adopted for video image playback, and a synchronization strategy based on video image playback progress is adopted for audio playback.
  • a smooth playback strategy is adopted for audio playback and video image playback.
  • a synchronization strategy based on audio playback progress is adopted on the basis of smooth playback.
  • a smooth playback strategy is adopted for audio playback and video image playback.
  • a synchronization strategy based on video and image playback progress is adopted on the basis of smooth playback.
  • step 510 those skilled in the art can implement the synchronization strategy by adopting various implementation manners.
  • the method based on the synchronous playback delay threshold is used to judge whether the playback progress of the audio is synchronized with the playback progress of the video image (for example, the difference of the playback progress is reflected by the timestamp); during the playback process, adjust the audio playback or video playback progress.
  • Image playback so that the playback progress difference between audio playback and video image playback is controlled within the synchronous playback delay threshold, so that audio and video playback synchronization is achieved on the basis of satisfying the real-time performance of audio playback and video image playback.
  • the playback progress can be accelerated by deleting the unplayed data in the cache, or the progress of the playback progress can be delayed by adding transition data to the unplayed data in the cache, so that the playback progress of the audio is different from the playback progress.
  • the playback progress of the video image is synchronized.
  • audio and video synchronization is implemented based on monitoring the data transmission delay.
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • ABEL Advanced Boolean Expression Language
  • AHDL Altera Hardware Description Language
  • HDCal JHDL
  • Lava Lava
  • Lola MyHDL
  • PALASM RHDL
  • VHDL Very-High-Speed Integrated Circuit Hardware Description Language
  • Verilog Verilog
  • the controller may be implemented in any suitable manner, for example, the controller may take the form of eg a microprocessor or processor and a computer readable medium storing computer readable program code (eg software or firmware) executable by the (micro)processor , logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of controllers include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the control logic of the memory.
  • the controller may take the form of eg a microprocessor or processor and a computer readable medium storing computer readable program code (eg software or firmware) executable by the (micro)processor , logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers
  • ASICs application specific integrated circuits
  • controllers include but are not limited to
  • the controller in addition to implementing the controller in the form of pure computer-readable program code, the controller can be implemented as logic gates, switches, application-specific integrated circuits, programmable logic controllers and embedded devices by logically programming the method steps.
  • the same function can be realized in the form of a microcontroller, etc. Therefore, such a controller can be regarded as a hardware component, and the devices included therein for realizing various functions can also be regarded as a structure within the hardware component. Or even, the means for implementing various functions can be regarded as both a software module implementing a method and a structure within a hardware component.
  • an embodiment of the present application proposes an audio and video playback device, which is installed in the audio and video output device 701 as shown in FIG. 7 , and the device includes:
  • a scene identification module which is used to identify the current application scene
  • a playback strategy configuration module which is used to configure the chasing strategy to be a playback strategy of audio playback and/or video image playback when the current application scenario is an interactive application scenario, wherein the chasing strategy includes:
  • the playback delay is less than or equal to the preset interactive scene playback delay threshold.
  • play strategy configuration module is also used for:
  • the smooth playback strategy As a playback strategy for audio playback and/or video image playback.
  • an embodiment of the present application proposes an audio playback device.
  • the device is installed in the audio playback device 702 as shown in FIG. 7 , and the audio playback device includes:
  • a threshold acquisition module which is used to acquire a preset interactive scene playback delay threshold when the playback strategy of audio playback is a catch-up strategy
  • the first playback adjustment module is used to adjust the audio playback in the process of audio playback when the playback strategy of the audio playback is a catch-up strategy, so that the playback progress of the audio playback catches up with the output of the audio and video data, so that the audio playback phase is comparable.
  • the playback delay compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
  • the audio playback device also includes:
  • the second playback adjustment module is configured to adjust the audio playback based on the smooth playback strategy during the audio playback when the playback strategy of the audio playback is the smooth playback strategy.
  • an embodiment of the present application proposes a video image playback device.
  • the device is installed in the video image playback device 703 as shown in FIG. 7 , and the video image playback device includes:
  • a threshold acquisition module which is used to acquire a preset interactive scene playback delay threshold when the playback strategy for video image playback is a catch-up strategy
  • the first playback adjustment module is used to adjust the playback of the video image in the process of playing the video image when the playback strategy of the video image playback is a catch-up strategy, so that the playback progress of the video image playback catches up with the output of the audio and video data.
  • the playback delay of the video image playback compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
  • the video image playback device also includes:
  • the second playback adjustment module is configured to adjust the playback of the video image based on the smooth playback strategy during the process of playing the video image when the playback strategy of the video image playback is the smooth playback strategy.
  • an embodiment of the present application proposes an audio, video and image playback device, the device is installed in the playback device 603 as shown in FIG. 6 , and the device includes:
  • a scene identification module which is used to identify the current application scene
  • a playback strategy configuration module which is used to configure the catch-up strategy as a playback strategy for audio playback and/or video image playback when the current application scenario is an interactive application scenario; and configure smooth playback when the current application scenario is a non-interactive application scenario
  • the strategy is the playback strategy for audio playback and/or video image playback
  • a threshold acquisition module which is used to acquire a preset interactive scene playback delay threshold when the playback strategy of audio playback and/or video image playback is a catch-up strategy
  • the first playback adjustment module which is used to adjust audio playback and/or video image playback when the playback strategy of audio playback and/or video image playback is a catch-up strategy, so that the playback progress of audio playback and/or video image playback can catch up with the sound.
  • the progress of video data output so that the playback delay of audio playback and/or video image playback is less than or equal to the preset interactive scene playback delay threshold compared to the audio and video data output;
  • the second playback adjustment module is configured to adjust audio playback and/or video image playback based on the smooth playback strategy when the playback strategy for audio playback and/or video image playback is a smooth playback strategy.
  • each module/unit for the convenience of description, when describing the device, functions are divided into various modules/units for description, and the division of each module/unit is only a logical function division. At the same time, the functions of each module/unit may be implemented in one or more software and/or hardware.
  • the apparatuses proposed in the embodiments of the present application may be fully or partially integrated into a physical entity during actual implementation, or may be physically separated.
  • these modules can all be implemented in the form of software calling through processing elements; they can also all be implemented in hardware; some modules can also be implemented in the form of software calling through processing elements, and some modules can be implemented in hardware.
  • the detection module may be a separately established processing element, or may be integrated in a certain chip of the electronic device.
  • the implementation of other modules is similar.
  • all or part of these modules can be integrated together, and can also be implemented independently.
  • each step of the above-mentioned method or each of the above-mentioned modules can be completed by an integrated logic circuit of hardware in the processor element or an instruction in the form of software.
  • the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more specific integrated circuits (Application Specific Integrated Circuit, ASIC), or, one or more digital signal processors ( Digital Singnal Processor, DSP), or, one or more Field Programmable Gate Array (Field Programmable Gate Array, FPGA), etc.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Singnal Processor
  • FPGA Field Programmable Gate Array
  • these modules can be integrated together and implemented in the form of an on-chip device (System-On-a-Chip, SOC).
  • An embodiment of the present application also proposes an electronic device (audio and video data output device), the electronic device includes a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the computer program When the processor executes, the electronic device is triggered to execute the steps of the audio and video playback method described in the embodiments of the present application.
  • An embodiment of the present application also proposes an electronic device (audio playback device), the electronic device includes a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the processor During execution, the electronic device is triggered to execute the steps of the audio playback method described in the embodiments of the present application.
  • An embodiment of the present application also proposes an electronic device (video image playback device), the electronic device includes a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are processed by the When the controller is executed, the electronic device is triggered to execute the steps of the video image playback method described in the embodiments of the present application.
  • An embodiment of the present application also proposes an electronic device (audio and video playback device), the electronic device includes a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are processed by the When the device is executed, the electronic device is triggered to execute the steps of the audio and video playback method described in the embodiments of the present application.
  • the above-mentioned one or more computer programs are stored in the above-mentioned memory, and the above-mentioned one or more computer programs include instructions.
  • the above-mentioned instructions are executed by the above-mentioned device, the above-mentioned device is made to execute the application. The method steps described in the examples.
  • the processor of the electronic device may be an on-chip device SOC, and the processor may include a central processing unit (Central Processing Unit, CPU), and may further include other types of processors.
  • the processor of the electronic device may be a PWM control chip.
  • the involved processor may include, for example, a CPU, a DSP, a microcontroller, or a digital signal processor, and may also include a GPU, an embedded Neural-network Process Units (NPU, NPU) ) and an image signal processor (Image Signal Processing, ISP), the processor may also include necessary hardware accelerators or logic processing hardware circuits, such as ASICs, or one or more integrated circuits for controlling the execution of programs in the technical solution of the present application Wait. Furthermore, the processor may have the function of operating one or more software programs, which may be stored in a storage medium.
  • the memory of the electronic device may be a read-only memory (ROM), other types of static storage devices that can store static information and instructions, random access memory (random access memory) memory, RAM) or other types of dynamic storage devices that can store information and instructions, also can be electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), compact disc read-only memory, CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, Blu-ray disk, etc.), magnetic disk storage medium or other magnetic storage device, or can also be used for portable or Any computer-readable medium that stores desired program code in the form of instructions or data structures and can be accessed by a computer.
  • ROM read-only memory
  • RAM random access memory
  • dynamic storage devices that can store information and instructions
  • EEPROM electrically erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • optical disk storage including compact disk, laser disk, optical disk, digital versatile disk
  • a processor may be combined with a memory to form a processing device, which is more commonly an independent component.
  • the processor is used to execute program codes stored in the memory to implement the method described in the embodiment of the present application.
  • the memory can also be integrated in the processor, or be independent of the processor.
  • the device, apparatus, module or unit described in the embodiments of the present application may be specifically implemented by a computer chip or entity, or implemented by a product having a certain function.
  • the embodiments of the present application may be provided as a method, an apparatus, or a computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein.
  • any function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, the computer executes the method provided by the embodiment of the present application.
  • An embodiment of the present application further provides a computer program product, where the computer program product includes a computer program that, when running on a computer, causes the computer to execute the method provided by the embodiment of the present application.
  • These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • At least one of a, b, and c may represent: a, b, c, a and b, a and c, b and c or a and b and c, where a, b, c may be single, or Can be multiple.
  • the terms “comprising”, “comprising” or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device including a series of elements not only includes those elements, but also includes Other elements not expressly listed, or which are inherent to such a process, method, article of manufacture, or apparatus are also included.
  • an element qualified by the phrase “comprising a" does not preclude the presence of additional identical elements in the process, method, article of manufacture, or device that includes the element.
  • the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

Provided are an audio/video playing method and apparatus, and an electronic device. The audio/video playing method comprises: recognizing the current application scenario; and according to the current application scenario, configuring a playing policy for audio playing and/or video image playing, including: when the current application scenario is an interactive application scenario, configuring a catch-up policy as the playing policy for audio playing and/or video image playing. According to the method of the embodiments of the present application, an audio/video playing scenario is recognized, and a corresponding playing policy is selected according to a specific application scenario, such that the user experience of audio/video playing can be greatly improved.

Description

一种音视频播放方法、装置和电子设备A kind of audio and video playback method, device and electronic equipment 技术领域technical field
本申请涉及智能终端技术领域,特别涉及一种音视频播放方法、装置和电子设备。The present application relates to the technical field of intelligent terminals, and in particular, to an audio and video playback method, device, and electronic device.
背景技术Background technique
随着分布式技术的发展,以智能手机为中心的分布式跨设备业务越来越多。在目前的分布式跨设备业务,一种比较常见的应用场景是音视频分离应用场景,采用不同的采集设备分别采集音频数据和视频图像数据,采用不同的采集设备分别播放音频数据和视频图像数据。例如,手机与周边的大屏、摄像头、麦克风、音箱配合实现与远端进行视频电话业务。With the development of distributed technology, there are more and more distributed cross-device services centered on smartphones. In the current distributed cross-device business, a common application scenario is the audio and video separation application scenario. Different collection devices are used to collect audio data and video image data respectively, and different collection devices are used to play audio data and video image data respectively. . For example, the mobile phone cooperates with the surrounding large screen, camera, microphone, and speaker to realize the video telephony service with the remote end.
在音视频分离应用场景中,音频数据与视频图像数据由数据输出端分别传输到音频播放端(例如,智能音箱)以及视频图像播放端(例如,大屏电视)后播放。在音频数据与视频图像数据传输环节中,存在传输错误、传输信号强度不稳、传输延迟等不利因素,这些不利因素会导致音频播放以及视频图像播放环节中出现播放卡顿、播放延迟等情况,从而大大降低用户体验。In the audio and video separation application scenario, the audio data and video image data are respectively transmitted from the data output end to the audio playback end (eg, a smart speaker) and the video image playback end (eg, a large-screen TV) for playback. In the transmission of audio data and video image data, there are unfavorable factors such as transmission errors, unstable transmission signal strength, transmission delay, etc. These unfavorable factors will cause playback freezes and playback delays in audio playback and video image playback. This greatly reduces the user experience.
发明内容SUMMARY OF THE INVENTION
针对现有技术下音视频分离应用场景中用户体验不佳的问题,本申请提供了一种音视频播放方法、装置和电子设备;本申请还提供了一种音频播放方法、装置和电子设备;本申请还提供了一种视频图像播放方法、装置和电子设备;本申请还提供一种计算机可读存储介质。Aiming at the problem of poor user experience in the application scenario of audio and video separation in the prior art, the present application provides an audio and video playback method, device and electronic device; the present application also provides an audio playback method, device and electronic device; The present application also provides a video image playback method, apparatus and electronic device; the present application also provides a computer-readable storage medium.
本申请实施例采用下述技术方案:The embodiment of the present application adopts the following technical solutions:
第一方面,本申请提供一种音视频播放方法,包括:In a first aspect, the present application provides a method for playing audio and video, including:
识别当前应用场景;Identify the current application scenario;
根据所述当前应用场景配置音频播放和/或视频图像播放的播放策略,包括:Configure a playback strategy for audio playback and/or video image playback according to the current application scenario, including:
在所述当前应用场景为交互类应用场景时,配置追赶策略为音频播放和/或视频图像播放的播放策略,其中,所述追赶策略包括:When the current application scenario is an interactive application scenario, the chasing strategy is configured as a playback strategy for audio playback and/or video image playback, wherein the chasing strategy includes:
调整所述音频播放和/或所述视频图像播放,令所述音频播放和/或所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放和/或所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。Adjust the audio playback and/or the video image playback, so that the playback progress of the audio playback and/or the video image playback catches up with the audio and video data output progress, so that the audio playback and/or the video playback progress The playback delay of the image playback compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
根据本申请实施例的方法,针对音视频播放场景进行识别,根据具体的应用场景选择对应的播放策略,可以大大提高音视频播放的用户体验。According to the method of the embodiment of the present application, the audio and video playback scenarios are identified, and the corresponding playback strategy is selected according to the specific application scenarios, which can greatly improve the user experience of audio and video playback.
进一步的,根据本申请实施例的方法,针对交互类应用场景采用追赶策略进行音视频播放,可以确保音视频播放的播放延迟满足交互类应用场景的应用需求,从而大大提高交互类应用场景的用户体验。Further, according to the method of the embodiment of the present application, using a catch-up strategy for audio and video playback in interactive application scenarios can ensure that the playback delay of audio and video playback meets the application requirements of interactive application scenarios, thereby greatly improving the user experience of interactive application scenarios. experience.
在上述第一方面的一种可行的应用方案中,所述方法还包括:In a feasible application scheme of the above-mentioned first aspect, the method further includes:
在所述当前应用场景为非交互类应用场景时,配置平滑播放策略为所述音频播放和/或所述视频图像播放的播放策略。When the current application scenario is a non-interactive application scenario, a smooth playback strategy is configured as a playback strategy for the audio playback and/or the video image playback.
根据本申请实施例的方法,针对非交互类应用场景采用追赶策略进行音视频播放,可以确保音视频播放的流畅度满足非交互类应用场景的应用需求,从而大大提高交互类应用场景的用户体验。According to the method of the embodiment of the present application, the catch-up strategy is used to play audio and video for non-interactive application scenarios, which can ensure that the smoothness of audio and video playback meets the application requirements of non-interactive application scenarios, thereby greatly improving the user experience of interactive application scenarios. .
在上述第一方面的一种可行的应用方案中,所述配置追赶策略为音频播放和/或视频图像播放的播放策略,包括:In a feasible application solution of the above first aspect, the configuration catch-up strategy is a playback strategy for audio playback and/or video image playback, including:
配置所述追赶策略为所述音频播放以及所述视频图像播放的播放策略。The chasing strategy is configured as a playing strategy of the audio playing and the video image playing.
在上述第一方面的一种可行的应用方案中,所述配置追赶策略为音频播放和/或视频图像播放的播放策略,包括:In a feasible application solution of the above first aspect, the configuration catch-up strategy is a playback strategy for audio playback and/or video image playback, including:
配置所述追赶策略为所述音频播放的播放策略;Configuring the chasing strategy to be the playback strategy of the audio playback;
配置同步策略为所述视频图像播放的播放策略,其中,所述同步策略包括:The configuration synchronization strategy is the playback strategy of the video image playback, wherein the synchronization strategy includes:
以所述音频播放的播放进度为基准,对所述视频图像播放进行调整,使得所述视频图像播放的播放进度与所述音频播放的播放进度同步。Based on the playback progress of the audio playback, the video image playback is adjusted so that the playback progress of the video image playback is synchronized with the playback progress of the audio playback.
在上述第一方面的一种可行的应用方案中,所述配置追赶策略为音频播放和/或视频图像播放的播放策略,包括:In a feasible application solution of the above first aspect, the configuration catch-up strategy is a playback strategy for audio playback and/or video image playback, including:
配置所述追赶策略为所述视频图像播放的播放策略;Configuring the chasing strategy to be a playback strategy of the video image playback;
配置同步策略为所述音频播放的播放策略,其中,所述同步策略包括:The configuration synchronization strategy is the playback strategy of the audio playback, wherein the synchronization strategy includes:
以所述视频图像播放的播放进度为基准,对所述音频播放进行调整,使得所述音频播放的播放进度与所述视频图像播放的播放进度同步。The audio playback is adjusted based on the playback progress of the video image playback, so that the playback progress of the audio playback is synchronized with the playback progress of the video image playback.
第二方面,本申请提供一种音频播放方法,包括:In a second aspect, the present application provides an audio playback method, comprising:
当音频播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;When the playback strategy of audio playback is the catch-up strategy, obtain the preset interactive scene playback delay threshold;
基于所述交互场景播放延迟阈值进行所述音频播放,在所述音频播放的执行过程中,调整所述音频播放,令所述音频播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。The audio playback is performed based on the interactive scene playback delay threshold, and during the execution of the audio playback, the audio playback is adjusted so that the playback progress of the audio playback catches up with the output progress of the audio and video data, so that the audio playback progresses The playback delay of audio playback compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
在上述第二方面的一种可行的应用方案中,所述基于所述交互场景播放延迟阈值进行所述音频播放,包括:In a feasible application solution of the second aspect, the audio playback based on the interactive scene playback delay threshold includes:
监控所述音频播放相对于音视频数据输出的播放延迟;Monitoring the playback delay of the audio playback relative to the audio and video data output;
当所述音频播放的播放延迟超出所述交互场景播放延迟阈值时,调整所述音频播放,以使得所述音频播放的播放延迟小于等于所述交互场景播放延迟阈值。When the playback delay of the audio playback exceeds the interactive scene playback delay threshold, the audio playback is adjusted so that the playback delay of the audio playback is less than or equal to the interactive scene playback delay threshold.
在上述第二方面的一种可行的应用方案中,所述基于所述交互场景播放延迟阈值进行所述音频播放,包括:In a feasible application solution of the second aspect, the audio playback based on the interactive scene playback delay threshold includes:
监控所述音频播放的未播放数据的数据缓存量;Monitor the data buffer amount of the unplayed data of the audio playback;
当所述音频播放的未播放数据的数据缓存量超出预设的数据缓存阈值时,调整所述音频播放的未播放数据的数据缓存量,使得所述音频播放的未播放数据的数据缓存量小于等于预设的数据缓存阈值。When the data buffer amount of the unplayed data of the audio playback exceeds the preset data buffer threshold, the data buffer amount of the unplayed data of the audio playback is adjusted so that the data buffer amount of the unplayed data of the audio playback is less than Equal to the preset data cache threshold.
在上述第二方面的一种可行的应用方案中,所述调整所述音频播放,包括:In a feasible application solution of the above second aspect, the adjusting the audio playback includes:
删除音频缓存中的全部或部分的未播放数据,以使得所述音频播放跳过被删除的未播放数据,从而使得所述音频播放的播放进度追赶音视频数据输出的进度。All or part of the unplayed data in the audio buffer is deleted, so that the audio playback skips the deleted unplayed data, so that the playback progress of the audio playback catches up with the output progress of the audio and video data.
在上述第二方面的一种可行的应用方案中,所述删除音频缓存中的全部或部分的未播放数据,包括:In a feasible application solution of the above second aspect, the deletion of all or part of the unplayed data in the audio buffer includes:
监控音频缓存中音频数据的波形、频率,在需要删除音频缓存中的未播放数据时,优先删除对于人耳不敏感的音频数据。Monitor the waveform and frequency of the audio data in the audio buffer, and when it is necessary to delete unplayed data in the audio buffer, preferentially delete the audio data that is not sensitive to human ears.
第三方面,本申请提供一种视频图像播放方法,包括:In a third aspect, the present application provides a video image playback method, comprising:
当视频图像播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;When the playback strategy of video image playback is the catch-up strategy, obtain the preset interactive scene playback delay threshold;
基于所述交互场景播放延迟阈值进行所述视频图像播放,在所述视频图像播放的执行过程中,调整所述视频图像播放,令所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。The video image playback is performed based on the interactive scene playback delay threshold, and during the execution of the video image playback, the video image playback is adjusted so that the playback progress of the video image playback catches up with the audio and video data output progress, So that the playback delay of the video image compared to the audio and video data output is less than or equal to a preset interactive scene playback delay threshold.
在上述第三方面的一种可行的应用方案中,所述基于所述交互场景播放延迟阈值进行所述视频图像播放,包括:In a feasible application solution of the third aspect, the video image playback based on the interactive scene playback delay threshold includes:
监控所述视频图像播放相对于音视频数据输出的播放延迟;Monitoring the playback delay of the video image playback relative to the audio and video data output;
当所述视频图像播放的播放延迟超出所述交互场景播放延迟阈值时,调整所述视频图像播放,以使得所述视频图像播放的播放延迟小于等于所述交互场景播放延迟阈值。When the playback delay of the video image playback exceeds the interactive scene playback delay threshold, the video image playback is adjusted so that the playback delay of the video image playback is less than or equal to the interactive scene playback delay threshold.
在上述第三方面的一种可行的应用方案中,所述基于所述交互场景播放延迟阈值进行所述视频图像播放,包括:In a feasible application solution of the third aspect, the video image playback based on the interactive scene playback delay threshold includes:
监控所述视频图像播放的未播放数据的数据缓存量;Monitoring the data buffer amount of the unplayed data played by the video image;
当所述视频图像播放的未播放数据的数据缓存量超出预设的数据缓存阈值时,调整所述视频图像播放的未播放数据的数据缓存量,使得所述视频图像播放的未播放数据的数据缓存量小于等于预设的数据缓存阈值。When the data cache amount of the unplayed data played by the video image exceeds the preset data cache threshold, adjust the data cache size of the unplayed data played by the video image so that the data of the unplayed data played by the video image is The cache amount is less than or equal to the preset data cache threshold.
在上述第三方面的一种可行的应用方案中,所述调整所述视频图像播放,包括:In a feasible application solution of the above third aspect, the adjusting the video image playback includes:
删除视频图像缓存中的全部或部分的未播放数据,以使得所述视频图像播放跳过被删除的未播放数据,从而使得所述视频图像播放的播放进度追赶音视频数据输出的进度。All or part of the unplayed data in the video image buffer is deleted, so that the video image playback skips the deleted unplayed data, so that the playback progress of the video image playback catches up with the audio and video data output progress.
第四方面,本申请提供一种音视频播放装置,装置包括:In a fourth aspect, the present application provides an audio and video playback device, the device comprising:
场景识别模块,其用于识别当前应用场景;a scene identification module, which is used to identify the current application scene;
播放策略配置模块,其用于在所述当前应用场景为交互类应用场景时,配置追赶策略为音频播放和/或视频图像播放的播放策略,其中,所述追赶策略包括:A playback strategy configuration module, configured to configure the chasing strategy as a playback strategy for audio playback and/or video image playback when the current application scenario is an interactive application scenario, wherein the chasing strategy includes:
调整所述音频播放和/或所述视频图像播放,令所述音频播放和/或所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放和/或所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。Adjust the audio playback and/or the video image playback, so that the playback progress of the audio playback and/or the video image playback catches up with the audio and video data output progress, so that the audio playback and/or the video playback progress The playback delay of the image playback compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
第五方面,本申请提供一种音频播放装置,装置包括:In a fifth aspect, the present application provides an audio playback device, the device comprising:
阈值获取模块,其用于当音频播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;a threshold acquisition module, which is used to acquire a preset interactive scene playback delay threshold when the playback strategy of audio playback is a catch-up strategy;
播放调整模块,其用于在进行所述音频播放的过程中调整所述音频播放,令所述音频播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。A playback adjustment module, which is used to adjust the audio playback in the process of performing the audio playback, so that the playback progress of the audio playback catches up with the progress of the audio and video data output, so that the audio playback is compared with the audio and video data. The output playback delay is less than or equal to the preset interactive scene playback delay threshold.
第六方面,本申请提供一种视频图像播放装置,装置包括:In a sixth aspect, the present application provides a video image playback device, the device comprising:
阈值获取模块,其用于当视频图像播放的播放策略为追赶策略时,获取预设的交互场 景播放延迟阈值;a threshold acquisition module, which is used to acquire a preset interactive scene playback delay threshold when the playback strategy of video image playback is a catch-up strategy;
播放调整模块,其用于在进行所述视频图像播放的过程中调整所述视频图像播放,令所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。A playback adjustment module, which is used to adjust the playback of the video image in the process of playing the video image, so that the playback progress of the video image playback catches up with the progress of the audio and video data output, so that the video image playback is relatively The playback delay of the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
第七方面,本申请提供一种音视频播放装置,装置包括:In a seventh aspect, the present application provides an audio and video playback device, the device comprising:
场景识别模块,其用于识别当前应用场景;a scene identification module, which is used to identify the current application scene;
播放策略配置模块,其用于根据所述当前应用场景配置音频播放和/或视频图像播放的播放策略,包括:在所述当前应用场景为交互类应用场景时,配置追赶策略为音频播放和/或视频图像播放的播放策略,其中,所述追赶策略包括:调整所述音频播放和/或所述视频图像播放,令所述音频播放和/或所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放和/或所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值;A playback strategy configuration module, which is used to configure a playback strategy for audio playback and/or video image playback according to the current application scenario, including: when the current application scenario is an interactive application scenario, configuring a catch-up strategy for audio playback and/or Or a playback strategy for video image playback, wherein the chasing strategy includes: adjusting the audio playback and/or the video image playback, so that the playback progress of the audio playback and/or the video image playback chases the audio and video data The progress of the output, so that the playback delay of the audio playback and/or the video image playback compared to the audio and video data output is less than or equal to a preset interactive scene playback delay threshold;
阈值获取模块,其用于当所述音频播放和/或所述视频图像播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;a threshold acquisition module, configured to acquire a preset interactive scene playback delay threshold when the playback strategy of the audio playback and/or the video image playback is a catch-up strategy;
播放调整模块,其用于在进行所述音频播放和/或所述视频图像播放的过程中调整所述音频播放和/或所述视频图像播放,令所述音频播放和/或所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放和/或所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。A playback adjustment module, which is used to adjust the audio playback and/or the video image playback in the process of performing the audio playback and/or the video image playback, so that the audio playback and/or the video image playback The playback progress of playing catches up with the progress of the audio and video data output, so that the playback delay of the audio playback and/or the video image playback compared to the audio and video data output is less than or equal to a preset interactive scene playback delay threshold.
第八方面,本申请提供一种音视频播放系统,系统包括:In an eighth aspect, the application provides an audio and video playback system, the system comprising:
音视频输出装置,其用于分别输出音频数据以及视频数据;以及,识别当前应用场景,在所述当前应用场景为交互类应用场景时,配置追赶策略为音频播放和/或视频图像播放的播放策略,其中,所述追赶策略包括:调整所述音频播放和/或所述视频图像播放,令所述音频播放和/或所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放和/或所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值;An audio and video output device, which is used to output audio data and video data respectively; and, identify the current application scene, when the current application scene is an interactive application scene, configure the chasing strategy to play audio playback and/or video image playback strategy, wherein the chasing strategy includes: adjusting the audio playback and/or the video image playback, so that the playback progress of the audio playback and/or the video image playback catches up with the audio and video data output progress, so that The playback delay of the audio playback and/or the video image playback compared to the audio and video data output is less than or equal to a preset interactive scene playback delay threshold;
音频播放装置,其用于接收所述音频数据,按照所述音视频输出装置配置的所述音频播放的播放策略,播放所述音频数据;An audio playback device, configured to receive the audio data, and play the audio data according to a playback strategy of the audio playback configured by the audio and video output device;
视频图像播放装置,其用于接收所述视频图像数据,按照所述音视频输出装置配置的所述视频图像播放的播放策略,播放所述视频图像数据。A video image playback device, configured to receive the video image data, and play the video image data according to a playback strategy of the video image playback configured by the audio and video output device.
第九方面,本申请提供一种音视频播放系统,系统包括:In a ninth aspect, the application provides an audio and video playback system, the system comprising:
音频输出装置,其用于输出音频数据;an audio output device for outputting audio data;
视频图像输出装置,其用于输出视频图像数据;a video image output device for outputting video image data;
音视频播放装置,其用于:Audio and video playback device, which is used for:
接收所述音频数据以及所述视频图像数据;receiving the audio data and the video image data;
识别当前应用场景,在所述当前应用场景为交互类应用场景时,配置追赶策略为音频播放和/或视频图像播放的播放策略,其中,所述追赶策略包括:调整所述音频播放和/或所述视频图像播放,令所述音频播放和/或所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放和/或所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值;Identifying the current application scenario, when the current application scenario is an interactive application scenario, configure a catch-up strategy as a playback strategy for audio playback and/or video image playback, wherein the catch-up strategy includes: adjusting the audio playback and/or The video image is played, so that the playback progress of the audio playback and/or the video image playback catches up with the progress of the audio and video data output, so that the audio playback and/or the video image playback is compared with the audio and video data. The output playback delay is less than or equal to the preset interactive scene playback delay threshold;
按照所述音频播放的播放策略以及所述视频图像播放的播放策略,播放所述音频数据以及所述视频图像数据。The audio data and the video image data are played according to the playback strategy of the audio playback and the playback strategy of the video image playback.
第十方面,本申请提供一种电子设备,所述电子设备包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发所述电子设备执行如上述第一方面所述的方法步骤。In a tenth aspect, the present application provides an electronic device, the electronic device comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the processor, triggering The electronic device performs the method steps as described in the first aspect above.
第十一方面,本申请提供一种电子设备,所述电子设备包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发所述电子设备执行如上述第二方面所述的方法步骤。In an eleventh aspect, the present application provides an electronic device comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the processor, The electronic device is triggered to perform the method steps described in the second aspect above.
第十二方面,本申请提供一种电子设备,所述电子设备包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发所述电子设备执行如上述第三方面所述的方法步骤。In a twelfth aspect, the present application provides an electronic device comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the processor, The electronic device is triggered to perform the method steps described in the third aspect above.
第十三方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行如上述第一~第三方面所述的的方法。According to a thirteenth aspect, the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, the computer causes the computer to execute the above-mentioned first to third aspects. 's method.
附图说明Description of drawings
图1所示为一音视频数据分别传输的应用场景示意图;Fig. 1 is a schematic diagram of an application scenario of audio and video data transmission respectively;
图2所示为一音视频数据分别传输的应用场景示意图;Figure 2 is a schematic diagram of an application scenario of audio and video data transmission respectively;
图3所示为音视频数据分别播放的流程图;Fig. 3 shows the flow chart of audio and video data playing respectively;
图4所示为一音视频数据分别传输的应用场景示意图;FIG. 4 is a schematic diagram of an application scenario where audio and video data are respectively transmitted;
图5所示为根据本申请一实施例的音视频播放方法流程图;5 is a flowchart of an audio and video playback method according to an embodiment of the present application;
图6所示为根据本申请一实施例的音视频播放系统的结构框图;FIG. 6 shows a structural block diagram of an audio and video playback system according to an embodiment of the present application;
图7所示为根据本申请一实施例的音视频播放系统的结构框图;FIG. 7 is a structural block diagram of an audio and video playback system according to an embodiment of the present application;
图8所示为一音视频数据分别传输的应用场景示意图;FIG. 8 is a schematic diagram of an application scenario where audio and video data are transmitted respectively;
图9所示为根据本申请一实施例的音频播放方法流程图;FIG. 9 shows a flowchart of an audio playback method according to an embodiment of the present application;
图10所示为根据本申请一实施例的音频播放方法流程图;FIG. 10 shows a flowchart of an audio playback method according to an embodiment of the present application;
图11所示为AAC音频数据解码流程示意图;Figure 11 shows a schematic diagram of the AAC audio data decoding process;
图12所示为根据本申请一实施例的视频图像播放方法流程图;FIG. 12 is a flowchart of a video image playback method according to an embodiment of the present application;
图13所示为根据本申请一实施例的视频图像播放方法流程图;FIG. 13 shows a flowchart of a video image playback method according to an embodiment of the present application;
图14所示为根据本申请一实施例的音视频播放方法流程图;FIG. 14 shows a flowchart of an audio and video playback method according to an embodiment of the present application;
图15所示为根据本申请一实施例的音视频播放方法流程图;FIG. 15 shows a flowchart of an audio and video playback method according to an embodiment of the present application;
图16所示为根据本申请一实施例的音视频播放方法部分流程图。FIG. 16 is a partial flowchart of a method for playing audio and video according to an embodiment of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the objectives, technical solutions and advantages of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the specific embodiments of the present application and the corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨 在限定本申请。The terms used in the embodiments of the present application are only used to explain specific embodiments of the present application, and are not intended to limit the present application.
在音视频分离应用场景中,音视频数据传输环节的主要不利因素是传输延迟的波动。传输延迟的波动会导致音视频播放出现卡顿,从而降低用户体验。例如,图1所示为一音视频数据分别传输的应用场景示意图。如图1所示,无人机11针对小区建筑进行视频图像采集,将采集到的视频图像数据传输到手机13;麦克风12针对小区进行音频采集(例如,采集无人机11拍摄区域当前的环境音,采集导游针对无人机11拍摄区域的现场讲解语音),将采集到的音频数据传输到手机13。手机13在播放无人机11采集到的视频图像数据的同时播放麦克风12采集到的音频数据,从而使得手机13的用户可以身临其境的对小区进行虚拟游览。In the audio and video separation application scenario, the main disadvantage of audio and video data transmission is the fluctuation of transmission delay. Fluctuations in transmission delay can cause audio and video playback to be stuck, thus degrading the user experience. For example, FIG. 1 shows a schematic diagram of an application scenario of transmitting audio and video data respectively. As shown in FIG. 1 , the drone 11 collects video images for the residential building, and transmits the collected video image data to the mobile phone 13 ; the microphone 12 performs audio collection for the community (for example, to collect the current environment of the shooting area of the drone 11 ) audio, collect the on-site explanation voice of the tour guide for the shooting area of the drone 11 ), and transmit the collected audio data to the mobile phone 13 . The mobile phone 13 plays the audio data collected by the microphone 12 while playing the video image data collected by the UAV 11 , so that the user of the mobile phone 13 can take a virtual tour of the cell.
一般的,当无人机11连续采集视频图像数据(例如,A11、A12、A13…)并依次发送到手机13;麦克风12连续采集与视频图像数据同步的音频数据(例如,B11、B12、B13…)并依次发送到手机13。Generally, when the drone 11 continuously collects video image data (for example, A11, A12, A13...) and sends it to the mobile phone 13 in sequence; the microphone 12 continuously collects audio data (for example, B11, B12, B13) synchronized with the video image data ...) and sent to the mobile phone 13 in turn.
手机13按照数据接收顺序播放视频图像数据以及音频数据。在此过程中,如果视频图像数据传输延迟稳定,在无人机11连续进行视频图像数据的发送时,手机13也会连续的接收到视频图像数据,从而进行连续的视频图像播放;然而,如果视频图像数据传输延迟不稳定,手机13就无法连续的接收到视频图像数据,手机13的视频图像播放就会出现卡顿、跳帧等情况;同样,如果音频数据传输延迟稳定,在麦克风12连续进行音频数据的发送时,手机13也会连续的接收到音频数据,从而进行连续的音频播放;然而,如果音频数据传输延迟不稳定,手机13就无法连续的接收到音频数据,手机13的音频播放就会出现卡顿、跳音等情况。The mobile phone 13 plays video image data and audio data in the order in which the data is received. During this process, if the transmission delay of the video image data is stable, when the drone 11 continuously transmits the video image data, the mobile phone 13 will also continuously receive the video image data, so as to perform continuous video image playback; however, if If the video image data transmission delay is unstable, the mobile phone 13 cannot continuously receive the video image data, and the video image playback of the mobile phone 13 may freeze or skip frames. When the audio data is sent, the mobile phone 13 will also continuously receive the audio data, thereby performing continuous audio playback; however, if the audio data transmission delay is unstable, the mobile phone 13 cannot continuously receive the audio data, and the audio of the mobile phone 13 Stuttering, skipping, etc. may occur during playback.
又例如,图2所示为一音视频数据分别传输的应用场景示意图。如图2所示,无人机21针对小区建筑进行视频图像采集,将采集到的视频图像数据传输到手机23;麦克风22针对小区进行音频采集(例如,采集无人机21拍摄区域当前的环境音,采集导游针对无人机21拍摄区域的现场讲解语音),将采集到的音频数据传输到手机23。手机23将视频图像数据传输到大屏电视25进行播放;同时,手机23还将音频数据传输到智能音箱24进行播放。这样,用户就可以通过大屏电视25与智能音箱24的配合,身临其境的对小区进行虚拟游览。For another example, FIG. 2 shows a schematic diagram of an application scenario of transmitting audio and video data respectively. As shown in FIG. 2 , the drone 21 collects video images for the residential building, and transmits the collected video image data to the mobile phone 23 ; the microphone 22 performs audio collection for the community (for example, to collect the current environment of the shooting area of the drone 21 ) The audio data collected by the tour guide is transmitted to the mobile phone 23 . The mobile phone 23 transmits video image data to the large-screen TV 25 for playback; at the same time, the mobile phone 23 also transmits audio data to the smart speaker 24 for playback. In this way, the user can take a virtual tour of the community through the cooperation of the large-screen TV 25 and the smart speaker 24 .
一般的,手机23连续的将视频图像数据(例如,A21、A22、A23…)依次发送到大屏电视25。手机23连续的将与视频图像数据同步的音频数据(例如,B21、B22、B23…)依次发送到智能音箱24。Generally, the mobile phone 23 continuously transmits video image data (eg, A21, A22, A23 . . . ) to the large-screen TV 25 in sequence. The mobile phone 23 continuously sends the audio data (eg, B21, B22, B23 . . . ) synchronized with the video image data to the smart speaker 24 in sequence.
大屏电视25按照数据接收顺序播放视频图像数据。在此过程中,如果视频图像数据传输延迟稳定,在手机23连续进行视频图像数据的发送时,大屏电视25也会连续的接收到视频图像数据,从而进行连续的视频图像播放;然而,如果视频图像数据传输延迟不稳定,大屏电视25就无法连续的接收到视频图像数据,大屏电视25的视频图像播放就会出现卡顿、跳帧等情况。The large-screen television 25 plays video image data in the order in which the data is received. During this process, if the transmission delay of the video image data is stable, when the mobile phone 23 continuously transmits the video image data, the large-screen TV 25 will also continuously receive the video image data, thereby performing continuous video image playback; however, if The video image data transmission delay is unstable, the large-screen TV 25 cannot continuously receive video image data, and the video image playback of the large-screen TV 25 may freeze or skip frames.
同样,智能音箱24按照数据接收顺序播放音频数据。如果音频数据传输延迟稳定,在手机23连续进行音频数据的发送时,智能音箱24也会连续的接收到音频数据,从而进行连续的音频播放;然而,如果音频数据传输延迟不稳定,智能音箱24就无法连续的接收到音频数据,智能音箱24的音频播放就会出现卡顿、跳音等情况。Likewise, the smart speaker 24 plays audio data in the order in which the data is received. If the audio data transmission delay is stable, when the mobile phone 23 continuously transmits the audio data, the smart speaker 24 will also continuously receive the audio data, thereby performing continuous audio playback; however, if the audio data transmission delay is unstable, the smart speaker 24 If the audio data cannot be received continuously, the audio playback of the smart speaker 24 will be stuck, skipped, and the like.
针对数据传输延迟不稳定而导致的问题,本申请一实施例中提供了一种基于缓存进行音视频播放方法。具体的,在音频数据以及视频数据的接收设备在接收到音频数据以及视频数据后,并不立即播放,而是将音频数据以及视频数据暂时缓存,在缓存一定量数据后,再播放缓存中的数据。图3所示为音视频数据分别播放的流程图。如图3所示,在播放视频图像数据的过程中:Aiming at the problem caused by unstable data transmission delay, an embodiment of the present application provides a method for playing audio and video based on a cache. Specifically, after receiving the audio data and video data, the receiving device for audio data and video data does not play the audio data and video data immediately, but temporarily buffers the audio data and video data, and after buffering a certain amount of data, plays the buffered data. data. FIG. 3 shows a flow chart of playing audio and video data respectively. As shown in Figure 3, in the process of playing video image data:
步骤311,数据输出端输出视频图像数据; Step 311, the data output terminal outputs video image data;
步骤312,播放端接收视频图像数据; Step 312, the playback terminal receives video image data;
步骤313,播放端缓存视频图像数据; Step 313, the player buffers the video image data;
步骤314,播放端按照缓存中的缓存顺序,播放缓存中的视频图像数据。Step 314: The playback end plays the video image data in the buffer according to the buffer order in the buffer.
在播放音频数据的过程中:During playback of audio data:
步骤321,数据输出端输出音频数据; Step 321, the data output terminal outputs audio data;
步骤322,播放端接收音频数据; Step 322, the player receives audio data;
步骤323,播放端缓存音频数据; Step 323, the player buffers the audio data;
步骤324,播放端按照缓存中的缓存顺序,播放缓存中的音频数据。Step 324: The playback end plays the audio data in the buffer according to the buffer order in the buffer.
在上述流程中,实现音频播放的数据输出端与视频图像播放的数据输出端可以为同一设备,也可以为不同设备;实现音频播放的播放端与视频图像播放的播放端可以为同一设备,也可以为不同设备。在缓存音频数据和/或视频图像数据时,数据缓存量可以根据数据传输延迟波动,动态地进行变化;或者,根据数据延迟波动的最大值,设定一个数据缓存量最大值。In the above process, the data output terminal for audio playback and the data output terminal for video image playback may be the same device or different devices; the playback terminal for audio playback and the playback terminal for video image playback may be the same device, or Can be for different devices. When buffering audio data and/or video image data, the data buffering amount can be dynamically changed according to the data transmission delay fluctuation; or, a maximum data buffering amount can be set according to the maximum value of the data delay fluctuation.
在图3所示实施例中,由于播放端按照缓存中的缓存顺序,播放缓存中的视频图像数据或音频数据,因此播放端接收某视频图像数据或某音频数据的时间节点,与播放端播放该视频图像数据或该音频数据的时间节点,可以为不同的时间节点。理论上,在进行播放前,缓存的音频数据或视频图像数据的数据量越大,在之后的播放过程中,由于数据传输延迟波动而产生的干扰也就越小。例如,如果将某视频的音频数据以及视频图像数据全部缓存完后,再进行播放,那么,在播放环节中,就不会出现由数据传输延迟波动而导致的卡顿、跳帧问题。In the embodiment shown in FIG. 3 , because the player plays the video image data or audio data in the cache according to the cache order in the cache, the player receives the time node of a certain video image data or a certain audio data, and the player plays a The time nodes of the video image data or the audio data may be different time nodes. Theoretically, before playing, the larger the data volume of buffered audio data or video image data, the less interference caused by fluctuations in data transmission delay during subsequent playing. For example, if the audio data and video image data of a certain video are all cached and then played, then in the playback process, there will be no problems of freezing and frame skipping caused by fluctuations in data transmission delay.
但是,由于数据被接收后,首先被缓存,播放时调用的是之前缓存的数据,而不是当前刚刚接收到的数据。因此,在数据被数据输出端输出后,直到该数据被播放时,在此之间,存在一个播放延迟。例如,在图2所示实施例的应用场景中,音频数据B23被手机23输出后,其被智能音箱24接收并缓存。在音频数据B23被缓存时,在缓存队列中,还存在之前缓存的、尚未被播放的其他音频数据(例如,B21、B22)。智能音箱24按照缓存队列中的排列顺序进行音频播放(按照B21、B22、B23的顺序),被在进行播放前,缓存的数据的数据量越大,播放延迟也就越大。However, since the data is first cached after being received, the previously cached data is called during playback, not the data that has just been received currently. Therefore, there is a playback delay between when the data is output by the data output until the data is played. For example, in the application scenario of the embodiment shown in FIG. 2 , after the audio data B23 is output by the mobile phone 23 , it is received and buffered by the smart speaker 24 . When the audio data B23 is buffered, other audio data (eg, B21 and B22 ) that have been buffered before and have not been played still exist in the buffer queue. The smart speaker 24 performs audio playback according to the arrangement order in the cache queue (in the order of B21, B22, and B23). Before being played, the larger the amount of buffered data, the greater the playback delay.
而在某些存在实时交互需求的应用场景中,播放延迟的增加,会大大降低用户体验。例如,在图1所示实施例中,假如无人机11的飞行姿态,由手机13的用户根据手机13所显示的无人机拍摄图像来进行操控。那么,如果从无人机11输出视频图像,到手机13接收到该视频图像并播放,之间的时间差过大,那么,用户根据手机13所显示的无人机拍摄图像所确认的无人机飞行姿态与无人机11的实时飞行姿态之间就存在偏差,用户也就无法顺利的进行飞行姿态操控。In some application scenarios that require real-time interaction, the increase in playback delay will greatly reduce the user experience. For example, in the embodiment shown in FIG. 1 , if the flying attitude of the drone 11 is assumed, the user of the mobile phone 13 controls the drone according to the image captured by the drone displayed on the mobile phone 13 . Then, if the time difference between outputting the video image from the drone 11 and receiving and playing the video image on the mobile phone 13 is too large, then the user confirms the drone according to the drone photographed image displayed on the mobile phone 13 . There is a deviation between the flight attitude and the real-time flight attitude of the UAV 11, and the user cannot control the flight attitude smoothly.
又例如,在图2所示实施例中,假如无人机21的飞行姿态,由手机23的用户根据大屏电视25所显示的无人机拍摄图像来进行操控。那么,即使不考虑无人机21输出视频图像,到手机23接收到该视频图像之间的延迟,如果从手机23输出视频图像,到该视频图像被大屏电视25接收并显示,之间的时间差过大,那么,用户根据大屏电视25所显示的无人机拍摄图像所确认的无人机飞行姿态与无人机21的实时飞行姿态之间也会存在偏差,用户也就无法顺利的进行飞行姿态操控。For another example, in the embodiment shown in FIG. 2 , if the flying attitude of the drone 21 is assumed, the user of the mobile phone 23 controls the drone according to the image captured by the drone displayed on the large-screen TV 25 . Then, even if the delay between the video image output by the drone 21 and the reception of the video image by the mobile phone 23 is not considered, if the video image is output from the mobile phone 23 and the video image is received and displayed by the large-screen TV 25, the difference between If the time difference is too large, then, there will be a deviation between the flight attitude of the drone confirmed by the user according to the image captured by the drone displayed on the large-screen TV 25 and the real-time flight attitude of the drone 21, and the user will not be able to smoothly Control the flight attitude.
又例如,图4所示为一音视频数据分别传输的应用场景示意图。如图4所示,用户A以及用户B使用各自的手机(手机43以及手机44)实现相互间的视频通话。为了优化视频通话效果,用户B通过手机43将视频通话的视频图像发送到大屏电视41显示,将视频通话的语音发送到智能音箱42播放。如果手机43输出视频图像到大屏电视41以及手机43输出语音到智能音箱42,与大屏电视41显示视频图像以及智能音箱42播放语音之间的播放延迟增加,用户A与用户B之间的交互延迟也就会随之增加,从而就会大大影响用户A与用户B之间视频通话的用户体验。For another example, FIG. 4 shows a schematic diagram of an application scenario of transmitting audio and video data respectively. As shown in FIG. 4 , user A and user B use their respective mobile phones (mobile phone 43 and mobile phone 44 ) to implement a video call with each other. In order to optimize the effect of the video call, user B sends the video image of the video call to the large-screen TV 41 for display through the mobile phone 43, and sends the voice of the video call to the smart speaker 42 for playback. If the mobile phone 43 outputs a video image to the large-screen TV 41 and the mobile phone 43 outputs a voice to the smart speaker 42, the playback delay between the display of the video image on the large-screen TV 41 and the playback of the voice on the smart speaker 42 increases, and the delay between user A and user B increases. The interaction delay will also increase, which will greatly affect the user experience of the video call between user A and user B.
因此,本申请一实施例提出了一种基于追赶策略的音视频播放方法,针对音视频播放场景进行识别,根据具体的应用场景选择对应的播放策略。Therefore, an embodiment of the present application proposes an audio and video playback method based on a catch-up strategy, which identifies audio and video playback scenarios, and selects a corresponding playback strategy according to specific application scenarios.
进一步的,针对存在实时交互需求的应用场景(交互类应用场景),本申请一实施例提出了一种基于追赶策略的音视频播放方法。具体的,根据应用场景的实时交互需求,预设交互场景播放延迟阈值。该交互场景播放延迟阈值为,在满足预期用户交互体验的基础上,输出设备输出音频数据和/或视频图像数据,与播放端播放音频数据和/或视频图像数据之间,播放延迟的最大值。例如,在图4所示的应用场景中,在用户B处,当视频图像和语音滞后时间增加量未超出150ms时,用户A与用户B的视频通话的用户体验不会受到影响。因此,设定手机43与大屏电视41之间、手机43与智能音箱42之间的交互场景播放延迟阈值为150ms。Further, for application scenarios with real-time interaction requirements (interactive application scenarios), an embodiment of the present application proposes an audio and video playback method based on a catch-up strategy. Specifically, according to the real-time interaction requirement of the application scenario, the playback delay threshold of the interaction scenario is preset. The interactive scene playback delay threshold is the maximum value of playback delay between the audio data and/or video image data output by the output device and the playback end of the audio data and/or video image data on the basis of satisfying the expected user interaction experience. . For example, in the application scenario shown in FIG. 4 , at user B, when the increase of the video image and voice lag time does not exceed 150ms, the user experience of the video call between user A and user B will not be affected. Therefore, the playback delay thresholds of the interaction scenes between the mobile phone 43 and the large-screen TV 41 and between the mobile phone 43 and the smart speaker 42 are set to be 150 ms.
当应用场景为交互类应用场景时,基于追赶策略进行音频播放和/或视频图像播放。在追赶策略中,调用预设的交互场景播放延迟阈值,在进行音频播放和/或视频图像播放的过程中,调整音频播放和/或视频图像播放,使得音频播放和/或视频图像播放的播放延迟小于等于交互场景播放延迟阈值。When the application scenario is an interactive application scenario, audio playback and/or video image playback is performed based on the catch-up strategy. In the catch-up strategy, the preset interactive scene playback delay threshold is called, and in the process of audio playback and/or video image playback, the audio playback and/or video image playback are adjusted to make the audio playback and/or video image playback playback. The delay is less than or equal to the interactive scene playback delay threshold.
图5所示为根据本申请一实施例的音视频播放方法流程图。音视频播放方法由音视频播放系统(包含音视频数据输出设备、音频播放端以及视频图像播放端)中的一台或多台设备执行。如图5所示:FIG. 5 is a flowchart of a method for playing audio and video according to an embodiment of the present application. The audio and video playback method is executed by one or more devices in the audio and video playback system (including audio and video data output devices, audio playback terminals, and video image playback terminals). As shown in Figure 5:
步骤500,识别当前应用场景; Step 500, identifying the current application scenario;
步骤501,判断当前应用场景是否为交互类应用场景; Step 501, judging whether the current application scene is an interactive application scene;
在当前应用场景为交互类应用场景时,执行步骤502;When the current application scenario is an interactive application scenario, step 502 is performed;
步骤502,调用追赶策略; Step 502, calling the catch-up strategy;
步骤520,采用追赶策略进行音频播放和/或视频图像播放,包括: Step 520, using a catch-up strategy to play audio and/or video images, including:
步骤521,调用预设的交互场景播放延迟阈值; Step 521, calling a preset interactive scene playback delay threshold;
步骤522,调整音频播放和/或视频图像播放,令音频播放和/或视频图像播放的播放进度追赶音视频数据输出的进度,以使得音频播放和/或视频图像播放相较于音视频数据输出的播放延迟小于等于交互场景播放延迟阈值。 Step 522, adjust audio playback and/or video image playback, so that the playback progress of audio playback and/or video image playback catches up with the progress of audio and video data output, so that audio playback and/or video image playback is compared to audio and video data output. The playback delay is less than or equal to the interactive scene playback delay threshold.
根据本申请实施例的方法,针对音视频播放场景进行识别,根据具体的应用场景选择对应的播放策略,可以大大提高音视频播放的用户体验。According to the method of the embodiment of the present application, the audio and video playback scenarios are identified, and the corresponding playback strategy is selected according to the specific application scenarios, which can greatly improve the user experience of audio and video playback.
进一步的,根据本申请实施例的方法,针对交互类应用场景采用追赶策略进行音视频播放,可以确保音视频播放的播放延迟满足交互类应用场景的应用需求,从而大大提高交互类应用场景的用户体验。Further, according to the method of the embodiment of the present application, using a catch-up strategy for audio and video playback in interactive application scenarios can ensure that the playback delay of audio and video playback meets the application requirements of interactive application scenarios, thereby greatly improving the user experience of interactive application scenarios. experience.
具体的,本申请一实施例还提出了一种音视频播放系统。图6所示为根据本申请一实施例的音视频播放系统的结构框图。如图6所示,音频输出设备601(例如,声音采集设备)输出音频数据;视频图像输出设备602(例如,图像采集设备)输出视频图像数据。音频数据以及视频图像数据被输出到播放设备603(例如,手机),播放设备603同步播放接收到的音频数据以及视频图像数据。用户通过播放设备603实现视频的视听。Specifically, an embodiment of the present application also proposes an audio and video playback system. FIG. 6 is a structural block diagram of an audio and video playback system according to an embodiment of the present application. As shown in FIG. 6 , an audio output device 601 (eg, a sound capture device) outputs audio data; a video image output device 602 (eg, an image capture device) outputs video image data. The audio data and video image data are output to a playback device 603 (eg, a mobile phone), and the playback device 603 plays the received audio data and video image data synchronously. The user realizes the viewing and listening of the video through the playback device 603 .
在实际应用场景中,针对图6所示的多输出设备(采集设备)、单播放设备的应用场景,步骤500、步骤501以及步骤502可以由播放设备603(例如,手机)执行,当播放判定当前应用场景为交互类应用场景时,其调用追赶策略(调用预设的交互场景播放延迟阈值),基于追赶策略进行播放。In an actual application scenario, for the application scenario of multiple output devices (capture devices) and single playback devices shown in FIG. 6 , steps 500 , 501 and 502 can be performed by the playback device 603 (for example, a mobile phone). When the current application scene is an interactive application scene, it invokes the catch-up strategy (calls the preset interactive scene playback delay threshold), and performs playback based on the catch-up strategy.
例如,在图1所示的应用场景中,手机13判断当前应用场景为交互类应用场景(实时控制无人机11的飞行姿态)。手机13调用预设的交互场景播放延迟阈值,基于交互场景播放延迟阈值实现追赶策略,基于追赶策略播放来自麦克风12的音频数据以及无人机11的视频图像数据。For example, in the application scenario shown in FIG. 1 , the mobile phone 13 determines that the current application scenario is an interactive application scenario (real-time control of the flight attitude of the drone 11 ). The mobile phone 13 invokes the preset interactive scene playback delay threshold, implements a catch-up strategy based on the interactive scene playback delay threshold, and plays audio data from the microphone 12 and video image data from the drone 11 based on the catch-up strategy.
具体的,本申请一实施例还提出了一种音视频播放系统。图7所示为根据本申请一实施例的音视频播放系统的结构框图。如图7所示,音视频输出设备701(例如,手机)同步输出同一视频(例如,视频通话的视频)的音频数据以及视频图像数据,其中,音频数据被输出到音频播放设备702(例如,智能音箱),视频图像数据被输出到视频图像播放设备703(例如,大屏显示设备)。音频播放设备702播放接收到的音频文件,视频图像播放设备703播放接收到的视屏图像文件。用户通过观看视频图像播放设备703以及收听音频播放设备702来实现视频的视听。Specifically, an embodiment of the present application also proposes an audio and video playback system. FIG. 7 is a structural block diagram of an audio and video playback system according to an embodiment of the present application. As shown in FIG. 7 , an audio and video output device 701 (eg, a mobile phone) synchronously outputs audio data and video image data of the same video (eg, a video of a video call), wherein the audio data is output to the audio playback device 702 (eg, smart speakers), the video image data is output to the video image playback device 703 (eg, a large-screen display device). The audio playing device 702 plays the received audio file, and the video image playing device 703 plays the received video image file. The user realizes the viewing and listening of the video by watching the video image playing device 703 and listening to the audio playing device 702 .
在实际应用场景中,针对图7所示的单输出设备、多播放设备的应用场景,步骤500、步骤501以及步骤502可以由音视频输出设备701(例如,手机)执行,当音视频输出设备701判定当前应用场景为交互类应用场景时,其调用追赶策略(包含交互场景播放延迟阈值),并将追赶策略(包含交互场景播放延迟阈值)下发到音频播放设备702和/或视频图像播放设备703;音频播放设备702和/或视频图像播放设备703基于获取到的交互场景播放延迟阈值实现追赶策略。In an actual application scenario, for the application scenario of a single output device and multiple playback devices shown in FIG. 7 , steps 500 , 501 and 502 can be performed by an audio and video output device 701 (eg, a mobile phone). When the audio and video output device 701 When determining that the current application scene is an interactive application scene, it invokes the catch-up strategy (including the interactive scene playback delay threshold), and sends the catch-up strategy (including the interactive scene playback delay threshold) to the audio playback device 702 and/or video image playback The device 703; the audio playback device 702 and/or the video image playback device 703 implements a catch-up strategy based on the acquired interactive scene playback delay threshold.
以下实施例主要基于图7所示的单输出设备、多播放设备的音视频播放系统进行描述,这里需要说明的是,下述实施例的各个步骤,也可以应用于图6所示的多输出设备(采集设备)、单播放设备的音视频播放系统。The following embodiments are mainly described based on the audio and video playback system with a single output device and multiple playback devices shown in FIG. 7 . It should be noted here that the steps in the following embodiments can also be applied to the multiple output device shown in FIG. 6 . Equipment (collection equipment), audio and video playback system of a single playback device.
例如,在图2所示的应用场景中,手机23判断当前应用场景为交互类应用场景(实时控制无人机21的飞行姿态)。手机23调用预设的交互场景播放延迟阈值,并将交互场景播放延迟阈值下发到大屏电视25;大屏电视25基于获取到的交互场景播放延迟阈值实现追赶策略,基于追赶策略播放来自手机23的视频图像数据。For example, in the application scenario shown in FIG. 2 , the mobile phone 23 determines that the current application scenario is an interactive application scenario (real-time control of the flight attitude of the drone 21 ). The mobile phone 23 invokes the preset interactive scene playback delay threshold, and sends the interactive scene playback delay threshold to the large-screen TV 25; the large-screen TV 25 implements a catch-up strategy based on the acquired interactive scene playback delay threshold, and plays the data from the mobile phone based on the catch-up strategy. 23 video image data.
又例如,在图4所示的应用场景中,手机43判断当前应用场景为交互类应用场景(与 手机44进行视频通话)。手机43调用预设的交互场景播放延迟阈值,并将交互场景播放延迟阈值下发到大屏电视41以及智能音箱42;大屏电视41基于获取到的交互场景播放延迟阈值实现追赶策略,基于追赶策略播放来自手机23的视频图像数据;智能音箱42基于获取到的交互场景播放延迟阈值实现追赶策略,基于追赶策略播放来自手机23的音频数据。For another example, in the application scenario shown in FIG. 4 , the mobile phone 43 determines that the current application scenario is an interactive application scenario (making a video call with the mobile phone 44). The mobile phone 43 calls the preset interactive scene playback delay threshold, and sends the interactive scene playback delay threshold to the large-screen TV 41 and the smart speaker 42; the large-screen TV 41 implements a catch-up strategy based on the acquired interactive scene playback delay threshold, and based on the catch-up strategy The strategy plays the video image data from the mobile phone 23 ; the smart speaker 42 implements a catch-up strategy based on the acquired interactive scene playback delay threshold, and plays the audio data from the mobile phone 23 based on the catch-up strategy.
进一步的,在某些不存在实时交互需求的应用场景中,播放延迟对用户体验的影响并不明显,播放延迟并不需要维持在一个较低的水准。Further, in some application scenarios without real-time interaction requirements, the impact of playback delay on user experience is not obvious, and playback delay does not need to be maintained at a low level.
例如,在图1所示的应用场景中,用户仅使用手机13对小区进行指定路线的虚拟游览,不会操控无人机11。因此,麦克风12输出音频,与手机13播放音频之间的播放延迟,以及,无人机11输出视频图像,与手机13播放视频图像之间的播放延迟,并不会明显影响用户进行虚拟游览的用户体验。For example, in the application scenario shown in FIG. 1 , the user only uses the mobile phone 13 to perform a virtual tour of the cell with the designated route, and does not control the drone 11 . Therefore, the playback delay between the audio output from the microphone 12 and the playback of the audio from the mobile phone 13, and the playback delay between the video image output from the drone 11 and the playback of the video image from the mobile phone 13 will not significantly affect the user's ability to perform virtual tours. user experience.
又例如,图8所示为一音视频数据分别传输的应用场景示意图。如图8所示,手机63连接到大屏电视61以及智能音箱62。如果用户使用手机63观看视频(例如,图1所示实施例中,将无人机11采集的视频图像以及麦克风12采集的音频整合后生成的视频)。手机63播放视频,为优化播放效果,手机63并不通过自身的喇叭以及屏幕输出音频与视频图像,而是将所播放的视频的视频图像发送到大屏电视61显示,将所播放的视频的音频发送到智能音箱62播放。手机63输出视频图像以及音频,与大屏电视61显示视频图像以及智能音箱62播放音频之间的播放延迟,并不会明显影响视频观看的用户体验。For another example, FIG. 8 is a schematic diagram of an application scenario of transmitting audio and video data respectively. As shown in FIG. 8 , the mobile phone 63 is connected to the large-screen TV 61 and the smart speaker 62 . If the user uses the mobile phone 63 to watch a video (for example, in the embodiment shown in FIG. 1 , a video generated by integrating the video image collected by the drone 11 and the audio collected by the microphone 12 ). The mobile phone 63 plays the video. In order to optimize the playing effect, the mobile phone 63 does not output audio and video images through its own speakers and screen, but sends the video image of the played video to the large-screen TV 61 for display, and the video image of the played video is displayed. The audio is sent to the smart speaker 62 for playback. The playback delay between the video image and audio output by the mobile phone 63 and the video image displayed by the large-screen TV 61 and the audio playback by the smart speaker 62 will not significantly affect the user experience of video viewing.
因此,针对不存在实时交互需求的应用场景(非交互类应用场景),本申请一实施例提出了一种基于平滑播放策略的音视频播放方法。Therefore, for application scenarios without real-time interaction requirements (non-interactive application scenarios), an embodiment of the present application proposes an audio and video playback method based on a smooth playback strategy.
具体的,根据应用场景的数据传输延迟波动状况,设定数据缓存值。具体的,数据缓存值与数据传输延迟波动的最大值相匹配,或者,数据缓存值根据数据传输延迟波动进行实时调整。当应用场景为非交互类应用场景时,基于平滑播放策略进行音频播放和/或视频图像播放。在平滑播放策略中,基于数据缓存值对音频数据以及视频图像数据进行缓存播放,以确保在数据传输延迟波动时,音视频数据可以平滑播放。Specifically, the data cache value is set according to the data transmission delay fluctuation situation of the application scenario. Specifically, the data cache value matches the maximum value of the data transmission delay fluctuation, or the data cache value is adjusted in real time according to the data transmission delay fluctuation. When the application scenario is a non-interactive application scenario, audio playback and/or video image playback is performed based on the smooth playback strategy. In the smooth playback strategy, the audio data and video image data are cached and played based on the data cache value to ensure that the audio and video data can be played smoothly when the data transmission delay fluctuates.
如图5所示:As shown in Figure 5:
在当前应用场景不为交互类应用场景(非交互类场景)时,执行步骤503;When the current application scenario is not an interactive application scenario (non-interactive scenario), perform step 503;
步骤503,调用平滑播放策略; Step 503, calling the smooth playback strategy;
步骤510,采用平滑播放策略进行音频播放和/或视频图像播放。 Step 510, using a smooth playback strategy to play audio and/or video images.
在实际应用场景中,针对多输出设备(采集设备)、单播放设备的应用场景,步骤503可以由播放设备(例如,手机)执行,当播放判定当前应用场景为非交互类应用场景时,其调用平滑播放策略,基于平滑播放策略进行播放。In an actual application scenario, for an application scenario of multiple output devices (capture devices) and a single playback device, step 503 may be performed by a playback device (eg, a mobile phone). When the playback determines that the current application scenario is a non-interactive application scenario, the Call the smooth playback strategy and play based on the smooth playback strategy.
根据本申请实施例的方法,针对非交互类应用场景采用追赶策略进行音视频播放,可以确保音视频播放的流畅度满足非交互类应用场景的应用需求,从而大大提高交互类应用场景的用户体验。According to the method of the embodiment of the present application, the catch-up strategy is used to play audio and video for non-interactive application scenarios, which can ensure that the smoothness of audio and video playback meets the application requirements of non-interactive application scenarios, thereby greatly improving the user experience of interactive application scenarios. .
例如,在图1所示的应用场景中,手机13判断当前应用场景为非交互类应用场景(单纯的视频观看,并不控制无人机11的飞行姿态)。手机13调用平滑播放策略,基于平滑播放策略播放来自麦克风12的音频数据以及无人机11的视频图像数据。For example, in the application scenario shown in FIG. 1 , the mobile phone 13 determines that the current application scenario is a non-interactive application scenario (simple video viewing, without controlling the flight attitude of the drone 11 ). The mobile phone 13 invokes the smooth playback strategy, and plays the audio data from the microphone 12 and the video image data of the drone 11 based on the smooth playback strategy.
在实际应用场景中,针对单输出设备、多播放设备的应用场景,步骤503可以由音视 频输出设备(例如,手机)执行,当音视频输出设备判定当前应用场景为交互类应用场景时,其调用交互场景播放延迟阈值,并将交互场景播放延迟阈值下发到音频播放端和/或视频图像播放端;音频播放端和/或视频图像播放端基于获取到的交互场景播放延迟阈值实现追赶策略。In an actual application scenario, for the application scenario of a single output device and multiple playback devices, step 503 can be performed by an audio and video output device (for example, a mobile phone). When the audio and video output device determines that the current application scenario is an interactive application scenario, its Invoke the interactive scene playback delay threshold, and deliver the interactive scene playback delay threshold to the audio player and/or the video image player; the audio player and/or the video image player implement a catch-up strategy based on the acquired interactive scene playback delay threshold .
例如,在图8所示的应用场景中,手机63判断当前应用场景为非交互类应用场景(单纯的视频播放场景)。手机63调用平滑播放策略,并将平滑播放策略下发到大屏电视61以及智能音箱62;大屏电视61基于平滑播放策略播放来自手机63的视频图像数据;智能音箱62基于平滑播放策略播放来自手机63的视频图像数据。For example, in the application scenario shown in FIG. 8 , the mobile phone 63 determines that the current application scenario is a non-interactive application scenario (simple video playback scenario). The mobile phone 63 invokes the smooth playback strategy and delivers the smooth playback strategy to the large-screen TV 61 and the smart speaker 62; the large-screen TV 61 plays the video image data from the mobile phone 63 based on the smooth playback strategy; the smart speaker 62 plays the video data from the mobile phone 63 based on the smooth playback strategy. Video image data of the mobile phone 63 .
在实际应用场景中,本领域的技术人员可以采用多种不同的实现方式实现平滑播放策略。例如,在平滑播放策略的一种实现方式中,根据应用场景的数据传输延迟波动状况,设定数据缓存值。在播放过程中,监控数据缓存量以实现追赶策略,在当前的缓存数据量达到数据缓存值时,再次协商增大缓冲。例如,当视频数据的缓存量达到视频数据的数据缓存值时,视频播放设备端上报中心设备(音视频输出设备);中心设备(音视频输出设备)监控到变化,重新协商给视频播放设备、音频播放设备下发新的数据缓存值。In practical application scenarios, those skilled in the art can implement the smooth playback strategy by adopting various implementation manners. For example, in an implementation manner of the smooth playback strategy, the data cache value is set according to the fluctuation situation of the data transmission delay in the application scenario. During the playback process, the amount of data buffering is monitored to implement the catch-up strategy, and when the current amount of buffered data reaches the data buffering value, it is negotiated again to increase the buffering. For example, when the buffered amount of video data reaches the data buffered value of video data, the video playback device reports the central device (audio and video output device); the central device (audio and video output device) monitors the change and renegotiates to the video playback device, The audio playback device issues a new data cache value.
在实际应用场景中,本领域的技术人员同样可以采用多种不同的实现方式实现追赶策略。In practical application scenarios, those skilled in the art can also adopt various implementation manners to implement the catch-up strategy.
以音频播放为例,本申请一实施例提出了一种音频播放方法,包括,当音频播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;基于交互场景播放延迟阈值进行音频播放,在音频播放的执行过程中,调整音频播放,令音频播放的播放进度追赶音视频数据输出的进度,以使得音频播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。Taking audio playback as an example, an embodiment of the present application proposes an audio playback method, including: when the playback strategy of audio playback is a catch-up strategy, obtaining a preset interactive scene playback delay threshold; Play, during the execution of audio playback, adjust the audio playback so that the playback progress of the audio playback catches up with the progress of the audio and video data output, so that the playback delay of the audio playback compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
具体的,在针对音频播放的追赶策略的一种实现方式中,在进行音频播放的过程中,直接监控音频播放相对于音视频数据输出的播放延迟;当音频播放的播放延迟超出交互场景播放延迟阈值时,调整音频播放,令音频播放端基于音频缓存数据进行播放的播放进度追赶音视频数据输出的进度,以减小音频播放的播放延迟,使得音频播放的播放延迟小于等于交互场景播放延迟阈值。Specifically, in an implementation of the catch-up strategy for audio playback, during the audio playback process, the playback delay of the audio playback relative to the audio and video data output is directly monitored; when the playback delay of the audio playback exceeds the playback delay of the interactive scene When the threshold is reached, adjust the audio playback so that the playback progress of the audio playback terminal based on the audio cache data catches up with the audio and video data output progress, so as to reduce the playback delay of the audio playback, so that the playback delay of the audio playback is less than or equal to the interactive scene playback delay threshold. .
图9所示为根据本申请一实施例的音频播放方法流程图。在步骤522的一种实现方式中,音频播放端执行如图9所示的下述流程:FIG. 9 is a flowchart of an audio playback method according to an embodiment of the present application. In an implementation of step 522, the audio player executes the following process as shown in Figure 9:
步骤910,监控音频播放相对于音视频数据输出的播放延迟; Step 910, monitor the playback delay of audio playback relative to the audio and video data output;
步骤920,判断音频播放的播放延迟是否超出预设的交互场景播放延迟阈值; Step 920, judging whether the playback delay of audio playback exceeds a preset interactive scene playback delay threshold;
当音频播放的播放延迟未超出预设的交互场景播放延迟阈值时,返回步骤910;When the playback delay of audio playback does not exceed the preset interactive scene playback delay threshold, return to step 910;
当音频播放的播放延迟超出预设的交互场景播放延迟阈值时,执行步骤930;When the playback delay of audio playback exceeds the preset interactive scene playback delay threshold, perform step 930;
步骤930,调整音频播放,令音频播放端基于音频缓存数据进行播放的播放进度追赶音视频数据输出的进度,以减小音频播放的播放延迟,使得音频播放的播放延迟小于等于交互场景播放延迟阈值。 Step 930, adjust the audio playback, so that the playback progress of the audio playback terminal based on the audio buffer data to play catches up with the audio and video data output progress, to reduce the playback delay of the audio playback, so that the playback delay of the audio playback is less than or equal to the interactive scene playback delay threshold. .
在实际应用场景中,本领域的技术人员可以采用多种不同的方案实现步骤910。例如,输出设备在输出音频数据时,在音频数据中加入时间标记,播放端在播放音频数据时,对比正在播放的音频数据中的时间标记与当前时刻,从而计算播放延迟。In practical application scenarios, those skilled in the art can implement step 910 by adopting various solutions. For example, when outputting audio data, the output device adds a time stamp to the audio data, and when playing the audio data, the playback end compares the time stamp in the audio data being played with the current moment to calculate the playback delay.
进一步的,音频播放端按照音频缓存内的音频缓存数据的存放顺序进行播放,如果音 频缓存中的音频数据被播放后即被移出缓存,那么音频缓存中缓存的数据量即可以代表数据输出端输出音频数据的进度与音频播放端播放音频数据的进度之间的差异。因此,为简化操作流程,降低方案实现难度,在追赶策略的一种实现方式中,监控音频数据缓存量以实现追赶策略。Further, the audio playback terminal plays according to the storage order of the audio buffer data in the audio buffer. If the audio data in the audio buffer is moved out of the buffer after being played, then the amount of data buffered in the audio buffer can be output on behalf of the data output terminal. The difference between the progress of the audio data and the progress of playing the audio data by the audio player. Therefore, in order to simplify the operation process and reduce the difficulty of implementing the solution, in an implementation manner of the catch-up strategy, the buffered amount of audio data is monitored to implement the catch-up strategy.
具体的,根据预设的交互场景播放延迟阈值,确定对应的缓存数据量阈值,在当前的音频缓存数据量超出缓存数据量阈值时,即可判定音频播放延迟超出交互场景播放延迟阈值。例如,假设交互场景播放延迟阈值为150ms,根据音频播放速度,可计算音频对应的缓存数据量阈值为4帧。Specifically, the corresponding buffered data volume threshold is determined according to the preset interactive scene playback delay threshold, and when the current audio buffered data volume exceeds the buffered data volume threshold, it can be determined that the audio playback delay exceeds the interactive scene playback delay threshold. For example, assuming that the interactive scene playback delay threshold is 150ms, according to the audio playback speed, the buffered data volume threshold corresponding to the audio can be calculated to be 4 frames.
图10所示为根据本申请一实施例的音频播放方法流程图。在步骤522的一种实现方式中,音频播放端执行如图10所示的下述流程:FIG. 10 is a flowchart of an audio playback method according to an embodiment of the present application. In an implementation of step 522, the audio player executes the following process as shown in Figure 10:
步骤1010,监控音频播放的未播放数据的数据缓存量; Step 1010, monitor the data buffer amount of unplayed data of audio playback;
步骤1020,判断音频数据缓存量是否超出预设的音频数据缓存阈值,其中,数据缓存阈值为根据交互场景播放延迟阈值所确定的值; Step 1020, judging whether the audio data buffering amount exceeds a preset audio data buffering threshold, wherein the data buffering threshold is a value determined according to the interactive scene playback delay threshold;
当音频数据缓存量未超出预设的音频数据缓存阈值时,返回步骤1010;When the audio data buffering amount does not exceed the preset audio data buffering threshold, return to step 1010;
当音频数据缓存量超出预设的音频数据缓存阈值时,执行步骤1030;When the audio data buffering amount exceeds the preset audio data buffering threshold, perform step 1030;
步骤1030,调整未播放音频数据的音频数据缓存量,使得音频数据缓存量小于等于预设的音频数据缓存阈值。Step 1030: Adjust the audio data buffering amount of the unplayed audio data, so that the audio data buffering amount is less than or equal to a preset audio data buffering threshold.
在实际应用场景中,本领域的技术人员可以采用多种不同的方案实现步骤522。例如,在音频播放端基于音频缓存数据进行播放时,并不按照缓存中数据存放次序依次读取音频数据播放,而是采用跳跃方式读取音频数据播放。In practical application scenarios, those skilled in the art can implement step 522 by adopting various solutions. For example, when the audio playback end plays based on the audio buffer data, the audio data is not read and played in sequence according to the storage order of the data in the buffer, but the audio data is read and played in a skip mode.
进一步的,为了简化操作流程,降低方案实现难度,在步骤522的一种实施方式中,当需要令音频播放端的播放进度追赶音视频数据输出的进度时,删除音频播放端缓存中的全部或部分未播放音频数据,以使得播放端的播放跳过被删除的数据,从而使得音频播放端的音频播放进度追赶音视频数据输出的进度。其中,音频数据的删除量与当前音频播放的播放进度相对应,以使得音频播放端的播放进度满足交互场景播放延迟阈值。Further, in order to simplify the operation process and reduce the difficulty of implementing the scheme, in an embodiment of step 522, when the playback progress of the audio player needs to be made to catch up with the progress of the audio and video data output, delete all or part of the cache of the audio player. The audio data is not played, so that the playback of the player side skips the deleted data, so that the audio playback progress of the audio player side catches up with the output progress of the audio and video data. The deletion amount of the audio data corresponds to the playback progress of the current audio playback, so that the playback progress of the audio playback terminal satisfies the interactive scene playback delay threshold.
例如,在步骤930的一种实现方式中,未播放音频数据的删除量与播放延迟超出交互场景播放延迟阈值的值相对应。For example, in one implementation of step 930, the deletion amount of unplayed audio data corresponds to a value where the playback delay exceeds a playback delay threshold of the interactive scene.
又例如,在步骤1030的一种实施方式中,删除音频缓存中的数据,使得音频数据缓存量小于等于音频缓存数据量阈值。For another example, in an embodiment of step 1030, the data in the audio buffer is deleted, so that the audio data buffer volume is less than or equal to the audio buffer data volume threshold.
进一步的,在步骤522的具体实现过程中,本领域的技术人员可以采用多种不同的方案删除缓存中的未播放音频数据。例如,删除音频缓存中最早被存入的未播放音频数据;又例如,从音频缓存中的未播放音频数据中随机挑选需要删除的数据;又例如,按照固定间隔,从音频缓存中的未播放音频数据中挑选需要删除的数据。Further, in the specific implementation process of step 522, those skilled in the art may adopt various different schemes to delete the unplayed audio data in the cache. For example, delete the earliest unplayed audio data stored in the audio buffer; another example, randomly select the data to be deleted from the unplayed audio data in the audio buffer; Select the data to be deleted from the audio data.
进一步的,为了尽可能的保留音频中所携带的信息,避免删除缓存中的音频数据而导致信息丢失,在步骤522的一种实施方式中,监控音频缓存中音频数据的波形、频率,在需要删除缓存中的音频数据时,优先删除对于人耳不敏感的音频数据。Further, in order to keep the information carried in the audio as much as possible, to avoid information loss caused by deleting the audio data in the cache, in an embodiment of step 522, the waveform and frequency of the audio data in the audio cache are monitored, and when necessary When deleting audio data in the cache, preferentially delete audio data that is not sensitive to human ears.
例如,正常人能感受的声频范围是20~20000Hz,但人耳对频率在1000~3000Hz范围内声音最敏感。为了尽量减小同步过程音频丢帧引起的失真,根据音频帧对人耳的敏感优先级来选择性丢弃。For example, the sound frequency range that normal people can feel is 20-20000Hz, but the human ear is most sensitive to sound in the frequency range of 1000-3000Hz. In order to minimize the distortion caused by the audio frame dropping during the synchronization process, the audio frames are selectively dropped according to the sensitivity priority of the human ear.
在分布式场景中,传输的音频帧的格式一般为高级音频编码(Advanced Audio Coding,AAC)。在对AAC音频数据解码过程中,提取出该帧数据的频域信息(一般通过快速傅里叶变换),分析出此帧音频对于人耳的敏感程度(比如3000Hz最敏感,敏感程度用100表示;20000Hz最不敏感,敏感程度用0表示)。In a distributed scenario, the format of the transmitted audio frame is generally Advanced Audio Coding (AAC). In the process of decoding AAC audio data, the frequency domain information of the frame data is extracted (usually through fast Fourier transform), and the sensitivity of this frame of audio to the human ear is analyzed (for example, 3000Hz is the most sensitive, and the sensitivity is represented by 100). ; 20000Hz is the least sensitive, and the sensitivity is represented by 0).
具体的,在音频处理的主控模块开始运行后,主控模块将AAC比特流(AAC Stream)的一部分放入输入缓冲区,通过查找同步字得到一帧的起始,找到后,根据ISO/IEC 13818-7所述的语法开始进行解码。主控模块的主要任务是操作输入输出缓冲区,调用其它各模块协同工作。其中,输入输出缓冲区均由数字信号处理(digital signal processing,DSP)控制模块提供接口。输出缓冲区中存放的数据为解码出来的脉冲编码调制(Pulse Code Modulation,PCM)数据,代表了声音的振幅。它由一块固定长度的缓冲区构成,通过调用DSP控制模块的接口函数,得到头指针,在完成输出缓冲区的填充后,调用中断处理输出至I2S接口所连接的音频模/数转换器(Analog-to-Digital Converter,ADC)芯片(立体声音频数/模转换器(Digital to analog converter,DAC)和直接驱动(DirectDrive)耳机放大器)输出模拟声音。Specifically, after the main control module of audio processing starts to run, the main control module puts a part of the AAC bit stream (AAC Stream) into the input buffer, and obtains the start of a frame by searching for the synchronization word. The syntax described in IEC 13818-7 starts decoding. The main task of the main control module is to operate the input and output buffers and call other modules to work together. The input and output buffers are provided with interfaces by a digital signal processing (digital signal processing, DSP) control module. The data stored in the output buffer is the decoded Pulse Code Modulation (PCM) data, which represents the amplitude of the sound. It consists of a fixed-length buffer. By calling the interface function of the DSP control module, the head pointer is obtained. After the output buffer is filled, the interrupt processing is called to output the audio analog/digital converter (Analog) connected to the I2S interface. -to-Digital Converter, ADC) chip (stereo audio digital/analog converter (Digital to analog converter, DAC) and direct drive (DirectDrive) headphone amplifier) output analog sound.
图11所示为AAC音频数据解码流程示意图。如图11所示,AAC Stream经历:FIG. 11 is a schematic diagram showing the decoding process of AAC audio data. As shown in Figure 11, AAC Stream goes through:
步骤1101,无噪解码(Noisless Decoding),无噪编码就是哈夫曼编码,它的作用在于进一步减少尺度因子和量化后频谱的冗余,即将尺度因子和量化后的频谱信息进行哈夫曼编码; Step 1101, noiseless decoding (Noisless Decoding), noiseless coding is Huffman coding, and its function is to further reduce the redundancy of the scale factor and the quantized spectrum, that is, to perform Huffman coding on the scale factor and the quantized spectrum information. ;
步骤1102,反量化(Dequantize); Step 1102, Dequantize;
步骤1103,联合立体声(Joint Stereo),联合立体声的是对原来的取样进行的一定的渲染工作,使声音更“好听”; Step 1103, joint stereo (Joint Stereo), joint stereo is a certain rendering work performed on the original sample, so that the sound is more "good";
步骤1104,知觉噪声替换(perceptual noise substitution,PNS); Step 1104, perceptual noise substitution (PNS);
步骤1105,瞬时噪声整形(temporary noise shaping,TNS); Step 1105, transient noise shaping (temporary noise shaping, TNS);
步骤1106,反离散余弦变换(IMDCT); Step 1106, inverse discrete cosine transform (IMDCT);
步骤1107,频段复制(Spectral Band Replication,SBR); Step 1107, frequency band replication (Spectral Band Replication, SBR);
之后,得到左右声道的PCM码流,再由主控模块将其放入输出缓冲区输出到声音播放设备。After that, the PCM code stream of the left and right channels is obtained, and then the main control module puts it into the output buffer and outputs it to the sound playback device.
在需要删除缓存中的音频数据时,先统计出当前音频缓冲队列里的各音频帧的敏感分布情况,再计算出丢哪些帧。比如敏感程度60以下的帧占50%,如果当前需要丢弃一半音频帧,就丢弃敏感程度60以下的帧。When the audio data in the cache needs to be deleted, the sensitive distribution of each audio frame in the current audio buffer queue is calculated first, and then which frames are lost. For example, frames with a sensitivity level below 60 account for 50%. If half of the audio frames need to be discarded at present, the frames with a sensitivity level below 60 are discarded.
进一步的,可以采用与上述针对音频播放的追赶策略实现方式相类似的方式,实现针对视频图像播放的追赶策略。例如,本申请一实施例还提出了一种视频图像播放方法,包括,当视频图像播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;基于交互场景播放延迟阈值进行视频图像播放,在视频图像播放的执行过程中,调整视频图像播放,令视频图像播放的播放进度追赶音视频数据输出的进度,以使得视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。Further, the catch-up strategy for video image playback can be implemented in a manner similar to the above-mentioned implementation manner of the catch-up strategy for audio playback. For example, an embodiment of the present application also proposes a video image playback method, including: when the playback strategy of video image playback is a catch-up strategy, acquiring a preset interactive scene playback delay threshold; Play, during the execution of the video image playback, adjust the video image playback, so that the playback progress of the video image playback catches up with the progress of the audio and video data output, so that the playback delay of the video image playback compared to the audio and video data output is less than or equal to the preset. The interactive scene playback delay threshold.
具体的,在针对视频播放的追赶策略的一种实现方式中,在进行视频图像播放的过程中,直接监控视频图像播放相对于音视频数据输出的播放延迟;当视频图像播放的播放延迟超出交互场景播放延迟阈值时,调整视频图像播放,令视频图像播放端基于视频图像缓 存数据进行播放的播放进度追赶音视频数据输出的进度,以减小视频图像播放的播放延迟,使得视频图像播放的播放延迟小于等于交互场景播放延迟阈值。Specifically, in an implementation of the catch-up strategy for video playback, during the video image playback process, the playback delay of the video image playback relative to the audio and video data output is directly monitored; when the playback delay of the video image playback exceeds the interaction When the scene playback delay threshold is set, adjust the video image playback so that the playback progress of the video image playback terminal based on the video image cache data catches up with the audio and video data output progress, so as to reduce the playback delay of the video image playback and make the playback of the video image playback. The delay is less than or equal to the interactive scene playback delay threshold.
图12所示为根据本申请一实施例的视频图像播放方法流程图。在步骤522的一种实现方式中,视频图像播放端执行如图12所示的下述流程:FIG. 12 is a flowchart of a method for playing a video image according to an embodiment of the present application. In an implementation of step 522, the video image player performs the following process as shown in Figure 12:
步骤1110,监控视频图像播放相对于音视频数据输出的播放延迟; Step 1110, monitor the playback delay of the video image playback relative to the audio and video data output;
步骤1120,判断视频图像播放的播放延迟是否超出预设的交互场景播放延迟阈值; Step 1120, judging whether the playback delay of the video image playback exceeds a preset interactive scene playback delay threshold;
当视频图像播放的播放延迟未超出预设的交互场景播放延迟阈值时,返回步骤1110;When the playback delay of the video image playback does not exceed the preset interactive scene playback delay threshold, return to step 1110;
当视频图像播放的播放延迟超出预设的交互场景播放延迟阈值时,执行步骤1130;When the playback delay of the video image playback exceeds the preset interactive scene playback delay threshold, perform step 1130;
步骤1130,调整视频图像播放,令视频图像播放端基于视频图像缓存数据进行播放的播放进度追赶音视频数据输出的进度,以减小视频图像播放的播放延迟,使得视频图像播放的播放延迟小于等于交互场景播放延迟阈值。 Step 1130, adjust the video image playback, so that the playback progress of the video image playback terminal based on the video image cache data to catch up with the audio and video data output progress, to reduce the playback delay of the video image playback, so that the playback delay of the video image playback is less than or equal to Interaction scene playback delay threshold.
在实际应用场景中,本领域的技术人员可以采用多种不同的方案实现步骤1110。例如,输出设备在输出视频图像数据时,在视频图像数据中加入时间标记,视频图像播放端在播放视频图像数据时,对比正在播放的视频图像数据中的时间标记与当前时刻,从而计算播放延迟。In practical application scenarios, those skilled in the art can implement step 1110 by adopting various solutions. For example, when outputting video image data, the output device adds a time stamp to the video image data, and the video image player compares the time stamp in the video image data being played with the current moment when playing the video image data, so as to calculate the playback delay .
进一步的,视频图像播放端按照视频图像缓存内视频图像缓存数据的存放顺序进行播放,如果视频图像缓存中的视频图像数据被播放后即被移出缓存,那么视频图像缓存中缓存的数据量即可以代表数据输出端输出视频图像数据的进度与视频图像播放端播放视频图像数据的进度之间的差异。因此,为简化操作流程,降低方案实现难度,在追赶策略的一种实现方式中,监控视频图像数据缓存量以实现追赶策略。Further, the video image player plays according to the storage order of the video image cache data in the video image cache. If the video image data in the video image cache is moved out of the cache after being played, then the amount of data cached in the video image cache can be It represents the difference between the progress of outputting video image data at the data output end and the progress of playing video image data at the video image playback end. Therefore, in order to simplify the operation process and reduce the difficulty of implementing the solution, in an implementation manner of the catch-up strategy, the buffered amount of video image data is monitored to implement the catch-up strategy.
具体的,根据预设的交互场景播放延迟阈值,确定对应的视频图像缓存数据量阈值,在当前的视频图像缓存数据量超出视频图像缓存数据量阈值时,即可判定播放延迟超出交互场景播放延迟阈值。例如,假设交互场景播放延迟阈值为150ms,根据视频图像播放速度,可计算视频图像对应的缓存数据量阈值为3帧。Specifically, according to the preset interactive scene playback delay threshold, the corresponding video image buffer data volume threshold is determined, and when the current video image buffer data volume exceeds the video image buffer data volume threshold, it can be determined that the playback delay exceeds the interactive scene playback delay threshold. For example, assuming that the playback delay threshold of the interactive scene is 150ms, according to the playback speed of the video image, the buffered data volume threshold corresponding to the video image can be calculated to be 3 frames.
图13所示为根据本申请一实施例的视频图像播放方法流程图。在步骤522的一种实现方式中,视频图像播放端执行如图13所示的下述流程:FIG. 13 is a flowchart of a method for playing a video image according to an embodiment of the present application. In an implementation of step 522, the video image player performs the following process as shown in Figure 13:
步骤1210,监控音频播放和/或视频图像播放的未播放数据的数据缓存量; Step 1210, monitor the data buffer amount of unplayed data of audio playback and/or video image playback;
步骤1220,判断视频图像数据缓存量是否超出预设的数据缓存阈值,其中:数据缓存阈值为根据交互场景播放延迟阈值所确定的值;当执行设备为音频播放端时,数据缓存阈值为音频数据缓存阈值;当执行设备为视频图像播放端时,数据缓存阈值为视频图像数据缓存阈值; Step 1220, judging whether the video image data buffering amount exceeds a preset data buffering threshold, wherein: the data buffering threshold is a value determined according to the interactive scene playback delay threshold; when the execution device is an audio player, the data buffering threshold is audio data Cache threshold; when the execution device is the video image player, the data cache threshold is the video image data cache threshold;
当数据缓存量未超出预设的数据缓存阈值时,返回步骤1210;When the data cache amount does not exceed the preset data cache threshold, return to step 1210;
当数据缓存量超出预设的数据缓存阈值时,执行步骤1230;When the data cache amount exceeds the preset data cache threshold, perform step 1230;
步骤1230,调整未播放视频图像数据的数据缓存量,使得视频图像数据缓存量小于等于预设的数据缓存阈值。Step 1230: Adjust the data buffering amount of the unplayed video image data, so that the video image data buffering amount is less than or equal to a preset data buffering threshold.
在实际应用场景中,本领域的技术人员可以采用多种不同的方案实现步骤522。例如,加速视频图像播放的播放速度(例如,将原本10帧/秒的播放速度调整为15帧/秒);又例如,在视频图像播放端基于视频图像缓存数据进行播放时,并不按照缓存中数据存放次序依次读取数据播放,而是采用跳跃方式读取数据播放(例如,针对视频图像缓存数据,采 用隔一帧读取的方式进行读取播放)。In practical application scenarios, those skilled in the art can implement step 522 by adopting various solutions. For example, the playback speed of video image playback is accelerated (for example, the original playback speed of 10 frames/second is adjusted to 15 frames/second); for another example, when the video image playback end plays based on the video image cache data, it does not follow the cached data. In the data storage order, the data is read and played in sequence, but the data is read and played in a skip mode (for example, for video image buffer data, the mode of reading and playing every other frame is used for reading and playing).
进一步的,为了简化操作流程,降低方案实现难度,在步骤522的一种实施方式中,当需要令视频图像播放端的播放进度追赶音视频数据输出的进度时,删除视频图像播放端缓存中的全部或部分未播放视频图像数据,以使得视频图像播放端的播放跳过被删除的数据,从而使得视频图像播放端的播放进度追赶音视频数据输出的进度。其中,视频图像数据的删除量与当前视频图像播放的播放进度相对应,以使得视频图像播放端的播放进度满足交互场景播放延迟阈值。Further, in order to simplify the operation process and reduce the difficulty of implementing the scheme, in an embodiment of step 522, when it is necessary to make the playback progress of the video image player catch up with the progress of the audio and video data output, delete all the buffers in the buffer of the video image player. Or part of the video image data is not played, so that the playback of the video image player skips the deleted data, so that the playback progress of the video image player catches up with the audio and video data output progress. The deletion amount of the video image data corresponds to the playback progress of the current video image playback, so that the playback progress of the video image playback terminal satisfies the interactive scene playback delay threshold.
例如,在步骤1130的一种实现方式中,未播放视频图像数据的删除量与视频图像播放延迟超出交互场景播放延迟阈值的值相对应。For example, in one implementation of step 1130, the deletion amount of the unplayed video image data corresponds to the value of the video image playback delay exceeding the interactive scene playback delay threshold.
又例如,在步骤1230的一种实施方式中,删除视频图像缓存中的数据,使得视频图像数据缓存量小于等于视频图像缓存数据量阈值。For another example, in an embodiment of step 1230, the data in the video image buffer is deleted, so that the video image data buffer amount is less than or equal to the video image buffer data amount threshold.
进一步的,在步骤522的具体实现过程中,本领域的技术人员可以采用多种不同的方案删除视频图像缓存中的未播放视频图像数据。例如,删除视频图像缓存中最早被存入的未播放视频图像数据;又例如,从视频图像缓存中的未播放视频图像数据中随机挑选需要删除的数据;又例如,按照固定间隔,从视频图像缓存中的未播放视频图像数据中挑选需要删除的数据。Further, in the specific implementation process of step 522, those skilled in the art may adopt various solutions to delete the unplayed video image data in the video image cache. For example, delete the earliest unplayed video image data stored in the video image cache; another example, randomly select the data to be deleted from the unplayed video image data in the video image cache; The data to be deleted is selected from the unplayed video image data in the cache.
进一步的,为了确保视频图像的平滑播放,尽量避免图像播放出现跳帧效果,在步骤522的一种实施方式中,在需要删除缓存中的视频图像数据时,按照固定间隔挑选需要删除的视频帧。Further, in order to ensure the smooth playback of the video image and try to avoid the effect of frame skipping in the image playback, in an embodiment of step 522, when the video image data in the cache needs to be deleted, the video frames to be deleted are selected according to fixed intervals. .
进一步的,在音视频分离应用场景中,传输延迟的波动还会导致音视频播放无法同步,从而降低用户体验。例如,在音视频数据采集环节中采用独立设备分别进行音频数据以及视频图像数据的采集传输。同步采集的音频数据以及视频图像数据被分别传输。由于传输延迟存在波动,音频数据与视频数据的传输延迟可能是不同的,那么,播放端所接收到音频数据以及视频数据就会不同步,这就可能会导致音频数据以及视频数据播放的不同步。Further, in the application scenario of audio and video separation, the fluctuation of transmission delay will also cause the audio and video playback to be unsynchronized, thereby reducing the user experience. For example, in the audio and video data collection link, independent equipment is used to collect and transmit audio data and video image data respectively. The simultaneously captured audio data and video image data are transmitted separately. Due to the fluctuation of transmission delay, the transmission delay of audio data and video data may be different, then, the audio data and video data received by the player will be out of synchronization, which may lead to out-of-sync audio data and video data playback. .
例如,在图1所示应用场景中,当无人机11采集到视频图像数据A1时,麦克风12会采集到与视频图像数据A1同步的音频数据B1。无人机11在将视频图像数据A1传输到手机13时,麦克风12将音频数据B1传输到手机13,如果视频图像数据A1以及音频数据B1在手机13上的接收是同步的,那么,如果手机13在接收到视频图像数据A1以及音频数据B1的同时进行播放,视频图像数据A1以及音频数据B1的播放就是同步的。For example, in the application scenario shown in FIG. 1 , when the drone 11 collects the video image data A1, the microphone 12 will collect the audio data B1 that is synchronized with the video image data A1. When the drone 11 transmits the video image data A1 to the mobile phone 13, the microphone 12 transmits the audio data B1 to the mobile phone 13. If the video image data A1 and the audio data B1 are received synchronously on the mobile phone 13, then, if the mobile phone 13. The video image data A1 and the audio data B1 are played at the same time as the video image data A1 and the audio data B1 are received, and the playback of the video image data A1 and the audio data B1 is synchronized.
然而,如果麦克风12传输音频数据B1到手机13的传输延迟,大于无人机11传输视频图像数据A1到手机13的传输延迟,那么,视频图像数据A1以及音频数据B1在手机13上的接收就是不同步的,手机13会先接收到视频图像数据A1。如果手机13在接收到视频图像数据A1的同时进行播放,那么,手机13开始播放视频图像数据A1时,手机13尚未接收到音频数据B1,最终手机13上音视频播放就不会同步,用户对小区进行虚拟游览的用户体验就会大大降低。However, if the transmission delay of the microphone 12 transmitting the audio data B1 to the mobile phone 13 is greater than the transmission delay of the drone 11 transmitting the video image data A1 to the mobile phone 13, then the reception of the video image data A1 and the audio data B1 on the mobile phone 13 is If it is not synchronized, the mobile phone 13 will first receive the video image data A1. If the mobile phone 13 plays the video image data A1 while receiving the video image data A1, when the mobile phone 13 starts to play the video image data A1, the mobile phone 13 has not yet received the audio data B1, and finally the audio and video playback on the mobile phone 13 will not be synchronized. The user experience of virtual tours in the community will be greatly reduced.
针对图1所示的应用场景,一种解决音视频播放不同步的方案是,在播放视频图像数据以及音频数据之前,对视频图像数据以及音频数据进行整合,生成音视频同步的视频文件,播放该视频文件以实现音视频同步。例如,在图1所示实施例中,手机13在接收到视频图像数据A1以及音频数据B1,将其整合,生成视频文件C1,手机13播放视频文件 C1以实现音视频同步。For the application scenario shown in FIG. 1 , a solution for solving the asynchronous playback of audio and video is to integrate the video image data and audio data before playing the video image data and the audio data to generate a video file with audio and video synchronization, and play the audio and video files. The video file to achieve audio and video synchronization. For example, in the embodiment shown in FIG. 1 , the mobile phone 13 receives video image data A1 and audio data B1, integrates them to generate a video file C1, and the mobile phone 13 plays the video file C1 to realize audio and video synchronization.
然而,在某些应用场景中,无法实现音视频数据的整合。However, in some application scenarios, the integration of audio and video data cannot be achieved.
例如,在图1所示实施例中,在手机13中,接收无人机11的视频图像数据的应用(APP)与接收麦克风12的音频数据的应用为不同厂商的应用。接收无人机11的视频图像数据的应用可以实现接收视频图像数据的同时进行播放,并且,接收麦克风12的音频数据的应用可以实现接收音频数据的同时进行播放。但是,接收无人机11的视频图像数据的应用与接收麦克风12的音频数据的应用间,无法简单的实现音视频数据的整合。例如,需要将视频图像数据以及音频数据导入到第三方视频制作应用,同步时间标签后进行整合,才能生成视频文件。这样,如果在进行音视频整合后再进行播放,用户就只能在无人机11以及麦克风12完成音视频采集后才能进行虚拟游览,而不能在无人机11以及麦克风12进行音视频采集时进行同步的虚拟游览。For example, in the embodiment shown in FIG. 1 , in the mobile phone 13 , the application (APP) for receiving the video image data of the drone 11 and the application for receiving the audio data of the microphone 12 are applications of different manufacturers. The application that receives the video image data of the drone 11 can realize the simultaneous playback of the video image data, and the application that receives the audio data of the microphone 12 can realize the simultaneous playback of the audio data. However, between the application receiving the video image data of the drone 11 and the application receiving the audio data of the microphone 12 , the integration of the audio and video data cannot be easily realized. For example, video image data and audio data need to be imported into a third-party video production application, and then integrated after synchronizing time tags, to generate video files. In this way, if the audio and video are integrated and then played, the user can only perform a virtual tour after the drone 11 and the microphone 12 have completed the audio and video collection, but not when the drone 11 and the microphone 12 are collecting audio and video. Take a synchronized virtual tour.
进一步的,在某些应用场景中,为了优化音视频播放效果,会采用不同的播放端分别播放音频数据以及视频图像数据。Further, in some application scenarios, in order to optimize the audio and video playback effect, different playback terminals are used to play audio data and video image data respectively.
例如,在图2所示实施例中,由麦克风22到手机23的音频数据传输与由无人机21到手机23的视频图像数据传输之间存在的传输延迟差异,以及,由手机23到智能音箱24的音频数据传输与由手机23到大屏电视25的视频图像数据传输之间存在的传输延迟差异,均可能导致音视频播放的不同步。虽然在手机23上可以通过整合麦克风22的音频数据以及无人机21的视频图像数据来实现音视频同步,但是,由于智能音箱24与大屏电视25为不同的播放端,无法通过整合音视频数据来同步音视频播放。For example, in the embodiment shown in FIG. 2, there is a difference in transmission delay between the transmission of audio data from the microphone 22 to the mobile phone 23 and the transmission of video image data from the drone 21 to the mobile phone 23, and the difference between the transmission of the audio data from the mobile phone 23 to the mobile phone 23 The difference in transmission delay between the audio data transmission of the speaker 24 and the video image data transmission from the mobile phone 23 to the large-screen TV 25 may cause the audio and video playback to be out of synchronization. Although audio and video synchronization can be achieved by integrating the audio data of the microphone 22 and the video image data of the drone 21 on the mobile phone 23, since the smart speaker 24 and the large-screen TV 25 are different playback ends, the audio and video cannot be integrated by integrating the audio and video data. data to synchronize audio and video playback.
又例如,在图4所示实施例中,由手机43到智能音箱42的音频数据传输与由手机43到大屏电视41的视频图像数据传输之间存在的传输延迟差异,可能导致音视频播放的不同步。For another example, in the embodiment shown in FIG. 4 , the difference in transmission delay between the transmission of audio data from the mobile phone 43 to the smart speaker 42 and the transmission of video image data from the mobile phone 43 to the large-screen TV 41 may cause audio and video playback. out of sync.
又例如,在图8所示实施例中,由手机63到智能音箱62的音频数据传输与由手机63到大屏电视41的视频图像数据传输之间存在的传输延迟差异,可能导致音视频播放的不同步。For another example, in the embodiment shown in FIG. 8 , the difference in transmission delay between the transmission of audio data from the mobile phone 63 to the smart speaker 62 and the transmission of video image data from the mobile phone 63 to the large-screen TV 41 may cause audio and video playback. out of sync.
因此,针对音视频分离应用场景中的音视频播放不同步的问题,在本申请一实施例中,针对交互类应用场景,在采用追赶策略进行音视频播放的基础上,引入了音视频同步操作。Therefore, in view of the problem of unsynchronized audio and video playback in the audio and video separation application scenario, in an embodiment of the present application, for the interactive application scenario, on the basis of using the catch-up strategy for audio and video playback, an audio and video synchronization operation is introduced. .
具体的,在步骤520的一种实现方式中,针对音频播放以及视频图像播放均采用追赶策略以实现音视频同步。即,采用追赶策略进行音频播放以及视频图像播放。调整音频播放以及视频图像播放,令音频播放端以及视频图像播放端的播放进度追赶音视频数据输出的进度,以使得音频播放以及视频图像播放相较于音视频数据输出的播放延迟均小于等于交互场景播放延迟阈值。Specifically, in an implementation manner of step 520, a catch-up strategy is adopted for both audio playback and video image playback to achieve audio and video synchronization. That is, a catch-up strategy is used for audio playback and video image playback. Adjust the audio playback and video image playback so that the playback progress of the audio playback terminal and the video image playback terminal catches up with the progress of the audio and video data output, so that the playback delay of the audio playback and video image playback compared to the audio and video data output is less than or equal to the interactive scene. Playback delay threshold.
采用追赶策略进行音频播放以及视频图像播放,可以将音频播放的音频播放延迟以及视频图像播放的视频图像播放延迟均控制在交互场景播放延迟阈值以内。由于播放延迟是参考数据播放与数据输出之间的时延,而音频输出以及视频图像输出是同步的,因此,也就将音频播放与视频图像播放之间的播放进度差异也就被控制在了交互场景播放延迟阈值以内,从而实现了音频播放与视频图像播放的同步。Using the catch-up strategy for audio playback and video image playback can control the audio playback delay of audio playback and the video image playback delay of video image playback within the interactive scene playback delay threshold. Since playback delay is the delay between reference data playback and data output, and audio output and video image output are synchronized, the difference in playback progress between audio playback and video image playback is also controlled within The interactive scene playback delay threshold is within the threshold, thus realizing the synchronization of audio playback and video image playback.
进一步的,在步骤520的一种实现方式中,仅针对音频播放与视频图像播放中的一项播放进程采用追赶策略,以使得采用追赶策略的播放进程的播放延迟控制在交互场景播放 延迟阈值之内。针对未采用追赶策略的播放进程,则采用同步策略进行播放。在同步策略中,以采用追赶策略的播放进程的播放进度为基准,对采用同步策略的播放进程进行调整,使得采用同步策略的播放进程的播放进度与采用追赶策略的播放进程的播放进度同步。Further, in an implementation manner of step 520, a catch-up strategy is only used for one playback process in audio playback and video image playback, so that the playback delay of the playback process using the catch-up strategy is controlled within the interactive scene playback delay threshold. Inside. For the playback process that does not adopt the catch-up strategy, the synchronization strategy is adopted for playback. In the synchronization strategy, based on the playback progress of the playback process using the catch-up strategy, the playback process using the synchronization strategy is adjusted so that the playback progress of the playback process using the synchronization strategy is synchronized with the playback progress of the playback process using the catch-up strategy.
在某些应用场景中,需要优先确保音频播放的流畅度,尽量减少调整音频播放进度的次数。例如,在日常的视频通话应用场景中,语音是交流信息的主要携带途径,视频图像的卡顿往往不会导致交流信息的缺失,而语音的卡顿往往会带来交流信息的缺失,因此需要优先确保语音播放的流畅度,尽量减少调整语音播放进度的次数。In some application scenarios, it is necessary to prioritize the smoothness of audio playback and minimize the number of times to adjust the audio playback progress. For example, in daily video call application scenarios, voice is the main way to carry information for communication. The video image jam often does not lead to the lack of communication information, while the voice jam often leads to the lack of communication information. Therefore, it is necessary to Priority is given to ensuring the smoothness of voice playback, and the number of times to adjust the progress of voice playback is minimized.
因此,在一实施例中,针对音频播放,采用基于音视频输出进度的追赶策略;针对视频图像播放,采用基于音频播放进度的同步策略。Therefore, in one embodiment, for audio playback, a catch-up strategy based on audio and video output progress is adopted; for video image playback, a synchronization strategy based on audio playback progress is adopted.
图14所示为根据本申请一实施例的音视频播放方法流程图。音视频播放方法由音视频播放系统(包含音视频数据输出设备、音频播放端以及视频图像播放端)中的一台或多台设备执行。如图14所示:FIG. 14 is a flowchart of a method for playing audio and video according to an embodiment of the present application. The audio and video playback method is executed by one or more devices in the audio and video playback system (including audio and video data output devices, audio playback terminals, and video image playback terminals). As shown in Figure 14:
步骤1300,识别当前应用场景; Step 1300, identifying the current application scenario;
步骤1301,判断当前应用场景是否为交互类应用场景; Step 1301, judging whether the current application scene is an interactive application scene;
在当前应用场景不为交互类应用场景时,执行步骤1310;When the current application scenario is not an interactive application scenario, perform step 1310;
步骤1310,采用平滑播放策略进行音频播放和/或视频图像播放; Step 1310, using a smooth playback strategy to play audio and/or video images;
在当前应用场景为交互类应用场景时,执行步骤1120;When the current application scenario is an interactive application scenario, perform step 1120;
步骤1320,采用追赶策略进行音频播放; Step 1320, using the catch-up strategy to play audio;
步骤1330,判断视频图像的播放进度与音频的播放进度是否同步; Step 1330, determine whether the playback progress of the video image is synchronized with the playback progress of the audio;
当视频图像的播放进度与音频的播放进度同步时,返回步骤1320;When the playback progress of the video image is synchronized with the playback progress of the audio, return to step 1320;
当视频图像的播放进度与音频的播放进度不同步时,执行步骤1340;When the playback progress of the video image is not synchronized with the playback progress of the audio, perform step 1340;
步骤1340,以音频的播放进度为基准,调整视频图像播放,使得视频图像的播放进度与音频的播放进度同步。Step 1340: Adjust the playback of the video image based on the playback progress of the audio, so that the playback progress of the video image is synchronized with the playback progress of the audio.
在某些应用场景中,需要优先确保视频图像播放的流畅度,尽量减少调整视频图像播放进度的次数。例如,在通过视频图像画面实时控制远程设备的应用场景中,视频图像画面用于反映远程设备的状态,现场声音只是用来辅助,视频图像的卡顿很容易导致控制错误,因此需要优先确保视频图像播放的流畅度,尽量减少调整视频图像播放进度的次数。In some application scenarios, it is necessary to give priority to ensuring the smoothness of video image playback, and minimize the number of times to adjust the video image playback progress. For example, in the application scenario of real-time control of a remote device through a video image, the video image is used to reflect the status of the remote device, and the live sound is only used to assist, and the video image freeze can easily lead to control errors, so it is necessary to give priority to ensuring the video The smoothness of image playback, minimize the number of times to adjust the video image playback progress.
因此,在一实施例中,针对视频图像播放,采用基于音视频输出进度的追赶策略;针对音频播放,采用基于视频图像播放进度的同步策略。Therefore, in an embodiment, for video image playback, a catch-up strategy based on audio and video output progress is adopted; for audio playback, a synchronization strategy based on video image playback progress is adopted.
图15所示为根据本申请一实施例的音视频播放方法流程图。音视频播放方法由音视频播放系统(包含音视频数据输出设备、音频播放端以及视频图像播放端)中的一台或多台设备执行。如图15所示:FIG. 15 is a flowchart of a method for playing audio and video according to an embodiment of the present application. The audio and video playback method is executed by one or more devices in the audio and video playback system (including audio and video data output devices, audio playback terminals, and video image playback terminals). As shown in Figure 15:
步骤1400,识别当前应用场景; Step 1400, identifying the current application scenario;
步骤1401,判断当前应用场景是否为交互类应用场景; Step 1401, judging whether the current application scene is an interactive application scene;
在当前应用场景不为交互类应用场景时,执行步骤1410;When the current application scenario is not an interactive application scenario, perform step 1410;
步骤1410,采用平滑播放策略进行音频播放和/或视频图像播放; Step 1410, using a smooth playback strategy to play audio and/or video images;
在当前应用场景为交互类应用场景时,执行步骤1420;When the current application scenario is an interactive application scenario, perform step 1420;
步骤1420,采用追赶策略进行视频图像播放; Step 1420, using a catch-up strategy to play video images;
步骤1430,判断视频图像的播放进度与音频的播放进度是否同步; Step 1430, determine whether the playback progress of the video image is synchronized with the playback progress of the audio;
当视频图像的播放进度与音频的播放进度同步时,返回步骤1420;When the playback progress of the video image is synchronized with the playback progress of the audio, return to step 1420;
当视频图像的播放进度与音频的播放进度不同步时,执行步骤1440;When the playback progress of the video image is not synchronized with the playback progress of the audio, perform step 1440;
步骤1440,以视频图像的播放进度为基准,调整音频播放,使得音频的播放进度与视频图像的播放进度同步。Step 1440: Adjust the audio playback based on the playback progress of the video image, so that the playback progress of the audio is synchronized with the playback progress of the video image.
进一步的,在图14或图15所示的实施例中,本领域的技术人员可以采用多种不同的实施方式以实现,判断音频的播放进度与视频图像的播放进度是否同步,以及,调整音频播放或视频图像播放,使得音频的播放进度与视频图像的播放进度同步。Further, in the embodiment shown in FIG. 14 or FIG. 15 , those skilled in the art can adopt a variety of different implementations to determine whether the playback progress of the audio is synchronized with the playback progress of the video image, and adjust the audio Play or video image playback, so that the playback progress of the audio is synchronized with the playback progress of the video image.
例如,在本申请一实施例中,采用基于同步播放延迟阈值进行判断的方式,来判断音频的播放进度与视频图像的播放进度是否同步。具体的,预设同步播放延迟阈值,该同步播放延迟阈值为,在满足预期用户视频观看体验的基础上,音频播放和视频播放之间,播放进度差异的最大值。例如,当音频播放与视频播放的播放进度差异超出150ms时,用户会感觉到明显的音视频播放不协调。因此,设定同步播放延迟阈值为150ms。当音频的播放进度与视频图像的播放进度之间的差异值超出同步播放延迟阈值时,即判定音频的播放进度与视频图像的播放进度不同步。For example, in an embodiment of the present application, the judgment based on the synchronous playback delay threshold is adopted to judge whether the playback progress of the audio is synchronized with the playback progress of the video image. Specifically, a synchronous playback delay threshold is preset, and the synchronous playback delay threshold is the maximum value of the difference in playback progress between audio playback and video playback on the basis of satisfying the expected user video viewing experience. For example, when the difference between the playback progress of audio playback and video playback exceeds 150ms, the user will feel obvious inconsistency of audio and video playback. Therefore, the synchronous playback delay threshold is set to 150ms. When the difference value between the playback progress of the audio and the playback progress of the video image exceeds the synchronous playback delay threshold, it is determined that the playback progress of the audio is not synchronized with the playback progress of the video image.
具体的,在一应用场景中,可以通过时间戳来体现播放进度差异。例如,在输出设备输出音频数据A9以及视频图像数据B9后,在音频播放端播放音频数据A9时,记录播放时间(时间戳T1);在视频图像播放端播放视频图像数据B9时,记录播放时间(时间戳T2);T1与T2之间的间隔时长即为音频播放与视频图像播放之间的播放进度差异。Specifically, in an application scenario, the difference in playback progress can be represented by a timestamp. For example, after the output device outputs the audio data A9 and the video image data B9, when the audio data A9 is played at the audio playback end, the playback time (time stamp T1) is recorded; when the video image data B9 is played at the video image playback end, the playback time is recorded. (time stamp T2); the interval duration between T1 and T2 is the difference in playback progress between audio playback and video image playback.
进一步的,在播放过程中,调整音频播放或视频图像播放,使得音频播放与视频图像播放间的播放进度差异控制在同步播放延迟阈值之内,从而在满足音频播放与视频图像播放的实时性的基础上,实现音视频播放同步。Further, in the playback process, adjust audio playback or video image playback, so that the playback progress difference between audio playback and video image playback is controlled within the synchronous playback delay threshold, so as to meet the real-time performance of audio playback and video image playback. On this basis, the synchronization of audio and video playback is realized.
进一步的,在步骤1340或步骤1440的具体实现过程中,可以采用删除缓存中的未播放数据的方式来加速推进播放进度,也可以采用在缓存中的未播放数据中添加过渡数据的方式来延缓播放进度的推进,从而使得音频的播放进度与视频图像的播放进度同步。Further, in the specific implementation process of step 1340 or step 1440, the way of deleting the unplayed data in the cache can be used to accelerate the progress of playing, or the way of adding transition data to the unplayed data in the cache can be used to delay. The playback progress is advanced, so that the playback progress of the audio is synchronized with the playback progress of the video image.
例如,针对视频图像数据,在两个视频图像帧之间加入一个过渡帧(过渡帧可以是相邻视频图像帧的复制帧),使得视频图像播放端的播放流程中加入过渡帧的播放,从而延缓视频图像播放进度的推进。For example, for video image data, a transition frame is added between two video image frames (the transition frame can be a duplicate frame of adjacent video image frames), so that the playback of the transition frame is added to the playback process of the video image playback end, thereby delaying the The progress of the video image playback progress.
进一步的,在步骤520的一种实现方式中,基于对数据传输延迟的监控来实现音视频同步。具体的,设置播放缓存环节。在播放设备接收到音频数据或视频图像数据后,在音频数据或视频图像数据被播放之前,将音频数据或视频图像数据再额外缓存特定时长(播放缓存)后在播放,缓存时长为音频数据传输延迟与视频图像数据传输延迟之间的差值,这样就可以补偿音频数据传输延迟与视频图像数据传输延迟之间的差异,从而确保音频数据与视频图像数据同步播放。Further, in an implementation manner of step 520, audio and video synchronization is implemented based on monitoring the data transmission delay. Specifically, the playback cache link is set. After the playback device receives the audio data or video image data, before the audio data or video image data is played, the audio data or video image data is additionally buffered for a certain period of time (playback buffering) before playing, and the buffering period is the audio data transmission. The difference between the delay and the transmission delay of the video image data, so that the difference between the transmission delay of the audio data and the transmission delay of the video image data can be compensated to ensure that the audio data and the video image data are played synchronously.
进一步的,考虑到在播放缓存环节中,在播放视频图像数据或音频数据之前将数据缓存,该缓存时长会被计算到播放延迟中,因此,在步骤520的一种实现方式中限定播放缓存环节中缓存数据的时长不能超过交互场景播放延迟阈值。由于播放缓存环节的缓存时长为音频数据传输延迟与视频图像数据传输延迟之间的差值,因此,当音频数据传输延迟与视频图像数据传输延迟之间的差值超出交互场景播放延迟阈值时,对缓存中已缓存的数据进行删除操作并在缓存中补入新的数据,从而在维持音视频播放同步的前提下,实现追赶 策略。Further, considering that in the playback buffering link, data is buffered before playing the video image data or audio data, the buffering duration will be calculated into the playback delay. Therefore, in an implementation manner of step 520, the playback buffering link is limited. The duration of cached data cannot exceed the interactive scene playback delay threshold. Since the cache duration of the playback cache is the difference between the audio data transmission delay and the video image data transmission delay, when the difference between the audio data transmission delay and the video image data transmission delay exceeds the interactive scene playback delay threshold, Delete the cached data in the cache and add new data in the cache, so as to realize the catch-up strategy on the premise of maintaining the synchronization of audio and video playback.
以一具体应用场景为例,如图8所示,如果用户使用手机63玩游戏。为优化游戏效果,用户采用游戏投屏模式进行游戏。在游戏投屏模式中,手机63并不通过自身的喇叭以及屏幕输出游戏音频与游戏视频图像,而是将游戏的视频图像发送到大屏电视61显示,将游戏的音频发送到智能音箱62播放。用户通过大屏电视61的图像输出以及智能音箱62的音频输出获取游戏内容,在手机上进行相应的游戏操作。由于用户与手机63间存在实时交互,因此,游戏投屏应用场景为交互类应用场景。具体的,在用户启动游戏投屏模式后,手机63、大屏电视61以及智能音箱62执行下述流程:Taking a specific application scenario as an example, as shown in FIG. 8 , if the user uses the mobile phone 63 to play games. In order to optimize the game effect, users use the game screen projection mode to play the game. In the game screen projection mode, the mobile phone 63 does not output game audio and game video images through its own speakers and screen, but sends the game video images to the large-screen TV 61 for display, and sends the game audio to the smart speaker 62 for playback. . The user obtains the game content through the image output of the large-screen TV 61 and the audio output of the smart speaker 62, and performs corresponding game operations on the mobile phone. Since there is real-time interaction between the user and the mobile phone 63, the game screen projection application scenario is an interactive application scenario. Specifically, after the user activates the game screen projection mode, the mobile phone 63, the large-screen TV 61 and the smart speaker 62 execute the following processes:
手机63识别游戏投屏场景后,分别建立与大屏电视61和智能音箱62的连接;After the mobile phone 63 recognizes the game screen projection scene, it establishes connections with the large-screen TV 61 and the smart speaker 62 respectively;
大屏电视61给手机63反馈网络延迟100ms,智能音箱62给手机63反馈网络延迟200ms;The large-screen TV 61 reports a network delay of 100ms to the mobile phone 63, and the smart speaker 62 reports a network delay of 200ms to the mobile phone 63;
手机63分别给大屏电视61和智能音箱62下发追赶策略(交互场景播放延迟阈值150ms),同时手机63发送大屏电视61的网络延迟数据到智能音箱62,发送智能音箱62的网络延迟数据到大屏电视61;The mobile phone 63 sends a catch-up strategy to the large-screen TV 61 and the smart speaker 62 respectively (the interactive scene playback delay threshold is 150ms), and the mobile phone 63 sends the network delay data of the large-screen TV 61 to the smart speaker 62, and sends the network delay data of the smart speaker 62. to the big screen TV 61;
手机63开始给大屏电视61发送视频图像帧,给智能音箱62发送音频帧;The mobile phone 63 starts to send video image frames to the large-screen TV 61 and audio frames to the smart speaker 62;
智能音箱62得知大屏电视61网络延迟100ms,小于自身的网络延迟200ms,因此直接播放音频帧;The smart speaker 62 learns that the network delay of the large-screen TV 61 is 100ms, which is less than its own network delay of 200ms, so it directly plays the audio frame;
大屏电视61得知智能音箱62网络延迟200ms,因此将视频图像帧缓存100ms(<交互场景播放延迟阈值150ms)后在解码播放,从而和智能音箱62的音频帧播放同步。The large-screen TV 61 knows that the network delay of the smart speaker 62 is 200ms, so the video image frame is buffered for 100ms (<interactive scene playback delay threshold of 150ms) before decoding and playback, thereby synchronizing with the audio frame playback of the smart speaker 62.
进一步的,假如在下一时刻,智能音箱62网络延迟进一步恶化到300ms,大屏电视61网络延迟仍然为100ms,智能音箱62给手机63反馈网络延迟300ms,大屏电视61给手机63反馈网络延迟100ms;手机63将更新的网络延迟数据,分别通知大屏电视61和智能音箱62;Further, if at the next moment, the network delay of the smart speaker 62 further deteriorates to 300ms, the network delay of the large-screen TV 61 is still 100ms, the smart speaker 62 reports the network delay to the mobile phone 63 300ms, and the large-screen TV 61 reports the network delay to the mobile phone 63 100ms ; The mobile phone 63 notifies the large-screen TV 61 and the smart speaker 62 of the updated network delay data respectively;
大屏电视61更新智能音箱62网络延迟300ms,计算需要缓存200ms才能和音频播放同步,但是超出了150ms的阈值,因此,丢掉之前缓存100ms的前50ms视频图像数据并再缓存100ms视频图像数据后(共150ms),解码播放视频图像数据,从而令视频图像播放与音频播放同步。The large-screen TV 61 updates the smart speaker 62 with a network delay of 300ms. The calculation needs to be cached for 200ms to synchronize with the audio playback, but the threshold of 150ms is exceeded. Therefore, after discarding the first 50ms of video image data that was previously cached for 100ms, and then caching the video image data for 100ms ( 150ms in total), decode and play the video image data, so as to synchronize the video image playback with the audio playback.
进一步的,假如大屏电视61网络闪断了50ms后继续,由于当前大屏电视61有150ms的视频缓冲,在闪断的50ms内,继续有视频图像数据可供播放并和智能音箱62的音频数据同步播放。Further, if the network of the large-screen TV 61 is interrupted for 50ms and then continues, since the current large-screen TV 61 has a video buffer of 150ms, within 50ms of the flash, there will continue to be video image data available for playback and the audio of the smart speaker 62. Data is played synchronously.
进一步的,在实际应用场景中,数据传输延迟往往是持续波动变化的,但是,对数据传输延迟的监控并不能无间断持续进行。因此,在步骤520的一种实现方式中,定时刷新音频播放设备以及视频图像播放设备的传输延迟,以定期的基于数据传输延迟进行播放缓存以同步音频播放以及视频图像播放。Further, in practical application scenarios, the data transmission delay often fluctuates continuously, but the monitoring of the data transmission delay cannot be performed continuously without interruption. Therefore, in an implementation manner of step 520, the transmission delay of the audio playback device and the video image playback device is periodically refreshed, and playback buffering is performed periodically based on the data transmission delay to synchronize audio playback and video image playback.
进一步的,在步骤520的一种实现方式中,采用基于数据传输延迟同步音视频播放方案与其他音视频同步方案相结合的方式,来提高同步效果。例如,在音视频分别播放的应用场景中,定期执行时间戳同步操作,在时间戳同步操作中,对比当前正在播放的音频数据以及视频图像数据的时间戳,以确认播放进度差异,调整音频播放或视频图像播放,使得音频播放与视频图像播放间的播放进度差异控制在同步播放延迟阈值之内,实现音视频 播放同步。Further, in an implementation manner of step 520, the synchronization effect is improved by adopting a combination of a synchronization audio and video playback scheme based on data transmission delay and other audio and video synchronization schemes. For example, in an application scenario where audio and video are played separately, perform a time stamp synchronization operation periodically. In the time stamp synchronization operation, compare the time stamps of the currently playing audio data and video image data to confirm the difference in playback progress and adjust the audio playback. Or video image playback, so that the playback progress difference between audio playback and video image playback is controlled within the synchronous playback delay threshold to achieve audio and video playback synchronization.
在两次时间戳同步操作之间,采用基于数据传输延迟同步音视频播放方案,定期刷新音频数据以及视频图像数据的数据传输延迟,根据数据传输延迟的差值进行播放缓存以实现音视频播放同步。Between two time stamp synchronization operations, the audio and video playback scheme based on data transmission delay is used to synchronize audio and video playback. The data transmission delay of audio data and video image data is regularly refreshed, and playback buffering is performed according to the difference between data transmission delays to achieve audio and video playback synchronization. .
进一步的,为确保平滑播放,播放设备在接收到音频数据或视频图像数据后,并不立即播放,而是缓存一定的数据以应对传输延迟波动。在播放音频数据或视频图像数据时,播放设备从音频缓存或视频图像缓存中提取数据,令提取出的数据进入播放环节。在步骤520的一种实现方式中,在采用追赶策略进行音频播放和/或视频图像播放的基础上,在播放环节中,加入播放缓存环节。具体的,在音频数据或视频图像数据从音频缓存或视频图像缓存中被提取后,在音频数据或视频图像数据被播放之前,将音频数据或视频图像数据再额外缓存特定时长(播放缓存)后在播放,缓存时长为音频数据传输延迟与视频图像数据传输延迟之间的差值,这样就可以补偿音频数据传输延迟与视频图像数据传输延迟之间的差异,从而确保音频数据与视频图像数据同步播放。Further, in order to ensure smooth playback, the playback device does not play the audio data or video image data immediately after receiving the audio data, but buffers a certain amount of data to cope with fluctuations in transmission delay. When playing audio data or video image data, the playback device extracts data from the audio buffer or video image buffer, so that the extracted data enters the playback link. In an implementation manner of step 520, on the basis of adopting the catch-up strategy for audio playback and/or video image playback, a playback buffer link is added in the playback link. Specifically, after the audio data or the video image data is extracted from the audio buffer or the video image buffer, before the audio data or the video image data is played, the audio data or the video image data are additionally buffered for a certain period of time (playback buffer). During playback, the buffering time is the difference between the audio data transmission delay and the video image data transmission delay, so that the difference between the audio data transmission delay and the video image data transmission delay can be compensated, so as to ensure that the audio data is synchronized with the video image data. play.
具体的,图16所示为根据本申请一实施例的音视频播放方法部分流程图。在步骤520的一种实现方式中,执行图16所示的下述流程以实现音视频播放同步:Specifically, FIG. 16 shows a partial flowchart of an audio and video playback method according to an embodiment of the present application. In an implementation manner of step 520, the following process shown in Figure 16 is performed to realize audio and video playback synchronization:
步骤1510,在进行音频数据以及视频数据的传输前,获取音频数据传输以及视频图像传输的初始传输延迟,以传输延迟高的数据类型为第一类数据,以传输延迟低的数据类型为第二类数据,第一类数据的初始数据传输延迟为第一延迟,第二类数据的初始数据传输延迟为第二延迟(当传输延迟相同时,任意定义第一类数据以及第二类数据);Step 1510: Before the transmission of audio data and video data, the initial transmission delay of audio data transmission and video image transmission is obtained, and the data type with high transmission delay is the first type of data, and the data type with low transmission delay is the second type of data. Type data, the initial data transmission delay of the first type of data is the first delay, and the initial data transmission delay of the second type of data is the second delay (when the transmission delays are the same, the first type of data and the second type of data are arbitrarily defined);
步骤1511,判断第一延迟差值是否小于等于交互场景播放延迟阈值,其中,第一延迟差值为第一延迟与第二延迟之间的差值; Step 1511, determine whether the first delay difference is less than or equal to the interactive scene playback delay threshold, wherein the first delay difference is the difference between the first delay and the second delay;
步骤1512,当第一延迟差值小于等于交互场景播放延迟阈值时,传输第一类数据以及第二类数据,在第一类数据进入播放环节后直接播放,在第二类数据进入播放环节后将第二类数据缓存第一延迟差值后播放; Step 1512, when the first delay difference is less than or equal to the interactive scene playback delay threshold, transmit the first type of data and the second type of data, and play directly after the first type of data enters the playback link, and after the second type of data enters the playback link. Play the second type of data after buffering the first delay difference;
步骤1513,当第一延迟差值大于交互场景播放延迟阈值时,说明当前应用场景中,如果要确保音视频同步,则无法满足实时交互需求,因此,输出延迟过高提醒,以提醒用户当前数据传输链路无法满足实时交互需求。 Step 1513, when the first delay difference is greater than the interactive scene playback delay threshold, it means that in the current application scenario, if the audio and video synchronization is to be ensured, the real-time interactive requirements cannot be met. Therefore, the output delay is too high to remind the user to remind the user of the current data. The transmission link cannot meet the real-time interaction requirements.
在上述流程中,由于第二类数据是缓存后再播放,相当于第二类数据等待了第一类数据,因此可以确保第一类数据与第二类数据同步播放。In the above process, because the second type of data is buffered and then played, it is equivalent to the second type of data waiting for the first type of data, so it can be ensured that the first type of data and the second type of data are played synchronously.
在步骤1512被执行后,还执行下述步骤:After step 1512 is performed, the following steps are also performed:
步骤1520,获取音频数据传输以及视频图像传输的当前传输延迟,以第一类数据的当前数据传输延迟为第三延迟,第二类数据的当前数据传输延迟为第四延迟; Step 1520, obtaining the current transmission delay of audio data transmission and video image transmission, taking the current data transmission delay of the first type of data as the third delay, and the current data transmission delay of the second type of data as the fourth delay;
步骤1521,判断第一类数据的当前传输延迟是否高于第二类数据; Step 1521, determine whether the current transmission delay of the first type of data is higher than that of the second type of data;
步骤1531,在第一类数据的当前传输延迟高于第二类数据的情况下,判断第二延迟差值是否小于等于交互场景播放延迟阈值,其中,第二延迟差值为第三延迟与第四延迟之间的差值; Step 1531, when the current transmission delay of the first type of data is higher than that of the second type of data, determine whether the second delay difference is less than or equal to the interactive scene playback delay threshold, where the second delay difference is the third delay and the third delay. The difference between the four delays;
步骤1532,当第二延迟差值小于等于交互场景播放延迟阈值时,在第一类数据进入播放环节后直接播放,在第二类数据进入播放环节后将第二类数据缓存第二延迟差值后播放; Step 1532, when the second delay difference is less than or equal to the interactive scene playback delay threshold, directly play the first type of data after entering the playback link, and cache the second type of data after the second type of data enters the playback link. The second delay difference after play;
步骤1533,当第二延迟差值大于交互场景播放延迟阈值时,在第一类数据进入播放环节后直接播放;删除已缓存的第二类数据中的部分数据,删除量为第二延迟差值超出交互场景播放延迟阈值的值;增加第二类数据的缓存量,当第二类数据的缓存量达到交互场景播放延迟阈值时开始播放第二类数据。 Step 1533, when the second delay difference is greater than the interactive scene playback delay threshold, directly play the first type of data after entering the playback link; delete some data in the cached second type of data, and the deletion amount is the second delay difference The value that exceeds the interactive scene playback delay threshold; increases the cache amount of the second type of data, and starts to play the second type of data when the cache amount of the second type of data reaches the interactive scene playback delay threshold.
进一步的,在步骤1521被执行后,还执行下述步骤:Further, after step 1521 is executed, the following steps are also executed:
步骤1541,在第二类数据的当前传输延迟高于第一类数据的情况下,判断第三延迟差值是否小于等于交互场景播放延迟阈值,其中,第三延迟差值为第四延迟与第三延迟之间的差值; Step 1541, when the current transmission delay of the second type of data is higher than that of the first type of data, determine whether the third delay difference is less than or equal to the interactive scene playback delay threshold, where the third delay difference is the fourth delay and the third delay. The difference between the three delays;
步骤1542,当第三延迟差值小于等于交互场景播放延迟阈值时,在第二类数据进入播放环节后直接播放,在第一类数据进入播放环节后将第一类数据缓存第三延迟差值后播放; Step 1542, when the third delay difference is less than or equal to the interactive scene playback delay threshold, directly play the second type of data after entering the playback link, and cache the first type of data after the first type of data enters the playback link. The third delay difference after play;
步骤1543,当第三延迟差值大于交互场景播放延迟阈值时,在第二类数据进入播放环节后直接播放;删除已缓存的第一类数据中的部分数据,删除量为第三延迟差值超出交互场景播放延迟阈值的值;增加第一类数据的缓存量,当第一类数据的缓存量达到交互场景播放延迟阈值时开始播放第一类数据。 Step 1543, when the third delay difference is greater than the interactive scene playback delay threshold, directly play the second type of data after entering the playback link; delete some data in the cached first type of data, and the deletion amount is the third delay difference The value that exceeds the interactive scene playback delay threshold; increases the buffer amount of the first type of data, and starts to play the first type of data when the buffered amount of the first type of data reaches the interactive scene playback delay threshold.
进一步的,在本申请一实施例中,针对非交互类应用场景,在采用平滑播放策略的基础上,引入了音视频同步操作。Further, in an embodiment of the present application, for a non-interactive application scenario, an audio and video synchronization operation is introduced based on a smooth playback strategy.
具体的,在步骤510的一种实现方式中,针对音频播放与视频图像播放中的一项播放进程采用同步策略进行播放。在同步策略中,以未采用同步策略的播放进程的播放进度为基准,对采用同步策略的播放进程进行调整,使得采用同步策略的播放进程的播放进度与采用平滑播放策略的播放进程的播放进度同步。Specifically, in an implementation manner of step 510, a synchronization strategy is used to play a playback process of audio playback and video image playback. In the synchronization strategy, based on the playback progress of the playback process without the synchronization strategy, the playback process with the synchronization strategy is adjusted so that the playback progress of the playback process with the synchronization strategy is the same as the playback progress of the playback process with the smooth playback strategy. Synchronize.
例如,在一实施例中,针对音频播放采用平滑播放策略,针对视频图像播放采用基于音频播放进度的同步策略。又例如,在一实施例中,针对视频图像播放采用平滑播放策略,针对音频播放采用基于视频图像播放进度的同步策略。又例如,在一实施例中,针对音频播放以及视频图像播放采用平滑播放策略,进一步的,针对视频图像播放,在采用平滑播放的基础上,采用基于音频播放进度的同步策略。又例如,在一实施例中,针对音频播放以及视频图像播放采用平滑播放策略,进一步的,针对音频播放,在采用平滑播放的基础上,采用基于视频图像播放进度的同步策略。For example, in one embodiment, a smooth playback strategy is adopted for audio playback, and a synchronization strategy based on audio playback progress is adopted for video image playback. For another example, in one embodiment, a smooth playback strategy is adopted for video image playback, and a synchronization strategy based on video image playback progress is adopted for audio playback. For another example, in one embodiment, a smooth playback strategy is adopted for audio playback and video image playback. Further, for video image playback, a synchronization strategy based on audio playback progress is adopted on the basis of smooth playback. For another example, in one embodiment, a smooth playback strategy is adopted for audio playback and video image playback. Further, for audio playback, a synchronization strategy based on video and image playback progress is adopted on the basis of smooth playback.
进一步的,在步骤510的具体实现过程中,本领域的技术人员可以采用多种不同的实施方式实现同步策略。Further, in the specific implementation process of step 510, those skilled in the art can implement the synchronization strategy by adopting various implementation manners.
例如,采用基于同步播放延迟阈值进行判断的方式,来判断音频的播放进度与视频图像的播放进度是否同步(例如,通过时间戳来体现播放进度差异);在播放过程中,调整音频播放或视频图像播放,使得音频播放与视频图像播放间的播放进度差异控制在同步播放延迟阈值之内,从而在满足音频播放与视频图像播放的实时性的基础上,实现音视频播放同步。具体的,可以采用删除缓存中的未播放数据的方式来加速推进播放进度,也可以采用在缓存中的未播放数据中添加过渡数据的方式来延缓播放进度的推进,从而使得音频的播放进度与视频图像的播放进度同步。For example, the method based on the synchronous playback delay threshold is used to judge whether the playback progress of the audio is synchronized with the playback progress of the video image (for example, the difference of the playback progress is reflected by the timestamp); during the playback process, adjust the audio playback or video playback progress. Image playback, so that the playback progress difference between audio playback and video image playback is controlled within the synchronous playback delay threshold, so that audio and video playback synchronization is achieved on the basis of satisfying the real-time performance of audio playback and video image playback. Specifically, the playback progress can be accelerated by deleting the unplayed data in the cache, or the progress of the playback progress can be delayed by adding transition data to the unplayed data in the cache, so that the playback progress of the audio is different from the playback progress. The playback progress of the video image is synchronized.
又例如,基于对数据传输延迟的监控来实现音视频同步。For another example, audio and video synchronization is implemented based on monitoring the data transmission delay.
可以理解的是,上述实施例中的部分或全部步骤或操作仅是示例,本申请实施例还 可以执行其它操作或者各种操作的变形。此外,各个步骤可以按照上述实施例呈现的不同的顺序来执行,并且有可能并非要执行上述实施例中的全部操作。It can be understood that some or all of the steps or operations in the foregoing embodiments are only examples, and other operations or variations of various operations may also be performed in the embodiments of the present application. Furthermore, the various steps may be performed in a different order presented in the above-described embodiments, and may not perform all operations in the above-described embodiments.
进一步的,在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(FieldProgrammable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由访问方对器件编程来确定。由设计人员自行编程来把一个数字装置“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。Further, in the 1990s, an improvement in a technology can be clearly differentiated between improvements in hardware (for example, improvements to circuit structures such as diodes, transistors, switches, etc.) or improvements in software (improvements in method procedures). ). However, with the development of technology, the improvement of many methods and processes today can be regarded as a direct improvement of the hardware circuit structure. Designers almost get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by hardware entity modules. For example, a Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)) is an integrated circuit whose logical function is determined by an accessor programming the device. It is programmed by the designer to "integrate" a digital device on a PLD, without the need for a chip manufacturer to design and manufacture a dedicated integrated circuit chip. Moreover, today, instead of making integrated circuit chips by hand, this kind of programming is also mostly implemented using "logic compiler" software, which is similar to the software compiler used in program development and writing, and needs to be compiled before compiling. The original code also has to be written in a specific programming language, which is called Hardware Description Language (HDL), and there is not only one HDL, but many kinds, such as ABEL (Advanced Boolean Expression Language) , AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., currently the most commonly used The ones are VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. It should also be clear to those skilled in the art that a hardware circuit for implementing the logic method process can be easily obtained by simply programming the method process in the above-mentioned several hardware description languages and programming it into the integrated circuit.
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller may be implemented in any suitable manner, for example, the controller may take the form of eg a microprocessor or processor and a computer readable medium storing computer readable program code (eg software or firmware) executable by the (micro)processor , logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of controllers include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art also know that, in addition to implementing the controller in the form of pure computer-readable program code, the controller can be implemented as logic gates, switches, application-specific integrated circuits, programmable logic controllers and embedded devices by logically programming the method steps. The same function can be realized in the form of a microcontroller, etc. Therefore, such a controller can be regarded as a hardware component, and the devices included therein for realizing various functions can also be regarded as a structure within the hardware component. Or even, the means for implementing various functions can be regarded as both a software module implementing a method and a structure within a hardware component.
因此,基于本申请实施例所提出的音视频播放方法,本申请一实施例提出了一种音视频播放装置,该装置安装在如图7所示的音视频输出设备701中,装置包括:Therefore, based on the audio and video playback method proposed by the embodiment of the present application, an embodiment of the present application proposes an audio and video playback device, which is installed in the audio and video output device 701 as shown in FIG. 7 , and the device includes:
场景识别模块,其用于识别当前应用场景;a scene identification module, which is used to identify the current application scene;
播放策略配置模块,其用于在当前应用场景为交互类应用场景时,配置追赶策略为音频播放和/或视频图像播放的播放策略,其中,追赶策略包括:A playback strategy configuration module, which is used to configure the chasing strategy to be a playback strategy of audio playback and/or video image playback when the current application scenario is an interactive application scenario, wherein the chasing strategy includes:
调整音频播放和/或所述视频图像播放,令音频播放和/或视频图像播放的播放进度追赶音视频数据输出的进度,以使得音频播放和/或视频图像播放相较于音视频数据输出的 播放延迟小于等于预设的交互场景播放延迟阈值。Adjust the audio playback and/or the video image playback so that the playback progress of the audio playback and/or the video image playback catches up with the audio and video data output progress, so that the audio playback and/or the video image playback is compared with the audio and video data output. The playback delay is less than or equal to the preset interactive scene playback delay threshold.
进一步的,播放策略配置模块还用于:Further, the play strategy configuration module is also used for:
在当前应用场景为非交互类应用场景时,配置平滑播放策略为音频播放和/或视频图像播放的播放策略。When the current application scenario is a non-interactive application scenario, configure the smooth playback strategy as a playback strategy for audio playback and/or video image playback.
基于本申请实施例所提出的音频播放方法,本申请一实施例提出了一种音频播放装置,该装置安装在如图7所示的音频播放设备702中,音频播放装置包括:Based on the audio playback method proposed by the embodiment of the present application, an embodiment of the present application proposes an audio playback device. The device is installed in the audio playback device 702 as shown in FIG. 7 , and the audio playback device includes:
阈值获取模块,其用于当音频播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;a threshold acquisition module, which is used to acquire a preset interactive scene playback delay threshold when the playback strategy of audio playback is a catch-up strategy;
第一播放调整模块,其用于当音频播放的播放策略为追赶策略时,在进行音频播放的过程中调整音频播放,令音频播放的播放进度追赶音视频数据输出的进度,以使得音频播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。The first playback adjustment module is used to adjust the audio playback in the process of audio playback when the playback strategy of the audio playback is a catch-up strategy, so that the playback progress of the audio playback catches up with the output of the audio and video data, so that the audio playback phase is comparable. The playback delay compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
进一步的,音频播放装置还包括:Further, the audio playback device also includes:
第二播放调整模块,其用于当音频播放的播放策略为平滑播放策略时,在进行音频播放的过程中基于平滑播放策略调整音频播放。The second playback adjustment module is configured to adjust the audio playback based on the smooth playback strategy during the audio playback when the playback strategy of the audio playback is the smooth playback strategy.
基于本申请实施例所提出的视频图像播放方法,本申请一实施例提出了一种视频图像播放装置,该装置安装在如图7所示的视频图像播放设备703中,视频图像播放装置包括:Based on the video image playback method proposed by the embodiment of the present application, an embodiment of the present application proposes a video image playback device. The device is installed in the video image playback device 703 as shown in FIG. 7 , and the video image playback device includes:
阈值获取模块,其用于当视频图像播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;a threshold acquisition module, which is used to acquire a preset interactive scene playback delay threshold when the playback strategy for video image playback is a catch-up strategy;
第一播放调整模块,其用于当视频图像播放的播放策略为追赶策略时,在进行视频图像播放的过程中调整视频图像播放,令视频图像播放的播放进度追赶音视频数据输出的进度,以使得视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。The first playback adjustment module is used to adjust the playback of the video image in the process of playing the video image when the playback strategy of the video image playback is a catch-up strategy, so that the playback progress of the video image playback catches up with the output of the audio and video data. The playback delay of the video image playback compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
进一步的,视频图像播放装置还包括:Further, the video image playback device also includes:
第二播放调整模块,其用于当视频图像播放的播放策略为平滑播放策略时,在进行视频图像播放的过程中基于平滑播放策略调整视频图像播放。The second playback adjustment module is configured to adjust the playback of the video image based on the smooth playback strategy during the process of playing the video image when the playback strategy of the video image playback is the smooth playback strategy.
进一步的,本申请一实施例提出了一种音视频图像播放装置,该装置安装在如图6所示的播放设备603中,装置包括:Further, an embodiment of the present application proposes an audio, video and image playback device, the device is installed in the playback device 603 as shown in FIG. 6 , and the device includes:
场景识别模块,其用于识别当前应用场景;a scene identification module, which is used to identify the current application scene;
播放策略配置模块,其用于在当前应用场景为交互类应用场景时,配置追赶策略为音频播放和/或视频图像播放的播放策略;在当前应用场景为非交互类应用场景时,配置平滑播放策略为音频播放和/或视频图像播放的播放策略;A playback strategy configuration module, which is used to configure the catch-up strategy as a playback strategy for audio playback and/or video image playback when the current application scenario is an interactive application scenario; and configure smooth playback when the current application scenario is a non-interactive application scenario The strategy is the playback strategy for audio playback and/or video image playback;
阈值获取模块,其用于当音频播放和/或视频图像播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;a threshold acquisition module, which is used to acquire a preset interactive scene playback delay threshold when the playback strategy of audio playback and/or video image playback is a catch-up strategy;
第一播放调整模块,其用于当音频播放和/或视频图像播放的播放策略为追赶策略时,调整音频播放和/或视频图像播放,令音频播放和/或视频图像播放的播放进度追赶音视频数据输出的进度,以使得音频播放和/或视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值;The first playback adjustment module, which is used to adjust audio playback and/or video image playback when the playback strategy of audio playback and/or video image playback is a catch-up strategy, so that the playback progress of audio playback and/or video image playback can catch up with the sound. The progress of video data output, so that the playback delay of audio playback and/or video image playback is less than or equal to the preset interactive scene playback delay threshold compared to the audio and video data output;
第二播放调整模块,其用于当音频播放和/或视频图像播放的播放策略为平滑播放策略时,基于平滑播放策略调整音频播放和/或视频图像播放。The second playback adjustment module is configured to adjust audio playback and/or video image playback based on the smooth playback strategy when the playback strategy for audio playback and/or video image playback is a smooth playback strategy.
在本申请实施例的描述中,为了描述的方便,描述装置时以功能分为各种模块/单元分别描述,各个模块/单元的划分仅仅是一种逻辑功能的划分,在实施本申请实施例时可以把各模块/单元的功能在同一个或多个软件和/或硬件中实现。In the description of the embodiments of the present application, for the convenience of description, when describing the device, functions are divided into various modules/units for description, and the division of each module/unit is only a logical function division. At the same time, the functions of each module/unit may be implemented in one or more software and/or hardware.
具体的,本申请实施例所提出的装置在实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分模块以软件通过处理元件调用的形式实现,部分模块通过硬件的形式实现。例如,检测模块可以为单独设立的处理元件,也可以集成在电子设备的某一个芯片中实现。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起,也可以独立实现。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。Specifically, the apparatuses proposed in the embodiments of the present application may be fully or partially integrated into a physical entity during actual implementation, or may be physically separated. And these modules can all be implemented in the form of software calling through processing elements; they can also all be implemented in hardware; some modules can also be implemented in the form of software calling through processing elements, and some modules can be implemented in hardware. For example, the detection module may be a separately established processing element, or may be integrated in a certain chip of the electronic device. The implementation of other modules is similar. In addition, all or part of these modules can be integrated together, and can also be implemented independently. In the implementation process, each step of the above-mentioned method or each of the above-mentioned modules can be completed by an integrated logic circuit of hardware in the processor element or an instruction in the form of software.
例如,以上这些模块可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(Application Specific Integrated Circuit,ASIC),或,一个或多个数字信号处理器(Digital Singnal Processor,DSP),或,一个或者多个现场可编程门阵列(Field Programmable Gate Array,FPGA)等。再如,这些模块可以集成在一起,以片上装置(System-On-a-Chip,SOC)的形式实现。For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more specific integrated circuits (Application Specific Integrated Circuit, ASIC), or, one or more digital signal processors ( Digital Singnal Processor, DSP), or, one or more Field Programmable Gate Array (Field Programmable Gate Array, FPGA), etc. For another example, these modules can be integrated together and implemented in the form of an on-chip device (System-On-a-Chip, SOC).
本申请一实施例还提出了一种电子设备(音视频数据输出设备),电子设备包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发电子设备执行如本申请实施例所述的音视频播放方法步骤。An embodiment of the present application also proposes an electronic device (audio and video data output device), the electronic device includes a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the computer program When the processor executes, the electronic device is triggered to execute the steps of the audio and video playback method described in the embodiments of the present application.
本申请一实施例还提出了一种电子设备(音频播放设备),电子设备包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发电子设备执行如本申请实施例所述的音频播放方法步骤。An embodiment of the present application also proposes an electronic device (audio playback device), the electronic device includes a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the processor During execution, the electronic device is triggered to execute the steps of the audio playback method described in the embodiments of the present application.
本申请一实施例还提出了一种电子设备(视频图像播放设备),电子设备包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发电子设备执行如本申请实施例所述的视频图像播放方法步骤。An embodiment of the present application also proposes an electronic device (video image playback device), the electronic device includes a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are processed by the When the controller is executed, the electronic device is triggered to execute the steps of the video image playback method described in the embodiments of the present application.
本申请一实施例还提出了一种电子设备(音视频播放设备),电子设备包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发电子设备执行如本申请实施例所述的音视频播放方法步骤。An embodiment of the present application also proposes an electronic device (audio and video playback device), the electronic device includes a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are processed by the When the device is executed, the electronic device is triggered to execute the steps of the audio and video playback method described in the embodiments of the present application.
具体的,在本申请一实施例中,上述一个或多个计算机程序被存储在上述存储器中,上述一个或多个计算机程序包括指令,当上述指令被上述设备执行时,使得上述设备执行本申请实施例所述的方法步骤。Specifically, in an embodiment of the present application, the above-mentioned one or more computer programs are stored in the above-mentioned memory, and the above-mentioned one or more computer programs include instructions. When the above-mentioned instructions are executed by the above-mentioned device, the above-mentioned device is made to execute the application. The method steps described in the examples.
具体的,在本申请一实施例中,电子设备的处理器可以是片上装置SOC,该处理器中可以包括中央处理器(Central Processing Unit,CPU),还可以进一步包括其他类型的处理器。具体的,在本申请一实施例中,电子设备的处理器可以是PWM控制芯片。Specifically, in an embodiment of the present application, the processor of the electronic device may be an on-chip device SOC, and the processor may include a central processing unit (Central Processing Unit, CPU), and may further include other types of processors. Specifically, in an embodiment of the present application, the processor of the electronic device may be a PWM control chip.
具体的,在本申请一实施例中,涉及的处理器可以例如包括CPU、DSP、微控制器或数字信号处理器,还可包括GPU、嵌入式神经网络处理器(Neural-network Process Units,NPU)和图像信号处理器(Image Signal Processing,ISP),该处理器还可包括必要的硬件加速器或逻辑处理硬件电路,如ASIC,或一个或多个用于控制本申请技术方案程序执行的集成电路等。此外,处理器可以具有操作一个或多个软件程序的功能,软件程序可以存储在存储介质中。Specifically, in an embodiment of the present application, the involved processor may include, for example, a CPU, a DSP, a microcontroller, or a digital signal processor, and may also include a GPU, an embedded Neural-network Process Units (NPU, NPU) ) and an image signal processor (Image Signal Processing, ISP), the processor may also include necessary hardware accelerators or logic processing hardware circuits, such as ASICs, or one or more integrated circuits for controlling the execution of programs in the technical solution of the present application Wait. Furthermore, the processor may have the function of operating one or more software programs, which may be stored in a storage medium.
具体的,在本申请一实施例中,电子设备的存储器可以是只读存储器(read-only memory,ROM)、可存储静态信息和指令的其它类型的静态存储设备、随机存取存储器(random access memory,RAM)或可存储信息和指令的其它类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备,或者还可以是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何计算机可读介质。Specifically, in an embodiment of the present application, the memory of the electronic device may be a read-only memory (ROM), other types of static storage devices that can store static information and instructions, random access memory (random access memory) memory, RAM) or other types of dynamic storage devices that can store information and instructions, also can be electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), compact disc read-only memory, CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, Blu-ray disk, etc.), magnetic disk storage medium or other magnetic storage device, or can also be used for portable or Any computer-readable medium that stores desired program code in the form of instructions or data structures and can be accessed by a computer.
具体的,在本申请一实施例中,处理器可以和存储器可以合成一个处理装置,更常见的是彼此独立的部件,处理器用于执行存储器中存储的程序代码来实现本申请实施例所述方法。具体实现时,该存储器也可以集成在处理器中,或者,独立于处理器。Specifically, in an embodiment of the present application, a processor may be combined with a memory to form a processing device, which is more commonly an independent component. The processor is used to execute program codes stored in the memory to implement the method described in the embodiment of the present application. . During specific implementation, the memory can also be integrated in the processor, or be independent of the processor.
进一步的,本申请实施例阐明的设备、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。Further, the device, apparatus, module or unit described in the embodiments of the present application may be specifically implemented by a computer chip or entity, or implemented by a product having a certain function.
本领域内的技术人员应明白,本申请实施例可提供为方法、装置、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as a method, an apparatus, or a computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein.
在本申请所提供的几个实施例中,任一功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。In the several embodiments provided in this application, if any function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
具体的,本申请一实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行本申请实施例提供的方法。Specifically, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, the computer executes the method provided by the embodiment of the present application.
本申请一实施例还提供一种计算机程序产品,该计算机程序产品包括计算机程序,当其在计算机上运行时,使得计算机执行本申请实施例提供的方法。An embodiment of the present application further provides a computer program product, where the computer program product includes a computer program that, when running on a computer, causes the computer to execute the method provided by the embodiment of the present application.
本申请中的实施例描述是参照根据本申请实施例的方法、设备(装置)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The descriptions of the embodiments in the present application are described with reference to flowcharts and/or block diagrams of methods, apparatuses (apparatuses), and computer program products according to the embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.
还需要说明的是,本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示单独存在A、同时存在A和B、单独存在B的情况。其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项”及其类似表达,是指的这些项中的任意组合,包括单项或复数项的任意组合。例如,a,b和c中的至少一项可以表示:a,b,c,a和b,a和c,b和c或a和b和c,其中a,b,c可以是单个,也可以是多个。It should also be noted that, in the embodiments of the present application, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or", which describes the association relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate the existence of A alone, the existence of A and B at the same time, and the existence of B alone. where A and B can be singular or plural. The character "/" generally indicates that the associated objects are an "or" relationship. "At least one of the following" and similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, and c may represent: a, b, c, a and b, a and c, b and c or a and b and c, where a, b, c may be single, or Can be multiple.
本申请实施例中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。In the embodiments of the present application, the terms "comprising", "comprising" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device including a series of elements not only includes those elements, but also includes Other elements not expressly listed, or which are inherent to such a process, method, article of manufacture, or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture, or device that includes the element.
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.
本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。The various embodiments in this application are described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the apparatus embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for related parts.
本领域普通技术人员可以意识到,本申请实施例中描述的各单元及算法步骤,能够以电子硬件、计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that each unit and algorithm steps described in the embodiments of the present application can be implemented by a combination of electronic hardware, computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the above-described devices, devices and units, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
以上所述,仅为本申请的具体实施方式,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be covered by the protection scope of the present application. The protection scope of the present application shall be subject to the protection scope of the claims.

Claims (24)

  1. 一种音视频播放方法,其特征在于,包括:A method for playing audio and video, comprising:
    识别当前应用场景;Identify the current application scenario;
    根据所述当前应用场景配置音频播放和/或视频图像播放的播放策略,包括:Configure a playback strategy for audio playback and/or video image playback according to the current application scenario, including:
    在所述当前应用场景为交互类应用场景时,配置追赶策略为音频播放和/或视频图像播放的播放策略,其中,所述追赶策略包括:When the current application scenario is an interactive application scenario, the chasing strategy is configured as a playback strategy for audio playback and/or video image playback, wherein the chasing strategy includes:
    调整所述音频播放和/或所述视频图像播放,令所述音频播放和/或所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放和/或所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。Adjust the audio playback and/or the video image playback, so that the playback progress of the audio playback and/or the video image playback catches up with the audio and video data output progress, so that the audio playback and/or the video playback progress The playback delay of the image playback compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    在所述当前应用场景为非交互类应用场景时,配置平滑播放策略为所述音频播放和/或所述视频图像播放的播放策略。When the current application scenario is a non-interactive application scenario, a smooth playback strategy is configured as a playback strategy for the audio playback and/or the video image playback.
  3. 根据权利要求1或2所述的方法,其特征在于,所述配置追赶策略为音频播放和/或视频图像播放的播放策略,包括:The method according to claim 1 or 2, wherein the configuration catch-up strategy is a playback strategy for audio playback and/or video image playback, comprising:
    配置所述追赶策略为所述音频播放以及所述视频图像播放的播放策略。The chasing strategy is configured as a playing strategy of the audio playing and the video image playing.
  4. 根据权利要求1或2所述的方法,其特征在于,所述配置追赶策略为音频播放和/或视频图像播放的播放策略,包括:The method according to claim 1 or 2, wherein the configuration catch-up strategy is a playback strategy for audio playback and/or video image playback, comprising:
    配置所述追赶策略为所述音频播放的播放策略;Configuring the chasing strategy to be the playback strategy of the audio playback;
    配置同步策略为所述视频图像播放的播放策略,其中,所述同步策略包括:The configuration synchronization strategy is the playback strategy of the video image playback, wherein the synchronization strategy includes:
    以所述音频播放的播放进度为基准,对所述视频图像播放进行调整,使得所述视频图像播放的播放进度与所述音频播放的播放进度同步。Based on the playback progress of the audio playback, the video image playback is adjusted so that the playback progress of the video image playback is synchronized with the playback progress of the audio playback.
  5. 根据权利要求1或2所述的方法,其特征在于,所述配置追赶策略为音频播放和/或视频图像播放的播放策略,包括:The method according to claim 1 or 2, wherein the configuration catch-up strategy is a playback strategy for audio playback and/or video image playback, comprising:
    配置所述追赶策略为所述视频图像播放的播放策略;Configuring the chasing strategy to be a playback strategy of the video image playback;
    配置同步策略为所述音频播放的播放策略,其中,所述同步策略包括:The configuration synchronization strategy is the playback strategy of the audio playback, wherein the synchronization strategy includes:
    以所述视频图像播放的播放进度为基准,对所述音频播放进行调整,使得所述音频播放的播放进度与所述视频图像播放的播放进度同步。The audio playback is adjusted based on the playback progress of the video image playback, so that the playback progress of the audio playback is synchronized with the playback progress of the video image playback.
  6. 一种音频播放方法,其特征在于,包括:An audio playback method, comprising:
    当音频播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;When the playback strategy of audio playback is the catch-up strategy, obtain the preset interactive scene playback delay threshold;
    基于所述交互场景播放延迟阈值进行所述音频播放,在所述音频播放的执行过程中,调整所述音频播放,令所述音频播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。The audio playback is performed based on the interactive scene playback delay threshold, and during the execution of the audio playback, the audio playback is adjusted so that the playback progress of the audio playback catches up with the output progress of the audio and video data, so that the audio playback progresses The playback delay of audio playback compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
  7. 根据权利要求6所述的方法,其特征在于,所述基于所述交互场景播放延迟阈值进行所述音频播放,包括:The method according to claim 6, wherein the performing the audio playback based on the interactive scene playback delay threshold comprises:
    监控所述音频播放相对于音视频数据输出的播放延迟;Monitoring the playback delay of the audio playback relative to the audio and video data output;
    当所述音频播放的播放延迟超出所述交互场景播放延迟阈值时,调整所述音频播放,以使得所述音频播放的播放延迟小于等于所述交互场景播放延迟阈值。When the playback delay of the audio playback exceeds the interactive scene playback delay threshold, the audio playback is adjusted so that the playback delay of the audio playback is less than or equal to the interactive scene playback delay threshold.
  8. 根据权利要求6所述的方法,其特征在于,所述基于所述交互场景播放延迟阈值进行所述音频播放,包括:The method according to claim 6, wherein the performing the audio playback based on the interactive scene playback delay threshold comprises:
    监控所述音频播放的未播放数据的数据缓存量;Monitor the data buffer amount of the unplayed data of the audio playback;
    当所述音频播放的未播放数据的数据缓存量超出预设的数据缓存阈值时,调整所述音频播放的未播放数据的数据缓存量,使得所述音频播放的未播放数据的数据缓存量小于等于预设的数据缓存阈值。When the data buffer amount of the unplayed data of the audio playback exceeds the preset data buffer threshold, the data buffer amount of the unplayed data of the audio playback is adjusted so that the data buffer amount of the unplayed data of the audio playback is less than Equal to the preset data cache threshold.
  9. 根据权利要求6~8中任一项所述的方法,其特征在于,所述调整所述音频播放,包括:The method according to any one of claims 6 to 8, wherein the adjusting the audio playback comprises:
    删除音频缓存中的全部或部分的未播放数据,以使得所述音频播放跳过被删除的未播放数据,从而使得所述音频播放的播放进度追赶音视频数据输出的进度。All or part of the unplayed data in the audio buffer is deleted, so that the audio playback skips the deleted unplayed data, so that the playback progress of the audio playback catches up with the output progress of the audio and video data.
  10. 根据权利要求9所述的方法,其特征在于,所述删除音频缓存中的全部或部分的未播放数据,包括:The method according to claim 9, wherein the deleting all or part of the unplayed data in the audio buffer comprises:
    监控音频缓存中音频数据的波形、频率,在需要删除音频缓存中的未播放数据时,优先删除对于人耳不敏感的音频数据。Monitor the waveform and frequency of the audio data in the audio buffer, and when it is necessary to delete unplayed data in the audio buffer, preferentially delete the audio data that is not sensitive to human ears.
  11. 一种视频图像播放方法,其特征在于,包括:A method for playing video images, comprising:
    当视频图像播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;When the playback strategy of video image playback is the catch-up strategy, obtain the preset interactive scene playback delay threshold;
    基于所述交互场景播放延迟阈值进行所述视频图像播放,在所述视频图像播放的执行过程中,调整所述视频图像播放,令所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。The video image playback is performed based on the interactive scene playback delay threshold, and during the execution of the video image playback, the video image playback is adjusted so that the playback progress of the video image playback catches up with the audio and video data output progress, So that the playback delay of the video image compared to the audio and video data output is less than or equal to a preset interactive scene playback delay threshold.
  12. 根据权利要求11所述的方法,其特征在于,所述基于所述交互场景播放延迟阈值进行所述视频图像播放,包括:The method according to claim 11, wherein the performing the video image playback based on the interactive scene playback delay threshold comprises:
    监控所述视频图像播放相对于音视频数据输出的播放延迟;Monitoring the playback delay of the video image playback relative to the audio and video data output;
    当所述视频图像播放的播放延迟超出所述交互场景播放延迟阈值时,调整所述视频图像播放,以使得所述视频图像播放的播放延迟小于等于所述交互场景播放延迟阈值。When the playback delay of the video image playback exceeds the interactive scene playback delay threshold, the video image playback is adjusted so that the playback delay of the video image playback is less than or equal to the interactive scene playback delay threshold.
  13. 根据权利要求11所述的方法,其特征在于,所述基于所述交互场景播放延迟阈值进行所述视频图像播放,包括:The method according to claim 11, wherein the performing the video image playback based on the interactive scene playback delay threshold comprises:
    监控所述视频图像播放的未播放数据的数据缓存量;Monitoring the data buffer amount of the unplayed data played by the video image;
    当所述视频图像播放的未播放数据的数据缓存量超出预设的数据缓存阈值时,调整所述视频图像播放的未播放数据的数据缓存量,使得所述视频图像播放的未播放数据的数据缓存量小于等于预设的数据缓存阈值。When the data cache amount of the unplayed data played by the video image exceeds the preset data cache threshold, adjust the data cache size of the unplayed data played by the video image so that the data of the unplayed data played by the video image is The cache amount is less than or equal to the preset data cache threshold.
  14. 根据权利要求11~13中任一项所述的方法,其特征在于,所述调整所述视频图像播放,包括:The method according to any one of claims 11 to 13, wherein the adjusting the video image playback comprises:
    删除视频图像缓存中的全部或部分的未播放数据,以使得所述视频图像播放跳过被删除的未播放数据,从而使得所述视频图像播放的播放进度追赶音视频数据输出的进度。All or part of the unplayed data in the video image buffer is deleted, so that the video image playback skips the deleted unplayed data, so that the playback progress of the video image playback catches up with the audio and video data output progress.
  15. 一种音视频播放装置,其特征在于,装置包括:An audio and video playback device, characterized in that the device comprises:
    场景识别模块,其用于识别当前应用场景;a scene identification module, which is used to identify the current application scene;
    播放策略配置模块,其用于根据所述当前应用场景配置音频播放和/或视频图像播放的播放策略,包括:A playback strategy configuration module, which is used to configure a playback strategy for audio playback and/or video image playback according to the current application scenario, including:
    在所述当前应用场景为交互类应用场景时,配置追赶策略为音频播放和/或视频图像播放的播放策略,其中,所述追赶策略包括:When the current application scenario is an interactive application scenario, the chasing strategy is configured as a playback strategy for audio playback and/or video image playback, wherein the chasing strategy includes:
    调整所述音频播放和/或所述视频图像播放,令所述音频播放和/或所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放和/或所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。Adjust the audio playback and/or the video image playback, so that the playback progress of the audio playback and/or the video image playback catches up with the audio and video data output progress, so that the audio playback and/or the video playback progress The playback delay of the image playback compared to the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
  16. 一种音频播放装置,其特征在于,装置包括:An audio playback device, characterized in that the device comprises:
    阈值获取模块,其用于当音频播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;a threshold acquisition module, which is used to acquire a preset interactive scene playback delay threshold when the playback strategy of audio playback is a catch-up strategy;
    播放调整模块,其用于在进行所述音频播放的过程中调整所述音频播放,令所述音频播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。A playback adjustment module, which is used to adjust the audio playback in the process of performing the audio playback, so that the playback progress of the audio playback catches up with the progress of the audio and video data output, so that the audio playback is compared with the audio and video data. The output playback delay is less than or equal to the preset interactive scene playback delay threshold.
  17. 一种视频图像播放装置,其特征在于,装置包括:A video image playback device, characterized in that the device comprises:
    阈值获取模块,其用于当视频图像播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;a threshold acquisition module, which is used to acquire a preset interactive scene playback delay threshold when the playback strategy for video image playback is a catch-up strategy;
    播放调整模块,其用于在进行所述视频图像播放的过程中调整所述视频图像播放,令所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。A playback adjustment module, which is used to adjust the playback of the video image in the process of playing the video image, so that the playback progress of the video image playback catches up with the progress of the audio and video data output, so that the video image playback is relatively The playback delay of the audio and video data output is less than or equal to the preset interactive scene playback delay threshold.
  18. 一种音视频播放装置,其特征在于,装置包括:An audio and video playback device, characterized in that the device comprises:
    场景识别模块,其用于识别当前应用场景;a scene identification module, which is used to identify the current application scene;
    播放策略配置模块,其用于根据所述当前应用场景配置音频播放和/或视频图像播放的播放策略,包括:在所述当前应用场景为交互类应用场景时,配置追赶策略为音频播放和/或视频图像播放的播放策略,其中,所述追赶策略包括:调整所述音频播放和/或所述视频图像播放,令所述音频播放和/或所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放和/或所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值;A playback strategy configuration module, which is used to configure a playback strategy for audio playback and/or video image playback according to the current application scenario, including: when the current application scenario is an interactive application scenario, configuring a catch-up strategy for audio playback and/or Or a playback strategy for video image playback, wherein the chasing strategy includes: adjusting the audio playback and/or the video image playback, so that the playback progress of the audio playback and/or the video image playback chases the audio and video data The progress of the output, so that the playback delay of the audio playback and/or the video image playback compared to the audio and video data output is less than or equal to a preset interactive scene playback delay threshold;
    阈值获取模块,其用于当所述音频播放和/或所述视频图像播放的播放策略为追赶策略时,获取预设的交互场景播放延迟阈值;a threshold acquisition module, configured to acquire a preset interactive scene playback delay threshold when the playback strategy of the audio playback and/or the video image playback is a catch-up strategy;
    播放调整模块,其用于在进行所述音频播放和/或所述视频图像播放的过程中调整所述音频播放和/或所述视频图像播放,令所述音频播放和/或所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放和/或所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值。A playback adjustment module, which is used to adjust the audio playback and/or the video image playback in the process of performing the audio playback and/or the video image playback, so that the audio playback and/or the video image playback The playback progress of playing catches up with the progress of the audio and video data output, so that the playback delay of the audio playback and/or the video image playback compared to the audio and video data output is less than or equal to a preset interactive scene playback delay threshold.
  19. 一种音视频播放系统,其特征在于,系统包括:An audio and video playback system, characterized in that the system includes:
    音视频输出装置,其用于分别输出音频数据以及视频数据;以及,识别当前应用场景,在所述当前应用场景为交互类应用场景时,配置追赶策略为音频播放和/或视频图像播放的播放策略,其中,所述追赶策略包括:调整所述音频播放和/或所述视频图像播放,令所述音频播放和/或所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放和/或所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值;An audio and video output device, which is used to output audio data and video data respectively; and, identify the current application scene, when the current application scene is an interactive application scene, configure the chasing strategy to play audio playback and/or video image playback strategy, wherein the chasing strategy includes: adjusting the audio playback and/or the video image playback, so that the playback progress of the audio playback and/or the video image playback catches up with the audio and video data output progress, so that The playback delay of the audio playback and/or the video image playback compared to the audio and video data output is less than or equal to a preset interactive scene playback delay threshold;
    音频播放装置,其用于接收所述音频数据,按照所述音视频输出装置配置的所述音频播放的播放策略,播放所述音频数据;An audio playback device, configured to receive the audio data, and play the audio data according to a playback strategy of the audio playback configured by the audio and video output device;
    视频图像播放装置,其用于接收所述视频图像数据,按照所述音视频输出装置配置的 所述视频图像播放的播放策略,播放所述视频图像数据。A video image playback device, configured to receive the video image data, and play the video image data according to the playback strategy of the video image playback configured by the audio and video output device.
  20. 一种音视频播放系统,其特征在于,系统包括:An audio and video playback system, characterized in that the system includes:
    音频输出装置,其用于输出音频数据;an audio output device for outputting audio data;
    视频图像输出装置,其用于输出视频图像数据;a video image output device for outputting video image data;
    音视频播放装置,其用于:Audio and video playback device, which is used for:
    接收所述音频数据以及所述视频图像数据;receiving the audio data and the video image data;
    识别当前应用场景,在所述当前应用场景为交互类应用场景时,配置追赶策略为音频播放和/或视频图像播放的播放策略,其中,所述追赶策略包括:调整所述音频播放和/或所述视频图像播放,令所述音频播放和/或所述视频图像播放的播放进度追赶音视频数据输出的进度,以使得所述音频播放和/或所述视频图像播放相较于音视频数据输出的播放延迟小于等于预设的交互场景播放延迟阈值;Identifying the current application scenario, when the current application scenario is an interactive application scenario, configure a catch-up strategy as a playback strategy for audio playback and/or video image playback, wherein the catch-up strategy includes: adjusting the audio playback and/or The video image is played, so that the playback progress of the audio playback and/or the video image playback catches up with the progress of the audio and video data output, so that the audio playback and/or the video image playback is compared with the audio and video data. The output playback delay is less than or equal to the preset interactive scene playback delay threshold;
    按照所述音频播放的播放策略以及所述视频图像播放的播放策略,播放所述音频数据以及所述视频图像数据。The audio data and the video image data are played according to the playback strategy of the audio playback and the playback strategy of the video image playback.
  21. 一种电子设备,其特征在于,所述电子设备包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发所述电子设备执行如权利要求1~5中任一项所述的方法步骤。An electronic device, characterized in that the electronic device comprises a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the processor, the electronic device is triggered The apparatus performs the method steps of any one of claims 1-5.
  22. 一种电子设备,其特征在于,所述电子设备包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发所述电子设备执行如权利要求6~10中任一项所述的方法步骤。An electronic device, characterized in that the electronic device comprises a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the processor, the electronic device is triggered The apparatus performs the method steps of any one of claims 6-10.
  23. 一种电子设备,其特征在于,所述电子设备包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发所述电子设备执行如权利要求11~14中任一项所述的方法步骤。An electronic device, characterized in that the electronic device comprises a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the processor, the electronic device is triggered The apparatus performs the method steps of any one of claims 11-14.
  24. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行如权利要求1-14中任一项所述的方法。A computer-readable storage medium, characterized in that, a computer program is stored in the computer-readable storage medium, and when it runs on a computer, the computer executes the method according to any one of claims 1-14 .
PCT/CN2021/143557 2021-02-26 2021-12-31 Audio/video playing method and apparatus, and electronic device WO2022179306A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110217016.8A CN114979783B (en) 2021-02-26 2021-02-26 Audio and video playing method and device and electronic equipment
CN202110217016.8 2021-02-26

Publications (1)

Publication Number Publication Date
WO2022179306A1 true WO2022179306A1 (en) 2022-09-01

Family

ID=82972806

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/143557 WO2022179306A1 (en) 2021-02-26 2021-12-31 Audio/video playing method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN114979783B (en)
WO (1) WO2022179306A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116170622A (en) * 2023-02-21 2023-05-26 阿波罗智联(北京)科技有限公司 Audio and video playing method, device, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080012985A1 (en) * 2006-07-12 2008-01-17 Quanta Computer Inc. System and method for synchronizing video frames and audio frames
CN106131655A (en) * 2016-05-19 2016-11-16 安徽四创电子股份有限公司 A kind of player method based on real-time video and smooth catch-up playback method
CN106790576A (en) * 2016-12-27 2017-05-31 深圳市汇龙建通实业有限公司 A kind of interactive desktop synchronization
CN107277558A (en) * 2017-06-19 2017-10-20 网宿科技股份有限公司 A kind of player client for realizing that live video is synchronous, system and method
CN107483972A (en) * 2017-07-24 2017-12-15 平安科技(深圳)有限公司 Live processing method, storage medium and a kind of mobile terminal of a kind of audio frequency and video
CN111372138A (en) * 2018-12-26 2020-07-03 杭州登虹科技有限公司 Live broadcast low-delay technical scheme of player end

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109144463B (en) * 2018-08-14 2020-08-25 Oppo广东移动通信有限公司 Transmission control method and device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080012985A1 (en) * 2006-07-12 2008-01-17 Quanta Computer Inc. System and method for synchronizing video frames and audio frames
CN106131655A (en) * 2016-05-19 2016-11-16 安徽四创电子股份有限公司 A kind of player method based on real-time video and smooth catch-up playback method
CN106790576A (en) * 2016-12-27 2017-05-31 深圳市汇龙建通实业有限公司 A kind of interactive desktop synchronization
CN107277558A (en) * 2017-06-19 2017-10-20 网宿科技股份有限公司 A kind of player client for realizing that live video is synchronous, system and method
CN107483972A (en) * 2017-07-24 2017-12-15 平安科技(深圳)有限公司 Live processing method, storage medium and a kind of mobile terminal of a kind of audio frequency and video
CN111372138A (en) * 2018-12-26 2020-07-03 杭州登虹科技有限公司 Live broadcast low-delay technical scheme of player end

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116170622A (en) * 2023-02-21 2023-05-26 阿波罗智联(北京)科技有限公司 Audio and video playing method, device, equipment and medium

Also Published As

Publication number Publication date
CN114979783A (en) 2022-08-30
CN114979783B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
JP5957760B2 (en) Video / audio processor
US7822050B2 (en) Buffering, pausing and condensing a live phone call
JP5026167B2 (en) Stream transmission server and stream transmission system
CN106686438B (en) method, device and system for synchronously playing audio images across equipment
US8244897B2 (en) Content reproduction apparatus, content reproduction method, and program
US9621949B2 (en) Method and apparatus for reducing latency in multi-media system
CN113286184B (en) Lip synchronization method for respectively playing audio and video on different devices
WO2020125153A1 (en) Smooth network video playback control method based on streaming media technology
KR20080014843A (en) Method and system for improving interactive media response systems using visual cues
US10582258B2 (en) Method and system of rendering late or early audio-video frames
JPWO2006082787A1 (en) Recording / reproducing apparatus, recording / reproducing method, recording medium storing recording / reproducing program, and integrated circuit used in recording / reproducing apparatus
JP2003114845A (en) Media conversion method and media conversion device
CN101710997A (en) MPEG-2 (Moving Picture Experts Group-2) system based method and system for realizing video and audio synchronization
MX2011005782A (en) Audio/video data play control method and apparatus.
JP2002033771A (en) Media data processor
US20130166769A1 (en) Receiving device, screen frame transmission system and method
WO2022179306A1 (en) Audio/video playing method and apparatus, and electronic device
KR100490403B1 (en) Method for controlling buffering of audio stream and apparatus thereof
CN111352605A (en) Audio playing and sending method and device
JP2007095163A (en) Multimedia encoded data separating and transmitting apparatus
CN113613221B (en) TWS master device, TWS slave device, audio device and system
CN114242067A (en) Speech recognition method, apparatus, device and storage medium
US20070248170A1 (en) Transmitting Apparatus, Receiving Apparatus, and Reproducing Apparatus
WO2016104178A1 (en) Signal processing device, signal processing method, and program
JP5325059B2 (en) Video / audio synchronized playback device, video / audio synchronized processing device, video / audio synchronized playback program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21927730

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21927730

Country of ref document: EP

Kind code of ref document: A1