CN117956210A

CN117956210A - Audio and video synchronization adjustment method and related equipment

Info

Publication number: CN117956210A
Application number: CN202211340666.2A
Authority: CN
Inventors: 郑磊; 胡敏
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2024-04-30

Abstract

The application discloses a sound and picture synchronization adjustment method and related equipment, which are applied to first equipment, wherein in the process of playing a target video, the first equipment plays a picture frame of the target video through the first equipment, and plays an audio of the target video through second equipment, and the method comprises the following steps: caching first audio information of a target video played by first equipment, wherein the first audio information comprises M first audio sampling points corresponding to picture frames played by the first equipment in a preset period; collecting second audio information of a target video played by second equipment, wherein the second audio information comprises M second audio sampling points corresponding to audio played by the second equipment in a preset period; determining and updating sound-to-picture delay parameters based on the first audio information and the second audio information; and adjusting the display time of the picture frame of the target video based on the updated audio-visual time delay parameter. By adopting the method and the device, the consistency of sound and picture synchronization can be improved, so that the user experience is improved.

Description

Audio and video synchronization adjustment method and related equipment

Technical Field

The application relates to the technical field of electronic equipment, in particular to a sound and picture synchronization adjustment method and related equipment.

Background

When the existing display equipment (such as a smart screen and the like) is connected with Bluetooth audio playing equipment (such as a sound box and the like), as the audio stream is subjected to audio compression coding before transmission and then is transmitted through a Bluetooth protocol, audio data decoding, playing and other multi-process processing are finally carried out on the audio equipment, so that the phenomenon of asynchronous audio and video is caused, for example, in a scene that the Bluetooth sound box is matched with the smart screen for use, when video is played on the smart screen, due to the existence of time delay, the picture displayed on the smart screen and the audio played by the Bluetooth sound box can be obviously felt, and obvious differences exist between the mouth shape of a person in the picture and the actually heard content, so that the experience degree of a user is reduced.

Therefore, how to provide a method for adjusting synchronization of audio and video and related devices, which can improve consistency of synchronization of audio and video when using peripheral devices (such as bluetooth audio playing devices) to play audio, is a problem to be solved.

Disclosure of Invention

The technical problem to be solved by the embodiment of the application is how to provide a sound and picture synchronization adjustment method and related equipment, which can improve the consistency of sound and picture synchronization when using peripheral equipment to play audio, thereby improving user experience.

A first aspect provides a method for adjusting synchronization of audio and video, which is applied to a first device, where the first device and a second device have established a wireless connection, and the first device plays, during a process of playing a target video, a frame of the target video through the first device, and plays, through the second device, an audio of the target video, where the method includes: caching first audio information of the target video played by the first device, wherein the first audio information comprises M first audio sampling points corresponding to picture frames played by the first device in a preset period, M is an integer greater than 0, and the first device starts a sound and picture synchronization function of the first device; collecting second audio information of the target video played by the second equipment, wherein the second audio information comprises M second audio sampling points corresponding to audio played by the second equipment in the preset period; determining and updating sound-to-picture delay parameters based on the first audio information and the second audio information; and adjusting the display time of the picture frame of the target video based on the updated audio-visual time delay parameter.

In the embodiment of the application, in the video playing process, the first device can buffer the audio information corresponding to the picture frame being played, and meanwhile, the first device can also collect the audio information played by the second device through the pick-up device, so that the current audio-video delay parameter can be determined based on the buffered audio information and the collected audio information. Further, the first device can adjust the current picture display time of the video based on the current audio-video time delay parameter, and because the first device can dynamically adjust the audio-video time delay parameter according to the actual situation (namely, caching audio information and collecting the audio information), and the deviation between the generated audio-video time delay parameter and the actual audio-video time delay is smaller, the audio-video synchronization consistency and the user experience are improved.

In some embodiments, the determining and updating the audio-visual delay parameter based on the first audio information and the second audio information includes: if the similarity of the first audio waveform and the second audio waveform is greater than or equal to a preset value, matching the M first audio sampling points with the M second audio sampling points, and determining and updating the audio-video delay parameters based on a matching result; wherein one of the first audio sampling points matches one of the second audio sampling points, the first audio waveform being generated based on the first audio information, and the second audio waveform being generated based on the second audio information.

In the embodiment of the application, the first device can match the cached audio frame with the acquired audio frame based on the similarity of the audio waveform corresponding to the cached audio information and the audio waveform corresponding to the acquired audio information so as to determine the current audio-video delay parameter. Further, the first device can adjust the current picture display time of the video based on the current audio-visual time delay parameter, and due to the fact that the deviation between the audio-visual time delay and the actual audio-visual time delay is smaller, audio-visual synchronization consistency and user experience are improved.

In some embodiments, the determining and updating the audio-visual delay parameter based on the matching result includes: determining a first time stamp and a second time stamp, wherein the first time stamp is a time corresponding to a target sampling point, the target sampling point is any one of the M first audio sampling points, and the second time stamp is a time corresponding to the second audio sampling point matched with the target sampling point; and determining and updating the audio-visual delay parameter based on the difference value between the second time stamp and the first time stamp.

In the embodiment of the application, after the cached audio frame is matched with the acquired audio frame, the timestamp corresponding to the cached audio frame can be understood as the picture display time on the first device, the timestamp of the acquired audio frame matched with the cached audio frame can be understood as the time when the second device plays the audio, and then the current audio-video delay parameter can be determined based on the time difference of the two. Further, the first device can adjust the current picture display time of the video based on the current audio-visual time delay parameter, and due to the fact that the deviation between the audio-visual time delay and the actual audio-visual time delay is smaller, audio-visual synchronization consistency and user experience are improved.

In some embodiments, the method further comprises: when the first equipment and the second equipment are connected, receiving initial audio-visual delay parameters sent by the second equipment; and when the target video is initially played, adjusting the display time of the picture frame for playing the target video based on the initial audio-visual delay parameter.

In the embodiment of the application, when the first device (video display device) and the second device (audio playing device) are connected, the second device can send the initial audio-video delay parameter to the first device. Because of the hardware difference between different audio playing devices, the initial audio-visual time delay of each audio playing device can be configured and stored locally. When the audio playing device establishes connection with the video display device, the audio playing device may send the initial audio-visual delay to the video display device. When the video playing device starts to play the video, the video playing device can firstly adjust the display time of the picture based on the initial audio-video time delay so as to ensure that the picture displayed by the video playing device is consistent with the sound played by the audio playing device as much as possible.

In some embodiments, the starting the audio-visual synchronization function of the first device includes any one of the following manners: starting the audio-video synchronization function of the first equipment according to a preset period; or starting the audio-video synchronization function of the first equipment according to the user instruction.

In the embodiment of the application, the sound and picture synchronization function of the first equipment can be started according to a preset period, such as starting the sound and picture synchronization function of the first equipment every 5 minutes; the audio-video synchronization function of the first device can be started according to the operation instruction of the user, so that the first device can acquire the audio playing condition of the second device in the video playing process, and the current audio-video time delay parameter can be determined and updated. Further, the first device can adjust the current picture display time of the video based on the current audio-visual time delay parameter, and due to the fact that the deviation between the audio-visual time delay and the actual audio-visual time delay is smaller, audio-visual synchronization consistency and user experience are improved.

In some embodiments, the starting the audio-visual synchronization function of the first device includes: the first device determines that the picture frame of the target video is played through the first device, and the audio-video synchronization function is automatically started when or after the audio of the target video is played through the second device.

In the embodiment of the application, when the first device and the second device are in wireless connection, the first device determines that the first device displays the picture frame of the target video, and the second device plays the audio of the target video simultaneously or later, the audio-video synchronization function of the first device can be automatically started, so that the first device can acquire the audio playing condition of the second device in the video playing process, and the current audio-video delay parameter is determined and updated. Further, the first device can adjust the current picture display time of the video based on the current audio-visual time delay parameter, and due to the fact that the deviation between the audio-visual time delay and the actual audio-visual time delay is smaller, audio-visual synchronization consistency and user experience are improved.

A second aspect provides an electronic device, wherein a first device and a second device have established a wireless connection, the first device playing, during a process of playing a target video, a frame of the target video through the first device, and playing, by the second device, an audio of the target video, the device comprising:

The media playing module is used for caching first audio information of the target video played by the first equipment, wherein the first audio information comprises M first audio sampling points corresponding to picture frames played by the first equipment in a preset period, M is an integer larger than 0, and the first equipment starts a sound and picture synchronization function of the first equipment;

the pickup module is used for collecting second audio information of the target video played by the second equipment, and the second audio information comprises M second audio sampling points corresponding to the audio played by the second equipment in the preset period;

the media playing module is further used for determining and updating audio-video delay parameters based on the first audio information and the second audio information;

The media playing module is further used for adjusting the display time of the picture frame of the target video based on the updated audio-visual time delay parameter.

In some embodiments, the media playing module is specifically configured to: if the similarity of the first audio waveform and the second audio waveform is greater than or equal to a preset value, matching the M first audio sampling points with the M second audio sampling points, and determining and updating the audio-video delay parameters based on a matching result; wherein one of the first audio sampling points matches one of the second audio sampling points, the first audio waveform being generated based on the first audio information, and the second audio waveform being generated based on the second audio information.

In some embodiments, the media playing module is specifically configured to: determining a first time stamp and a second time stamp, wherein the first time stamp is a time corresponding to a target sampling point, the target sampling point is any one of the M first audio sampling points, and the second time stamp is a time corresponding to the second audio sampling point matched with the target sampling point; and determining and updating the audio-visual delay parameter based on the difference value between the second time stamp and the first time stamp.

In some embodiments, the apparatus further comprises: the Bluetooth connection module is used for receiving initial audio-visual delay parameters sent by the second equipment when the first equipment and the second equipment are connected; the media playing module is further configured to adjust a display time of a frame of the target video to be played based on the initial audio-visual delay parameter when the target video is initially played.

In some embodiments, the media playing module is further configured to: starting the audio-video synchronization function of the first equipment according to a preset period; or starting the audio-video synchronization function of the first equipment according to the user instruction.

In some embodiments, the media playing module is further configured to: the first device determines that the picture frame of the target video is played through the first device, and the audio-video synchronization function is automatically started when or after the audio of the target video is played through the second device.

A third aspect provides an electronic device comprising one or more memories, one or more processors, characterized in that the memories are for storing a program; the processor is configured to invoke the program to cause the electronic device to perform the method of any of the first aspects above.

A fourth aspect provides a computer storage medium storing a program which when executed by a processor implements the method of any one of the first aspects.

A fifth aspect provides a chip system comprising a processor for supporting an electronic device to implement the functions referred to in the first aspect above, e.g. generating or processing information referred to in the audio-visual delay adjustment method described above.

In one possible design, the chip system further includes a memory to hold the necessary program instructions and data for the electronic device. The chip system can be composed of chips, and can also comprise chips and other discrete devices.

A sixth aspect provides a program product characterized in that the program product comprises instructions which, when executed by an electronic device, cause the electronic device to perform the method of any of the above first aspects.

Drawings

Fig. 1A is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Fig. 1B is a software structural block diagram of an electronic device according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a video playing system according to an embodiment of the present invention.

FIG. 3A is a diagram of a user interface for intelligent screen application management according to an embodiment of the present application.

Fig. 3B is a schematic diagram of a user interface for video playing according to an embodiment of the present application.

Fig. 3C is a schematic diagram of an internal structure of an apparatus according to an embodiment of the present invention.

Fig. 3D is a schematic diagram of a video playing process according to an embodiment of the present invention.

Fig. 3E is a schematic diagram of switching playing software according to an embodiment of the present invention.

Fig. 3F is a schematic diagram of another video playing system according to an embodiment of the present invention.

Fig. 4 is a flowchart of a method for adjusting synchronization of audio and video according to an embodiment of the present application.

Fig. 5A is a schematic diagram of a target video according to an embodiment of the present invention.

Fig. 5B is a schematic diagram of audio-visual delay according to an embodiment of the present invention.

Fig. 5C is a schematic diagram of another audio-visual delay according to an embodiment of the present invention.

Fig. 5D is a schematic diagram of matching a buffered audio frame with an acquired audio frame according to an embodiment of the present application.

Fig. 6 is a flowchart of another method for adjusting synchronization of audio and video according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

The terms first, second, third and the like in the description and in the claims and in the drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

As used in this specification, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between 2 or more computers. Furthermore, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from two components interacting with one another in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).

The term "User Interface (UI)" in the description and claims of the present application and in the drawings is a media interface for interaction and information exchange between an application program or an operating system and a user, which enables conversion between an internal form of information and a form acceptable to the user. The user interface of the application program is source code written in a specific computer language such as java, extensible markup language (extensible markup language, XML) and the like, the interface source code is analyzed and rendered on the electronic equipment, and finally the interface source code is presented as content which can be identified by a user, such as a control of pictures, words, buttons and the like. Controls (controls), also known as parts (widgets), are basic elements of a user interface, typical controls being a toolbar (toolbar), menu bar (menu bar), text box (text box), button (button), scroll bar (scrollbar), picture and text. The properties and content of the controls in the interface are defined by labels or nodes, such as XML specifying the controls contained in the interface by nodes < Textview >, < ImgView >, < VideoView >, etc. One node corresponds to a control or attribute in the interface, and the node is rendered into visual content for a user after being analyzed and rendered. In addition, many applications, such as the interface of a hybrid application (hybrid application), typically include web pages. A web page, also referred to as a page, is understood to be a special control embedded in an application program interface, and is source code written in a specific computer language, such as hypertext markup language (hyper text markup language, GTML), cascading style sheets (CASCADING STYLE SHEETS, CSS), java script (JavaScript, JS), etc., and the web page source code may be loaded and displayed as user-recognizable content by a browser or web page display component similar to the browser function. The specific content contained in a web page is also defined by tags or nodes in the web page source code, such as GTML defines elements and attributes of the web page by < p >, < img >, < video >, < canvas >.

A commonly used presentation form of a user interface is a graphical user interface (graphic user interface, GUI), which refers to a graphically displayed user interface that is related to computer operations. It may be an interface element such as an icon, a window, a control, etc. displayed in a display screen of the electronic device, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.

First, an exemplary electronic device provided in the following embodiments of the present application will be described.

Referring to fig. 1A, fig. 1A is a schematic structural diagram of an electronic device according to an embodiment of the present application, wherein the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, an antenna, a wireless communication module 130, an audio module 140, a speaker 140A, a microphone 140B, a sensor module 150, a key 160, a display 170, and the like. The sensor module 150 may include a pressure sensor 150A, among other things.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a memory, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP). Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-INTEGRATED CIRCUIT, I2C) interface, an integrated circuit built-in audio (inter-INTEGRATED CIRCUIT SOUND, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a general-purpose input/output (GPIO) interface, and/or a universal serial bus (universal serial bus, USB) interface, etc.

The wireless communication module 130 may provide at least one of wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (WIRELESS FIDELITY, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation SATELLITE SYSTEM, GNSS), frequency modulation (frequency modulation, FM), near field communication (NEAR FIELD communication, NFC), infrared (IR), etc., for application on the electronic device 100. The wireless communication module 130 may be one or more devices integrating at least one communication processing module. The wireless communication module 130 receives electromagnetic waves via an antenna, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 130 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via an antenna.

The electronic device 100 implements display functions through a GPU, a display screen 170, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 170 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display 170 is used to display images, videos, and the like.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions.

The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, video data, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, and the like.

The electronic device 100 may implement audio functions through an audio module 140, a speaker 140A, a microphone 140B, an application processor, and the like. Such as music playing, recording, etc.

The audio module 140 is used to convert digital audio information into an analog audio data output and also to convert an analog audio input into a digital audio signal. The audio module 140 may also be used to encode and decode audio signals. In some embodiments, the audio module 140 may be disposed in the processor 110, or some functional modules of the audio module 140 may be disposed in the processor 110.

It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components.

The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In the embodiment of the application, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated. Referring to fig. 1B, fig. 1B is a software structure block diagram of an electronic device according to an embodiment of the present application.

The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into three layers, from top to bottom, an application layer, an Zhuoyun lines (Android runtime) and a system library, respectively, and a kernel layer.

The application layer may include a series of application packages.

As shown in FIG. 1B, the application package may include video, gallery, calendar, WLAN, bluetooth, music, etc. applications.

Android run time includes a core library and virtual machines. Android runtime is responsible for scheduling and management of the android system.

The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer runs in a virtual machine. The virtual machine executes the java file of the application program layer as a binary file. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), etc.

The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, g.264, MP3, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The kernel layer at least contains display drivers, audio drivers.

It should be understood that the software architecture diagram illustrated in the embodiments of the present application is not intended to be limiting in any way with respect to the software architecture diagram of the electronic device 100.

The following describes a video playing system architecture provided by the embodiments of the present application in conjunction with the above-mentioned electronic device 100 and the software architecture of the above-mentioned terminal.

Referring to fig. 2, fig. 2 is a schematic diagram of a video playing system according to an embodiment of the present application. The video playing system 200 may include at least two devices, two of which are illustrated in fig. 2, one of which is a first device that may be understood as a video display device (e.g., the smart screen 201 in fig. 2), and the other of which is a second device that may be understood as an audio playing device (e.g., the speaker 202 in fig. 2). The second device can be used as an audio playing peripheral of the first device to be connected with the first device, such as connection is established through Bluetooth, when video is played on the first device, video pictures can be displayed on the first device, audio in the video can be played through the second device, and therefore video watching experience of a user can be improved. However, after the first device is connected with the second device, the audio stream is subjected to audio compression and encoding before transmission, and then is transmitted through a bluetooth protocol, and finally, audio data decoding, playing and other multi-process processing are performed on the audio device, so that the phenomenon of asynchronous audio and video is caused, and the experience of a user is reduced. Therefore, the application provides a sound and picture synchronization adjustment method, which can promote consistency of sound and picture synchronization when using peripheral equipment (such as a bluetooth sound box) to play audio, and the sound and picture synchronization adjustment method provided by the application will be described in detail later, and is not repeated here.

It should be noted that, the audio-video synchronization adjustment method provided by the embodiment of the present application may be executed by the first device.

For example, the first device in the embodiment of the present application may be a television (such as a smart screen), a mobile phone, a tablet computer, a smart watch, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), an augmented reality (Augmented reality, AR) \virtual reality (VR) device, or the like, and in the embodiment of the present application, the terminal may be a terminal having a screen display function. It should be noted that the first device may include, but is not limited to, all or part of the structure and functions of the electronic device 100 described above.

By way of example, the second device in the embodiments of the present application may be a speaker, a wireless headset, a watch, a mobile phone, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), an augmented reality (Augmented reality, AR) \virtual reality (VR) device, and the like.

It should also be noted that the devices in the video playback system 200 may establish a connection by wireless means, and the wireless connection technology may include, but is not limited to, bluetooth (BT), wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (WIRELESS FIDELITY, wi-Fi) network), near field communication technology (NEAR FIELD communication, NFC), infrared technology (IR, etc. in some embodiments, the first device establishes a connection with the second device by bluetooth.

It should be understood that the video playing system 200 in fig. 2 is merely some exemplary implementations provided by the embodiments of the present invention, and the video playing system in the embodiments of the present invention includes, but is not limited to, the above implementations.

An application scenario and a User Interface (UI) embodiment under the application scenario according to the embodiments of the present application are described below.

Scene: the intelligent screen is matched with an application scene of the loudspeaker box to play the video. (Smart screen can be understood as the first device in FIG. 2 above and speaker can be understood as the second device in FIG. 2 above)

Referring to fig. 3A, fig. 3A is a schematic diagram of a user interface for intelligent screen application management according to an embodiment of the present application. The user interface 300 may be understood as an interface for application management of the smart screen 201. The user interface 300 includes the Hua Chen video, the Aiqi art and the you cool video. Without limitation, user interface 300 may also include other applications such as music-like software that may be QQ music; and also can be a tool auxiliary application of a calculator, a calendar, a setting and the like which are self-contained in the intelligent screen 201 system.

Referring to fig. 3B, fig. 3B is a schematic diagram of a user interface for video playing according to an embodiment of the present application. The user interface 301 may be understood as a video playback interface after the smart screen 201 enters the video playback application. The smart screen 201 may receive an operation (e.g., a touch operation for the video of the above-described user interface 300 in fig. 3A) by a user, and in response to the operation, enter the video of the above-described user interface 301 is displayed on a display screen (which may be understood as the display screen 170 in fig. 1A). The user interface 301 includes a display area 3011 and a display area 3012. Wherein the display area 3011 may be used to display video pictures and the display area 3012 may be used to display information about video related content such as video titles, selections, profiles, etc. Not limited to this, the user interface 301 may include only the display area 3011, or may include more display areas, which is not particularly limited herein.

In this scenario, the user may first establish a connection with the speaker 202 (e.g., via bluetooth) while playing the video using the smart screen 201. At the same time, the user may select video playing software, such as selecting Hua as video, in the user interface 300 of the smart screen 201. Further, the smart screen 201 displays the user interface 301 through the display 170, and when playing the video, the smart screen may display the video frame through the display area 3011 in the user interface 301, and play the audio in the video through the peripheral speaker 202. Because the smart screen 201 will compress and encode the audio stream before transmitting the audio to the speaker 202, and then transmit the audio stream through the bluetooth protocol, and finally perform multiple processes such as audio data decoding and playing on the audio device, thereby causing the phenomenon of asynchronous audio and video.

In some embodiments, when the audio output peripheral (i.e. the sound box 202) is connected with bluetooth, a preset audio-video delay parameter may be reported to the display device (i.e. the smart screen 201), so that the smart screen 201 may adjust the picture display time of the video (which may be understood as the start display time of each picture frame) based on the audio-video delay parameter, thereby improving the audio-video synchronization consistency. However, due to the instability of bluetooth protocol transmission and the continuous change of audio decoding time delay in the long-time playing process, the deviation between the actual audio-video time delay and the audio-video time delay parameter reported initially is caused, and the problem of asynchronous picture and sound is caused, so that the user experience is poor.

Aiming at the technical problems of the embodiments, in the embodiments of the present application, a scheme for dynamically adjusting the audio-video time delay is provided, and the audio played by the sound box 202 can be collected in real time by using the microphone of the source device (i.e. the smart screen 201), so as to dynamically adjust the audio-video time delay, thereby improving the user experience.

For example, when the user selects video playing software (e.g., hua as video) on the user interface 300, the display 170 of the smart screen 201 displays the user interface 301, and can play video pictures through the display area 3011 in the user interface 301, and audio in the video through the speaker 202.

Referring to fig. 3C, fig. 3C is a schematic diagram of an internal structure of an apparatus according to an embodiment of the present invention, in which a smart screen 201 and a sound box 202 are taken as an example for illustration, the smart screen 201 may include a media playing module, a sound pickup module, a bluetooth data sending module, and a picture display module; the sound box 202 may include a bluetooth data receiving module and an audio playing module.

In the video playing process, the smart screen 201 can send audio data in the video to the sound box 202 through the bluetooth data sending module, and meanwhile, the sound box 202 can receive the audio data through the bluetooth data receiving module and decode and play the audio data through the audio playing module. The media playing module in the smart screen 201 may periodically buffer audio frames based on the audio/video source, and may periodically collect audio played by the speaker 202 through the pickup module to obtain collected audio frames. Further, a delay processing module in the media playing module may periodically match the buffered audio frames with the acquired audio frames and determine the target sound-to-picture delay parameter based on the matched audio frames. The picture display module in the smart screen 201 may then adjust the picture display time of the video based on the updated audio-visual delay parameter (i.e., the target audio-visual delay parameter). By using the audio and video synchronization adjustment method provided by the embodiment of the application, the audio played by the sound box 202 can be periodically collected and matched with the audio cached by the intelligent screen 201 to determine the target audio and video delay parameter. Compared with the embodiment, the target sound and picture delay parameter can be dynamically adjusted according to actual conditions, so that deviation between the actual sound and picture delay and the target sound and picture delay is reduced, and sound and picture synchronization consistency and user experience are improved.

In some embodiments, smart screen 201 plays video in an application (e.g., plays video in what is known as video). When the video starts to be played, the sound box 202 may first send an initial audio-video delay parameter to the smart screen 201, and the smart screen 201 may first adjust the frame display time of the video based on the initial audio-video delay parameter. During video playing, the smart screen 201 may collect audio frames of the speaker 202 through the microphone and match the buffered audio frames to determine the target audio and video delay parameters. Further, the smart screen 201 can adjust the frame display time of the video based on the updated audio-visual delay parameter (i.e. the target audio-visual delay parameter), and the deviation between the target audio-visual delay and the actual audio-visual delay is smaller, so that the audio-visual synchronization consistency is improved. It should be noted that, the target audio and video delay parameter may be updated multiple times during the operation of the smart screen 201.

For example, as shown in fig. 3D, fig. 3D is a schematic diagram of a video playing process provided by an embodiment of the present invention, where (a) in fig. 3D can be understood as playing video in a sub-video based on an initial delay parameter, and illustratively, when playing for one minute and fifty seconds (which may be set according to needs), the smart screen 201 may collect an audio frame of the sound box 202 through the microphone and match with an audio frame buffered by itself, so as to determine a target audio-video delay parameter. Further, as shown in (b) of fig. 3D, when the video is played for one minute, the time of displaying the video can be adjusted based on the target audio-video delay parameter acquired before, so that the user can synchronize the picture displayed by the intelligent screen 201 with the sound played by the sound box 202 during the video watching process, thereby improving the video watching experience of the user.

In some embodiments, smart screen 201 switches video playback software (e.g., from Hua to Aiqi video playback). Referring to fig. 3E, fig. 3E is a schematic diagram of a switching playing software provided by an embodiment of the present invention, and (a) in fig. 3E is an interface for application management of the smart screen 201 (i.e. the user interface 300. Detailed description of the user interface 300 can be referred to above in fig. 3A, and details are not repeated here), a user can first exit the user interface 301 and reenter the user interface 300, and then the user can select new playing software, such as selecting the aiqi playing software.

Referring to fig. 3E (b), the user interface 302 may be understood as a video playing interface after the smart screen 201 enters a new video playing application. The smart screen 201 may receive an operation (e.g., a touch operation for the curiosity) of the user interface 300 of fig. 3E by a user, and in response to the operation, enter the curiosity, display the user interface 302 on a display screen (which may be understood as the display screen 170 of fig. 1A described above). The user interface 302 includes a display area 3021 and a display area 3022. Among other things, the display area 3021 may be used to display video pictures and the display area 3022 may be used to display information about video related content such as video titles, selections, profiles, etc. Not limited to this, the user interface 302 may include only the display area 3021 or more display areas, which are not particularly limited herein.

Because the responses of different application software to the audio and video delay parameters will be different, when the application software is switched, the intelligent screen 201 may adjust the display time of the video based on the latest updated audio and video delay parameter (such as the last updated audio and video delay parameter in the process of playing the video for the time of bloom), or may adjust the display time of the video based on the initial audio and video delay parameter reported by the loudspeaker 202 when the bluetooth is connected, which is not limited in detail herein. Later, when the new application software plays the video, the microphone of the intelligent screen 201 periodically collects the audio frame of the sound box 202 and matches the audio frame with the cached audio frame so as to determine the target audio-video delay parameter under the application software, and the target audio-video delay parameter can be updated regularly, so that the picture displayed by the intelligent screen 201 is synchronous with the sound played by the sound box 202 when different application software plays the video, and the user viewing experience is improved.

In some embodiments, please refer to fig. 3F, fig. 3F is a schematic diagram of another video playing system according to an embodiment of the present invention, in which the smart screen 201 may disconnect from the speaker 202 and establish a connection with other audio playing devices 203 (e.g. establish a connection through bluetooth). When the smart screen 201 is connected with other audio playing devices 203, the audio playing devices 203 may also send an initial audio-video delay parameter to the smart screen 201, and the smart screen 201 may adjust the video display time based on the initial audio-video delay parameter. During video playback, the smart screen 201 may capture audio frames of the audio playback device 203 via a microphone and match the buffered audio frames to determine target audio and video delay parameters. Further, the smart screen 201 may adjust the display time of the video (which may be understood as the display time of each frame, but not including the display time of the frame) based on the updated audio and video delay parameter (i.e. the target audio and video delay parameter), so that the audio and video synchronization consistency is improved due to the smaller deviation between the target audio and video delay and the actual audio and video delay.

It should be understood that the above application scenario and the user interface embodiments under the application scenario are only some exemplary implementations provided by the embodiments of the present invention, and the application scenario and the user interface embodiments under the application scenario in the embodiments of the present invention include, but are not limited to, the above implementation manners.

Based on the application scenario provided in fig. 3A to 3F and the UI embodiment under the application scenario, the method for adjusting synchronization of audio and video provided in the embodiment of the present application is described next, and the method may be applied to the electronic device described in fig. 1A.

Referring to fig. 4, fig. 4 is a flowchart illustrating a method for adjusting synchronization of audio and video according to an embodiment of the present application. The following further describes the audio-video synchronization adjustment method provided by the embodiment of the present application in detail with reference to the system architecture in fig. 2 and fig. 4. It should be noted that the first device in the video playing system 200 may be the electronic device described in fig. 1A, and may be used as an execution main body of the audio-video synchronization adjustment method provided in the embodiment of the present application. The following description will be made with the first device as an execution subject. The method may include the following steps S401-S409, as shown in fig. 4, described in detail below:

Step S401: the first device establishes a connection with the second device.

In some embodiments, the first device may establish a bluetooth connection with the second device through a bluetooth connection module.

Specifically, the first device may be a video display device, such as the smart screen 201 described above; the second device may be an audio playback device such as the speaker 202 described above. The first device and the second device may also establish a connection by other wireless means, and the wireless connection technology may include, but is not limited to, bluetooth (BT), wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (WIRELESS FIDELITY, wi-Fi) network), global navigation satellite system (global navigation SATELLITE SYSTEM, GNSS), frequency modulation (frequency modulation, FM), near field communication (NEAR FIELD communication, NFC), infrared technology (IR, etc.

In some embodiments, the second device may send the initial audio-visual delay parameter to the first device (video display device) when the first device establishes a connection with the second device (audio playback device). Because of the hardware difference between different audio playing devices, the initial audio-visual time delay of each audio playing device can be configured and stored locally. When the audio playing device and the video display device are connected, the audio playing device can send the initial audio-video delay parameter to the video display device so as to ensure that when the video starts to play, the picture displayed by the video playing device is kept consistent with the sound played by the audio playing device as much as possible.

It should be noted that, step S401 may be performed by the application layer in fig. 1B.

Step S402: the first device starts playing of the video source.

Specifically, a target video (i.e., a video source) to be played may be selected on the first device, and video playback may be performed by the media playback module, where the target video may be composed of a picture frame and audio, and the picture frame corresponds to the audio. For example, as shown in fig. 5A, fig. 5A is a schematic diagram of a target video according to an embodiment of the present invention, where the target video may be composed of a plurality of picture frames (such as picture frame 1, picture frame 2, … …, picture frame m, etc., and only a part of the picture frames of the target video are shown in the figure) and audio, and the audio may include a plurality of audio frames, where the audio frames may be understood as sampling points on the audio, such as audio frame 1, audio frame 2, … …, audio frame m, etc.

In some embodiments, please refer to fig. 5B, fig. 5B is a schematic diagram of audio-video delay provided in an embodiment of the present invention, in which after a first device determines a target video, the first device may send audio data corresponding to the target video to a second device; when the first device starts playing the target video, the first device may display the frames of the target video on the display screen 170 through the media playing module, for example, sequentially display the frames of the target video (for example, the first device displays the frame 1, the frame 2, … …, the frame m, etc. one by one on the display screen 170 through the media playing module), and may decode and play the audio of the target video through the audio playing module of the second device. However, since the audio data will delay during the encoding and transmitting process, and the second device needs time to perform audio decoding on the audio data, the picture displayed by the first device and the sound played by the second device cannot be synchronized, that is, s '-s is not equal to 0, where s' may be represented as the audio playing time of the second device and s may be represented as the time when the picture is displayed by the first device. Therefore, when the first device and the second device are connected, the second device will first send an initial audio-video delay parameter (Δs _{Initial initiation}, which can be expressed as a preset delay between the audio played by the second device and the picture displayed by the first device) to the first device, and then the first device can delay Δs _{Initial initiation} seconds and display the picture frame on the display screen 170, so as to improve the synchronization consistency of the audio-video, and realize the synchronous playing of the picture and the sound.

It should be noted that, step S402 may be performed by the application layer in fig. 1B.

Step S403: the first device turns on the periodic audio buffering function.

Specifically, in the process of playing the target video, as the transmission instability of the Bluetooth protocol and the time delay of audio decoding are continuously changed in the long-time playing process, the deviation between the actual audio-video time delay and the initial audio-video time delay parameter is caused, and the problem of asynchronous picture and sound is caused. In order to solve the technical problem, please refer to fig. 5C, fig. 5C is another audio-video delay schematic diagram provided by the embodiment of the present invention, in which the first device may buffer, through the media playing module, an audio frame (which may be understood as a plurality of buffered audio sampling points obtained by buffering according to a preset sampling frequency) corresponding to a picture frame played in a preset period (for example, 1 second), and record a timestamp (which may be referred to as a buffered timestamp t, t may represent a time corresponding to a buffered audio frame) corresponding to each frame, for example, m buffered audio frames respectively correspond to t1, t2, … …, tm, etc., and one buffered audio frame corresponds to one buffered timestamp.

It should be noted that the buffering may be performed periodically, for example, every 5 minutes. Meanwhile, to relieve the storage pressure, audio frames within a preset time, such as audio frames within 1s, may be buffered each time.

In some embodiments, the periodic audio buffering function and the periodic audio collection function may be turned on after the first device turns on the audio-visual synchronization function.

In some embodiments, the sound and picture synchronization function of the first device may be started according to a preset period, for example, the sound and picture synchronization function of the first device is started every 5 minutes; the audio-video synchronization function of the first device can also be started according to the operation instruction of the user; when the first device and the second device are connected wirelessly, the first device determines that the first device displays the frame of the target video, and the second device plays the audio of the target video simultaneously or later, the audio-video synchronization function of the first device can be automatically started, so that the first device can acquire the audio playing condition of the second device in the video playing process, and the current audio-video delay parameter is determined and updated. It should be noted that, step S403 may be performed by the application layer in fig. 1B.

Step S404: the first device turns on the periodic audio acquisition function.

Specifically, in order to solve the above technical problem, when the first device performs audio buffering, the first device also needs to periodically collect, by the pickup module, the audio played by the second device within a preset period (for example, 1 second), and may collect the audio according to a preset sampling frequency that is the same as the buffered audio, so as to obtain a plurality of collected audio frames, and further determine and record a timestamp corresponding to each frame (may be referred to as a collection timestamp t ', t ' may represent a time corresponding to the collected audio frame), for example, m collected audio frames in fig. 5C correspond to t '1, t '2, … …, t'm, and so on, and one collected audio frame corresponds to one collection timestamp.

It should be noted that, to ensure that the collected audio frame can be matched with the buffered audio frame, the audio frame may be collected in advance, for example, the audio frame may be collected in the first 2s of the buffered audio frame, and the audio frame may be collected in the second 2s of the buffered audio frame.

It should be noted that, step S404 may be performed by the application layer in fig. 1B.

Step S405: the pickup module in the first device may send the collected audio to the media play module.

Specifically, step S405 may be performed by the application layer in fig. 1B described above.

Step S406: the first device performs audio matching on the collected audio and the cached audio.

Specifically, the buffered audio may be a buffered audio frame buffered according to a preset sampling frequency within a preset time, and the collected audio may be a collected audio frame collected according to the same preset sampling frequency. The buffered audio frames include buffered audio information (e.g., buffered audio waveforms); the captured audio frames include captured audio information (e.g., captured audio waveforms). It is assumed that the first device caches 1 second of audio (i.e. m cached audio frames within 1 second are available, m being an integer greater than 0), while to ensure that an acquisition audio frame matching the cached audio frame can be acquired, the first device may acquire k seconds of audio (i.e. k x m acquisition audio frames within k seconds are available, and m acquisition audio frames may be included within 1 second, k being an integer greater than 1). Further, the buffered audio waveforms corresponding to the m buffered audio frames may be compared with the acquired audio waveforms corresponding to the m acquired audio frames acquired in the first second, and if the similarity of the audio waveforms is greater than or equal to a preset value, the m buffered audio frames may be matched with the m acquired audio frames in the first second, and one buffered audio frame corresponds to one acquired audio frame. If the similarity of the audio waveforms is smaller than a preset value, comparing the acquired audio waveforms corresponding to m acquired audio frames acquired in the next second with the buffered audio waveforms corresponding to m buffered audio frames until the m buffered audio frames are matched with the acquired audio frames, and stopping matching. For example, as shown in fig. 5D, fig. 5D is a schematic diagram of matching a buffered audio frame with an acquired audio frame, where in the diagram, audio waveforms corresponding to m acquired audio frames may be compared with audio waveforms corresponding to m buffered audio frames first, if the similarity of the audio waveforms is greater than a preset value, then m acquired audio frames are matched with m buffered audio frames, and one acquired audio frame is matched with one buffered audio frame, for example, a buffered audio frame at time t1 may be matched with an acquired audio frame at time t '1, a buffered audio frame at time t2 may be matched with an acquired audio frame at time t '2, and … … may be used to match a buffered audio frame at time tm with an acquired audio frame at time t'm. It should be emphasized that in the embodiment of the present application, t is denoted as a timestamp of a buffered audio frame (e.g. t1 denotes a time corresponding to a first buffered audio frame, t2 denotes a time corresponding to a second buffered audio frame, etc.), and t ' is denoted as a timestamp of an acquired audio frame (e.g. t1' denotes a time corresponding to a first acquired audio frame, t2' denotes a time corresponding to a second acquired audio frame, etc.).

It should be noted that, step S406 may be performed by the application layer in fig. 1B.

Step S407: and the first equipment performs time delay calculation and determines the time delay parameter of the target sound and picture.

Specifically, the target audio-video delay parameter may include a duration of time for the first device to compress the audio data, a duration of time for the audio data to be transmitted from the display device (i.e., the first device) to the audio playing device (i.e., the second device), and a duration of time for the audio playing device to perform audio decoding. The first device may perform delay calculation by using the media playing module based on the audio frames buffered in step S404 and the audio frames collected in step S405 to determine the target audio-video delay parameter.

In some embodiments, a difference between a timestamp corresponding to an acquired audio frame and a timestamp of a buffered audio frame that matches the acquired audio frame is used as the target sound-to-picture delay parameter.

In some embodiments, a difference between a timestamp corresponding to the captured audio frame and a timestamp of the buffered audio frame that matches the captured audio frame and a time delay sum of the timestamp of the captured audio frame determined by the first device is used as the target audio delay parameter.

Specifically, after the first device invokes the microphone to collect the audio frame through the pickup module, the operation of time stamping is required to be performed on the collected audio frame, so that when the time delay parameter of the target sound and picture is determined, the interference of time delay of time stamping on the upper layer of the microphone can be eliminated.

Assuming that the timestamp of the target buffered audio frame is t1, the timestamp of the acquired audio frame matched with the target buffered audio frame is t1', where the timestamp t1 of the target buffered audio frame may also be understood as the display time of the picture frame (may be understood as the time when the picture frame starts to be displayed), the time of audio play of the second device may be the time delay from the t1' -microphone hardware to the upper layer with the timestamp, and further Δt _{Target object} =t1 ' -the time delay from the microphone hardware to the upper layer with the timestamp-t 1, and Δt _{Target object} may be expressed as the time delay between the audio currently played by the second device and the picture displayed by the first device.

It should be noted that, the time delay for the first device to determine the timestamp of the collected audio frame may be determined according to the first device (e.g., determined according to the processing speed of the smart screen 201), and is generally a value that does not change.

It should be noted that, step S407 may be performed by the application layer in fig. 1B.

Step S408: the media playing module of the first device sends the target audio-video delay parameter to the picture display module.

Specifically, step S408 may be performed by the application layer in fig. 1B described above.

Step S409: the first device may display a picture on the display screen via the picture display module based on the target audio-visual delay parameter.

Specifically, the media playing module of the first device may update the target audio-visual delay parameter to the picture display module of the first device, so that the picture display module may display the current picture according to the calculated latest audio-visual delay parameter. For example, the time when the first device originally displays the frame is n, and the time when the second device plays the corresponding audio is also n in theory, but because the audio data is delayed in the encoding and transmitting processes, the second device also needs time to perform audio decoding on the audio data, so that the time when the second device actually plays the corresponding audio is n'. By adopting the audio-video synchronization adjustment method provided by the embodiment of the application, the audio-video time delay parameter (Deltat _{Target object}) between the current first device and the second device can be calculated, and the time delay can be expressed as the delay between the audio played by the current second device and the picture displayed by the first device, namely n' -n. Further, the first device can adjust the display time of the picture frame of the video based on the audio-video time delay parameter, for example, the picture frame of the video can be displayed after being delayed by Δt _{Target object} seconds, namely, the picture frame originally displayed at the time n is delayed to the time n+Δt _{Target object} for display, so that the audio-video synchronization consistency is improved, and the user experience is improved.

It should be noted that, step S409 may be performed by the application layer in fig. 1B.

In the embodiment of the application, in the video playing process, the first device can periodically buffer the audio frames based on the audio and video source, and can periodically collect the audio played by the second device through the microphone so as to obtain the collected audio frames. Further, the buffered audio frames and the captured audio frames may be periodically matched and a target audio-to-video delay parameter determined based on the matched audio frames. The picture display module in the first device may then periodically adjust the picture display instant of the video based on the updated audio-visual delay parameter (i.e., the target audio-visual delay parameter).

In summary, by using the audio-video synchronization adjustment method provided by the embodiment of the application, the audio played by the second device can be periodically collected and matched with the audio cached by the first device, so as to determine the target audio-video delay parameter. Compared with some embodiments, the target sound-painting delay parameter can be dynamically adjusted according to actual conditions, so that deviation between actual sound-painting delay and target sound-painting delay is reduced, and sound-painting synchronization consistency is improved.

Referring to fig. 6, fig. 6 is a flowchart of another audio-video synchronization adjustment method provided by the embodiment of the present application, where the method is applied to a first device, the first device is connected to a second device, and in the process of playing a target video, the first device plays a frame of the target video through the first device, and plays an audio of the target video through the second device, and the method includes steps S501-S504. The first device may be the first device (i.e. the video display device) in fig. 4, and the second device may be the second device (i.e. the audio playing device) in fig. 4, which are described in detail below:

step S501: the first device caches first audio information of the target video played by the first device.

Specifically, the first audio information includes M first audio sampling points corresponding to a picture frame played by the first device in a preset period, M is an integer greater than 0, and the first device has started a sound-picture synchronization function of the first device.

The specific embodiment of the first device buffering the audio may be referred to the description related to step S403, which is not repeated here.

It should be noted that, before executing step S501, the first device may establish a connection with the second device, and the detailed description of the foregoing step S401 may be referred to, which is not repeated herein.

It should be noted that, before executing step S501, the playing of the target video may be started on the first device, and the detailed description of step S402 may be referred to herein, which is not repeated herein.

Step S502: the first device collects second audio information of the target video played by the second device.

Specifically, the second audio information includes M second audio sampling points corresponding to audio played by the second device in the preset period. The second audio sample point may be understood as the acquisition audio frame described above.

The specific embodiment of the first device for capturing audio may be referred to the above description of step S404, which is not repeated here.

Step S503: the first device determines and updates a sound-to-picture delay parameter based on the first audio information and the second audio information.

In some embodiments, the first device determines and updates a sound-to-picture delay parameter based on the first audio information and the second audio information, including: if the similarity of the first audio waveform and the second audio waveform is greater than or equal to a preset value, the first equipment matches the M first audio sampling points with the M second audio sampling points, and determines and updates the audio-video delay parameters based on the matching result; wherein one of the first audio sampling points matches one of the second audio sampling points, the first audio waveform being generated based on the first audio information, and the second audio waveform being generated based on the second audio information.

For a specific embodiment of determining the audio-visual time parameter by the first device based on the first audio information and the second audio information, reference may be made to the description of step S406 and step S407, which are not repeated herein.

In some embodiments, the first device determines and updates the audio-visual delay parameter based on the matching result, including: the first device determines a first time stamp and a second time stamp, wherein the first time stamp is a time corresponding to a target sampling point, the target sampling point is any one of the M first audio sampling points, and the second time stamp is a time corresponding to the second audio sampling point matched with the target sampling point; the first device determines and updates the audio-visual delay parameter based on the difference between the second timestamp and the first timestamp.

For a specific embodiment of determining the audio/video time parameter by the first device, refer to the description of step S407, which is not repeated here.

Step S504: and the first equipment adjusts the display time of the picture frame of the target video based on the updated audio-visual time delay parameter.

For a specific embodiment of adjusting the display time of the picture by the first device based on the audio-visual delay parameter, reference may be made to the description of step S408 and step S409, which are not described herein.

In some embodiments, the method further comprises: when the first equipment and the second equipment are connected, the first equipment receives initial audio-visual delay parameters sent by the second equipment; and when the target video is initially played, the first device adjusts the display time of the picture frame for playing the target video based on the initial audio-visual delay parameter.

The specific embodiment of the first device for adjusting the display time of the picture based on the initial audio-visual delay parameter can be referred to the description of step S402, which is not repeated here.

In some embodiments, the starting the audio-visual synchronization function of the first device includes any one of the following manners: starting the audio-video synchronization function of the first equipment according to a preset period; or, starting the audio-video synchronization function of the first equipment according to a user instruction; or the first device determines that the first device plays the picture frame of the target video, and the second device plays the audio of the target video at the same time or after the first device plays the audio of the target video, and the audio-video synchronization function is automatically started.

For the specific embodiment of the first device for starting the sound and picture synchronization function, refer to the description related to fig. 3D, fig. 3E, and fig. 3F, and the description related to step S403, which are not repeated here.

The foregoing details the method according to the embodiment of the present invention, and the following provides relevant devices according to the embodiment of the present invention.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where a first device 60 is connected to a second device, and in a process of playing a target video, the first device 60 plays a frame of the target video through the first device 60, and plays an audio of the target video through the second device. The first device 60 may include a media playing module 601, a pickup module 602, and a bluetooth connection module 603. Wherein the detailed description of each module is as follows.

The media playing module 601 is configured to cache first audio information of the first device 60 for playing the target video, where the first audio information includes M first audio sampling points corresponding to a frame of a picture played by the first device 60 in a preset period, M is an integer greater than 0, and the first device 60 has started a sound-and-picture synchronization function of the first device 60;

The pickup module 602 is configured to collect second audio information of the target video played by the second device, where the second audio information includes M second audio sampling points corresponding to audio played by the second device in the preset period;

the media playing module 601 is further configured to determine and update a sound-to-picture delay parameter based on the first audio information and the second audio information;

The media playing module 601 is further configured to adjust a display time of a frame of the target video based on the updated audio-visual delay parameter.

In some embodiments, the media playing module 601 is specifically configured to: if the similarity of the first audio waveform and the second audio waveform is greater than or equal to a preset value, matching the M first audio sampling points with the M second audio sampling points, and determining and updating the audio-video delay parameters based on a matching result; wherein one of the first audio sampling points matches one of the second audio sampling points, the first audio waveform being generated based on the first audio information, and the second audio waveform being generated based on the second audio information.

In some embodiments, the media playing module 601 is specifically configured to: determining a first time stamp and a second time stamp, wherein the first time stamp is a time corresponding to a target sampling point, the target sampling point is any one of the M first audio sampling points, and the second time stamp is a time corresponding to the second audio sampling point matched with the target sampling point; and determining and updating the audio-visual delay parameter based on the difference value between the second time stamp and the first time stamp.

In some embodiments, the apparatus further comprises: a bluetooth connection module 603, configured to receive an initial audio-video delay parameter sent by the second device when the first device 60 establishes a connection with the second device; the media playing module 601 is further configured to adjust a display time of a frame of the target video to be played based on the initial audio-visual delay parameter when the target video is initially played.

In some embodiments, the media playing module 601 is further configured to: starting the audio-visual synchronization function of the first device 60 according to a preset period; or, starting the audio-visual synchronization function of the first device 60 according to a user instruction; or, the first device 60 determines that the frame of the target video is played through the first device 60, and the audio-video synchronization function is automatically started at the same time or after the audio of the target video is played through the second device.

It should be noted that, the functions of each functional unit in the first device 60 described in the embodiment of the present invention may be referred to the related descriptions of step S501 to step S504 executed by the first device in the embodiment of the method described in fig. 6, and are not repeated here.

The application provides a computer storage medium, which is characterized in that the computer storage medium stores a program which realizes any one of the above audio-visual delay adjustment methods when being executed by a processor.

The embodiment of the application provides electronic equipment, which comprises a processor, wherein the processor is configured to support the electronic equipment to realize the corresponding functions in any sound and picture time delay adjustment method. The electronic device may also include a memory for coupling with the processor that holds the program instructions and data necessary for the electronic device. The electronic device may also include a communication interface for the electronic device to communicate with other devices or communication networks.

The application provides a chip system which comprises a processor and is used for supporting an electronic device to realize the related functions, such as generating or processing information related to the audio-visual time delay adjustment method. In one possible design, the chip system further includes a memory to hold the necessary program instructions and data for the electronic device. The chip system can be composed of chips, and can also comprise chips and other discrete devices.

The present application provides a program product characterized in that the program comprises instructions which, when executed by an electronic device, cause the electronic device to perform a sound and picture delay adjustment method as described above.

The application provides an electronic device comprising one or more memories and one or more processors, wherein the memories are used for storing programs; the processor is used for calling the program to enable the electronic equipment to execute the sound and picture time delay adjustment method.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc., in particular may be a processor in the computer device) to perform all or part of the steps of the above-mentioned method of the various embodiments of the present application. Wherein the aforementioned storage medium may comprise: various media capable of storing program codes, such as a U disk, a removable hard disk, a magnetic disk, a compact disk, a Read-Only Memory (abbreviated as ROM), or a random access Memory (Random Access Memory, abbreviated as RAM), are provided.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. The audio-video synchronization adjustment method is characterized by being applied to first equipment, wherein the first equipment and second equipment are in wireless connection, the first equipment plays a picture frame of a target video through the first equipment and plays audio of the target video through the second equipment in the process of playing the target video, and the method comprises the following steps:

Caching first audio information of the target video played by the first device, wherein the first audio information comprises M first audio sampling points corresponding to picture frames played by the first device in a preset period, M is an integer greater than 0, and the first device starts a sound and picture synchronization function of the first device;

Collecting second audio information of the target video played by the second equipment, wherein the second audio information comprises M second audio sampling points corresponding to audio played by the second equipment in the preset period;

Determining and updating sound-to-picture delay parameters based on the first audio information and the second audio information;

and adjusting the display time of the picture frame of the target video based on the updated audio-visual time delay parameter.

2. The method of claim 1, wherein the determining and updating the sound-to-picture delay parameter based on the first audio information and the second audio information comprises:

If the similarity of the first audio waveform and the second audio waveform is greater than or equal to a preset value, matching the M first audio sampling points with the M second audio sampling points, and determining and updating the audio-video delay parameters based on a matching result; wherein one of the first audio sampling points matches one of the second audio sampling points, the first audio waveform being generated based on the first audio information, and the second audio waveform being generated based on the second audio information.

3. The method of claim 2, wherein the determining and updating the sound-to-picture delay parameter based on the matching result comprises:

Determining a first time stamp and a second time stamp, wherein the first time stamp is a time corresponding to a target sampling point, the target sampling point is any one of the M first audio sampling points, and the second time stamp is a time corresponding to the second audio sampling point matched with the target sampling point;

and determining and updating the audio-visual delay parameter based on the difference value between the second time stamp and the first time stamp.

4. A method according to any one of claims 1-3, wherein the method further comprises:

When the first equipment and the second equipment are connected, receiving initial audio-visual delay parameters sent by the second equipment;

and when the target video is initially played, adjusting the display time of the picture frame for playing the target video based on the initial audio-visual delay parameter.

5. The method of any one of claims 1-4, wherein the turning on the sound and picture synchronization function of the first device includes:

and starting the audio-video synchronization function of the first equipment according to a preset period.

6. The method of any one of claims 1-4, wherein the turning on the sound and picture synchronization function of the first device includes:

and starting the audio-video synchronization function of the first equipment according to user operation.

7. The method of any one of claims 1-4, wherein the turning on the sound and picture synchronization function of the first device includes:

The first device determines that the picture frame of the target video is played through the first device, and the audio-video synchronization function is automatically started when or after the audio of the target video is played through the second device.

8. An electronic device comprising a display screen, a memory, one or more processors, wherein the memory is configured to store a program; the processor is configured to invoke the program to cause the electronic device to perform the method of any of claims 1-7.

9. A computer storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the method of any one of claims 1 to 7.

10. A program product, characterized in that the program product, when run on an electronic device, causes the electronic device to perform the method of any of claims 1 to 7.