CN113965662A - Audio and video output device and audio and video delay calibration method and related components thereof - Google Patents

Audio and video output device and audio and video delay calibration method and related components thereof Download PDF

Info

Publication number
CN113965662A
CN113965662A CN202111249455.3A CN202111249455A CN113965662A CN 113965662 A CN113965662 A CN 113965662A CN 202111249455 A CN202111249455 A CN 202111249455A CN 113965662 A CN113965662 A CN 113965662A
Authority
CN
China
Prior art keywords
video
audio
signal
delay
audio signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111249455.3A
Other languages
Chinese (zh)
Inventor
张�杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Goertek Techology Co Ltd
Original Assignee
Goertek Techology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Goertek Techology Co Ltd filed Critical Goertek Techology Co Ltd
Priority to CN202111249455.3A priority Critical patent/CN113965662A/en
Publication of CN113965662A publication Critical patent/CN113965662A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/04Synchronising
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • G10L21/055Time compression or expansion for synchronising with other signals, e.g. video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/04Synchronising
    • H04N5/06Generation of synchronising signals

Abstract

The application discloses an audio and video output device, an audio and video delay calibration method and related components thereof, wherein the audio and video delay calibration method comprises the following steps: generating first video signal pulses and first audio signal pulses with aligned rising edges; sending the first video signal pulse to a video encoding and decoding device to display the invisible light signal corresponding to the first video signal pulse; sending the first audio signal pulse to an audio signal processing device to play the inaudible frequency band sound signal corresponding to the first audio signal pulse; determining the time delay between the inaudible frequency band sound signal and the invisible light signal based on the video encoding and decoding device and the audio signal processing device; and completing audio and video delay calibration based on the delay time. By applying the scheme of the application, the audio and video delay can be effectively eliminated, the phenomenon that the video picture and the audio are not synchronous is avoided, and the use experience of a user is not influenced.

Description

Audio and video output device and audio and video delay calibration method and related components thereof
Technical Field
The invention relates to the technical field of electronic equipment, in particular to audio and video output equipment, an audio and video delay calibration method thereof and related components.
Background
When the mobile phone is used for connecting the Bluetooth headset, a VR (Virtual Reality) product is used for connecting the Bluetooth headset, or when the VR product is used, when a user needs to play audio and video simultaneously, the situation that the audio is delayed than the video can exist, namely, the phenomenon that a video picture is not synchronous with the audio occurs, and the use experience of the user is seriously influenced.
This is typically due to the faster processing of video pictures and the slower processing of audio signals. Usually, an audio signal is transmitted to an audio playing device through bluetooth, and a process of "audio encoding compression-bluetooth transmission-device bluetooth reception-audio decoding-audio signal processing" is required, and the delay of the process is limited by an encoding and decoding mode and a chip processing capability. In addition, the headphone product performs DSP (Digital Signal processing) on the audio Signal, and also introduces a part of delay.
In summary, how to effectively eliminate the audio/video delay and avoid the phenomenon that the video picture and the audio are not synchronized is a technical problem that needs to be solved urgently by those skilled in the art at present.
Disclosure of Invention
The invention aims to provide audio and video output equipment, an audio and video delay calibration method and related components thereof, so as to effectively eliminate audio and video delay and avoid the phenomenon that a video picture is not synchronous with audio.
In order to solve the technical problems, the invention provides the following technical scheme:
an audio and video delay calibration method comprises the following steps:
generating first video signal pulses and first audio signal pulses with aligned rising edges;
sending the first video signal pulse to a video coding and decoding device so as to display the invisible light signal corresponding to the first video signal pulse;
sending the first audio signal pulse to an audio signal processing device so as to play the inaudible frequency band sound signal corresponding to the first audio signal pulse;
determining a delay time between the inaudible frequency band sound signal and the invisible light signal based on the video encoding and decoding device and the audio signal processing device;
and completing audio and video delay calibration based on the delay time.
Preferably, the determining the time delay between the inaudible band sound signal and the invisible light signal based on the video encoding and decoding device and the audio signal processing device includes:
after the video coding and decoding device finishes processing the first video signal pulse, detecting 1 st to Nth rising edges of the first video signal pulse by using the video coding and decoding device, and recording N detection moments of the 1 st to Nth rising edges of the first video signal pulse by using the video coding and decoding device; n is a positive integer;
after the audio signal processing device finishes processing the first audio signal pulse, detecting 1 st to Nth rising edges of the first audio signal pulse by using the audio signal processing device, and recording N detection moments of detecting the 1 st to Nth rising edges of the first audio signal pulse by using the audio signal processing device;
and determining the difference value between the detection time of the rising edge of each recorded first audio signal pulse and the detection time of the corresponding rising edge of the first video signal pulse, and taking the average value of the determined N difference values as the determined delay time length between the inaudible frequency band sound signal and the inaudible light signal.
Preferably, the determining the time delay between the inaudible band sound signal and the invisible light signal based on the video encoding and decoding device and the audio signal processing device includes:
after the invisible light signal corresponding to the first video signal pulse is displayed, detecting 1 st to Kth rising edges of the invisible light signal by using an image recognition device, and recording K detection times of detecting the 1 st to Kth rising edges of the invisible light signal by using the image recognition device; k is a positive integer;
detecting, by a voice recognition device, 1 st to kth rising edges of an inaudible band sound signal after the playback of the inaudible band sound signal corresponding to the first audio signal pulse, and recording, by the voice recognition device, K detection timings at which the 1 st to kth rising edges of the inaudible band sound signal are detected;
receiving K detection moments fed back by the image recognition device through the video coding and decoding device, and receiving K detection moments fed back by the sound recognition device through the audio signal processing device;
and determining the difference value between the detection time of the rising edge of each recorded inaudible light signal and the detection time of the corresponding rising edge of the inaudible frequency band sound signal, and taking the average value of the determined K difference values as the determined delay time length between the inaudible frequency band sound signal and the inaudible light signal.
Preferably, the image recognition device is a camera, and the voice recognition device is a microphone.
Preferably, the generating the first video signal pulse and the first audio signal pulse with aligned rising edges includes:
every first duration, a first video signal pulse and a first audio signal pulse are generated with aligned rising edges.
Preferably, the completing the audio/video delay calibration based on the delay duration includes:
judging whether the delay time length is a numerical value representing that the audio lags behind the video or not;
if so, configuring the delay time length into a video signal delay value to finish audio and video delay calibration.
Preferably, the method further comprises the following steps:
and when the delay time length is judged to be a numerical value representing that the video lags behind the audio, configuring the delay time length as an audio signal delay value, or configuring the delay time length as a video signal advance value, so as to finish audio and video delay calibration.
An audio-video delay calibration system comprising:
video encoding and decoding device, audio signal processing device;
a master controller to: generating first video signal pulses and first audio signal pulses with aligned rising edges; sending the first video signal pulse to the video encoding and decoding device to display the invisible light signal corresponding to the first video signal pulse; sending the first audio signal pulse to the audio signal processing device to play the inaudible frequency band sound signal corresponding to the first audio signal pulse; determining a delay time between the inaudible frequency band sound signal and the invisible light signal based on the video encoding and decoding device and the audio signal processing device; and completing audio and video delay calibration based on the delay time.
An audio and video output device comprises the audio and video delay calibration system.
A computer-readable storage medium, having a computer program stored thereon, which, when being executed by a processor, carries out the steps of the audio-video delay calibration method according to any one of the preceding claims.
By applying the technical scheme provided by the embodiment of the invention, the standard audio and video file is used for testing, so that the delay time is obtained to finish the audio and video delay calibration. Specifically, the first video signal pulse and the first audio signal pulse with aligned rising edges are generated, and due to the aligned rising edges, the first video signal pulse and the first audio signal pulse are generated simultaneously. And then, the first video signal pulse is sent to the video coding and decoding device, so that the display of the invisible light signal corresponding to the first video signal pulse can be carried out, and the first audio signal pulse is sent to the audio signal processing device, so that the playing of the sound signal of the inaudible frequency band corresponding to the first audio signal pulse can be carried out. Then, based on the video encoding and decoding device and the audio signal processing device, the time delay duration between the sound signal of the inaudible frequency band and the invisible light signal can be determined, and the time delay duration is also the reason for the asynchronization of the video picture and the audio, so that based on the time delay duration, the audio and video time delay calibration can be completed. The scheme of the application can effectively eliminate audio and video delay and avoid the phenomenon that video pictures and audio are not synchronous. Moreover, the first video signal pulse and the first audio signal pulse are utilized, the invisible light signal is displayed, the inaudible frequency band sound signal is played, the invisible light signal cannot be perceived by human eyes, and the inaudible frequency band sound signal cannot be heard by human ears, so that the scheme of the application can carry out audio and video delay calibration under any condition, the influence on the normal product using process of a user cannot be caused, and the use experience of the user is effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of an implementation of an audio/video delay calibration method according to the present invention;
fig. 2 is a schematic structural diagram of an audio/video delay calibration system according to the present invention;
fig. 3 is a schematic structural diagram of an audio/video delay calibration system in a specific embodiment of the present invention.
Detailed Description
The core of the invention is to provide an audio and video delay calibration method, which can effectively eliminate audio and video delay, avoid the phenomenon that a video picture is not synchronous with audio, and cannot influence the use experience of a user.
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of an audio/video delay calibration method according to the present invention, where the audio/video delay calibration method may include the following steps:
step S101: first video signal pulses and first audio signal pulses are generated with aligned rising edges.
Specifically, the main controller may generate the first video signal pulse and the first audio signal pulse, and the main controller may be, for example, a main control chip in a mobile phone, or, for example, the VR main control chip 301 in fig. 3, and may be set according to an actual situation, without affecting the implementation of the present invention.
The first video signal pulse is subsequently used for realizing the display of the invisible light signal, and the first audio signal pulse is subsequently used for realizing the playing of the sound signal of the inaudible frequency band.
Step S102: and sending the first video signal pulse to a video coding and decoding device so as to display the invisible light signal corresponding to the first video signal pulse.
After the main controller generates the first video signal pulse, the first video signal pulse is sent to the video encoding and decoding device, and after the video encoding and decoding device processes the first video signal pulse, the display can be controlled to display the invisible light signal corresponding to the first video signal pulse.
In the scheme of the application, the main controller generates the first video signal pulse, and the subsequent display content corresponding to the first video signal pulse is the invisible light signal, which considers that the invisible light signal cannot be perceived by human eyes, so that when the audio/video delay calibration operation of the application is executed, the normal product use process of a user cannot be influenced, for example, the user watches programs, the audio/video delay calibration can also be executed in real time, and the original watching content of the user cannot be influenced.
The specific type of the invisible light signal can be set and adjusted according to actual needs, and for example, ultraviolet rays with the wavelength of less than 380nm can be selected.
Step S103: and sending the first audio signal pulse to an audio signal processing device so as to play the inaudible frequency band sound signal corresponding to the first audio signal pulse.
The main controller generates a first audio signal pulse, and then sends the first audio signal pulse to the audio signal processing device, and the audio signal processing device processes the first audio signal pulse, so that the speaker can be controlled to play the sound signal in the inaudible frequency band corresponding to the first audio signal pulse.
The main controller generates a first audio signal pulse, and the subsequent playing content corresponding to the first audio signal pulse is an inaudible frequency band sound signal, that is, a high-frequency sound signal which cannot be heard by human ears.
Because the frequency range of sound that people's ear can feel is between 20Hz to 20000Hz, and infrasound has certain harm, consequently, the high frequency sound signal that the inaudible frequency channel sound signal can be selected to be more than 20000Hz usually, and specific frequency value can be set for and adjusted as required.
Step S104: and determining the time delay between the inaudible band sound signal and the invisible light signal based on the video encoding and decoding device and the audio signal processing device.
Because the first video signal pulse and the first audio signal pulse with aligned rising edges are generated, the delay time length between the obtained inaudible frequency band sound signal and the invisible light signal is a numerical value reflecting the audio and video delay condition of the equipment. The delay time length can be determined based on the video encoding and decoding device and the audio signal processing device, and then audio and video delay calibration can be completed according to the delay time length.
The specific manner of determining the delay time based on the video encoding and decoding apparatus and the audio signal processing apparatus may be set according to actual needs, for example, in a specific embodiment of the present invention, step S104 may specifically include:
the method comprises the following steps: after the invisible light signal corresponding to the first video signal pulse is displayed, detecting 1 st to Kth rising edges of the invisible light signal by using an image recognition device, and recording K detection times of the 1 st to Kth rising edges of the invisible light signal by using the image recognition device; k is a positive integer;
step two: after the reproduction of the inaudible band sound signal corresponding to the first audio signal pulse, detecting 1 st to Kth rising edges of the inaudible band sound signal by using a sound recognition device, and recording K detection times of the 1 st to Kth rising edges of the inaudible band sound signal by using the sound recognition device;
step three: receiving K detection moments fed back by the image recognition device through the video coding and decoding device, and receiving K detection moments fed back by the sound recognition device through the audio signal processing device;
step four: and determining the difference value between the detection time of the rising edge of each recorded invisible light signal and the detection time of the corresponding rising edge of the sound signal of the inaudible frequency band, and taking the average value of the determined K difference values as the delay time length between the determined sound signal of the inaudible frequency band and the invisible light signal.
In this embodiment, after the invisible light signal corresponding to the first video signal pulse is displayed, the 1 st to K th rising edges of the invisible light signal are detected by the image recognition device. Similarly, after the reproduction of the inaudible band audio signal corresponding to the first audio signal pulse, the 1 st to K th rising edges of the inaudible band audio signal are detected by the audio recognition device.
For example, in the embodiment of fig. 3, the VR device is connected to a bluetooth headset, in fig. 3, the video codec device 302 is connected to the VR main control chip 301, and the audio signal processing device 304 may be a headset DSP, and is connected to the VR main control chip 301 through a headset bluetooth chip 303. The audio signal processing device 304 controls the speaker 305 to output an audio signal, and the video codec device 302 controls the display 306 to display an image. In fig. 3, the image recognition device 307 may be a camera of the VR device, and the sound recognition device 308 may be an FB MIC or an in-ear MIC of the VR device.
The image recognition device records the detection time of the 1 st to the Kth rising edges of the invisible light signals, so as to obtain K detection times, and feeds back the information to the video coding and decoding device. Taking K5 as an example, the detection timings of the 1 st to K th rising edges of the invisible light signal fed back by the image recognition device can be represented as T1, T2, T3, T4, and T5 in this order.
The voice recognition device records each detection time of the 1 st to the Kth rising edges of the voice signals of the inaudible frequency band, thereby obtaining K detection times, and feeds back the information to the audio signal processing device. Taking K as 5 as an example, the detection timings of the 1 st to K th rising edges of the inaudible band sound signal fed back by the sound recognition apparatus can be represented as t1, t2, t3, t4 and t5 in this order.
After obtaining the respective feedback data by the video codec and the audio signal processing apparatus, it is possible to determine a difference between a detection time of a rising edge of each of the recorded inaudible light signals and a detection time of a rising edge of the corresponding inaudible frequency band sound signal, and to use an average value of the determined K differences as a delay time length between the determined inaudible frequency band sound signal and the inaudible light signal, for example, in the case where K is 5, the delay time length delay is ((T1-T1) + (T2-T2) + (T3-T3) + (T4-T4) + (T5-T5))/5.
In other cases, K may have other values. It is to be understood that, when K is 1, the delay time duration is T1-T1, that is, when K is 1, the difference between the detection time of the rising edge of the invisible light signal and the detection time of the rising edge of the inaudible band sound signal may be directly used as the delay time duration. Of course, in practical applications, considering that the averaging is beneficial to reduce the error, K is usually set to be a positive integer greater than or equal to 2, and N in the following embodiments is the same as that.
In this embodiment of the present application, after the invisible light signal corresponding to the first video signal pulse is displayed, the 1 st to K-th rising edges of the invisible light signal are detected by the image recognition device. After the inaudible frequency band sound signal corresponding to the first audio signal pulse is played, the 1 st to the Kth rising edges of the inaudible frequency band sound signal are detected by the sound recognition device, so that the determined delay time is accurate, and the audio/video delay condition can be accurately reflected.
In practical applications, the image recognition device may be a camera, the voice recognition device may be a microphone, which is commonly used and is convenient to implement, and a plurality of products may have the camera and/or the microphone, which is beneficial to reducing the implementation cost of the scheme of the present application. Of course, in other cases, the image recognition device and the voice recognition device may be of other types, and the object of the present application may be achieved.
In one embodiment of the present invention, step S104 may include:
the method comprises the following steps: after the video coding and decoding device finishes processing the first video signal pulse, detecting 1 st to Nth rising edges of the first video signal pulse by using the video coding and decoding device, and recording N detection moments of the 1 st to Nth rising edges of the first video signal pulse by using the video coding and decoding device; n is a positive integer;
step two: after the audio signal processing device finishes processing the first audio signal pulse, detecting 1 st to Nth rising edges of the first audio signal pulse by using the audio signal processing device, and recording N detection moments of the 1 st to Nth rising edges of the first audio signal pulse by using the audio signal processing device;
step three: and determining the difference value between the detection time of the rising edge of each recorded first audio signal pulse and the detection time of the corresponding rising edge of the first video signal pulse, and taking the average value of the determined N difference values as the delay time length between the determined inaudible frequency band sound signal and the determined invisible light signal.
In this embodiment, instead of detecting the rising edge after the invisible light signal is displayed, the video codec device may detect the 1 st to nth rising edges of the first video signal pulse after the video codec device completes the processing of the first video signal pulse.
Similarly, instead of waiting for the inaudible band sound signal to be played and then detecting the rising edge, the audio signal processing apparatus may be used to detect the 1 st to nth rising edges of the first audio signal pulse after the audio signal processing apparatus completes the processing of the first audio signal pulse.
Compared with the embodiment needing to use the image recognition device and the voice recognition device, the embodiment does not need to use the image recognition device and the voice recognition device, and therefore the cost of the scheme is low. Of course, since the detection of the rising edge is performed by the video codec and the audio signal processing device, the accuracy of the determined delay time is slightly lower than that of the above-mentioned embodiment that requires the use of the image recognition device and the voice recognition device. That is, after the video codec completes processing of the first video signal pulse, a certain amount of time is consumed until the invisible light signal corresponding to the first video signal pulse is displayed subsequently. Similarly, after the audio signal processing apparatus completes the processing of the first audio signal pulse, there is a certain time consumption until the subsequent playback of the inaudible frequency band sound signal corresponding to the first audio signal pulse. The two time consumptions described here may differ slightly in some cases, and therefore the accuracy of the delay duration determined by this embodiment is somewhat low, but this embodiment can be used to reduce the implementation cost of the solution, particularly in cases where the product does not have image recognition means and voice recognition means in its own right, in view of the fact that this difference is usually within the allowable range.
In other cases, the step S104 may be implemented in other ways, for example, when the video codec device just receives the first video signal pulse, the video codec device may detect each rising edge and record each detection time, and correspondingly, when the audio signal processing device just receives the first audio signal pulse, the video codec device may detect each rising edge and record each detection time. For another example, after the video codec device receives the first video signal pulse, before the video codec device completes processing of the first video signal pulse, in a preset appropriate intermediate link, detection of each rising edge of the first video signal pulse may be performed, and each detection time may be recorded. The time delay duration can be effectively determined by setting according to actual needs.
Step S105: and completing audio and video delay calibration based on the delay time.
Based on the time delay, the video coding and decoding device and/or the audio signal processing device can be adjusted, so that audio and video time delay calibration is completed. In addition, in a normal situation, the audio lags behind the video, so the audio and video delay calibration can be realized by delaying the video signal according to the delay duration, or delaying the video signal and the audio signal simultaneously. For example, in one case, the absolute value of the determined delay time is 0.6s, it is considered that the audio lags the video, and the delay time is 0.6s, so that the video signal can be delayed by 0.6 s. For another example, the video signal may be delayed by 1s, and the audio signal may be delayed by 0.4s, so as to implement the audio/video delay calibration.
In an embodiment of the present invention, step S105 may specifically include:
judging whether the delay time length is a numerical value representing that the audio lags behind the video or not;
if yes, configuring the delay time length into a video signal delay value so as to finish audio and video delay calibration.
In this embodiment, it is determined whether the delay time is a value indicating that the audio lags behind the video, so as to avoid a situation of a calibration failure caused by processing according to a default audio lag behind the video when the video lags behind the audio.
The specific manner of judging whether the delay time is the value representing that the audio lags behind the video can be set and adjusted according to actual needs, and whether the calculated delay time is a positive number or a negative number can be judged generally based on the calculation manner of the delay time, so that whether the delay time is the value representing that the audio lags behind the video can be conveniently judged. For example, in the above embodiment, when the delay time duration delay is ((T1-T1) + (T2-T2) + (T3-T3) + (T4-T4) + (T5-T5))/5, if the delay time duration delay is greater than 0, it may be determined that the delay time duration is a value indicating that the audio lags behind the video, and conversely, if the delay time duration delay is less than 0, it may be determined that the delay time duration is a value indicating that the video lags behind the audio. Of course, if the delay time duration delay is exactly equal to 0, it indicates that there is no audio/video delay and no calibration is needed.
In this embodiment, after the delay time is determined to be a value indicating that the audio lags behind the video, the delay time is directly configured as a video signal delay value, so that the audio and video delay calibration can be completed. In addition, if it is determined that the delay time duration is a value indicating that the video lags behind the audio, it may be caused by a detection error or the like, and may not be processed.
Further, in an embodiment of the present invention, the method may further include:
and when the delay time length is judged to be a numerical value representing that the video lags behind the audio, configuring the delay time length into an audio signal delay value, or configuring the delay time length into a video signal advance value, so as to finish audio and video delay calibration.
In the embodiment, considering that in some occasions, the delay time can be determined more accurately, so that if the delay time is determined to be a value indicating that the video lags behind the audio, it can be said that the video at the moment actually lags behind the audio, and therefore, the embodiment can also perform audio and video delay calibration on such a situation.
For example, in one case, the calculated delay time is-0.6 s, indicating that the video lags the audio and lags by 0.6s, and therefore, the delay time may be configured as an audio signal delay value, i.e., delaying the audio signal by 0.6 s. Or the delay time length is configured to be the leading value of the video signal, namely the video signal is advanced by 0.6 s. Of course, in practical applications, if the video signal cannot be advanced or the degree of advance cannot be matched with the calculated delay time, a scheme for delaying the audio signal needs to be selected.
In a specific embodiment of the present invention, step S101 may specifically include:
every first duration, a first video signal pulse and a first audio signal pulse are generated with aligned rising edges.
In this embodiment, it is considered that the use experience of the user is not affected by performing the audio/video delay calibration according to the scheme of the present application, therefore, in this embodiment, the first video signal pulse and the first audio signal pulse with aligned rising edges are generated every first time, that is, the audio/video delay calibration is performed once every first time, which is beneficial to improving the real-time performance of the audio/video delay calibration and further beneficial to ensuring the use experience of the user.
By applying the technical scheme provided by the embodiment of the invention, the standard audio and video file is used for testing, so that the delay time is obtained to finish the audio and video delay calibration. Specifically, the first video signal pulse and the first audio signal pulse with aligned rising edges are generated, and due to the aligned rising edges, the first video signal pulse and the first audio signal pulse are generated simultaneously. And then, the first video signal pulse is sent to the video coding and decoding device, so that the display of the invisible light signal corresponding to the first video signal pulse can be carried out, and the first audio signal pulse is sent to the audio signal processing device, so that the playing of the sound signal of the inaudible frequency band corresponding to the first audio signal pulse can be carried out. Then, based on the video encoding and decoding device and the audio signal processing device, the time delay duration between the sound signal of the inaudible frequency band and the invisible light signal can be determined, and the time delay duration is also the reason for the asynchronization of the video picture and the audio, so that based on the time delay duration, the audio and video time delay calibration can be completed. The scheme of the application can effectively eliminate audio and video delay and avoid the phenomenon that video pictures and audio are not synchronous. Moreover, the first video signal pulse and the first audio signal pulse are utilized, the invisible light signal is displayed, the inaudible frequency band sound signal is played, the invisible light signal cannot be perceived by human eyes, and the inaudible frequency band sound signal cannot be heard by human ears, so that the scheme of the application can carry out audio and video delay calibration under any condition, the influence on the normal product using process of a user cannot be caused, and the use experience of the user is effectively improved.
Corresponding to the above method embodiment, the embodiment of the present invention further provides an audio/video delay calibration system, which can be referred to in correspondence with the above.
Referring to fig. 2, a schematic structural diagram of an audio/video delay calibration system according to the present invention is shown, including:
video encoding and decoding device 201, audio signal processing device 202;
a main controller 203 for: generating first video signal pulses and first audio signal pulses with aligned rising edges; sending the first video signal pulse to a video encoding and decoding device to display the invisible light signal corresponding to the first video signal pulse; sending the first audio signal pulse to an audio signal processing device to play the inaudible frequency band sound signal corresponding to the first audio signal pulse; determining the time delay between the inaudible frequency band sound signal and the invisible light signal based on the video encoding and decoding device and the audio signal processing device; and completing audio and video delay calibration based on the delay time.
In an embodiment of the present invention, the main controller 203 determines a time delay duration between the inaudible band sound signal and the invisible light signal based on the video encoding and decoding apparatus and the audio signal processing apparatus, and is specifically configured to:
after the video coding and decoding device finishes processing the first video signal pulse, detecting 1 st to Nth rising edges of the first video signal pulse by using the video coding and decoding device, and recording N detection moments of the 1 st to Nth rising edges of the first video signal pulse by using the video coding and decoding device; n is a positive integer;
after the audio signal processing device finishes processing the first audio signal pulse, detecting 1 st to Nth rising edges of the first audio signal pulse by using the audio signal processing device, and recording N detection moments of the 1 st to Nth rising edges of the first audio signal pulse by using the audio signal processing device;
and determining the difference value between the detection time of the rising edge of each recorded first audio signal pulse and the detection time of the corresponding rising edge of the first video signal pulse, and taking the average value of the determined N difference values as the delay time length between the determined inaudible frequency band sound signal and the determined invisible light signal.
In an embodiment of the present invention, the main controller 203 determines a time delay duration between the inaudible band sound signal and the invisible light signal based on the video encoding and decoding apparatus and the audio signal processing apparatus, and is specifically configured to:
after the invisible light signal corresponding to the first video signal pulse is displayed, detecting 1 st to Kth rising edges of the invisible light signal by using an image recognition device, and recording K detection times of the 1 st to Kth rising edges of the invisible light signal by using the image recognition device; k is a positive integer;
after the reproduction of the inaudible band sound signal corresponding to the first audio signal pulse, detecting 1 st to Kth rising edges of the inaudible band sound signal by using a sound recognition device, and recording K detection times of the 1 st to Kth rising edges of the inaudible band sound signal by using the sound recognition device;
receiving K detection moments fed back by the image recognition device through the video coding and decoding device, and receiving K detection moments fed back by the sound recognition device through the audio signal processing device;
and determining the difference value between the detection time of the rising edge of each recorded invisible light signal and the detection time of the corresponding rising edge of the sound signal of the inaudible frequency band, and taking the average value of the determined K difference values as the delay time length between the determined sound signal of the inaudible frequency band and the invisible light signal.
In one embodiment of the invention, the image recognition device is a camera and the voice recognition device is a microphone.
In one embodiment of the present invention, the main controller 203 generates the first video signal pulse and the first audio signal pulse with aligned rising edges, and is specifically configured to:
every first duration, a first video signal pulse and a first audio signal pulse are generated with aligned rising edges.
In a specific embodiment of the present invention, the main controller 203 completes audio/video delay calibration based on the delay duration, and is specifically configured to:
judging whether the delay time length is a numerical value representing that the audio lags behind the video or not;
if yes, configuring the delay time length into a video signal delay value so as to finish audio and video delay calibration.
In one embodiment of the present invention, the main controller 203 is further configured to:
and when the delay time length is judged to be a numerical value representing that the video lags behind the audio, configuring the delay time length into an audio signal delay value, or configuring the delay time length into a video signal advance value, so as to finish audio and video delay calibration.
Corresponding to the above method and system embodiments, the present invention further provides an audio/video output device and a computer readable storage medium, where the audio/video output device may include the audio/video delay calibration system in any of the above embodiments. The computer readable storage medium has stored thereon a computer program, which when executed by a processor implements the steps of the audio/video delay calibration method as in any of the above embodiments. A computer-readable storage medium as referred to herein may include Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. The principle and the implementation of the present invention are explained in the present application by using specific examples, and the above description of the embodiments is only used to help understanding the technical solution and the core idea of the present invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (10)

1. An audio and video delay calibration method is characterized by comprising the following steps:
generating first video signal pulses and first audio signal pulses with aligned rising edges;
sending the first video signal pulse to a video coding and decoding device so as to display the invisible light signal corresponding to the first video signal pulse;
sending the first audio signal pulse to an audio signal processing device so as to play the inaudible frequency band sound signal corresponding to the first audio signal pulse;
determining a delay time between the inaudible frequency band sound signal and the invisible light signal based on the video encoding and decoding device and the audio signal processing device;
and completing audio and video delay calibration based on the delay time.
2. The audio/video delay calibration method according to claim 1, wherein the determining a delay time duration between the inaudible band sound signal and the invisible light signal based on the video codec device and the audio signal processing device includes:
after the video coding and decoding device finishes processing the first video signal pulse, detecting 1 st to Nth rising edges of the first video signal pulse by using the video coding and decoding device, and recording N detection moments of the 1 st to Nth rising edges of the first video signal pulse by using the video coding and decoding device; n is a positive integer;
after the audio signal processing device finishes processing the first audio signal pulse, detecting 1 st to Nth rising edges of the first audio signal pulse by using the audio signal processing device, and recording N detection moments of detecting the 1 st to Nth rising edges of the first audio signal pulse by using the audio signal processing device;
and determining the difference value between the detection time of the rising edge of each recorded first audio signal pulse and the detection time of the corresponding rising edge of the first video signal pulse, and taking the average value of the determined N difference values as the determined delay time length between the inaudible frequency band sound signal and the inaudible light signal.
3. The audio/video delay calibration method according to claim 1, wherein the determining a delay time duration between the inaudible band sound signal and the invisible light signal based on the video codec device and the audio signal processing device includes:
after the invisible light signal corresponding to the first video signal pulse is displayed, detecting 1 st to Kth rising edges of the invisible light signal by using an image recognition device, and recording K detection times of detecting the 1 st to Kth rising edges of the invisible light signal by using the image recognition device; k is a positive integer;
detecting, by a voice recognition device, 1 st to kth rising edges of an inaudible band sound signal after the playback of the inaudible band sound signal corresponding to the first audio signal pulse, and recording, by the voice recognition device, K detection timings at which the 1 st to kth rising edges of the inaudible band sound signal are detected;
receiving K detection moments fed back by the image recognition device through the video coding and decoding device, and receiving K detection moments fed back by the sound recognition device through the audio signal processing device;
and determining the difference value between the detection time of the rising edge of each recorded inaudible light signal and the detection time of the corresponding rising edge of the inaudible frequency band sound signal, and taking the average value of the determined K difference values as the determined delay time length between the inaudible frequency band sound signal and the inaudible light signal.
4. The audio-video delay calibration method according to claim 3, wherein the image recognition device is a camera, and the voice recognition device is a microphone.
5. The audio-video delay calibration method of claim 1, wherein said generating leading edge aligned first video signal pulses and first audio signal pulses comprises:
every first duration, a first video signal pulse and a first audio signal pulse are generated with aligned rising edges.
6. The audio/video delay calibration method according to claim 1, wherein the completing audio/video delay calibration based on the delay duration includes:
judging whether the delay time length is a numerical value representing that the audio lags behind the video or not;
if so, configuring the delay time length into a video signal delay value to finish audio and video delay calibration.
7. The audio-video delay calibration method of claim 6, further comprising:
and when the delay time length is judged to be a numerical value representing that the video lags behind the audio, configuring the delay time length as an audio signal delay value, or configuring the delay time length as a video signal advance value, so as to finish audio and video delay calibration.
8. An audio-video delay calibration system, comprising:
video encoding and decoding device, audio signal processing device;
a master controller to: generating first video signal pulses and first audio signal pulses with aligned rising edges; sending the first video signal pulse to the video encoding and decoding device to display the invisible light signal corresponding to the first video signal pulse; sending the first audio signal pulse to the audio signal processing device to play the inaudible frequency band sound signal corresponding to the first audio signal pulse; determining a delay time between the inaudible frequency band sound signal and the invisible light signal based on the video encoding and decoding device and the audio signal processing device; and completing audio and video delay calibration based on the delay time.
9. An audiovisual output device characterized in that it comprises an audiovisual delay calibration system as claimed in claim 8.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the audio-video delay calibration method according to any one of claims 1 to 7.
CN202111249455.3A 2021-10-26 2021-10-26 Audio and video output device and audio and video delay calibration method and related components thereof Pending CN113965662A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111249455.3A CN113965662A (en) 2021-10-26 2021-10-26 Audio and video output device and audio and video delay calibration method and related components thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111249455.3A CN113965662A (en) 2021-10-26 2021-10-26 Audio and video output device and audio and video delay calibration method and related components thereof

Publications (1)

Publication Number Publication Date
CN113965662A true CN113965662A (en) 2022-01-21

Family

ID=79467117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111249455.3A Pending CN113965662A (en) 2021-10-26 2021-10-26 Audio and video output device and audio and video delay calibration method and related components thereof

Country Status (1)

Country Link
CN (1) CN113965662A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN87104106A (en) * 1986-05-14 1988-03-09 无线电电信及技术公司 Interactive television and data transmission system
US6130718A (en) * 1996-05-14 2000-10-10 Thomson Licensing S.A. Method and device for correcting errors in the opening of time windows for recovering data with horizontal synchronization pulses
CN1547846A (en) * 2002-05-02 2004-11-17 索尼株式会社 Video signal processing device and method, recording medium, and program
CN101742357A (en) * 2009-12-29 2010-06-16 北京牡丹电子集团有限责任公司 Method for measuring audio/video synchronous error of digital television device
CN103219029A (en) * 2013-03-25 2013-07-24 广东欧珀移动通信有限公司 Method and system for automatically adjusting synchronization of audio and video
CN104581202A (en) * 2013-10-25 2015-04-29 腾讯科技(北京)有限公司 Audio and video synchronization method and system, encoding device and decoding device
EP2892241A1 (en) * 2014-01-07 2015-07-08 Samsung Electronics Co., Ltd Audio/visual device and control method thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN87104106A (en) * 1986-05-14 1988-03-09 无线电电信及技术公司 Interactive television and data transmission system
US6130718A (en) * 1996-05-14 2000-10-10 Thomson Licensing S.A. Method and device for correcting errors in the opening of time windows for recovering data with horizontal synchronization pulses
CN1547846A (en) * 2002-05-02 2004-11-17 索尼株式会社 Video signal processing device and method, recording medium, and program
CN101742357A (en) * 2009-12-29 2010-06-16 北京牡丹电子集团有限责任公司 Method for measuring audio/video synchronous error of digital television device
CN103219029A (en) * 2013-03-25 2013-07-24 广东欧珀移动通信有限公司 Method and system for automatically adjusting synchronization of audio and video
CN104581202A (en) * 2013-10-25 2015-04-29 腾讯科技(北京)有限公司 Audio and video synchronization method and system, encoding device and decoding device
EP2892241A1 (en) * 2014-01-07 2015-07-08 Samsung Electronics Co., Ltd Audio/visual device and control method thereof

Similar Documents

Publication Publication Date Title
EP3383064B1 (en) Echo cancellation method and system
JP5957760B2 (en) Video / audio processor
US7957549B2 (en) Acoustic apparatus and method of controlling an acoustic apparatus
JP2007533189A (en) Video / audio synchronization
CN104967960A (en) Voice data processing method, and voice data processing method and system in game live broadcasting
US9986362B2 (en) Information processing method and electronic device
US9756437B2 (en) System and method for transmitting environmental acoustical information in digital audio signals
US20220322010A1 (en) Rendering audio over multiple speakers with multiple activation criteria
US9502047B2 (en) Talker collisions in an auditory scene
WO2015117343A1 (en) Method and system for improving tone quality of voice, and mobile terminal
WO2016127699A1 (en) Method and device for adjusting reference signal
WO2010091555A1 (en) Stereo encoding method and device
CN112637732A (en) Display device and audio signal playing method
WO2023029829A1 (en) Audio processing method and apparatus, user terminal, and computer readable medium
CN111402910B (en) Method and equipment for eliminating echo
WO2023070848A1 (en) Electronic device and far-field noise elimination self-calibration method and system therefor
US20140205104A1 (en) Information processing apparatus, information processing method, and program
US10523171B2 (en) Method for dynamic sound equalization
CN111787464B (en) Information processing method and device, electronic equipment and storage medium
US20100296662A1 (en) Sound signal processing device and method
CN113965662A (en) Audio and video output device and audio and video delay calibration method and related components thereof
CN108806677B (en) Audio processing device and audio processing method
CN104062830A (en) Projector and method for setting the projector
KR20020028918A (en) Audio system
TW201830229A (en) Calibration method and computer readable recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination