KR102311386B1

KR102311386B1 - Method and Apparatus for Lip Synchronization Control of Audio/Visual Signals

Info

Publication number: KR102311386B1
Application number: KR1020200032559A
Authority: KR
Inventors: 한아람; 김대성; 조다영; 함경선
Original assignee: 주식회사 엘지유플러스
Priority date: 2020-03-17
Filing date: 2020-03-17
Publication date: 2021-10-08
Also published as: KR20210115993A

Abstract

립싱크 제어 방법이 제공된다. 본 방법은, 압축된 비디오 신호를 디코딩하여 압축 해제된 비디오 신호를 제공하는 단계, 상기 압축 해제된 비디오 신호가 압축된 오디오 신호와 함께 TV 세트로 출력되도록 하는 단계, 상기 압축된 오디오 신호를 디코딩하여 압축 해제된 오디오 신호를 제공하는 단계, 상기 압축 해제된 오디오 신호를 샘플링하여 기준 오디오 신호 샘플들을 제공하는 단계, 상기 TV 세트로부터 출력되는 사운드 출력을 통해서 루프백 오디오 신호 샘플들을 제공하는 단계, 및 상기 기준 오디오 신호 샘플들 및 상기 루프백 오디오 신호 샘플들을 비교하는 것에 기초하여 상기 압축된 비디오 신호가 디코딩되는 타이밍 또는 상기 압축 해제된 비디오 신호가 상기 TV 세트로 출력되는 타이밍을 제어하는 단계를 포함할 수 있다.A lip sync control method is provided. The method comprises the steps of decoding a compressed video signal to provide a decompressed video signal, causing the decompressed video signal to be output to a TV set together with a compressed audio signal, by decoding the compressed audio signal providing a decompressed audio signal, sampling the decompressed audio signal to provide reference audio signal samples, providing loopback audio signal samples via a sound output output from the TV set, and the reference and controlling the timing at which the compressed video signal is decoded or the timing at which the decompressed video signal is output to the TV set based on comparing the audio signal samples and the loopback audio signal samples.

Description

Method and Apparatus for Lip Synchronization Control of Audio/Visual Signals

본 발명은 AV(Audio/Visual) 신호 처리 기술에 관한 것이다.The present invention relates to an AV (Audio/Visual) signal processing technology.

최근 멀티미디어 기술의 발달에 발맞추어 현실감 있는 오디오를 재생하기 위한 실감 오디오 기술 개발에 대한 요구가 급증하고 있다. 이러한 요구에 부응하여 실감나는 오디오 효과를 제공해 줄 수 있는 3차원 음향 시스템에 대한 연구가 활발히 진행되고 있다. 이러한 연구의 결과로 탄생한 3차원 음향 시스템들 중의 하나가 돌비 애트모스(Dolby Atmos)이다. 돌비 애트모스는 공간감을 주는 실감나는 서라운드 사운드로 영화의 장면들을 들을 수 있게 해주어 영화 장면의 사실감과 효과를 극대화함으로써 청중을 몰입시키는 3차원 음향 시스템이다. 돌비 애트모스는 오디오 개체 개념을 기반으로 한, 3차원 공간에 독립적으로 위치 가능하게 매스터링된 360°입체 음향 시스템이라 할 수 있다. 세계적으로 돌비 애트모스로 매스터링된 컨텐츠는 현재 기준으로 1,300여 개로 상당한 수에 달하고 있고, 넷플릭스의 경우 신규 제작 컨텐츠의 3/4 정도가 애트모스로 매스터링되고 있을 정도이다.Recently, in line with the development of multimedia technology, the demand for the development of realistic audio technology for reproducing realistic audio is rapidly increasing. In response to these demands, research on a 3D sound system capable of providing realistic audio effects is being actively conducted. One of the 3D sound systems created as a result of this research is Dolby Atmos. Dolby Atmos is a three-dimensional sound system that immerses the audience by maximizing the realism and effect of movie scenes by allowing you to listen to movie scenes with realistic surround sound that gives a sense of space. Dolby Atmos is a 360° stereophonic sound system based on the concept of an audio object that is independently positioned and mastered in three-dimensional space. Worldwide, there are about 1,300 content mastered with Dolby Atmos, which is a considerable number, and in the case of Netflix, about three-quarters of new content is mastered with Atmos.

본 발명의 과제는 컨텐츠의 재생 시 오디오와 비디오 간의 시차를 보정하여 이들을 동기화할 수 있도록 하는 것이다.An object of the present invention is to correct a disparity between audio and video when content is reproduced so that they can be synchronized.

본 발명이 해결하고자 하는 과제들은 이상에서 언급한 과제들에 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

일 측면에서, 립싱크 제어 방법이 제공된다. 본 방법은, 압축된 비디오 신호를 디코딩하여 압축 해제된 비디오 신호를 제공하는 단계, 상기 압축 해제된 비디오 신호가 압축된 오디오 신호와 함께 TV 세트로 출력되도록 하는 단계, 상기 압축된 오디오 신호를 디코딩하여 압축 해제된 오디오 신호를 제공하는 단계, 상기 압축 해제된 오디오 신호를 샘플링하여 기준 오디오 신호 샘플들을 제공하는 단계, 상기 TV 세트로부터 출력되는 사운드 출력을 통해서 루프백 오디오 신호 샘플들을 제공하는 단계, 및 상기 기준 오디오 신호 샘플들 및 상기 루프백 오디오 신호 샘플들을 비교하는 것에 기초하여 상기 압축된 비디오 신호가 디코딩되는 타이밍 또는 상기 압축 해제된 비디오 신호가 상기 TV 세트로 출력되는 타이밍을 제어하는 단계를 포함할 수 있다.In one aspect, a lip sync control method is provided. The method comprises the steps of decoding a compressed video signal to provide a decompressed video signal, causing the decompressed video signal to be output to a TV set together with a compressed audio signal, by decoding the compressed audio signal providing a decompressed audio signal, sampling the decompressed audio signal to provide reference audio signal samples, providing loopback audio signal samples via a sound output output from the TV set, and the reference and controlling the timing at which the compressed video signal is decoded or the timing at which the decompressed video signal is output to the TV set based on comparing the audio signal samples and the loopback audio signal samples.

일 실시예에서, 상기 TV 세트로부터 출력되는 사운드 출력을 전기 신호로 변환하여 루프백 오디오 신호 샘플들을 제공하는 단계는, 상기 TV 세트로부터 출력되는 사운드 출력을 전기 신호로 변환하고, 상기 전기 신호를 A/D 변환하여 상기 루프백 오디오 신호 샘플들을 제공하는 단계를 포함한다.In one embodiment, the step of converting a sound output output from the TV set into an electrical signal to provide loopback audio signal samples comprises: converting a sound output output from the TV set into an electrical signal, and converting the electrical signal to A/ D transform and provide samples of the loopback audio signal.

일 실시예에서, 상기 전기 신호를 A/D 변환하여 상기 루프백 오디오 신호 샘플들을 제공하는 단계는, 상기 전기 신호를 A/D 변환하여 디지틀 샘플들을 제공하고 상기 제공된 디지틀 샘플들에 대해 노이즈 필터링을 수행하여 상기 루프백 오디오 샘플들을 제공하는 단계를 포함한다.In an embodiment, the A/D conversion of the electrical signal to provide the loopback audio signal samples includes: A/D converting the electrical signal to provide digital samples and noise filtering on the provided digital samples and providing the loopback audio samples.

일 실시예에서, 상기 전기 신호를 A/D 변환하여 루프백 오디오 신호 샘플들을 제공하는 단계와 상기 압축 해제된 오디오 신호를 샘플링하여 기준 오디오 신호 샘플들을 제공하는 단계는 서로 동기를 맞추어 수행된다.In one embodiment, A/D converting the electrical signal to provide loopback audio signal samples and sampling the decompressed audio signal to provide reference audio signal samples are performed in synchronization with each other.

일 실시예에서, 상기 기준 오디오 신호 샘플들 및 상기 루프백 오디오 신호 샘플들은 동일한 시간 윈도우(time window) 내의 샘플들이다.In one embodiment, the reference audio signal samples and the loopback audio signal samples are samples within the same time window.

일 실시예에서, 상기 기준 오디오 신호 샘플들 및 상기 루프백 오디오 신호 샘플들을 비교하는 것에 기초하여 상기 압축된 비디오 신호가 디코딩되는 타이밍 또는 상기 압축 해제된 비디오 신호가 상기 TV 세트로 출력되는 타이밍을 제어하는 단계는, 상기 기준 오디오 신호 샘플들 및 상기 루프백 오디오 신호 샘플들 간의 시차 값을 산출하고 상기 시차 값에 기초하여, 상기 압축 해제된 비디오 신호가 상기 TV 세트로 출력되는 것 또는 상기 압축된 비디오 신호가 디코딩되어 압축 해제된 비디오 신호로 제공되는 것이 지연되도록 하는 단계를 포함한다.In one embodiment, controlling the timing at which the compressed video signal is decoded or the timing at which the decompressed video signal is output to the TV set based on comparing the reference audio signal samples and the loopback audio signal samples The step is to calculate a disparity value between the reference audio signal samples and the loopback audio signal samples, and based on the disparity value, the decompressed video signal is output to the TV set or the compressed video signal is and delaying presentation of the decoded and decompressed video signal.

일 실시예에서, 상기 기준 오디오 신호 샘플들 및 상기 루프백 오디오 신호 샘플들 간의 시차 값을 산출하고 상기 시차 값에 기초하여, 상기 압축 해제된 비디오 신호가 상기 TV 세트로 출력되는 것 또는 상기 압축된 비디오 신호가 디코딩되어 압축 해제된 비디오 신호로 제공되는 것이 지연되도록 하는 단계는, 상기 시차 값에 기초하여 지연 값을 결정하는 단계, 및 상기 압축 해제된 비디오 신호가 상기 TV 세트로 출력되는 것 또는 상기 압축된 비디오 신호가 디코딩되어 압축 해제된 비디오 신호로 제공되는 것이 상기 지연 값만큼 지연되도록 하는 단계를 포함한다.In one embodiment, calculating a disparity value between the reference audio signal samples and the loopback audio signal samples, and based on the disparity value, the decompressed video signal is output to the TV set or the compressed video Delaying a signal from being decoded and presented as a decompressed video signal comprises: determining a delay value based on the disparity value; and outputting the decompressed video signal to the TV set or the compressed and causing the decoded video signal to be decoded and provided as a decompressed video signal delayed by the delay value.

일 실시예에서, 상기 지연 값은 상기 시차 값과 동일하다.In an embodiment, the delay value is equal to the disparity value.

일 실시예에서, 상기 지연 값은 상기 시차 값에서 처리 지연 값을 뺀 값이다.In an embodiment, the delay value is a value obtained by subtracting a processing delay value from the disparity value.

일 실시예에서, 상기 방법은, 상기 압축된 비디오 신호와 상기 압축된 오디오 신호를 포함하는 돌비 애트모스 신호를 수신하는 단계, 및 상기 돌비 애트모스 신호를 상기 압축된 비디오 신호와 상기 압축된 오디오 신호로 분리하는 단계를 더 포함한다.In one embodiment, the method comprises: receiving a Dolby Atmos signal comprising the compressed video signal and the compressed audio signal; and converting the Dolby Atmos signal to the compressed video signal and the compressed audio signal. It further comprises the step of separating into

일 실시예에서, 상기 압축된 비디오 신호를 디코딩하여 압축 해제된 비디오 신호를 제공하는 단계는, 상기 압축된 오디오 신호의 헤더에 포함된 시간 정보를 참조하여 상기 제공되는 압축 해제된 오디오 신호와 동기를 맞추어 상기 압축 해제된 비디오 신호를 제공하는 단계를 포함한다.In an embodiment, the decoding of the compressed video signal to provide a decompressed video signal may include synchronizing the provided decompressed audio signal with reference to time information included in a header of the compressed audio signal. and providing the decompressed video signal accordingly.

일 실시예에서, 상기 TV 세트는 TV를 포함한다.In one embodiment, the TV set comprises a TV.

일 실시예에서, 상기 TV 세트는 TV 및 상기 TV에 접속된 라우드 스피커를 포함한다.In one embodiment, the TV set includes a TV and a loudspeaker connected to the TV.

다른 측면에서, 셋탑박스가 제공된다. 본 셋탑박스는, 압축된 비디오 신호와 압축된 오디오 신호를 포함하는 신호를 디멀티플렉싱하여 제1 단자로 압축된 비디오 신호를 출력하고 제2 단자로 압축된 오디오 신호를 출력하도록 구성된 디멀티플렉서 - 상기 제2 단자는 TV와의 접속을 위한 출력 단자와 접속됨 -, 상기 제2 단자로 출력되는 상기 압축된 오디오 신호를 디코딩하여 압축 해제된 오디오 신호를 출력하도록 구성된 오디오 디코더, 상기 제1 단자로 출력되는 상기 압축된 비디오 신호를 입력받아 그대로 또는 지연시켜 출력하도록 구성된 지연 발생부, 상기 지연 발생부로부터 출력되는 상기 압축된 비디오 신호를 디코딩하여 상기 출력 단자로 압축 해제된 비디오 신호를 출력하도록 구성된 비디오 디코더, 상기 TV로부터 출력되는 음파를 수신하여 루프백 오디오 신호 샘플들을 제공하도록 구성된 음파 변환부, 및 상기 압축 해제된 오디오 신호를 샘플링하여 기준 오디오 신호 샘플들을 제공하고, 상기 기준 오디오 신호 샘플들 및 상기 루프백 오디오 신호 샘플들에 기초하여 지연 값을 결정하여 출력하도록 구성된 지연값 결정부를 포함할 수 있다. 상기 지연 발생부는 상기 압축된 비디오 신호를 입력받아 그대로 출력하다가 상기 지연값 결정부로부터 상기 지연 값을 입력받는 것에 응답하여 상기 압축된 비디오 신호를 입력받아 상기 지연 값만큼 지연시켜 출력하도록 더 구성될 수 있다.In another aspect, a set-top box is provided. The set-top box is a demultiplexer configured to demultiplex a signal including a compressed video signal and a compressed audio signal to output a compressed video signal to a first terminal and output a compressed audio signal to a second terminal - the second terminal is connected to an output terminal for connection with a TV - an audio decoder configured to decode the compressed audio signal output to the second terminal to output a decompressed audio signal, the compression output to the first terminal A delay generation unit configured to receive and output a video signal as it is or with a delay, a video decoder configured to decode the compressed video signal output from the delay generation unit and output a decompressed video signal to the output terminal, the TV a sound wave converter configured to receive a sound wave output from the , and provide loopback audio signal samples, and sample the decompressed audio signal to provide reference audio signal samples, the reference audio signal samples and the loopback audio signal samples and a delay value determiner configured to determine and output a delay value based on . The delay generator may be further configured to receive and output the compressed video signal as it is, and to receive the compressed video signal in response to receiving the delay value from the delay value determiner and delay the output by the delay value. have.

일 실시예에서, 상기 오디오 디코더는 소프트웨어 기반의 애트모스 오디오 디코더이다.In one embodiment, the audio decoder is a software-based Atmos audio decoder.

일 실시예에서, 상기 음파 변환부는 상기 TV로부터 출력되는 음파를 전기 신호로 변환하도록 구성된 마이크를 포함하고, 상기 음파 변환부는 상기 전기 신호를 A/D 변환하여 루프백 오디오 신호 샘플들을 제공하도록 더 구성된다.In one embodiment, the sound wave converter comprises a microphone configured to convert a sound wave output from the TV into an electric signal, and the sound wave converter is further configured to A/D convert the electric signal to provide loopback audio signal samples .

일 실시예에서, 상기 셋탑박스는, 상기 루프백 오디오 신호 샘플들을 노이즈 제거 필터링하도록 구성된 노이즈 제거 필터부를 더 포함한다.In an embodiment, the set-top box further comprises a noise removal filter unit configured to remove noise filter the loopback audio signal samples.

일 실시예에서, 상기 기준 오디오 신호 샘플들 및 상기 루프백 오디오 신호 샘플들은 동일한 시간 윈도우 내의 샘플들이다.In one embodiment, the reference audio signal samples and the loopback audio signal samples are samples within the same time window.

일 실시예에서, 상기 지연값 결정부는, 상기 기준 오디오 신호 샘플들 및 상기 루프백 오디오 신호 샘플들 간의 시차 값을 산출하고 상기 시차 값에 기초하여 상기 지연 값을 결정하도록 더 구성된다.In an embodiment, the delay value determiner is further configured to calculate a disparity value between the reference audio signal samples and the loopback audio signal samples and determine the delay value based on the disparity value.

일 실시예에서, 상기 오디오 디코더, 상기 음파 변환부 및 상기 지연값 결정부는 상기 지연 발생부가 상기 지연값 결정부로부터 상기 지연 값을 입력받는 것에 응답하여 비활성화되도록 작동된다.In an embodiment, the audio decoder, the sound wave converter, and the delay value determiner are operated such that the delay generator is deactivated in response to receiving the delay value from the delay value determiner.

또 다른 측면에서, 셋탑박스가 제공된다. 본 셋탑박스는, 압축된 비디오 신호와 압축된 오디오 신호를 포함하는 신호를 디멀티플렉싱하여 제1 단자로 압축된 비디오 신호를 출력하고 제2 단자로 압축된 오디오 신호를 출력하도록 구성된 디멀티플렉서 - 상기 제2 단자는 TV와의 접속을 위한 출력 단자와 접속됨 -, 상기 제2 단자로 출력되는 상기 압축된 오디오 신호를 디코딩하여 압축 해제된 오디오 신호를 출력하도록 구성된 오디오 디코더, 상기 제1 단자로 출력되는 상기 압축된 비디오 신호를 디코딩하여 압축 해제된 비디오 신호를 출력하도록 구성된 비디오 디코더, 상기 압축 해제된 비디오 신호를 입력받아 상기 출력 단자로 그대로 또는 지연시켜 출력하도록 구성된 지연 발생부, 상기 TV로부터 출력되는 음파를 수신하여 루프백 오디오 신호 샘플들을 제공하도록 구성된 음파 변환부, 및 상기 압축 해제된 오디오 신호를 샘플링하여 기준 오디오 신호 샘플들을 제공하고, 상기 기준 오디오 신호 샘플들 및 상기 루프백 오디오 신호 샘플들에 기초하여 지연 값을 결정하여 출력하도록 구성된 지연값 결정부를 포함할 수 있다. 상기 지연 발생부는 상기 압축 해제된 비디오 신호를 입력받아 그대로 출력하다가 상기 지연값 결정부로부터 상기 지연 값을 입력받는 것에 응답하여 상기 압축 해제된 비디오 신호를 입력받아 상기 지연 값만큼 지연시켜 출력하도록 더 구성될 수 있다.In another aspect, a set-top box is provided. The set-top box is a demultiplexer configured to demultiplex a signal including a compressed video signal and a compressed audio signal to output a compressed video signal to a first terminal and output a compressed audio signal to a second terminal - the second terminal is connected to an output terminal for connection with a TV - an audio decoder configured to decode the compressed audio signal output to the second terminal to output a decompressed audio signal, the compression output to the first terminal A video decoder configured to decode the compressed video signal to output a decompressed video signal, a delay generator configured to receive the decompressed video signal and output it as it is or with a delay to the output terminal, and receive a sound wave output from the TV a sonic converter configured to provide loopback audio signal samples by: sampling the decompressed audio signal to provide reference audio signal samples, and determining a delay value based on the reference audio signal samples and the loopback audio signal samples It may include a delay value determiner configured to determine and output. The delay generator receives the decompressed video signal and outputs it as it is, and in response to receiving the delay value from the delay value determiner, receives the decompressed video signal and delays the output by the delay value. can be

본 발명의 실시예들에 따르면, 돌비 애트모스 컨텐츠의 재생 시 오디오와 비디오 간의 시차를 보정하여 이들을 동기화함으로써 시청자에게 더욱 실감나는 오디오 효과를 제공해 줄 수 있는 기술적 효과가 있다.According to embodiments of the present invention, there is a technical effect of providing a more realistic audio effect to a viewer by synchronizing the disparity between audio and video when playing Dolby Atmos content.

도 1은 립싱크 제어 기능을 구비한 셋탑박스의 블록도의 제1 실시예를 도시한 도면이다.
도 2는 도 1의 셋탑박스와 접속될 수 있는 TV 세트의 블록도의 일 실시예를 도시한 도면이다.
도 3은 립싱크 제어 기능을 구비한 셋탑박스의 블록도의 제2 실시예를 도시한 도면이다.
도 4는 도 1의 지연값 결정부에서 기준 오디오 신호 샘플들 및 루프백 오디오 신호 샘플들 간의 시차 값을 산출하는 과정을 개념적으로 설명하기 위한 도면이다.
도 5는 셋탑박스에서 수행되는 립싱크 제어 방법을 설명하기 위한 흐름도의 제1 실시예를 도시한 도면이다.
도 6은 셋탑박스에서 수행되는 립싱크 제어 방법을 설명하기 위한 흐름도의 제2 실시예를 도시한 도면이다.1 is a view showing a first embodiment of a block diagram of a set-top box having a lip sync control function.
FIG. 2 is a diagram illustrating an embodiment of a block diagram of a TV set that can be connected to the set-top box of FIG. 1 .
3 is a diagram illustrating a second embodiment of a block diagram of a set-top box having a lip sync control function.
FIG. 4 is a diagram for conceptually explaining a process of calculating a disparity value between reference audio signal samples and loopback audio signal samples in the delay value determiner of FIG. 1 .
5 is a diagram illustrating a first embodiment of a flowchart for explaining a lip sync control method performed in a set-top box.
6 is a diagram illustrating a second embodiment of a flowchart for explaining a lip sync control method performed in a set-top box.

본 발명의 이점들과 특징들 그리고 이들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해 질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 본 실시예들은 단지 본 발명의 개시가 완전하도록 하며 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려 주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention and a method of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in a variety of different forms, and these embodiments merely allow the disclosure of the present invention to be complete and those of ordinary skill in the art to which the present invention pertains. It is provided to fully inform the person of the scope of the invention, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용되는 용어는 단지 특정한 실시예를 설명하기 위해 사용되는 것으로 본 발명을 한정하려는 의도에서 사용된 것이 아니다. 예를 들어, 단수로 표현된 구성 요소는 문맥상 명백하게 단수만을 의미하지 않는다면 복수의 구성 요소를 포함하는 개념으로 이해되어야 한다. 또한, 본 발명의 명세서에서, '포함하다' 또는 '가지다' 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것일 뿐이고, 이러한 용어의 사용에 의해 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성이 배제되는 것은 아니다. 또한, 본 명세서에 기재된 실시예에 있어서 '모듈' 혹은 '부'는 적어도 하나의 기능이나 동작을 수행하는 기능적 부분을 의미할 수 있다.The terms used herein are used only to describe specific embodiments and are not intended to limit the present invention. For example, a component expressed in a singular should be understood as a concept including a plurality of components unless the context clearly means only the singular. In addition, in the specification of the present invention, terms such as 'comprise' or 'have' are only intended to designate that the features, numbers, steps, operations, components, parts, or combinations thereof described in the specification exist, and such The use of the term does not exclude the possibility of the presence or addition of one or more other features or numbers, steps, operations, components, parts, or combinations thereof. Also, in the embodiments described in this specification, a 'module' or a 'unit' may mean a functional part that performs at least one function or operation.

덧붙여, 다르게 정의되지 않는 한 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미가 있는 것으로 해석되어야 하며, 본 발명의 명세서에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.In addition, unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in the specification of the present invention, it should be interpreted in an ideal or excessively formal meaning. doesn't happen

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 보다 상세히 설명한다. 다만, 이하의 설명에서는 본 발명의 요지를 불필요하게 흐릴 우려가 있는 경우, 널리 알려진 기능이나 구성에 관한 구체적 설명은 생략하기로 한다.Hereinafter, embodiments of the present invention will be described in more detail with reference to the accompanying drawings. However, in the following description, if there is a risk of unnecessarily obscuring the gist of the present invention, detailed descriptions of well-known functions or configurations will be omitted.

도 1은 립싱크 제어 기능을 구비한 셋탑박스의 블록도의 제1 실시예를 도시한 도면이다. 도 1에 도시된 구성부들은 셋탑박스의 모든 기능을 반영한 것이 아니고 필수적인 것도 아니어서, 셋탑박스는 도시된 구성부들 보다 많은 구성부들을 포함하거나 그 보다 적은 구성부들을 포함할 수 있음을 인식하여야 한다.1 is a view showing a first embodiment of a block diagram of a set-top box having a lip sync control function. It should be recognized that the components shown in FIG. 1 do not reflect all functions of the set-top box and are not essential, so the set-top box may include more components or fewer components than the illustrated components. .

도 1에 도시된 바와 같이, 셋탑박스(100)는 압축된 비디오 신호(compressed video signals)와 압축된 오디오 신호(compressed audio signals)가 멀티플렉싱되어 실려있는 AV 신호, 예를 들면 돌비 애트모스 컨텐츠를 디멀티플렉싱함으로써 압축된 비디오 신호와 압축된 오디오 신호를 서로 분리하도록 구성된 디멀티플렉서(110)를 포함할 수 있다. 디멀티플렉서(110)는 제1 단자(112)로 압축된 비디오 신호를 출력하고 제2 단자(113)로 압축된 오디오 신호를 출력할 수 있다. 제2 단자(113)는 TV 세트와의 접속을 위한 셋탑박스(100)의 출력 단자(123)와 접속될 수 있다. 이에 따라 압축된 오디오 신호는 TV 세트로 입력될 수 있도록 셋탑박스(100)의 출력 단자(123)를 통해 출력된다. 일 실시예에서, 출력 단자(123)는 HDMI(High Definition Multimedia Interface) 단자이다. 셋탑박스(100)는 제2 단자(113)로 출력되는 압축된 오디오 신호를 디코딩하여 압축 해제된 오디오 신호(decoded audio signals)를 출력하도록 구성된 오디오 디코더(audio decoder, 115)를 더 포함할 수 있다. 일 실시예에서, 오디오 디코더(115)는 소프트웨어 기반의 애트모스 오디오 디코더이다. 셋탑박스(100)는 제1 단자(112)로 출력되는 압축된 비디오 신호를 입력받아 그대로 또는 지연시켜 출력하도록 구성된 지연 발생부(117)를 더 포함할 수 있다. 셋탑박스(100)는 지연 발생부(117)로부터 출력되는 압축된 비디오 신호를 디코딩하여 출력 단자(123)로 압축 해제된 비디오 신호(decoded video signals)를 출력하도록 구성된 비디오 디코더(video decoder, 120)를 더 포함할 수 있다. 일 실시예에서, 비디오 디코더(120)는 H.264, H.265 또는 HEVC(High Efficiency Video Coding) 표준에 부합하는 비디오 디코더이다. 압축 해제된 비디오 신호는 출력 단자(123)를 통해 출력되어 TV 세트로 입력될 수 있다. 비디오 디코더(120)는 오디오 디코더(115)로 입력되는 압축된 오디오 신호의 헤더(header)에 포함된 시간 정보를 참조하여 오디오 디코더(115)에서 출력되는 압축 해제된 오디오 신호와 동기를 맞추어 압축 해제된 비디오 신호가 출력되도록 작동될 수 있다. 비록 셋탑박스(100)에서 출력되는 신호는 서로 동기가 맞지 않은 압축 해제된 비디오 신호와 압축된 오디오 신호이지만, 비디오 디코더(120)에 의한 전술한 동기 제어 동작에 의해 셋탑박스(100) 내부에서 오디오 디코더(115)가 압축 해제된 비디오 신호와 동기가 맞는 압축 해제된 오디오 신호를 출력하게 되는 결과가 된다.As shown in FIG. 1 , the set-top box 100 decodes an AV signal in which compressed video signals and compressed audio signals are multiplexed, for example, Dolby Atmos content. and a demultiplexer 110 configured to separate the compressed video signal and the compressed audio signal from each other by multiplexing. The demultiplexer 110 may output a compressed video signal to the first terminal 112 and output a compressed audio signal to the second terminal 113 . The second terminal 113 may be connected to an output terminal 123 of the set-top box 100 for connection to a TV set. Accordingly, the compressed audio signal is output through the output terminal 123 of the set-top box 100 to be input to the TV set. In one embodiment, the output terminal 123 is a High Definition Multimedia Interface (HDMI) terminal. The set-top box 100 may further include an audio decoder 115 configured to decode the compressed audio signal output to the second terminal 113 and output decompressed audio signals. . In one embodiment, the audio decoder 115 is a software-based Atmos audio decoder. The set-top box 100 may further include a delay generator 117 configured to receive the compressed video signal output to the first terminal 112 and output it as it is or with a delay. The set-top box 100 is a video decoder 120 configured to decode the compressed video signal output from the delay generator 117 and output decompressed video signals to the output terminal 123 . may further include. In one embodiment, video decoder 120 is a video decoder conforming to H.264, H.265, or High Efficiency Video Coding (HEVC) standards. The decompressed video signal may be output through the output terminal 123 and input to the TV set. The video decoder 120 decompresses in synchronization with the decompressed audio signal output from the audio decoder 115 with reference to time information included in a header of the compressed audio signal input to the audio decoder 115 . It can be operated to output a video signal. Although the signal output from the set-top box 100 is a decompressed video signal and a compressed audio signal that are out of synchronization with each other, the audio in the set-top box 100 is performed by the above-described synchronization control operation by the video decoder 120 . As a result, the decoder 115 outputs a decompressed audio signal synchronized with the decompressed video signal.

셋탑박스(100)는 셋탑박스(100)에서 출력되는 압축 해제된 비디오 신호와 압축된 오디오 신호를 수신하는 TV 세트로부터 출력되는 사운드 출력, 즉 음파를 전기 신호로 변환하도록 구성된 마이크(130)를 더 포함할 수 있다. 일 실시예에서, 마이크(130)는 음성 인식 마이크(voice recognition microphone)이다. 셋탑박스(100)는 전기 신호를 A/D 변환하여 루프백 오디오 신호 샘플들(loopback audio signal samples)을 제공하도록 구성된 A/D 변환기(135)를 더 포함할 수 있다. 마이크(130) 및 A/D 변환기(135)는 총체적으로 음파 변환부를 구성할 수 있다. 루프백 오디오 신호 샘플들은 TV 세트가 셋탑박스(100)로부터 수신한 압축된 오디오 신호를 이용하여 라우드 스피커(loud speaker)를 통해 방사한 오디오 출력을 셋탑박스(100)에서 다시 취음하여 수집한 신호 샘플들로서 파악될 수 있다. 일 실시예에서, A/D 변환기(135)는 입력되는 전기 신호를 선정된 시간 구간 동안 디지틀 샘플링하도록 작동될 수 있다. 셋탑박스(100)는 A/D 변환기(135)에 접속되어 루프백 오디오 신호 샘플들을 노이즈 제거 필터링하도록 구성된 노이즈 제거 필터부(140)를 더 포함할 수 있다. 일 실시예에서, 노이즈 제거 필터부(140)는 가청 주파수 대역을 통과시키는 디지틀 대역통과 필터(digital bandpass filter)이다.The set-top box 100 includes a microphone 130 configured to convert a sound output, that is, a sound wave output from a TV set that receives the decompressed video signal and the compressed audio signal output from the set-top box 100, that is, into an electrical signal. may include In one embodiment, the microphone 130 is a voice recognition microphone. The set-top box 100 may further include an A/D converter 135 configured to A/D convert an electrical signal to provide loopback audio signal samples. The microphone 130 and the A/D converter 135 may collectively constitute a sound wave converter. The loopback audio signal samples are signal samples collected by the TV set using the compressed audio signal received from the set-top box 100 and collecting the audio output emitted through a loud speaker in the set-top box 100 again. can be comprehended. In one embodiment, the A/D converter 135 may be operated to digitally sample the input electrical signal for a predetermined time period. The set-top box 100 may further include a noise removal filter unit 140 connected to the A/D converter 135 and configured to remove noise and filter the loopback audio signal samples. In one embodiment, the noise removal filter unit 140 is a digital bandpass filter that passes the audible frequency band.

셋탑박스(100)는 오디오 디코더(115)로부터 출력되는 압축 해제된 오디오 신호를 샘플링하여 기준 오디오 신호 샘플들을 제공하도록 구성된 지연값 결정부(145)를 더 포함할 수 있다. 일 실시예에서, 지연값 결정부(145)는 압축 해제된 오디오 신호를 선정된 시간 구간 동안 디지틀 샘플링하도록 작동될 수 있다. 일 실시예에서, 지연값 결정부(145)는 A/D 변환기(135)에 의한 샘플링과 동기되어 압축 해제된 오디오 신호를 샘플링한다. 이러한 실시예의 경우 기준 오디오 신호 샘플들 및 루프백 오디오 신호 샘플들은 동일한 시간 윈도우(time window) 내의 샘플들이 된다. 지연값 결정부(145)는 기준 오디오 신호 샘플들 및 루프백 오디오 신호 샘플들에 기초하여 지연 값을 결정하여 출력하도록 더 구성될 수 있다. 일 실시예에서, 지연값 결정부(145)는 기준 오디오 신호 샘플들 및 루프백 오디오 신호 샘플들 간의 시차 값을 산출하고 시차 값에 기초하여 지연 값을 결정하도록 더 구성될 수 있다.The set-top box 100 may further include a delay value determiner 145 configured to provide reference audio signal samples by sampling the decompressed audio signal output from the audio decoder 115 . In an embodiment, the delay value determiner 145 may be operated to digitally sample the decompressed audio signal for a predetermined time period. In an embodiment, the delay value determiner 145 samples the decompressed audio signal in synchronization with sampling by the A/D converter 135 . For this embodiment the reference audio signal samples and the loopback audio signal samples are samples within the same time window. The delay value determiner 145 may be further configured to determine and output a delay value based on the reference audio signal samples and the loopback audio signal samples. In an embodiment, the delay value determiner 145 may be further configured to calculate a disparity value between the reference audio signal samples and the loopback audio signal samples and determine the delay value based on the disparity value.

지연값 결정부(145)에서 기준 오디오 신호 샘플들 및 루프백 오디오 신호 샘플들 간의 시차 값을 산출하는 과정을 개념적으로 설명하기 위한 도면인 도 4를 참조하면, 지연값 결정부(145)는 도시된 바와 같이 루프백 오디오 신호 샘플들 중의 임의의 샘플의 샘플 시간(T')과 이에 해당하는 기준 오디오 신호 샘플의 샘플 시간(T) 간의 차이(T' - T)를 시차 값으로서 산출할 수 있다. 일 실시예에서, 지연값 결정부(145)는 기준 오디오 신호 샘플들과 루프백 오디오 신호 샘플들 간에 상호 상관(cross-correlation)을 취해봄으로써 시차 값을 결정하도록 구성된다. 기준 오디오 신호 샘플들은 셋탑박스(100)에서 출력하는 압축 해제된 비디오 신호와 동기가 맞는 압축 해제된 오디오 신호를 샘플링한 신호 샘플들이므로, TV 세트가 라우드 스피커를 통해 방사한 오디오 출력을 셋탑박스(100)에서 다시 취음하여 수집한 신호 샘플들인 루프백 오디오 신호 샘플들과 기준 오디오 신호 샘플들 간에 산출된 시차는 TV 세트에서 실제로 재생되는 비디오 출력과 오디오 출력 간의 시차를 반영하는 것으로 볼 수 있다. 일 실시예에서, 지연값 결정부(145)는 시차 값을 지연 값으로서 결정하도록 구성된다. 일 실시예에서, 지연값 결정부(145)는 시차 값에서 처리 지연 값을 뺀 값을 지연 값으로서 결정하도록 구성된다. 일 실시예에서, 처리 지연 값은 마이크(130), A/D 변환기(135) 및 노이즈 제거 필터부(140)를 거치는 동안 발생될 수 있는 지연과 지연값 결정부(145)에서 발생될 수 있는 지연을 고려하여 선험적으로 결정될 수 있다.Referring to FIG. 4 , which is a diagram for conceptually explaining a process in which the delay value determiner 145 calculates a disparity value between the reference audio signal samples and the loopback audio signal samples, the delay value determiner 145 may include As described above, a difference (T′ - T) between a sample time (T′) of an arbitrary sample among loopback audio signal samples and a sample time (T) of a corresponding reference audio signal sample may be calculated as a disparity value. In an embodiment, the delay value determining unit 145 is configured to determine the disparity value by cross-correlation between the reference audio signal samples and the loopback audio signal samples. Since the reference audio signal samples are signal samples obtained by sampling the decompressed audio signal synchronized with the decompressed video signal output from the set-top box 100, the TV set transmits the audio output emitted through the loudspeaker to the set-top box ( 100), the disparity calculated between the loopback audio signal samples and the reference audio signal samples, which are signal samples collected again, may be considered to reflect the disparity between the video output and the audio output actually reproduced in the TV set. In an embodiment, the delay value determining unit 145 is configured to determine the disparity value as the delay value. In an embodiment, the delay value determining unit 145 is configured to determine a value obtained by subtracting a processing delay value from a disparity value as a delay value. In one embodiment, the processing delay value is a delay that may be generated while passing through the microphone 130 , the A/D converter 135 , and the noise removal filter unit 140 , and may be generated by the delay value determiner 145 . It can be determined a priori taking into account the delay.

지연 발생부(117)는 디멀티플렉서(110)으로부터 출력되는 압축된 비디오 신호를 입력받아 그대로 출력하다가 지연값 결정부(145)로부터 지연 값을 입력받는 것에 응답하여 압축된 비디오 신호를 입력받아 지연 값만큼 지연시켜 출력하도록 더 구성될 수 있다. 일 실시예에서, 오디오 디코더(115)는 지연 발생부(117)가 지연값 결정부(145)로부터 지연 값을 입력받는 것에 응답하여 비활성화되도록 작동될 수 있다. 이러한 실시예의 경우, 비디오 디코더(120)는 오디오 디코더(115)와의 동기 제어 동작을 멈추고 지연 발생부(117)로부터 지연 값만큼 지연되어 입력되는 압축된 비디오 신호를 디코딩하여 압축 해제된 비디오 신호를 출력함으로써 압축 해제된 비디오 신호를 그 만큼 지연시켜 출력하게 되는 셈이고, 그 결과 TV 세트에서 출력되는 비디오 출력과 오디오 출력 간의 시차가 보상될 수 있게 된다. 일 실시예에서, A/D 변환기(135), 노이즈 제거 필터부(140) 및 지연값 결정부(145) 또한 지연 발생부(117)가 지연값 결정부(145)로부터 지연 값을 입력받는 것에 응답하여 비활성화되도록 작동될 수 있다.The delay generator 117 receives the compressed video signal output from the demultiplexer 110 and outputs it as it is, and receives the compressed video signal in response to the delay value input from the delay value determiner 145 and receives the compressed video signal as much as the delay value. It may be further configured to output with a delay. In an embodiment, the audio decoder 115 may be operated such that the delay generator 117 is deactivated in response to receiving a delay value from the delay value determiner 145 . In this embodiment, the video decoder 120 stops the synchronization control operation with the audio decoder 115 and decodes the compressed video signal inputted with delay by the delay value from the delay generator 117 to output the decompressed video signal. This means that the decompressed video signal is output with that much delay, and as a result, the time difference between the video output and the audio output output from the TV set can be compensated. In an embodiment, the A/D converter 135 , the noise removal filter unit 140 , and the delay value determiner 145 also delay the delay generator 117 in receiving the delay value from the delay value determiner 145 . It can be actuated to be deactivated in response.

도 2는 도 1의 셋탑박스와 접속될 수 있는 TV 세트의 블록도의 일 실시예를 도시한 도면이다. 도 2에 도시된 구성부들 또한 TV 세트의 모든 기능을 반영한 것이 아니고 필수적인 것도 아니어서, TV 세트는 도시된 구성부들 보다 많은 구성부들을 포함하거나 그 보다 적은 구성부들을 포함할 수 있음을 인식하여야 한다.FIG. 2 is a diagram illustrating an embodiment of a block diagram of a TV set that can be connected to the set-top box of FIG. 1 . It should be appreciated that the components shown in FIG. 2 also do not reflect all functions of the TV set and are not essential, so that the TV set may include more or fewer components than the illustrated components. .

도 2에 도시된 바와 같이, TV 세트(200)는 셋탑박스(100)와의 접속을 위한 입력 단자(210)를 포함한다. 일 실시예에서, 입력 단자(210)는 HDMI 단자이다. TV 세트(200)는 입력 단자(210)에 접속된 TV 패널(220)을 더 포함할 수 있다. 일 실시예에서, TV 패널(220)은 TN(Twisted Nematic) 패널, IPS(In-plane Switching) 패널 및 VA(Vertical Alignment) 패널과 같은 LCD(Liquid Crystal Display) 패널, OLED(Organic Light Emitting Diode) 패널 또는 AMOLED(Active Matrix Organic Light Emitting Diode) 패널일 수 있으나, TV 패널(220)의 종류가 이에 제한되는 것이 아님을 인식하여야 한다. TV 세트(200)는 입력 단자(210)에 접속된 오디오 디코더(230)를 더 포함할 수 있다. 오디오 디코더(230)는 입력 단자(210)로부터 입력되는 압축된 오디오 신호를 디코딩하여 압축 해제된 오디오 신호를 출력하도록 구성될 수 있다. TV 세트(200)는 오디오 디코더(230)에 접속된 D/A 변환기(240)를 더 포함할 수 있다. D/A 변환기(240)는 디지틀 신호인 압축 해제된 오디오 신호를 애널로그 신호로 변환하도록 구성될 수 있다. TV 세트(200)는 D/A 변환기(240)에 접속된 라우드 스피커(250)를 더 포함할 수 있다. 라우드 스피커(250)는 TV 패널(220)에 디스플레이되는 비디오에 믹싱된 오디오를 음파로 출력하기 위해 사용될 수 있다. 도시되지는 않았지만 일 실시예에서 TV 세트(200)는 외장 라우드 스피커를 포함할 수 있다.As shown in FIG. 2 , the TV set 200 includes an input terminal 210 for connection to the set-top box 100 . In one embodiment, the input terminal 210 is an HDMI terminal. The TV set 200 may further include a TV panel 220 connected to the input terminal 210 . In one embodiment, the TV panel 220 is a liquid crystal display (LCD) panel, organic light emitting diode (OLED), such as a twisted nematic (TN) panel, an in-plane switching (IPS) panel, and a vertical alignment (VA) panel. It may be a panel or an Active Matrix Organic Light Emitting Diode (AMOLED) panel, but it should be recognized that the type of the TV panel 220 is not limited thereto. The TV set 200 may further include an audio decoder 230 connected to the input terminal 210 . The audio decoder 230 may be configured to output a decompressed audio signal by decoding the compressed audio signal input from the input terminal 210 . The TV set 200 may further include a D/A converter 240 connected to the audio decoder 230 . The D/A converter 240 may be configured to convert a decompressed audio signal that is a digital signal into an analog signal. The TV set 200 may further include a loudspeaker 250 connected to the D/A converter 240 . The loudspeaker 250 may be used to output audio mixed with the video displayed on the TV panel 220 as sound waves. Although not shown, in one embodiment, the TV set 200 may include an external loudspeaker.

도 3은 립싱크 제어 기능을 구비한 셋탑박스의 블록도의 제2 실시예를 도시한 도면이다. 도 3에 도시된 구성부들은 셋탑박스의 모든 기능을 반영한 것이 아니고 필수적인 것도 아니어서, 셋탑박스는 도시된 구성부들 보다 많은 구성부들을 포함하거나 그 보다 적은 구성부들을 포함할 수 있음을 인식하여야 한다.3 is a diagram illustrating a second embodiment of a block diagram of a set-top box having a lip sync control function. The components shown in FIG. 3 do not reflect all functions of the set-top box and are not essential, so it should be recognized that the set-top box may include more components or fewer components than the illustrated components. .

도 3에 도시된 셋탑박스(300)의 구성부들인 디멀티플렉서(310), 오디오 디코더(315), 출력 단자(323), 마이크(330), A/D 변환기(335), 노이즈 제거 필터부(340) 및 지연값 결정부(345)는 도 1에 도시된 셋탑박스(100)에서 대응하는 구성부들과 대체로 동일한 기능을 하며 동일한 방식으로 구성될 수 있으므로 이들에 대한 설명은 생략하기로 한다. 도 3에 도시된 셋탑박스(300)는 지연 발생부(317) 및 비디오 디코더(320)의 접속 방식에 있어 도 1에 도시된 셋탑박스(100)와 차이가 있다. 비디오 디코더(320)는 디멀티플렉서(310)로부터 제1 단자(312)를 통해 압축된 비디오 신호를 입력받고 이를 디코딩하여 압축 해제된 비디오 신호를 출력하도록 구성되고, 지연 발생부(317)는 비디오 디코더(320)로부터 압축 해제된 비디오 신호를 입력받아 출력 단자(323)로 그대로 출력하다가 지연값 결정부(345)로부터 지연 값을 입력받는 것에 응답하여 압축 해제된 비디오 신호를 입력받아 지연 값만큼 지연시켜 출력 단자(323)로 출력하도록 구성될 수 있다. 도 1에 도시된 셋탑박스(100)에서의 비디오 신호 처리는 선 지연, 후 디코딩의 방식으로 이루어지는 반면, 도 3에 도시된 셋탑박스(300)에서의 비디오 신호 처리는 선 디코딩, 후 지연의 방식으로 이루어 진다고 할 수 있다.The demultiplexer 310 , the audio decoder 315 , the output terminal 323 , the microphone 330 , the A/D converter 335 , and the noise removal filter unit 340 which are components of the set-top box 300 shown in FIG. 3 . ) and the delay value determining unit 345 have substantially the same function as the corresponding components in the set-top box 100 shown in FIG. 1 and may be configured in the same manner, so a description thereof will be omitted. The set-top box 300 shown in FIG. 3 is different from the set-top box 100 shown in FIG. 1 in a connection method between the delay generator 317 and the video decoder 320 . The video decoder 320 is configured to receive a compressed video signal from the demultiplexer 310 through the first terminal 312, decode it, and output a decompressed video signal, and the delay generator 317 includes the video decoder ( 320) receives the decompressed video signal and outputs it as it is to the output terminal 323, and in response to receiving the delay value from the delay value determiner 345, receives the decompressed video signal and delays the output by the delay value. It may be configured to output to the terminal 323 . The video signal processing in the set-top box 100 shown in FIG. 1 is performed by a method of pre-delay and post-decoding, whereas the video signal processing in the set-top box 300 shown in FIG. 3 is performed by a method of pre-decoding and post-delay. can be said to be made with

이상의 설명에서는 셋탑박스(100, 300)의 구성부들/회로들의 각각을 분리된 별도의 구성 요소로서 설명하였으나, 셋탑박스(100, 300)의 모든 구성부들/회로들 또는 일부 구성부들/회로들을 ASIC(Application Specific Integrated Circuit), PLD(Programmable Logic Device) 또는 FPGA(Field-Programmable Gate Array) 설계 방식에 기반하여 단일의 IC 칩으로 구현할 수 있다.In the above description, each of the components/circuits of the set-top boxes 100 and 300 has been described as separate and separate components, but all components/circuits or some components/circuits of the set-top boxes 100 and 300 are ASICs. Based on the (Application Specific Integrated Circuit), PLD (Programmable Logic Device) or FPGA (Field-Programmable Gate Array) design method, it can be implemented with a single IC chip.

도 5는 셋탑박스에서 수행되는 립싱크 제어 방법을 설명하기 위한 흐름도의 제1 실시예를 도시한 도면이다.5 is a diagram illustrating a first embodiment of a flowchart for explaining a lip sync control method performed in a set-top box.

본 립싱크 제어 방법은 압축된 비디오 신호와 압축된 오디오 신호를 포함하는 신호, 예를 들면 돌비 애트모스 신호를 수신하는 단계(S505)로부터 시작된다. 단계(S510)에서는 디멀티플렉서(110)에 의해 돌비 애트모스 신호를 압축된 비디오 신호와 압축된 오디오 신호로 분리하여 제1 단자(112) 및 제2 단자(113)로 각각 출력한다. 단계(S515)에서는 압축된 오디오 신호가 오디오 디코더(115)로 입력되게 하여 오디오 디코더(115)로부터 압축 해제된 오디오 신호가 출력되도록 하고, 압축된 비디오 신호가 비디오 디코더(120)로 입력되게 하여 비디오 디코더(120)로부터 압축 해제된 비디오 신호가 출력되도록 한다. 본 단계에서 비디오 디코더(120)는 오디오 디코더(115)로 입력되는 압축된 오디오 신호의 헤더(header)에 포함된 시간 정보를 참조하여 출력되는 압축 해제된 오디오 신호와 동기를 맞추어 압축 해제된 비디오 신호가 출력되도록 동기 제어 동작을 수행할 수 있다. 단계(S520)에서는 압축된 오디오 신호 및 압축 해제된 비디오 신호가 TV 세트(200)로 공급되도록 한다. 본 단계는 실질적인 동작을 요하는 것이 아니라 비디오 디코더(120)의 출력을 출력 단자(123)에 접속하고 디멀티플렉서(110)의 제2 단자(113)를 출력 단자(123)에 접속함으로써 내재적으로 실행될 수 있음에 유의하여야 한다. 단계(S525)에서는 TV 세트(200)로부터 출력되는 사운드 출력을 마이크(130)를 이용하여 전기 신호로 변환한다. 단계(S530)에서는 A/D 변환기(135)를 이용하여 전기 신호를 A/D 변환하여 루프백 오디오 신호 샘플들을 제공한다. 본 단계에서는 A/D 변환기(135)를 이용하여 전기 신호를 A/D 변환하여 디지틀 샘플들을 제공하고 노이즈 제거 필터부(140)에 의해 제공된 디지틀 샘플들에 대해 노이즈 필터링을 수행하여 루프백 오디오 샘플들을 제공할 수 있다. 단계(S535)에서는 지연값 결정부(145)에 의해 오디오 디코더(115)로부터 출력되는 압축 해제된 오디오 신호를 샘플링하여 기준 오디오 신호 샘플들을 제공한다. 일 실시예에서 본 단계는 전기 신호를 A/D 변환하여 루프백 오디오 신호 샘플들을 제공하는 전술한 단계(S530)와 서로 동기를 맞추어 실행될 수 있다. 이러한 실시예에 따르면 기준 오디오 신호 샘플들 및 루프백 오디오 신호 샘플들은 동일한 시간 윈도우 내의 샘플들일 수 있다. 단계(S540)에서는 지연값 결정부(145)에 의해 기준 오디오 신호 샘플들 및 루프백 오디오 신호 샘플들 간의 시차 값을 산출하고 산출된 시차 값에 근거하여 지연 값을 결정하고, 지연 발생부(117)로 하여금 압축된 비디오 신호가 비디오 디코더(120)로 입력되는 것이 지연 값만큼 지연되게 하도록 한다. 일 실시예에서, 지연값 결정부(145)는 지연 값을 시차 값과 동일한 값으로 결정한다. 일 실시예에서, 지연값 결정부(145)는 시차 값에서 처리 지연 값을 뺀 값을 지연 값으로서 결정한다.The lip sync control method starts with a step ( S505 ) of receiving a signal including a compressed video signal and a compressed audio signal, for example, a Dolby Atmos signal. In step S510 , the demultiplexer 110 separates the Dolby Atmos signal into a compressed video signal and a compressed audio signal, and outputs them to the first terminal 112 and the second terminal 113 , respectively. In step S515 , the compressed audio signal is input to the audio decoder 115 , the decompressed audio signal is output from the audio decoder 115 , and the compressed video signal is inputted to the video decoder 120 , so that the video A decompressed video signal is output from the decoder 120 . In this step, the video decoder 120 synchronizes with the decompressed audio signal output with reference to the time information included in the header of the compressed audio signal input to the audio decoder 115, and the decompressed video signal A synchronous control operation may be performed to output . In step S520 , the compressed audio signal and the decompressed video signal are supplied to the TV set 200 . This step does not require actual operation, but can be implicitly performed by connecting the output of the video decoder 120 to the output terminal 123 and the second terminal 113 of the demultiplexer 110 to the output terminal 123 . It should be noted that there is In step S525 , the sound output from the TV set 200 is converted into an electrical signal using the microphone 130 . In step S530, an electric signal is A/D converted using the A/D converter 135 to provide loopback audio signal samples. In this step, an electric signal is A/D converted using the A/D converter 135 to provide digital samples, and noise filtering is performed on the digital samples provided by the noise removal filter unit 140 to generate loopback audio samples. can provide In step S535 , the decompressed audio signal output from the audio decoder 115 is sampled by the delay value determiner 145 to provide reference audio signal samples. In an embodiment, this step may be performed in synchronization with the aforementioned step S530 of providing loopback audio signal samples by A/D converting an electrical signal. According to this embodiment, the reference audio signal samples and the loopback audio signal samples may be samples within the same time window. In step S540 , a disparity value between the reference audio signal samples and the loopback audio signal samples is calculated by the delay value determiner 145 , and a delay value is determined based on the calculated disparity value, and the delay generating unit 117 . causes the input of the compressed video signal to the video decoder 120 to be delayed by a delay value. In an embodiment, the delay value determiner 145 determines the delay value to be the same as the disparity value. In an embodiment, the delay value determiner 145 determines a value obtained by subtracting a processing delay value from a disparity value as a delay value.

도 6은 셋탑박스에서 수행되는 립싱크 제어 방법을 설명하기 위한 흐름도의 제2 실시예를 도시한 도면이다.6 is a diagram illustrating a second embodiment of a flowchart for explaining a lip sync control method performed in a set-top box.

본 립싱크 제어 방법은 압축된 비디오 신호와 압축된 오디오 신호를 포함하는 신호, 예를 들면 돌비 애트모스 신호를 수신하는 단계(S605)로부터 시작된다. 단계(S610)에서는 디멀티플렉서(310)에 의해 돌비 애트모스 신호를 압축된 비디오 신호와 압축된 오디오 신호로 분리하여 제1 단자(312) 및 제2 단자(313)로 각각 출력한다. 단계(S615)에서는 오디오 디코더(315)에 의해 압축된 오디오 신호를 디코딩하여 압축 해제된 오디오 신호를 출력한다. 단계(S620)에서는 비디오 디코더(320)에 의해 압축된 비디오 신호를 디코딩하여 압축 해제된 비디오 신호를 출력한다. 본 단계에서 비디오 디코더(320)는 압축된 오디오 신호의 헤더에 포함된 시간 정보를 참조하여 출력되는 압축 해제된 오디오 신호와 동기를 맞추어 압축 해제된 비디오 신호가 출력되도록 동기 제어 동작을 수행할 수 있다. 단계(S625)에서는 압축된 오디오 신호 및 압축 해제된 비디오 신호가 TV 세트(200)로 공급되도록 한다. 본 단계는 실질적인 동작을 요하는 것이 아니라 비디오 디코더(320)의 출력이 지연 발생부(317)를 통해 지연 없이 그대로 출력 단자(323)로 전달되고 디멀티플렉서(310)의 제2 단자(313)를 출력 단자(323)에 접속함으로써 내재적으로 실행될 수 있음에 유의하여야 한다. 단계(S630)에서는 TV 세트(200)로부터 출력되는 사운드 출력을 마이크(330)를 이용하여 전기 신호로 변환한다. 단계(S635)에서는 A/D 변환기(335)를 이용하여 전기 신호를 A/D 변환하여 루프백 오디오 신호 샘플들을 제공한다. 본 단계에서는 A/D 변환기(335)를 이용하여 전기 신호를 A/D 변환하여 디지틀 샘플들을 제공하고 노이즈 제거 필터부(340)에 의해 제공된 디지틀 샘플들에 대해 노이즈 필터링을 수행하여 루프백 오디오 샘플들을 제공할 수 있다. 단계(S640)에서는 지연값 결정부(345)에 의해 오디오 디코더(315)로부터 출력되는 압축 해제된 오디오 신호를 샘플링하여 기준 오디오 신호 샘플들을 제공한다. 일 실시예에서 본 단계는 전기 신호를 A/D 변환하여 루프백 오디오 신호 샘플들을 제공하는 전술한 단계(S635)와 서로 동기를 맞추어 실행될 수 있다. 이러한 실시예에 따르면 기준 오디오 신호 샘플들 및 루프백 오디오 신호 샘플들은 동일한 시간 윈도우 내의 샘플들일 수 있다. 단계(S645)에서는 지연값 결정부(345)에 의해 기준 오디오 신호 샘플들 및 루프백 오디오 신호 샘플들 간의 시차 값을 산출하고 산출된 시차 값에 근거하여 지연 값을 결정하고, 지연 발생부(317)로 하여금 압축 해제된 비디오 신호가 TV 세트(200)로 공급되는 것이 지연 값만큼 지연되게 하도록 한다. 일 실시예에서, 지연값 결정부(345)는 지연 값을 시차 값과 동일한 값으로 결정한다. 일 실시예에서, 지연값 결정부(345)는 시차 값에서 처리 지연 값을 뺀 값을 지연 값으로서 결정한다.The lip sync control method starts with a step ( S605 ) of receiving a signal including a compressed video signal and a compressed audio signal, for example, a Dolby Atmos signal. In step S610 , the demultiplexer 310 separates the Dolby Atmos signal into a compressed video signal and a compressed audio signal, and outputs them to the first terminal 312 and the second terminal 313 , respectively. In step S615, the audio signal compressed by the audio decoder 315 is decoded to output the decompressed audio signal. In step S620, the video signal compressed by the video decoder 320 is decoded to output the decompressed video signal. In this step, the video decoder 320 may perform a synchronization control operation so that the decompressed video signal is output in synchronization with the output decompressed audio signal with reference to time information included in the header of the compressed audio signal. . In step S625 , the compressed audio signal and the decompressed video signal are supplied to the TV set 200 . This step does not require an actual operation, but the output of the video decoder 320 is directly transferred to the output terminal 323 without delay through the delay generator 317 , and the second terminal 313 of the demultiplexer 310 is output. It should be noted that this can be done implicitly by connecting to terminal 323 . In step S630 , the sound output from the TV set 200 is converted into an electrical signal using the microphone 330 . In step S635, an electric signal is A/D converted using the A/D converter 335 to provide loopback audio signal samples. In this step, the A/D converter 335 is used to A/D convert the electrical signal to provide digital samples, and noise filtering is performed on the digital samples provided by the noise removal filter unit 340 to generate loopback audio samples. can provide In step S640 , the decompressed audio signal output from the audio decoder 315 is sampled by the delay value determiner 345 to provide reference audio signal samples. In an embodiment, this step may be performed in synchronization with the aforementioned step ( S635 ) of A/D converting an electrical signal to provide loopback audio signal samples. According to this embodiment, the reference audio signal samples and the loopback audio signal samples may be samples within the same time window. In step S645, the delay value determining unit 345 calculates a disparity value between the reference audio signal samples and the loopback audio signal samples, and determines a delay value based on the calculated disparity value, and the delay generation unit 317 causes the supply of the decompressed video signal to the TV set 200 to be delayed by a delay value. In an embodiment, the delay value determiner 345 determines the delay value to be the same as the disparity value. In an embodiment, the delay value determiner 345 determines a value obtained by subtracting a processing delay value from a disparity value as a delay value.

이상의 설명에 있어서 어떤 구성 요소가 다른 구성 요소에 접속되거나 결합된다는 기재의 의미는 당해 구성 요소가 그 다른 구성 요소에 직접적으로 접속되거나 결합된다는 의미뿐만 아니라 이들이 그 사이에 개재된 하나 또는 그 이상의 타 구성 요소를 통해 접속되거나 결합될 수 있다는 의미를 포함하는 것으로 이해되어야 한다. 이외에도 구성 요소들 간의 관계를 기술하기 위한 용어들(예컨대, '간에', '사이에' 등)도 유사한 의미로 해석되어야 한다.In the above description, the meaning of the description that a component is connected to or coupled to another component means that the component is directly connected or coupled to the other component, as well as one or more other components interposed therebetween. It should be understood to include the meaning that may be connected or coupled through an element. In addition, terms for describing the relationship between elements (eg, 'between', 'between', etc.) should also be interpreted with similar meanings.

본원에 개시된 실시예들에 있어서, 도시된 구성 요소들의 배치는 발명이 구현되는 환경 또는 요구 사항에 따라 달라질 수 있다. 예컨대, 일부 구성 요소가 생략되거나 몇몇 구성 요소들이 통합되어 하나로 실시될 수 있다. 또한 일부 구성 요소들의 배치 순서 및 연결이 변경될 수 있다.In the embodiments disclosed herein, the arrangement of the illustrated components may vary depending on the environment or requirements in which the invention is implemented. For example, some components may be omitted or some components may be integrated and implemented as one. Also, the arrangement order and connection of some components may be changed.

이상에서는 본 발명의 다양한 실시예들에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예들에 한정되지 아니하며, 상술한 실시예들은 첨부하는 특허청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양하게 변형 실시될 수 있음은 물론이고, 이러한 변형 실시예들이 본 발명의 기술적 사상이나 범위와 별개로 이해되어져서는 아니 될 것이다. 따라서, 본 발명의 기술적 범위는 오직 첨부된 특허청구범위에 의해서만 정해져야 할 것이다.In the above, various embodiments of the present invention have been shown and described, but the present invention is not limited to the specific embodiments described above, and the above-described embodiments depart from the gist of the present invention as claimed in the appended claims. Without this, various modifications may be made by those of ordinary skill in the art to which the present invention pertains, and these modified embodiments should not be understood separately from the technical spirit or scope of the present invention. Accordingly, the technical scope of the present invention should be defined only by the appended claims.

100, 300: 셋탑박스
110, 310: 디멀티플렉서
112, 312: 제1 단자
113, 313: 제2 단자
115, 315: 오디오 디코더
117, 317: 지연 발생부
120, 320: 비디오 디코더
123, 323: 출력 단자
130, 330: 마이크
135, 335: A/D 변환기
140, 340: 노이즈 제거 필터부
145, 345: 지연값 결정부
200: TV 세트
210: 입력 단자
220: TV 패널
230: 오디오 디코더
240: D/A 변환기
250: 라우드 스피커100, 300: set-top box
110, 310: demultiplexer
112, 312: first terminal
113, 313: second terminal
115, 315: audio decoder
117, 317: delay generating unit
120, 320: video decoder
123, 323: output terminal
130, 330: microphone
135, 335: A/D converter
140, 340: noise removal filter unit
145, 345: delay value determining unit
200: TV set
210: input terminal
220: TV panel
230: audio decoder
240: D/A converter
250: loudspeaker

Claims

A lip sync control method comprising:
decoding the compressed video signal to provide a decompressed video signal;
causing the decompressed video signal to be output to a TV set along with a compressed audio signal;
decoding the compressed audio signal to provide a decompressed audio signal;
sampling the decompressed audio signal to provide reference audio signal samples;
providing loopback audio signal samples via a sound output output from the TV set; and
and controlling a timing at which the compressed video signal is decoded or a timing at which the decompressed video signal is output to the TV set based on comparing the reference audio signal samples and the loopback audio signal samples. control method.

According to claim 1,
The step of converting a sound output output from the TV set into an electrical signal to provide loopback audio signal samples may include converting the sound output output from the TV set into an electrical signal, and A/D converting the electrical signal to provide the loopback audio signal samples. A method of controlling lip sync comprising the step of providing audio signal samples.

3. The method of claim 2,
The A/D conversion of the electrical signal to provide the loopback audio signal samples may include: A/D converting the electrical signal to provide digital samples and performing noise filtering on the provided digital samples to provide the loopback audio signal A method of controlling lip sync comprising providing samples.

3. The method of claim 2,
The A/D-converting the electrical signal to provide loopback audio signal samples and the sampling the decompressed audio signal to provide reference audio signal samples are performed in synchronization with each other.

5. The method of claim 4,
wherein the reference audio signal samples and the loopback audio signal samples are samples within the same time window.

According to claim 1,
Controlling the timing at which the compressed video signal is decoded or the timing at which the decompressed video signal is output to the TV set based on comparing the reference audio signal samples and the loopback audio signal samples comprises: Calculating a disparity value between audio signal samples and the loopback audio signal samples, and based on the disparity value, the decompressed video signal is output to the TV set or the compressed video signal is decoded and decompressed A method of controlling lip sync, comprising the step of delaying presentation of a video signal.

7. The method of claim 6,
Calculating a disparity value between the reference audio signal samples and the loopback audio signal samples, and based on the disparity value, the decompressed video signal is output to the TV set or the compressed video signal is decoded and compressed delaying the presentation of the released video signal comprising:
determining a delay value based on the disparity value; and
and causing the output of the decompressed video signal to be output to the TV set or the decoding of the compressed video signal to be provided as a decompressed video signal delayed by the delay value.

8. The method of claim 7,
and the delay value is the same as the parallax value.

9. The method of claim 8,
The delay value is a value obtained by subtracting a processing delay value from the disparity value.

According to claim 1,
receiving a Dolby Atmos signal comprising the compressed video signal and the compressed audio signal; and
Separating the Dolby Atmos signal into the compressed video signal and the compressed audio signal.

According to claim 1,
The decoding of the compressed video signal to provide a decompressed video signal may include referring to time information included in a header of the compressed audio signal to synchronize with the provided decompressed audio signal and to provide the decompressed video signal. A method of controlling lip sync comprising the step of providing a video signal.

According to claim 1,
The TV set includes a TV, lip sync control method.

According to claim 1,
wherein the TV set includes a TV and a loudspeaker connected to the TV.

As a set-top box,
A demultiplexer configured to demultiplex a signal including a compressed video signal and a compressed audio signal to output a compressed video signal to a first terminal and output a compressed audio signal to a second terminal, wherein the second terminal is connected to a TV connected to the output terminal for -,
an audio decoder configured to decode the compressed audio signal output to the second terminal to output a decompressed audio signal;
a delay generator configured to receive the compressed video signal output to the first terminal and output it as it is or with a delay;
a video decoder configured to decode the compressed video signal output from the delay generator and output a decompressed video signal to the output terminal;
a sound wave converter configured to receive a sound wave output from the TV and provide loopback audio signal samples; and
a delay value determiner configured to sample the decompressed audio signal to provide reference audio signal samples, and to determine and output a delay value based on the reference audio signal samples and the loopback audio signal samples;
The delay generator is further configured to receive and output the compressed video signal as it is, and to receive the compressed video signal in response to receiving the delay value from the delay value determiner and delay the output by the delay value. set top box.

15. The method of claim 14,
The audio decoder is a software-based Atmos audio decoder, a set-top box.

15. The method of claim 14,
The sound wave conversion unit includes a microphone configured to convert the sound wave output from the TV into an electrical signal,
The sound wave converter is further configured to A/D convert the electrical signal to provide loopback audio signal samples.

15. The method of claim 14,
and a denoising filter unit configured to denoise the loopback audio signal samples.

15. The method of claim 14,
wherein the reference audio signal samples and the loopback audio signal samples are samples within the same time window.

15. The method of claim 14,
The delay value determining unit is further configured to calculate a disparity value between the reference audio signal samples and the loopback audio signal samples and determine the delay value based on the disparity value.

15. The method of claim 14,
The set-top box, wherein the audio decoder, the sound wave conversion unit, and the delay value determining unit is operated to be deactivated in response to the delay generating unit receiving the delay value from the delay value determining unit.

As a set-top box,
A demultiplexer configured to demultiplex a signal including a compressed video signal and a compressed audio signal to output a compressed video signal to a first terminal and output a compressed audio signal to a second terminal, wherein the second terminal is connected to a TV connected to the output terminal for -,
an audio decoder configured to decode the compressed audio signal output to the second terminal to output a decompressed audio signal;
a video decoder configured to decode the compressed video signal output to the first terminal to output a decompressed video signal;
a delay generator configured to receive the decompressed video signal and output it as it is or with a delay to the output terminal;
a sound wave converter configured to receive a sound wave output from the TV and provide loopback audio signal samples; and
a delay value determiner configured to sample the decompressed audio signal to provide reference audio signal samples, and to determine and output a delay value based on the reference audio signal samples and the loopback audio signal samples;
The delay generator receives the decompressed video signal and outputs it as it is, and in response to receiving the delay value from the delay value determiner, receives the decompressed video signal and delays the output by the delay value. A set-top box.

22. The method of claim 21,
The audio decoder is a software-based Atmos audio decoder, a set-top box.

22. The method of claim 21,
The sound wave conversion unit includes a microphone configured to convert the sound wave output from the TV into an electrical signal,
The sound wave converter is further configured to A/D convert the electrical signal to provide loopback audio signal samples.

22. The method of claim 21,
and a denoising filter unit configured to denoise the loopback audio signal samples.

22. The method of claim 21,
wherein the reference audio signal samples and the loopback audio signal samples are samples within the same time window.

22. The method of claim 21,
The delay value determining unit is further configured to calculate a disparity value between the reference audio signal samples and the loopback audio signal samples and determine the delay value based on the disparity value.

22. The method of claim 21,
The set-top box, wherein the audio decoder, the sound wave conversion unit, and the delay value determining unit is operated to be deactivated in response to the delay generating unit receiving the delay value from the delay value determining unit.