JPH099224A

JPH099224A - Dynamic image and speech codec using lip-synchronous controller

Info

Publication number: JPH099224A
Application number: JP15194995A
Authority: JP
Inventors: Yoshikazu Fukuhara; 義和福原; Hideyuki Yoshida; 秀行吉田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1995-06-19
Filing date: 1995-06-19
Publication date: 1997-01-10

Abstract

PURPOSE: To match the movement of the mouth of an object with speech. CONSTITUTION: A motion vector detection circuit 41 improves performance by making a block size small when the movement of the object image picked up by a camera 1 is large, outputs the lip-synchronous control signals Ls of a delay level corresponding to the arithmetic amount of the block matching arithmetic circuit group to a lip-synchronous control circuit 14a and performs lip synchronization. Also, when the movement of the object is small, the block size is enlarged, the time of the arithmetic processing of the block matching arithmetic circuit group is substantially shortened and the delay time of the lip synchronization is adaptively shortened.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、テレビ会議システム等
の通信機器分野における動画像および音声コーデック装
置に関し、特に音声データの圧縮後および伸長前に行う
リップシンク制御装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a moving image and audio codec device in the field of communication equipment such as a video conference system, and more particularly to a lip sync control device for performing compression and expansion of audio data.

【０００２】[0002]

【従来の技術】近年、リップシンク制御装置は、画像コ
ーデック側の画像データの圧縮、伸長による遅れが画像
の動きによって変化するにもかかわらず音声データの遅
延量は伝送レートによる固定制御が主流となっている。
以下、図面を参照しながら従来のリップシンク制御装置
について説明を行う。2. Description of the Related Art In recent years, in a lip sync control device, although the delay due to the compression and expansion of the image data on the image codec side changes depending on the motion of the image, the delay amount of the audio data is mainly controlled by the transmission rate. Has become.
Hereinafter, a conventional lip sync control device will be described with reference to the drawings.

【０００３】図５は従来の動きベクトル検出回路を用い
た動画像および音声コーデック装置の機能ブロック図で
ある。まず、画像系はカメラ１からの入力ビデオ信号
(ＮＴＳＣ)はＡ／Ｄ変換器２によってＡ／Ｄ変換され、
ライン変換器３でITC-TH261(例えばＣＩＦ)という世界
共通の中間フォーマットに変換が施される。そして、Ｃ
ＩＦ変換により符号化した映像信号は動きベクトル検出
回路４にて動き補償を施し、符号化およびＤＣＴ部５で
動き補償された画像フレーム間予測信号に符号化され
る。また、動きベクトル検出回路４から生成される前画
像フレームの再生画像信号との差分信号を符号化および
ＤＣＴ部５のＤＣＴにて離散コサイン変換を行い変換係
数をジグザグスキャン配列にした後、量子化部６で量子
化される。そして、その量子化されたデータを可変長符
号化部７にてさらにデータ圧縮した後、誤り検出訂正部
８で誤り訂正符号化を行い、マルチメディア多重，分離
部９によって定められたデータ構造に従う一つのビット
列に多重化して公衆回線網Ｎへ伝送される。FIG. 5 is a functional block diagram of a moving picture and audio codec device using a conventional motion vector detecting circuit. First, the image system is the input video signal from the camera 1.
(NTSC) is A / D converted by the A / D converter 2,
The line converter 3 performs conversion to an intermediate format commonly used in the world called ITC-TH261 (for example, CIF). And C
The video signal encoded by the IF conversion is motion-compensated by the motion vector detection circuit 4, and is encoded and encoded by the DCT unit 5 into a motion-compensated inter-frame prediction signal. In addition, the difference signal from the reproduced image signal of the previous image frame generated from the motion vector detection circuit 4 is encoded, and the DCT of the DCT unit 5 performs discrete cosine transform to transform coefficients into a zigzag scan array, and then quantized. It is quantized by the unit 6. Then, after the quantized data is further compressed by the variable length coding unit 7, the error detection and correction unit 8 performs error correction coding and follows the data structure defined by the multimedia multiplexing and demultiplexing unit 9. It is multiplexed into one bit string and transmitted to the public network N.

【０００４】次に音声系はマイク11からの入力音声信号
はＡ／Ｄ変換器12によってＡ／Ｄ変換され、音声符号化
器13で音声圧縮され、リップシンク制御回路14にて画像
系の遅れに見合った音声の遅延がＣＰＵ10より制御線Ｌ
₁を介して固定制御される。そして、マルチメディア多
重，分離部９によって定められたデータ構造に従う一つ
のビット列に多重化して公衆回線網Ｎへ伝送される。Next, in the audio system, the input audio signal from the microphone 11 is A / D converted by the A / D converter 12, compressed by the audio encoder 13, and delayed by the lip sync control circuit 14 in the image system. The delay of the voice corresponding to
Fixed control via ₁ . Then, it is multiplexed into one bit string according to the data structure defined by the multimedia multiplexing and demultiplexing unit 9 and transmitted to the public line network N.

【０００５】なお、ＣＰＵ10は制御検出部15を介してカ
メラ１の方向制御と、それに伴う音声符号化器13の制御
およびリップシンク制御回路14の固定制御を行う。The CPU 10 controls the direction of the camera 1 via the control detector 15, and controls the audio encoder 13 and the lip sync control circuit 14 in accordance with the direction control.

【０００６】図６は図５に示す従来の動きベクトル検出
回路４のブロック図であり、16は、入力画像としての16
×16画素からなるブロックデータａおよび記憶画像とし
ての32×32画素からなるブロックデータｂを入力して、
記憶画像のブロックデータｂから読み出すブロックの位
置を少しずつずらしていくウィンドサーチ回路、17はブ
ロックマッチング演算を行い残差ｆを計算する256個の
演算回路を備えた演算エレメント部、18は得られた残差
値から最小値の残差ｆｍを求める比較回路、19は動きベ
クトル情報として、動きベクトル(座標Ｖx，Ｖy、残差
ｆｍ)と同一座標での残差ｆｓをシリアル出力にする出
力回路である。FIG. 6 is a block diagram of the conventional motion vector detecting circuit 4 shown in FIG. 5, in which 16 is an input image.
Input block data a consisting of × 16 pixels and block data b consisting of 32 × 32 pixels as a stored image,
A window search circuit that gradually shifts the position of the block read from the block data b of the stored image, 17 is an arithmetic element unit having 256 arithmetic circuits for performing a block matching operation and calculating a residual f, and 18 is obtained. A comparison circuit for obtaining the minimum residual fm from the residual value, and 19 is an output circuit for serially outputting the residual fs at the same coordinates as the motion vector (coordinates Vx, Vy, residual fm) as motion vector information. Is.

【０００７】まず、演算エレメント部17でのブロックマ
ッチング演算は、次の(数１)で表されるブロックデータ
ａ，ｂとの差分絶対値の総和を残差ｆとして計算してい
る。First, in the block matching calculation in the calculation element section 17, the sum of absolute differences between the block data a and b expressed by the following (Equation 1) is calculated as a residual f.

【０００８】[0008]

【数１】 [Equation 1]

【０００９】なお、32×32画素のブロックデータｂを16
×16画素のウィンド(ブロックデータａ)でサーチするの
で、水平、垂直方向に各々16回、合計256回の試行が必
要になる。The block data b of 32 × 32 pixels is converted into 16
Since the search is performed using the window of 16 pixels (block data a), 16 trials are required in each of the horizontal and vertical directions, for a total of 256 trials.

【００１０】このように各演算エレメント部17の演算回
路17a，17b……17nからの残差出力ｆは合計256個の比較
回路18に出力されるが、比較回路18はこれらの大小を比
較して最小値の残差ｆｍを検出し、その時のクロック係
数値から動きベクトルの座標Ｖx，Ｖyを算出している。In this way, the residual output f from the arithmetic circuits 17a, 17b ... 17n of each arithmetic element unit 17 is output to a total of 256 comparing circuits 18, which compare the magnitudes of these. Then, the minimum residual fm is detected, and the coordinates Vx and Vy of the motion vector are calculated from the clock coefficient value at that time.

【００１１】そして音声系に対して、映像系のこのよう
な動きベクトル演算等による遅れの情報は伝達されない
ため、図５のリップシンク制御回路14は、映像系の平均
的な遅れから推測した固定設定値となっている。なおこ
の固定設定値は、片方向０〜500msec程度となってお
り、ＣＰＵ10から設定される。Since the delay information due to such motion vector calculation of the video system is not transmitted to the audio system, the lip sync control circuit 14 of FIG. 5 uses the fixed delay estimated from the average delay of the video system. It is the set value. The fixed set value is about 0 to 500 msec in one direction and is set by the CPU 10.

【００１２】[0012]

【発明が解決しようとする課題】しかしながら、上記し
た従来の動きベクトル検出回路では、演算エレメント部
17での全探索によるブロックマッチング演算を実行し、
被写体の動きレベルに関わらず全ての演算を行っている
ためリップシンクの遅延時間を動きレベルに応じて切り
換える手段を持たず固定に設定していた。そのため被写
体の口の動きと声が一致しない現象が起き自然性が損な
われていた。However, in the above-mentioned conventional motion vector detection circuit, the arithmetic element unit is
Execute block matching operation by full search in 17,
Since all calculations are performed regardless of the motion level of the subject, the delay time of the lip sync has been set to a fixed value without a means for switching according to the motion level. As a result, the movement of the subject's mouth and the voice did not match, and the naturalness was impaired.

【００１３】本発明は上記課題を解決し、第１の発明の
目的は連続するフレーム間で画素ブロックの動きを合理
的に推測し、かつ毎秒に処理する画像フレーム数を制御
(間引き)して、被写体の動きのレベルに応じた演算を行
って画像フレーム間予測精度の改善をはかり、間引きし
た画像フレームの動きベクトルは予測検出して補い、演
算の試行回数を削減して効率の良い演算を実行すること
が可能な動きベクトル検出回路と、かつその動きベクト
ルの演算遅延情報をリップシンク制御回路に伝送し適応
的に音声データを遅延するものである。The present invention solves the above problems, and an object of the first invention is to reasonably estimate the motion of a pixel block between consecutive frames and control the number of image frames to be processed per second.
(Thinning) to perform calculation according to the motion level of the subject to improve the prediction accuracy between image frames, and predict and compensate the motion vector of the thinned image frames to reduce the number of calculation trials. A motion vector detection circuit capable of executing efficient calculation, and a motion vector calculation delay information of the motion vector are transmitted to a lip sync control circuit to adaptively delay audio data.

【００１４】また、第２の発明の目的は、連続する画像
フレーム間の伝送速度を合理的に推定し、その画像フレ
ーム間の伝送速度情報をリップシンク制御装置に伝送し
適応的に音声データを遅延するものである。A second object of the present invention is to reasonably estimate the transmission rate between consecutive image frames, transmit the transmission rate information between the image frames to a lip sync controller, and adaptively transmit voice data. It is delayed.

【００１５】[0015]

【課題を解決するための手段】本発明の第１の発明は、
その目的を達成するため被写体の動きに応じてカメラの
方向を制御するＣＰＵと、前記ＣＰＵがカメラを方向制
御する際、毎秒処理する画像フレーム数を制御する画像
フレーム制御回路と、前記画像フレーム制御回路からの
出力から間引かれた画像フレームの動きベクトルを予測
する動きベクトル予測検出回路と、入力画像の被写体の
動きレベルを判別するフィルターおよびユーザーがブロ
ックサイズを選択可能なスイッチ回路群と、ブロックマ
ッチング演算を行うウィンドサーチ回路およびブロック
マッチング演算回路群と、前記ウィンドサーチ回路およ
びブロックマッチング演算回路群の出力の大小比較を行
う比較回路群と、前記比較回路群の出力を動きベクトル
として出力する出力回路とからなる動きベクトル検出回
路並びに前記比較回路群からのｎビット遅延情報を遅延
させる遅延レベル出力回路と、前記画像フレーム制御回
路からの出力から間引かれた画像フレームのリップシン
クを予測するリップシンク予測検出回路と、符号化され
た音声データをリップシンク制御信号によって前記音声
データの遅延量が決定される遅延テーブルを切り換える
マルチプレクサとからなるリップシンク制御回路を有す
ることを特徴とする。The first invention of the present invention is as follows:
To achieve the object, a CPU that controls the direction of the camera according to the movement of a subject, an image frame control circuit that controls the number of image frames processed every second when the CPU controls the direction of the camera, and the image frame control A motion vector prediction / detection circuit that predicts a motion vector of an image frame decimated from the output from the circuit, a filter that determines the motion level of the subject of the input image, and a switch circuit group that allows the user to select the block size, and a block A window search circuit and a block matching calculation circuit group that perform matching calculation, a comparison circuit group that compares the outputs of the window search circuit and the block matching calculation circuit group, and an output that outputs the output of the comparison circuit group as a motion vector Motion vector detection circuit including circuit and the comparison A delay level output circuit that delays n-bit delay information from the path group, a lip sync prediction detection circuit that predicts a lip sync of an image frame decimated from the output from the image frame control circuit, and encoded audio A lip sync control circuit comprising a multiplexer for switching a delay table in which a delay amount of the audio data is determined by a lip sync control signal for data is provided.

【００１６】また、本発明の第２の発明は、その目的を
達成するため、カメラからの動画像を圧縮する画像圧縮
回路と、マイクからの音声を圧縮する音声圧縮回路と、
前記画像圧縮回路で処理された画像ごとの時系列一貫番
号ＴＲデータを読み出し、単位時間内のＴＲデータ差よ
り前画像フレームと次画像フレーム間の伝送時間を判別
し、前記判別された時間より、予め得られている画像フ
レーム間時間と音声データ同期ずれ量テーブルから音声
信号を遅延させ画像と音声の同期をとるＣＰＵとからな
るリップシンク制御回路を有することを特徴とする。In order to achieve the object, a second invention of the present invention comprises an image compression circuit for compressing a moving image from a camera, and an audio compression circuit for compressing a voice from a microphone.
The time series consistent number TR data for each image processed by the image compression circuit is read, the transmission time between the previous image frame and the next image frame is discriminated from the TR data difference within a unit time, and from the discriminated time, It is characterized in that it has a lip sync control circuit composed of a CPU for delaying an audio signal from an inter-image frame time and an audio data synchronization deviation amount table obtained in advance to synchronize an image and audio.

【００１７】[0017]

【作用】第１の発明によれば、被写体の動きが大きいと
き、ブロックサイズを小さくして性能を向上させその演
算量に応じた遅延レベルでリップシンクが動作し、被写
体の動きが小さいときには、ブロックサイズを大きくし
て演算処理の時間を大幅に短縮することができ、適応的
にリップシンクの遅延時間も短くする。また、カメラの
動きに対しては音声を伴った追随性の向上を可能にし、
またユーザーがブロックサイズを選択して画質を調整す
ることも可能になる。According to the first aspect of the present invention, when the movement of the subject is large, the block size is reduced to improve the performance and the lip sync operates at a delay level according to the amount of calculation, and when the movement of the subject is small, The block size can be increased to greatly reduce the calculation processing time, and the lip sync delay time can be adaptively shortened. In addition, it makes it possible to improve the followability with sound for the movement of the camera,
It also allows the user to select the block size and adjust the image quality.

【００１８】また、第２の発明によれば、ＣＰＵが画像
圧縮回路で処理された画像ごとの時系列一貫番号ＴＲデ
ータを読み出し、単位時間内のＴＲデータ差より前画像
フレームと次画像フレーム間の伝送時間を判別し、前記
判別された時間より、予め得られている画像フレーム間
時間と音声データ同期ずれ量テーブルから音声信号を遅
延させ画像と音声の同期をとることができる。According to the second aspect of the invention, the CPU reads the time-series consistent number TR data for each image processed by the image compression circuit, and detects the difference between the previous image frame and the next image frame from the TR data difference within a unit time. It is possible to synchronize the image and the voice by delaying the voice signal from the time between image frames and the voice data synchronization deviation amount table which are obtained in advance from the determined transmission time.

【００１９】[0019]

【実施例】図１は本発明の第１の実施例におけるリップ
シンク制御装置を用いた動画像および音声コーデック装
置の機能ブロック図である。図１において前記従来例の
図５と同じ機能ブロックには同じ符号を付し、その説明
を省略する。ここで、従来例とは異なり、画像系の遅れ
に見合った音声の遅延が、ＣＰＵ10より固定制御するこ
となく、動きベクトル検出回路４aからの制御線Ｌ₂を介
してリップシンク制御回路14aへリップシンク制御信号
Ｌsを適応的に加え制御する点に特徴がある。FIG. 1 is a functional block diagram of a moving image and audio codec device using a lip sync control device according to a first embodiment of the present invention. In FIG. 1, the same functional blocks as those in FIG. 5 of the conventional example are designated by the same reference numerals, and the description thereof will be omitted. Here, unlike the conventional example, the sound delay corresponding to the delay of the image system is not fixedly controlled by the CPU 10 and is lip-controlled to the lip sync control circuit 14a via the control line L ₂ from the motion vector detection circuit 4a. It is characterized in that the sync control signal Ls is adaptively added and controlled.

【００２０】図２は図１の動きベクトル検出回路４aお
よびリップシンク制御回路14aのブロック図を示す。動
きベクトル検出回路４aにおける21は入力画像の被写体
の動きレベルを判別するフィルターおよびユーザーがブ
ロックサイズを選択可能なスイッチ21a，21b……21nか
らなるスイッチ回路群(以下、フィルターおよびスイッ
チ回路群という)である。22はブロックマッチング演算
を行うウィンドサーチ回路およびブロックマッチング演
算回路群である。これは、ブロック内にブロックサイズ
22a，22b……22nFIG. 2 is a block diagram of the motion vector detection circuit 4a and the lip sync control circuit 14a shown in FIG. A reference numeral 21 in the motion vector detection circuit 4a is a switch circuit group consisting of a filter for discriminating the motion level of the object of the input image and switches 21a, 21b. Is. Reference numeral 22 denotes a window search circuit and a block matching calculation circuit group that perform a block matching calculation. This is the block size within the block
22a, 22b …… 22n

【００２１】[0021]

【外１】 [Outside 1]

【００２２】を有し、図２の左から順にブロックササイ
ズは半減していくように配置されている。そして、後述
するようにブロックサイズが小さくなると、計算量が増
加し、左から順に遅延が小さい方から大きい方へ移行す
る。23は前記ウィンドサーチ回路およびブロックマッチ
ング演算回路群22の出力の大小比較を行う比較回路群で
ある。The block sizes are arranged in order from the left of FIG. Then, as will be described later, as the block size decreases, the amount of calculation increases, and the delay shifts from the smaller delay to the larger delay. Reference numeral 23 is a comparison circuit group for comparing the outputs of the window search circuit and the block matching calculation circuit group 22.

【００２３】これらフィルターおよびスイッチ回路群2
1，ウィンドサーチ回路およびブロックマッチング演算
回路群22，比較回路群23は、それぞれ１対１に対応する
ｎ＋１個のユニットからなる。また24は出力回路、26は
画像フレーム制御回路であり、ＣＰＵ10がカメラ１を方
向制御する際、毎秒処理する画像フレーム数を制御す
る。27は動きベクトル予測検出回路であり、前記画像フ
レーム制御回路26からの出力から間引かれた画像フレー
ムの動きベクトル20を予測する。28は前記ブロックサイ
ズ22a，22b……22nをユーザーが選択するためのスイッ
チである。These filter and switch circuit group 2
1, the window search circuit and block matching arithmetic circuit group 22, and the comparison circuit group 23 are each composed of n + 1 units corresponding to each other. Further, 24 is an output circuit, and 26 is an image frame control circuit, which controls the number of image frames to be processed per second when the CPU 10 controls the direction of the camera 1. A motion vector prediction / detection circuit 27 predicts the motion vector 20 of the image frame thinned out from the output from the image frame control circuit 26. 28 is a switch for the user to select the block size 22a, 22b ... 22n.

【００２４】次にリップシンク制御回路14aにおける29
は前記比較回路群23からのｎビット遅延情報(図１のリ
ップシンク制御信号Ｌs)を遅延させる遅延レベル出力回
路である。30は前記画像フレーム26からの出力から間引
かれた画像フレームのリップシンクを予測するリップシ
ンク予測検出回路である。31は音声符号化器13で符号化
された音声データ33をリップシンク制御信号Ｌsによっ
て前記音声データ33の遅延量が決定される遅延テーブル
32(０，５……500ms)を切り換えるマルチプレクサであ
る。Next, 29 in the lip sync control circuit 14a
Is a delay level output circuit for delaying the n-bit delay information (lip sync control signal Ls in FIG. 1) from the comparison circuit group 23. Reference numeral 30 is a lip sync prediction detection circuit for predicting lip sync of the image frames thinned out from the output from the image frame 26. Reference numeral 31 is a delay table in which the amount of delay of the audio data 33 encoded by the audio encoder 13 is determined by the lip sync control signal Ls.
It is a multiplexer that switches between 32 (0, 5 ... 500 ms).

【００２５】図３は本発明の第１の実施例におけるブロ
ックマッチングの基準画像と前画像フレーム検索領域と
の幾何学的関係図であって、図中、Ｃは基準画像であ
り、Ｍ×Ｎブロックでなる現フレームを示し、Ｄは前画
像フレームにおける探索領域画像である。以上のように
構成された図２に示す動きベクトル検出回路とリップシ
ンク制御回路の動作を説明する。FIG. 3 is a geometrical relationship diagram between a reference image for block matching and a previous image frame search area in the first embodiment of the present invention, in which C is a reference image and M × N. The current frame consisting of blocks is shown, and D is the search area image in the previous image frame. The operation of the motion vector detection circuit and the lip sync control circuit shown in FIG. 2 configured as described above will be described.

【００２６】今、第ｎフレームの１つ前のフレームであ
るところの第(ｎ−１)フレームの画像が送信、受信の両
端において蓄えられるとし、この情報と次の第ｎフレー
ムの差分情報をもとにして、被写体の動きエリアを検出
する。それには、画面をサブブロックに分割しておき、
このサブブロックをいずれかの方向にどれだけの画素数
を動かしたとき、第ｎフレーム内でパターンマッチング
がとれるかで動きベクトル(方向と移動量)を検出する。
このことをすべてのサブブロックについて行い動きベク
トルを伝送する。同時に送信端ではこの動きベクトルを
使用して動き予測した第ｎフレーム画像データを作る。
実際の第ｎフレーム画像データとの差分は予測誤差と呼
ばれ符号化し伝送される。受信端では、既に蓄えられて
いる第(ｎ−１)フレーム情報と送られてきた動きベクト
ルおよび予測差分から第ｎフレームを復元する。Now, assuming that the image of the (n-1) th frame, which is the frame immediately preceding the nth frame, is stored at both ends of transmission and reception, this information and the difference information of the next nth frame are stored. Based on this, the moving area of the subject is detected. To do that, divide the screen into sub-blocks,
The motion vector (direction and movement amount) is detected depending on how many pixels the sub-block is moved in any direction and pattern matching can be achieved in the nth frame.
This is done for all sub-blocks and the motion vector is transmitted. At the same time, at the transmitting end, the motion vector is used to create the motion-predicted nth frame image data.
The difference from the actual nth frame image data is called a prediction error and is encoded and transmitted. At the receiving end, the nth frame is restored from the already stored (n-1) th frame information, the transmitted motion vector and prediction difference.

【００２７】ここで、図２のブロックマッチング演算回
路群22におけるブロックマッチング演算では、図３にお
いて、前フレームは(Ｍ＋２ｐ)×(Ｎ＋２ｐ)の全領域で
あって、サブブロックＣ(基準画像)は、その動きのレベ
ルに応じて、動きがなければ(Ｍ＋２ｐ)×(Ｎ＋２ｐ)を
サブブロックＣとして、あるいは、動きレベルが大きけ
れば、最大(Ｎ×Ｎ)サイズとしてブロックマッチング演
算を行う。このときの探索する領域は、前フレームＤの
うち、サブブロックＣに相当する領域ごとに(Ｄ領域)Ｃ
と比較する。Here, in the block matching calculation in the block matching calculation circuit group 22 of FIG. 2, the previous frame is the entire area of (M + 2p) × (N + 2p) in FIG. 3, and the sub-block C (reference image) is Depending on the level of the movement, if there is no movement, (M + 2p) × (N + 2p) is set as the sub-block C, or if the movement level is large, the maximum (N × N) size is used for the block matching calculation. The area to be searched at this time is (D area) C for each area corresponding to the sub-block C in the previous frame D.
Compare with

【００２８】そして(数２)で表されるサブブロックＣ，
ブロックＤとの差分絶対値の総和を残差ｆとして計算し
ている。Then, the sub-block C represented by (Equation 2),
The sum of absolute differences between the block D and the block D is calculated as the residual f.

【００２９】[0029]

【数２】 [Equation 2]

【００３０】そしてこの処理では、動き推定は±Ｐ画素
またはライン／フレームの範囲で行われ、全画素シフト
(水平方向)および全ラインシフト(垂直方向)を考えた力
ずくの探索は(２Ｐ)²回のコスト関数の計算を必要とす
る。In this process, motion estimation is performed in the range of ± P pixels or lines / frames, and all pixels are shifted.
The brute force search considering (horizontal direction) and all line shifts (vertical direction) requires (2P) ² cost function calculations.

【００３１】その際、前述のようにブロックサイズ(Ｍ
×Ｎ)を動きレベルに応じて変化させるとコスト関数の
計算量も変化するため、例えば静止状態のときはブロッ
クサイズ(Ｍ×Ｎ)を大きく設定して、コスト関数の計算
量を減少させ、効率よくすることができる。At this time, the block size (M
Since the calculation amount of the cost function also changes when (× N) is changed according to the motion level, for example, in the stationary state, the block size (M × N) is set to be large to reduce the calculation amount of the cost function. Can be efficient.

【００３２】ここで、図２に示すように、フィルターお
よびスイッチ回路群21の各段間には次段がオンすると切
れるスイッチ28が設けられているので、例えば、記憶画
像のブロックデータｂが静止画像であるとき、初段のフ
ィルターおよびスイッチ回路21aがオンするとともに、
フィルターおよびスイッチ回路群21bとの間にスイッチ2
8がウィンドサーチ回路およびブロックマッチング演算
回路群22の初段のブロックサイズ22a(Ｍ／２⁰×Ｎ／
２⁰)のみを有効とし、次段以下を無効とするので、最も
大きなブロックサイズによるマッチング演算が行われ
る。Here, as shown in FIG. 2, a switch 28 that is turned off when the next stage is turned on is provided between each stage of the filter and switch circuit group 21, so that, for example, the block data b of the stored image remains static. When it is an image, the first stage filter and switch circuit 21a is turned on,
Switch 2 between filter and switch circuit group 21b
8 is the block size 22a (M / 2 ⁰ × N / of the first stage of the window search circuit and block matching arithmetic circuit group 22)
2 ⁰⁾ and only the valid, since the disabled next stage following the matching operation by the largest block size is performed.

【００３３】そして、これより記憶画像のブロックデー
タｂの動きレベルが上昇すると、２段目のフィルターお
よびスイッチ回路21bと２段目のウィンドサーチ回路お
よびブロックマッチング演算回路群22の22b(ブロックサ
イズＭ／２¹×Ｎ／２¹)のみ有効となる。さらに動きベ
クトルに応じてブロックサイズを変更することにより、
計算量をできるだけ、少なくして、通信機器の追随性を
向上することができる。When the motion level of the block data b of the stored image rises from this, the second stage filter and switch circuit 21b and the second stage window search circuit and 22b of the block matching arithmetic circuit group 22 (block size M / 2 ¹ × N / 2 ¹ ) is only valid. Furthermore, by changing the block size according to the motion vector,
The amount of calculation can be reduced as much as possible to improve the followability of the communication device.

【００３４】このフィルターおよびスイッチ回路群21は
ＣＰＵ10からもブロックサイズ22a，22b……22nを切り
換え可能なスイッチ構成になっており、ブロックサイズ
を固定させるマニュアル制御も可能である。The filter and switch circuit group 21 has a switch structure in which the CPU 10 can switch the block sizes 22a, 22b ... 22n, and manual control for fixing the block size is also possible.

【００３５】次に図２に示す動きベクトル検出回路４a
およびリップシンク制御回路14aを用いた図１の動作を
説明する。ＣＰＵ10はカメラ１の方向制御をしている場
合、ＣＰＵ10はその方向制御内容に基づいて、ブロック
サイズおよび画像フレーム数の制御を行う。例えばカメ
ラ１の制御速度が速い場合、カメラ１の動きに対する追
従性を向上させるための手段は以下のとおりである。Next, the motion vector detecting circuit 4a shown in FIG.
The operation of FIG. 1 using the lip sync control circuit 14a will be described. When the CPU 10 controls the direction of the camera 1, the CPU 10 controls the block size and the number of image frames based on the content of the direction control. For example, when the control speed of the camera 1 is high, the means for improving the followability to the movement of the camera 1 is as follows.

【００３６】まず、第１にＣＰＵ10はブロックサイズを
許容範囲内で大きく設定して、コスト関数の計算量を減
少させて、ブロック歪み等画質は低下するが動きに対す
る追従性を優先させることができる。そして、制御線Ｌ
₂へのリップシンク制御信号Ｌsは、選択されたブロック
サイズに基づいたｎビットの情報が伝送される。First, the CPU 10 sets the block size large within an allowable range to reduce the calculation amount of the cost function, and the image quality is deteriorated such as block distortion, but the followability to motion can be prioritized. . And the control line L
_As the lip sync control signal Ls to ₂ , the n-bit information based on the selected block size is transmitted.

【００３７】第２に画質は低下させずに動きに対する追
従性を向上させるため、ＣＰＵ10はブロックサイズを小
さく設定し、コスト関数が増加して処理時間が長くなっ
た分、毎秒に処理する画像フレーム数を制御(間引き)し
て、情報量を削減する。そして間引きした画像フレーム
の動きベクトルは前後の画像フレームの動きベクトルか
ら予測検出して補足する。そして、制御線Ｌ₂のリップ
シンク制御信号Ｌsも、選択されたブロックサイズに基
づいたｎビットの情報と間引きした画像フレームは前後
の遅延レベルから予測検出したｎビットの情報が伝送さ
れる。このリップシンク制御信号の遅延レベルの一例を
表１に示す。Secondly, the CPU 10 sets the block size to be small in order to improve the followability to the motion without degrading the image quality, and the cost function increases and the processing time becomes longer. Control (decimate) the number to reduce the amount of information. Then, the motion vector of the thinned image frame is predicted and detected from the motion vector of the preceding and succeeding image frames to be supplemented. The lip sync control signal Ls of the control line L ₂ is also n-bit information based on the selected block size and n-bit information predicted and detected from the delay levels before and after the thinned image frame is transmitted. Table 1 shows an example of the delay level of the lip sync control signal.

【００３８】[0038]

【表１】 [Table 1]

【００３９】以上説明したように本実施例は、被写体の
動きの速さに応じたブロックサイズで動きベクトル演算
処理をするため、リップシンク処理もその選択されたブ
ロックサイズを基準に遅延設定される。これによって被
写体の動きレベルに関わらず、被写体の口の動きと声が
一致するようになる。As described above, in this embodiment, since the motion vector calculation process is performed with the block size corresponding to the speed of movement of the subject, the lip sync process is also delayed and set based on the selected block size. . As a result, the movement of the subject's mouth matches the voice regardless of the movement level of the subject.

【００４０】図４は本発明の第２の実施例におけるリッ
プシンク制御装置を用いた動画像および音声コーデック
装置の機能ブロック図である。前記図１と同じ機能ブロ
ックには同じ符号を付し、その説明を省略する。図４に
おいて、34はカメラ１からの動画像を圧縮する画像圧縮
回路、35はマイク11からの音声を圧縮する音声圧縮回
路、36は前記音声圧縮回路35で圧縮された音声をＣＰＵ
10の制御により所定時間だけ遅延させる信号遅延回路、
37は予め得られている画像フレーム間時間と音声データ
同期ずれ量デーブル、38は回線インターフェース(Ｉ／
Ｆ)である。FIG. 4 is a functional block diagram of a moving picture and audio codec device using a lip sync control device according to the second embodiment of the present invention. The same functional blocks as those in FIG. 1 are designated by the same reference numerals, and the description thereof will be omitted. In FIG. 4, 34 is an image compression circuit for compressing a moving image from the camera 1, 35 is a voice compression circuit for compressing the voice from the microphone 11, 36 is a CPU for the voice compressed by the voice compression circuit 35.
A signal delay circuit that delays for a predetermined time by controlling 10
Reference numeral 37 is a previously obtained time between image frames and audio data synchronization deviation amount table, and 38 is a line interface (I / I).
F).

【００４１】次に動作を説明すると、まずカメラ１から
入った画像信号は画像圧縮回路34「ＩＴＣ−ＴＨ．26
1」のアルゴリズムで圧縮される。次にマルチメディア
多重・分離部９で音声信号やその他データと多重化され
回線インターフェース38を介して公衆回線網Ｎに伝送さ
れる。Next, the operation will be described. First, the image signal input from the camera 1 is the image compression circuit 34 "ITC-TH.26".
Compressed with 1 ”algorithm. Next, it is multiplexed with a voice signal and other data by the multimedia multiplexing / separating unit 9 and transmitted to the public line network N through the line interface 38.

【００４２】ここで、「ＩＴＣ−ＴＨ．261」のアルゴ
リズム上で処理されている画像ごとの時系列一貫番号
“ＴＲ”データをＣＰＵ10に読み込み画像搬出スピード
を演算する。画像ごとのスピードは次の手順で求める。
まずＣＰＵ10に読み込んだ“ＴＲ”データを｛ｎ２｝と
して記憶する。次に設定する単位時間後“ＴＲ”データ
を読み込む、これを｛ｎ１｝とする。この２つのデータ
より｛ｎ１−ｎ２｝を演算すると単位時間内の画像の駒
数が得られる。この単位時間内の“ＴＲ”データ差より
前画像フレームと次画像フレーム間の伝送時間を判別す
る。Here, the time series consistent number "TR" data for each image processed on the "ITC-TH.261" algorithm is read into the CPU 10 to calculate the image discharge speed. The speed for each image is calculated by the following procedure.
First, the "TR" data read by the CPU 10 is stored as {n2}. The "TR" data is read after the unit time set next, and this is designated as {n1}. By calculating {n1-n2} from these two data, the number of frames of the image within the unit time can be obtained. The transmission time between the previous image frame and the next image frame is discriminated from the "TR" data difference within this unit time.

【００４３】判別された時間より予め得られている画像
フレーム間時間と音声データ同期ずれ量テーブル37から
音声信号処理回路ラインに挿入されている信号遅延回路
36を制御し音声信号を遅延させ画像と音声の同期あわせ
を行う。A signal delay circuit inserted in the audio signal processing circuit line from the inter-image frame time and audio data synchronization deviation amount table 37 obtained in advance from the determined time.
Controls 36 to delay the audio signal and synchronize the image and audio.

【００４４】以上説明したように本実施例は、画像圧縮
ブロックより搬出している画像フレームスピードから画
像処理スピードを判断してリップシンク処理の遅延量を
制御する。これによって、画像フレームスピードが速い
ときは、遅延量が少なく逆に遅いときは遅延量を大きく
して被写体の口の動きと音声が一致するようになる。As described above, in the present embodiment, the image processing speed is judged from the image frame speed carried out from the image compression block to control the delay amount of the lip sync processing. As a result, when the image frame speed is fast, the delay amount is small, and conversely, when the image frame speed is slow, the delay amount is increased so that the movement of the subject's mouth and the sound match.

【００４５】[0045]

【発明の効果】以上説明したように本発明の第１の発明
は、被写体の動きの速さに応じたブロックサイズで動き
ベクトル演算処理されるため、リップシンク処理もその
選択されたブロックサイズを基準に遅延されることによ
って、被写体の動きレベルに関わらず被写体の口の動き
と音声が一致するようになる。As described above, according to the first aspect of the present invention, since the motion vector calculation process is performed with the block size according to the speed of movement of the subject, the lip sync process also uses the selected block size. By being delayed with respect to the reference, the movement of the subject's mouth and the voice match regardless of the movement level of the subject.

【００４６】また、第２の発明は、画像圧縮ブロックよ
り搬出している画像フレームスピードから画像処理スピ
ードを判断してリップシンク処理の遅延量を制御するこ
とによって、画像フレームスピードが速いときは、遅延
量が少なく逆に遅いときは遅延量を大きくして被写体の
口の動きと音声が一致するようになる。The second aspect of the invention is to judge the image processing speed from the image frame speed carried out from the image compression block and control the delay amount of the lip sync processing so that when the image frame speed is high, When the amount of delay is small and conversely slow, the amount of delay is increased to match the movement of the subject's mouth with the sound.

[Brief description of drawings]

【図１】本発明の第１の実施例におけるリップシンク制
御装置を用いた動画像および音声コーデック装置の機能
ブロック図である。FIG. 1 is a functional block diagram of a moving picture and audio codec device using a lip sync control device according to a first embodiment of the present invention.

【図２】図１の動きベクトル検出回路およびリップシン
ク制御回路のブロック図である。FIG. 2 is a block diagram of a motion vector detection circuit and a lip sync control circuit of FIG.

【図３】本発明の第１の実施例におけるブロックマッチ
ングの基準画像と前画像フレーム探索領域の幾何学的関
係図である。FIG. 3 is a geometrical relationship diagram between a reference image for block matching and a previous image frame search area in the first embodiment of the present invention.

【図４】本発明の第２の実施例におけるリップシンク制
御装置を用いた動画像および音声コーデック装置の機能
ブロック図である。FIG. 4 is a functional block diagram of a moving image and audio codec device using a lip sync control device according to a second embodiment of the present invention.

【図５】従来の動きベクトル検出回路を用いた動画像お
よび音声コーデック装置の機能ブロック図である。FIG. 5 is a functional block diagram of a moving image and audio codec device using a conventional motion vector detection circuit.

【図６】図５の動きベクトル検出回路の機能ブロック図
である。6 is a functional block diagram of the motion vector detection circuit of FIG.

[Explanation of symbols]

１…カメラ、２，12…Ａ／Ｄ変換器、３…ライン変
換器、４a…動きベクトル検出回路、５…符号化お
よびＤＣＴ部、６…量子化部、７…可変長符号化
部、８…誤り検出訂正部、９…マルチメディア多
重，分離部、 10…ＣＰＵ、 11…マイク、 13…音声
符号化器、 14a…リップシンク制御回路、15…制御検
出部、 21…フィルターおよびスイッチ回路群、 22…
ウィンドサーチ回路およびブロックマッチング演算回路
群、 23…比較回路群、 24…出力回路、 26…画像フ
レーム制御回路、 27…動きベクトル予測検出回路、28
…スイッチ、 29…遅延レベル出力回路、 30…リップ
シンク予測検出回路、 31…マルチプレクサ、 32…遅
延テーブル、 34…画像圧縮回路、 35…音声圧縮回
路、 36…信号遅延回路、 37…画像フレーム間時間と
音声データ同期ずれ量テーブル、 38…回線インターフ
ェース(Ｉ／Ｆ)。1 ... Camera, 2, 12 ... A / D converter, 3 ... Line converter, 4a ... Motion vector detection circuit, 5 ... Coding and DCT section, 6 ... Quantization section, 7 ... Variable length coding section, 8 ... error detection / correction unit, 9 ... multimedia multiplexing, separation unit, 10 ... CPU, 11 ... microphone, 13 ... voice encoder, 14a ... lip sync control circuit, 15 ... control detection unit, 21 ... filter and switch circuit group , twenty two…
Wind search circuit and block matching arithmetic circuit group, 23 ... Comparison circuit group, 24 ... Output circuit, 26 ... Image frame control circuit, 27 ... Motion vector prediction detection circuit, 28
... switch, 29 ... delay level output circuit, 30 ... lip sync prediction detection circuit, 31 ... multiplexer, 32 ... delay table, 34 ... image compression circuit, 35 ... audio compression circuit, 36 ... signal delay circuit, 37 ... between image frames Time and audio data synchronization deviation amount table, 38 ... Line interface (I / F).

Claims

[Claims]

1. A CPU for controlling the direction of a camera according to the movement of a subject, an image frame control circuit for controlling the number of image frames processed every second when the CPU controls the direction of the camera, and the image frame control circuit. Motion vector predictive detection circuit that predicts the motion vector of the image frame thinned out from the output from the, the filter that determines the motion level of the subject of the input image and the switch circuit group that the user can select the block size, and the block matching A window search circuit and a block matching calculation circuit group that perform calculation, a comparison circuit group that compares the outputs of the window search circuit and the block matching calculation circuit group, and an output circuit that outputs the output of the comparison circuit group as a motion vector. From the motion vector detection circuit and the comparison circuit group consisting of Delay level output circuit for delaying the n-bit delay information, a lip sync prediction detection circuit for predicting the lip sync of the image frame thinned out from the output from the image frame control circuit, and the encoded audio data. A moving picture and audio codec device using a lip sync control device, comprising a lip sync control circuit including a multiplexer for switching a delay table in which a delay amount of the audio data is determined by a sync control signal.

2. The CPU, in the operation control of the block matching arithmetic circuit group, controls the size of the block size to reduce the calculation amount of the cost function and the information amount. A video and audio codec device using the lip sync control device according to 1.

3. An image compression circuit for compressing a moving image from a camera, and an audio compression circuit for compressing audio from a microphone,
The time series consistent number TR data for each image processed by the image compression circuit is read, the transmission time between the previous image frame and the next image frame is discriminated from the TR data difference within a unit time, and from the discriminated time, A lip sync control device having a lip sync control circuit including a CPU for delaying an audio signal from an image frame inter-frame time and an audio data synchronization deviation amount table obtained in advance to synchronize an image and audio is used. Video and audio codec device used.