JP2003061098A

JP2003061098A - Image processor, image processing method, recording medium and program

Info

Publication number: JP2003061098A
Application number: JP2001250392A
Authority: JP
Inventors: Tadashi Ohira; 正大平
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2001-08-21
Filing date: 2001-08-21
Publication date: 2003-02-28

Abstract

PROBLEM TO BE SOLVED: To provide an image processor that composites a sign language image together principal moving images and audio signals, so as to fully provide hard of hearing persons with a natural and easily understandable video services. SOLUTION: The image processor is provided for encoding a moving image on an object basis to generate a database for sign language images. The processor is provided with a moving picture entry means for receiving a moving image of people making communication in sign language, an object extracting means that extracts only the people making communication in sign language as objects for capturing moving images of shape information, a natural image encoding means that encodes the moving picture of the people making communication in sign language, a shape information image encoding means that encodes the moving image of the information on the shape, a multiplexer means that multiplexes the natural image after encoding with the shape information image, a text input means that receives the meaning of the sign language image as text data, and a database storage means that stores the database comprising the multiplexed image and the text data.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、健聴者から聴覚障
害者への通訳を行うシステムに関するものであり、特
に、画像と音声を持つコンテンツに対して手話画像を生
成、多重化し、必要に応じて合成表示する技術に関する
ものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system for interpreting from a hearing-impaired person to a hearing-impaired person. The present invention relates to a technology for composite display.

【０００２】[0002]

【従来の技術】手話は、手の位置、向き、移動方向、移
動速度、表情等の手段を使って情報を相手に伝える、聴
覚障害者のための言語で、音声を中心として発達してき
た健聴者の用いている自然言語とは異なった体系を持っ
ている。したがって、聴覚障害者が健聴者と会話を行う
際には音声言語の体系に属する自然言語を用いて筆談や
口話を行うよりも、手話で会話を行った方が楽である上
に伝達速度も速い。そのため、自然言語と手話を混在さ
せたシステムが望まれている。Sign language is a language for the hearing impaired that conveys information to the other party by means of the position, direction, moving direction, moving speed, facial expression, etc. of the hand. It has a system different from the natural language used by people. Therefore, when a deaf person has a conversation with a normal hearing person, it is easier and more convenient to have a conversation in sign language than to use a natural language that belongs to the spoken language system to have a conversation or a verbal conversation. Is also fast. Therefore, a system that mixes natural language and sign language is desired.

【０００３】従来の手話を生成する方法としては、CGア
ニメーションを用いるものや単語単位で撮影した画像を
そのままつないで表示する方法がある。As a conventional method of generating sign language, there are a method of using CG animation and a method of directly connecting and displaying images taken in word units.

【０００４】CGアニメーションを用いる方法では、単純
なものでは細部の情報がわかりにくく手話で最も重要な
手の動きがわかりにくくなる。対して複雑なアニメーシ
ョンを作ろうとすると高度な開発環境を必要とする。In the method using CG animation, it is difficult to understand detailed information with a simple one, and it becomes difficult to understand the most important hand movement in sign language. On the other hand, a sophisticated development environment is required to create complicated animation.

【０００５】一方で、コンピュータ・放送・通信など多
くの領域で利用できる、汎用的な次世代マルチメディア
符号化規格として、ＩＳＯ（International Organizati
on for Standardization：国際標準化機構）やＩＴＵ
（International Telecommunication Union：国際電気
通信連合）によって、MPEG-４の国際標準化作業が進め
られている。MPEG-4の大きな特徴として、オブジェクト
・ベース符号化を行う機能がある。On the other hand, ISO (International Organizati) is used as a general-purpose next-generation multimedia coding standard that can be used in many fields such as computer, broadcasting and communication.
on for Standardization: International Standardization Organization) and ITU
(International Telecommunication Union) is working on the international standardization of MPEG-4. A major feature of MPEG-4 is the ability to perform object-based coding.

【０００６】オブジェクト・ベース符号化とは、ＭＰＥ
Ｇ（Moving Picture Expert Group）-１やＭＰＥＧ-２
で採用されているような長方形の画像全体を符号化する
方法ではなく、予め何等かの方法で生成された形状情報
の画像によって、画像の中の切り出された人物やその他
の物体、つまり画像のオブジェクト毎に符号化を行う方
法である。以降ではこの形状情報画像と区別するため、
一般的に処理対象とする画像を自然画像と称す。Object-based coding is MPE.
G (Moving Picture Expert Group) -1 and MPEG-2
It is not the method of encoding the entire rectangular image that is adopted in the above, but the image of the shape information generated by some method in advance is used to cut out the person or other object in the image, that is, the image of the image. This is a method of encoding each object. In the following, to distinguish from this shape information image,
An image to be processed is generally called a natural image.

【０００７】形状情報画像とは、符号化対象となってい
る自然画像と全く同じ縦横の画素数を持ち、オブジェク
トの形状を表す一種の画像である。形状情報画像には各
画素が１bitで表されるバイナリ・アルファ・プレーン
と、各画素が２bit以上で表されるグレイスケール・ア
ルファ・プレーンがある。バイナリ・アルファ・プレー
ンは通常、画素の値が“１”の領域はオブジェクト領
域、“０”の領域はオブジェクト外の領域を表す。グレ
イスケール・アルファ・プレーンは、画素の値が“１以
上２５５以下”の領域はオブジェクト領域、“０”の領
域はオブジェクト外の領域を表す。The shape information image is a kind of image having the same number of vertical and horizontal pixels as the natural image to be encoded and representing the shape of the object. The shape information image includes a binary alpha plane in which each pixel is represented by 1 bit and a gray scale alpha plane in which each pixel is represented by 2 bits or more. In the binary alpha plane, an area having a pixel value of "1" usually represents an object area, and an area having a pixel value of "0" represents an area outside the object. In the gray scale alpha plane, an area having a pixel value of “1 or more and 255 or less” represents an object area, and an area of “0” represents an area outside the object.

【０００８】尚、その他のMPEG-4の詳細内容について
は、ISO/IECによる国際標準の文書に委ねることとす
る。It should be noted that other detailed contents of MPEG-4 are left to ISO / IEC international standard documents.

【０００９】[0009]

【発明が解決しようとする課題】手話での会話が円滑に
行なわれるためには、手話者の画像が明瞭である必要が
あり、また手話者の手の形が正しく認識されるために
は、画像は立体的に表示される必要がある。In order for a conversation in sign language to be conducted smoothly, the image of the signer must be clear, and in order for the shape of the signer's hand to be recognized correctly, Images need to be displayed stereoscopically.

【００１０】従来から、手話を撮影した画像を単純に合
成する方法では、画像は立体的であるが記憶すべき情報
量が多い。手話単語を登録する手話画像の辞書には、一
手話単語あたり最低６０枚前後のカラー画像を２０００
単語以上記憶しなければならないことから、一手話単語
あたりの記憶すべき情報量が多いため多くの記憶容量を
必要とする。Conventionally, in a method of simply synthesizing an image of a sign language, the image is stereoscopic, but the amount of information to be stored is large. The sign language image dictionary for registering sign language words contains at least about 60 color images per sign language word.
Since more than one word must be stored, a large amount of information is required to be stored per sign language word, which requires a large storage capacity.

【００１１】本発明の目的は、オブジェクト・ベース符
号化装置及びその方法を用いて、任意形状の手話画像の
データベースを効率的に構築する手段、及び主たるAVス
トリーム（全画面の動画像と音声）に手話画像を合成す
ることで、聴覚障害者にとって自然でわかりやすい映像
サービスを充足させる手段を提供することである。An object of the present invention is to efficiently construct a database of sign language images of arbitrary shape by using an object-based coding device and method thereof, and a main AV stream (full-screen moving image and audio). By synthesizing a sign language image, it is possible to provide a means for satisfying a video service that is natural and easy for the deaf person to understand.

【００１２】[0012]

【課題を解決するための手段】本発明の一観点によれ
ば、手話画像のデータベースを作成するデータベース作
成手段と、主たる動画像を符号化して前記データベース
内の手話画像と多重化する多重化手段と、前記多重化さ
れた画像から主たる動画像と手話画像を復号し、合成出
力する手段とを備えた画像処理装置が提供される。According to one aspect of the present invention, a database creating means for creating a database of sign language images and a multiplexing means for encoding a main moving image to multiplex with a sign language image in the database. And an image processing device provided with means for decoding a main moving image and a sign language image from the multiplexed image, and combining and outputting.

【００１３】本発明の他の観点によれば、動画像のオブ
ジェクト・ベース符号化を行い、手話画像のデータベー
スを作成する画像処理装置であって、手話者の動画像を
入力する動画像入力手段と、手話者のみをオブジェクト
として抽出し形状情報の動画像を獲得するオブジェクト
抽出手段と、前記手話者の動画像を符号化する自然画像
符号化手段と、前記形状情報の動画像を符号化する形状
情報画像符号化手段と、前記符号化後の自然画像と形状
情報画像とを多重化する多重化手段と、前記手話画像の
意味をテキストデータとして入力するテキスト入力手段
と、前記多重化した画像と前記テキストデータをデータ
ベース化して蓄積するデータベース蓄積手段とを備えた
画像処理装置が提供される。According to another aspect of the present invention, there is provided an image processing device for performing object-based coding of a moving image to create a database of sign language images, wherein the moving image input means inputs a moving image of a signer. An object extracting means for extracting only a signer as an object to obtain a moving picture of shape information, a natural image coding means for coding the moving picture of the signer, and a moving picture of the shape information. Shape information image encoding means, multiplexing means for multiplexing the encoded natural image and shape information image, text input means for inputting the meaning of the sign language image as text data, and the multiplexed image There is provided an image processing apparatus including: and a database storage unit that stores the text data as a database.

【００１４】本発明のさらに他の観点によれば、主たる
動画像信号とオーディオ信号を符号化する符号化手段
と、前記オーディオ信号から音声の情報を抽出し認識す
る音声認識手段と、前記認識した音声データを利用し手
話画像のデータベースから所望の手話画像を検索する手
話画像検索手段と、前記符号化した主たる動画像信号と
オーディオ信号と前記手話画像を多重化して出力する多
重化手段とを備えた画像処理装置が提供される。According to still another aspect of the present invention, a coding means for coding a main moving image signal and an audio signal, a voice recognition means for extracting and recognizing voice information from the audio signal, and the above-mentioned recognition. A sign language image searching means for searching a desired sign language image from a database of sign language images using voice data, and a multiplexing means for multiplexing and outputting the encoded main moving image signal, audio signal and the sign language image. An image processing device is provided.

【００１５】本発明のさらに他の観点によれば、入力信
号を、主たる動画像信号及びオーディオ信号の組みと、
手話画像とに分配する第１の分配手段と、前記主たる動
画像信号及びオーディオ信号を復号する第１の復号手段
と、前記手話画像から、手話者の自然画像と手話者の形
状情報画像とに分配する第２の分配手段と、前記手話者
の自然画像を復号する第２の復号手段と、前記手話者の
形状情報画像を復号する第３の復号手段と、前記主たる
動画像信号と前記手話者の自然画像と形状情報画像か
ら、主たる動画像と手話画像を合成する合成手段とを備
えた画像処理装置が提供される。According to still another aspect of the present invention, the input signal is a set of a main moving image signal and an audio signal,
A first distribution means for distributing the sign language image, a first decoding means for decoding the main moving image signal and the audio signal, and the sign language image into a natural image of the sign language and a shape information image of the sign language. A second distributing means for distributing, a second decoding means for decoding the natural image of the signer, a third decoding means for decoding the shape information image of the signer, the main moving image signal and the sign language There is provided an image processing device including a synthesizing unit that synthesizes a main moving image and a sign language image from a person's natural image and a shape information image.

【００１６】本発明のさらに他の観点によれば、手話画
像のデータベースを作成するデータベース作成ステップ
と、主たる動画像を符号化して前記データベース内の手
話画像と多重化する多重化ステップと、前記多重化され
た画像から主たる動画像と手話画像を復号し、合成出力
するステップとを備えた画像処理方法が提供される。According to still another aspect of the present invention, a database creating step of creating a database of sign language images, a multiplexing step of encoding a main moving image and multiplexing with a sign language image in the database, the multiplexing There is provided an image processing method including a step of decoding a main moving image and a sign language image from the converted image, and synthesizing and outputting.

【００１７】本発明のさらに他の観点によれば、動画像
のオブジェクト・ベース符号化を行い、手話画像のデー
タベースを作成する画像処理方法であって、手話者の動
画像を入力する動画像入力ステップと、手話者のみをオ
ブジェクトとして抽出し形状情報の動画像を獲得するオ
ブジェクト抽出ステップと、前記手話者の動画像を符号
化する自然画像符号化ステップと、前記形状情報の動画
像を符号化する形状情報画像符号化ステップと、前記符
号化後の自然画像と形状情報画像とを多重化する多重化
ステップと、前記手話画像の意味をテキストデータとし
て入力するテキスト入力ステップと、前記多重化した画
像と前記テキストデータをデータベース化して蓄積する
データベース蓄積ステップとを備えた画像処理方法が提
供される。According to still another aspect of the present invention, there is provided an image processing method for creating a database of sign language images by performing object-based coding of moving images, the moving image input method for inputting a moving image of a signer. A step of extracting only a signer as an object to obtain a moving image of shape information, a natural image encoding step of encoding a moving image of the signer, and encoding a moving image of the shape information Shape information image encoding step, a multiplexing step of multiplexing the encoded natural image and the shape information image, a text input step of inputting the meaning of the sign language image as text data, and the multiplexing There is provided an image processing method including an image and a database storing step of storing the text data as a database.

【００１８】本発明のさらに他の観点によれば、主たる
動画像信号とオーディオ信号を符号化する符号化ステッ
プと、前記オーディオ信号から音声の情報を抽出し認識
する音声認識ステップと、前記認識した音声データを利
用し手話画像のデータベースから所望の手話画像を検索
する手話画像検索ステップと、前記符号化した主たる動
画像信号とオーディオ信号と前記手話画像を多重化して
出力する多重化ステップとを備えた画像処理方法が提供
される。According to still another aspect of the present invention, a coding step for coding a main moving image signal and an audio signal, a voice recognition step for extracting and recognizing voice information from the audio signal, and the recognition step. A sign language image searching step for searching a desired sign language image from a database of sign language images using voice data; and a multiplexing step for multiplexing and outputting the encoded main moving image signal, audio signal and the sign language image. An image processing method is provided.

【００１９】本発明のさらに他の観点によれば、入力信
号を、主たる動画像信号及びオーディオ信号の組みと、
手話画像とに分配する第１の分配ステップと、前記主た
る動画像信号及びオーディオ信号を復号する第１の復号
ステップと、前記手話画像から、手話者の自然画像と手
話者の形状情報画像とに分配する第２の分配ステップ
と、前記手話者の自然画像を復号する第２の復号ステッ
プと、前記手話者の形状情報画像を復号する第３の復号
ステップと、前記主たる動画像信号と前記手話者の自然
画像と形状情報画像から、主たる動画像と手話画像を合
成する合成ステップとを備えた画像処理方法が提供され
る。According to still another aspect of the present invention, the input signal is a combination of a main moving image signal and an audio signal,
A first distribution step of distributing the sign language image, a first decoding step of decoding the main moving image signal and the audio signal, and the sign language image into a natural image of the sign language and a shape information image of the sign language. A second distributing step of distributing, a second decoding step of decoding a natural image of the signer, a third decoding step of decoding a shape information image of the signer, the main moving image signal and the sign language There is provided an image processing method including a synthesizing step of synthesizing a main moving image and a sign language image from a person's natural image and shape information image.

【００２０】本発明のさらに他の観点によれば、手話画
像のデータベースを作成するデータベース作成手順と、
主たる動画像を符号化して前記データベース内の手話画
像と多重化する多重化手順と、前記多重化された画像か
ら主たる動画像と手話画像を復号し、合成出力する手順
とをコンピュータに実行させるためのプログラムを記録
したコンピュータ読み取り可能な記録媒体が提供され
る。According to still another aspect of the present invention, a database creation procedure for creating a database of sign language images,
To cause a computer to perform a multiplexing procedure of encoding a main moving image and multiplexing it with a sign language image in the database, and a procedure of decoding the main moving image and the sign language image from the multiplexed image, and synthesizing and outputting. There is provided a computer-readable recording medium recording the program.

【００２１】本発明のさらに他の観点によれば、動画像
のオブジェクト・ベース符号化を行い、手話画像のデー
タベースを作成するプログラムを記録した記録媒体であ
って、手話者の動画像を入力する動画像入力手順と、手
話者のみをオブジェクトとして抽出し形状情報の動画像
を獲得するオブジェクト抽出手順と、前記手話者の動画
像を符号化する自然画像符号化手順と、前記形状情報の
動画像を符号化する形状情報画像符号化手順と、前記符
号化後の自然画像と形状情報画像とを多重化する多重化
手順と、前記手話画像の意味をテキストデータとして入
力するテキスト入力手順と、前記多重化した画像と前記
テキストデータをデータベース化して蓄積するデータベ
ース蓄積手順とをコンピュータに実行させるためのプロ
グラムを記録したコンピュータ読み取り可能な記録媒体
が提供される。According to still another aspect of the present invention, a recording medium in which a program for performing object-based encoding of a moving image and creating a database of sign language images is recorded, and a moving image of a signer is input. A moving image input procedure, an object extracting procedure for extracting only a signer as an object to obtain a moving picture of shape information, a natural image coding procedure for coding a moving picture of the signer, and a moving picture of the shape information. A shape information image encoding procedure for encoding, a multiplexing procedure for multiplexing the encoded natural image and shape information image, a text input procedure for inputting the meaning of the sign language image as text data, A program for causing a computer to execute a multiplexed image and a database storing procedure for storing the text data as a database is recorded. Computer readable recording medium is provided.

【００２２】本発明のさらに他の観点によれば、主たる
動画像信号とオーディオ信号を符号化する符号化手順
と、前記オーディオ信号から音声の情報を抽出し認識す
る音声認識手順と、前記認識した音声データを利用し手
話画像のデータベースから所望の手話画像を検索する手
話画像検索手順と、前記符号化した主たる動画像信号と
オーディオ信号と前記手話画像を多重化して出力する多
重化手順とをコンピュータに実行させるためのプログラ
ムを記録したコンピュータ読み取り可能な記録媒体が提
供される。According to still another aspect of the present invention, a coding procedure for coding a main moving image signal and an audio signal, a voice recognition procedure for extracting and recognizing voice information from the audio signal, and the above-mentioned recognition. A computer has a sign language image search procedure for searching a desired sign language image from a database of sign language images using voice data, and a multiplexing procedure for multiplexing and outputting the encoded main moving image signal, audio signal, and the sign language image. There is provided a computer-readable recording medium recording a program to be executed by a computer.

【００２３】本発明のさらに他の観点によれば、入力信
号を、主たる動画像信号及びオーディオ信号の組みと、
手話画像とに分配する第１の分配手順と、前記主たる動
画像信号及びオーディオ信号を復号する第１の復号手順
と、前記手話画像から、手話者の自然画像と手話者の形
状情報画像とに分配する第２の分配手順と、前記手話者
の自然画像を復号する第２の復号手順と、前記手話者の
形状情報画像を復号する第３の復号手順と、前記主たる
動画像信号と前記手話者の自然画像と形状情報画像か
ら、主たる動画像と手話画像を合成する合成手順とをコ
ンピュータに実行させるためのプログラムを記録したコ
ンピュータ読み取り可能な記録媒体が提供される。According to still another aspect of the present invention, the input signal is a set of a main moving image signal and an audio signal,
A first distribution procedure for distributing the sign language image, a first decoding procedure for decoding the main moving image signal and the audio signal, and the sign language image into a natural image of the sign language and a shape information image of the sign language. A second distribution procedure for distribution, a second decoding procedure for decoding the natural image of the signer, a third decoding procedure for decoding the shape information image of the signer, the main moving image signal, and the sign language. There is provided a computer-readable recording medium in which a program for causing a computer to execute a synthesizing procedure for synthesizing a main moving image and a sign language image from a person's natural image and shape information image is recorded.

【００２４】本発明のさらに他の観点によれば、手話画
像のデータベースを作成するデータベース作成手順と、
主たる動画像を符号化して前記データベース内の手話画
像と多重化する多重化手順と、前記多重化された画像か
ら主たる動画像と手話画像を復号し、合成出力する手順
とをコンピュータに実行させるためのプログラムが提供
される。According to still another aspect of the present invention, a database creating procedure for creating a database of sign language images,
To cause a computer to perform a multiplexing procedure of encoding a main moving image and multiplexing it with a sign language image in the database, and a procedure of decoding the main moving image and the sign language image from the multiplexed image, and synthesizing and outputting. Program is provided.

【００２５】本発明のさらに他の観点によれば、動画像
のオブジェクト・ベース符号化を行い、手話画像のデー
タベースを作成するプログラムであって、手話者の動画
像を入力する動画像入力手順と、手話者のみをオブジェ
クトとして抽出し形状情報の動画像を獲得するオブジェ
クト抽出手順と、前記手話者の動画像を符号化する自然
画像符号化手順と、前記形状情報の動画像を符号化する
形状情報画像符号化手順と、前記符号化後の自然画像と
形状情報画像とを多重化する多重化手順と、前記手話画
像の意味をテキストデータとして入力するテキスト入力
手順と、前記多重化した画像と前記テキストデータをデ
ータベース化して蓄積するデータベース蓄積手順とをコ
ンピュータに実行させるためのプログラムが提供され
る。According to still another aspect of the present invention, there is provided a program for performing object-based encoding of a moving image and creating a database of sign language images, which comprises a moving image input procedure for inputting a moving image of a signer. , An object extraction procedure for extracting only a signer as an object to obtain a moving picture of shape information, a natural image coding procedure for coding the moving picture of the signer, and a shape for coding the moving picture of the shape information An information image encoding procedure, a multiplexing procedure for multiplexing the encoded natural image and a shape information image, a text input procedure for inputting the meaning of the sign language image as text data, and the multiplexed image A program for causing a computer to execute a database storage procedure for storing the text data in a database and storing the database is provided.

【００２６】本発明のさらに他の観点によれば、主たる
動画像信号とオーディオ信号を符号化する符号化手順
と、前記オーディオ信号から音声の情報を抽出し認識す
る音声認識手順と、前記認識した音声データを利用し手
話画像のデータベースから所望の手話画像を検索する手
話画像検索手順と、前記符号化した主たる動画像信号と
オーディオ信号と前記手話画像を多重化して出力する多
重化手順とをコンピュータに実行させるためのプログラ
ムが提供される。According to still another aspect of the present invention, a coding procedure for coding a main moving image signal and an audio signal, a voice recognition procedure for extracting and recognizing voice information from the audio signal, and the recognized procedure. A computer has a sign language image search procedure for searching a desired sign language image from a database of sign language images using voice data, and a multiplexing procedure for multiplexing and outputting the encoded main moving image signal, audio signal, and the sign language image. A program is provided for execution by.

【００２７】本発明のさらに他の観点によれば、入力信
号を、主たる動画像信号及びオーディオ信号の組みと、
手話画像とに分配する第１の分配手順と、前記主たる動
画像信号及びオーディオ信号を復号する第１の復号手順
と、前記手話画像から、手話者の自然画像と手話者の形
状情報画像とに分配する第２の分配手順と、前記手話者
の自然画像を復号する第２の復号手順と、前記手話者の
形状情報画像を復号する第３の復号手順と、前記主たる
動画像信号と前記手話者の自然画像と形状情報画像か
ら、主たる動画像と手話画像を合成する合成手順とをコ
ンピュータに実行させるためのプログラムが提供され
る。According to still another aspect of the present invention, the input signal is a set of a main moving image signal and an audio signal,
A first distribution procedure for distributing the sign language image, a first decoding procedure for decoding the main moving image signal and the audio signal, and the sign language image into a natural image of the sign language and a shape information image of the sign language. A second distribution procedure for distribution, a second decoding procedure for decoding the natural image of the signer, a third decoding procedure for decoding the shape information image of the signer, the main moving image signal, and the sign language. There is provided a program for causing a computer to execute a synthesizing procedure for synthesizing a main moving image and a sign language image from a person's natural image and shape information image.

【００２８】本発明はこのように構成したので、任意形
状の手話画像のデータベースを効率的に構築し、主たる
動画像及びオーディオ信号（全画面の動画像と音声）に
手話画像を合成することで、聴覚障害者にとって自然で
わかりやすい映像サービスを充足させることができる。Since the present invention is configured in this way, a database of sign language images of arbitrary shape can be efficiently constructed, and the sign language image can be synthesized with the main moving image and audio signal (full-screen moving image and voice). , It is possible to satisfy video services that are natural and easy for the hearing impaired.

【００２９】[0029]

【発明の実施の形態】以下、本発明の実施形態を、図面
を用いて詳細に説明する。（第１の実施形態）図７及び図８を用いてオブジェクト
・ベース符号化方法を説明する。図７(a)は自然画像、
図７(b)は図７(a)に対応した形状情報画像である。この
ようにオブジェクト・ベース符号化方法は、画像全体で
なくオブジェクトの部分のみを符号化することから、画
像を高い効率で符号化することができる。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described in detail below with reference to the drawings. (First Embodiment) An object-based coding method will be described with reference to FIGS. Figure 7 (a) is a natural image,
FIG. 7B is a shape information image corresponding to FIG. As described above, the object-based encoding method can encode an image with high efficiency because it encodes only the part of the object, not the entire image.

【００３０】また、図８(a)では、主たる画像を表示し
た場合、図８(b)では前記画像に手話画像のオブジェク
トを合成した状態を示す。このようにオブジェクト・ベ
ース符号化では、既にオブジェクトの形状が明確である
ため、その背景となる画像との合成が簡易的に可能とな
る。Further, FIG. 8A shows a state in which a main image is displayed, and FIG. 8B shows a state in which a sign language image object is combined with the image. In this way, in the object-based coding, the shape of the object is already clear, and therefore it can be easily combined with the background image.

【００３１】図１は本発明の第一の実施形態としての手
話画像のデータベースを作成するための装置構成を示す
ブロック図である。尚、本実施形態では、MPEG-4におけ
る実現方法を述べる。またMPEG-4に従い、自然画像によ
る動画像をテクスチャ画像、このテクスチャに対応した
符号化すべきオブジェクトの形状情報による動画像をシ
ェイプ画像と称する。本装置には、前記テクスチャ画像
と、前記シェイプ画像とで構成する２種類の動画像を入
力する。FIG. 1 is a block diagram showing the configuration of an apparatus for creating a database of sign language images as the first embodiment of the present invention. In this embodiment, an implementation method in MPEG-4 will be described. According to MPEG-4, a moving image based on a natural image is referred to as a texture image, and a moving image based on the shape information of an object to be coded corresponding to this texture is referred to as a shape image. Two types of moving images composed of the texture image and the shape image are input to the apparatus.

【００３２】図１の構成を説明する。１００は手話者を
撮影するカメラである。１０１はカメラ１００で撮影し
た手話者の画像から手話者の領域と背景領域を分割し、
手話者の領域を１、背景の領域を０としたバイナリー・
アルファ・プレーンで構成した動画像を生成するシェイ
プ生成器である。１０２はカメラからの手話者の自然画
像を符号化するテクスチャ符号化器である。１０３はシ
ェイプ生成器１０１からの手話者のシェイプ画像を符号
化するシェイプ符号化器である。１０４はテクスチャ符
号化器とシェイプ符号化器からのストリームを多重化す
る多重化器である。１０５は多重化器からの出力と別途
入力された手話画像に対応したテキストを関連付け蓄積
する手話画像データベースである。The configuration of FIG. 1 will be described. Reference numeral 100 is a camera for photographing the sign language. Reference numeral 101 divides the signer area and the background area from the image of the signer captured by the camera 100,
Binary where the signer area is 1 and the background area is 0
It is a shape generator that generates a moving image composed of an alpha plane. A texture encoder 102 encodes a natural image of a signer from a camera. Reference numeral 103 is a shape encoder that encodes the shape image of the sign language from the shape generator 101. A multiplexer 104 multiplexes the streams from the texture encoder and the shape encoder. Reference numeral 105 denotes a sign language image database for associating and storing the output from the multiplexer and the text corresponding to the separately input sign language image.

【００３３】次に図１の動作について説明する。まず、
カメラ１００により手話者の画像を撮影する。撮影につ
いては後に他の画像と合成しやすくするため、ブルーバ
ックなどの背景を用いる。撮影した画像からシェイプ生
成器１０１により手話者の形状情報画像を生成する。次
に手話者自身の撮影画像（自然画像）をテクスチャ符号
化器１０２へ、形状情報画像をシェイプ符号化器１０３
へ入力し、それぞれをオブジェクト符号化する。それぞ
れの符号化データは多重化器１０４で多重化され手話画
像ストリームとして出力され、手話画像の意味であるテ
キストデータと共にデータベース１０５へ蓄積される。Next, the operation of FIG. 1 will be described. First,
The camera 100 captures an image of a signer. For shooting, a background such as a blue background is used to facilitate later composition with other images. The shape generator 101 generates a shape information image of the signer from the photographed image. Next, the photographed image (natural image) of the signer himself is sent to the texture encoder 102, and the shape information image is sent to the shape encoder 103.
And code each object. The respective encoded data are multiplexed by the multiplexer 104 and output as a sign language image stream, which is stored in the database 105 together with the text data which is the meaning of the sign language image.

【００３４】図２は、本発明の第１の実施形態としての
主たるオーディオ・ビデオと手話画像を符号化し多重化
する装置の全体構成を示すブロック図である。図２の構
成を説明する。２００は主たるコンテンツであるビデオ
信号とオーディオ信号を入力し、符号化する主AV符号化
器である。２０１は入力したオーディオから音声を認識
しそのテキストデータを出力する音声認識器である。２
０３は音声認識器２０１からのテキストデータを用い
て、前記手話画像データベースから対応する手話画像を
検索する手話画像検索器である。２０２は主AV符号化器
からの主ストリームと、手話画像検索器２０３からの手
話画像ストリームを多重化する多重化器である。FIG. 2 is a block diagram showing the overall construction of an apparatus for encoding and multiplexing main audio / video and sign language images as the first embodiment of the present invention. The configuration of FIG. 2 will be described. A main AV encoder 200 inputs and encodes a video signal and an audio signal, which are main contents. A voice recognizer 201 recognizes voice from input audio and outputs the text data. Two
Reference numeral 03 is a sign language image searcher that searches for the corresponding sign language image from the sign language image database using the text data from the voice recognizer 201. A multiplexer 202 multiplexes the main stream from the main AV encoder and the sign language image stream from the sign language image searcher 203.

【００３５】次に図２の動作について説明する。まず、
主たるビデオとオーディオの素材信号は符号化器２００
で符号化され主ストリームとして出力される。同時にオ
ーディオ素材信号は、音声認識器２０１により音声を認
識、解読する。その後、手話画像検索器２０３により解
読した音声に対して蓄積された手話画像データベース１
０５から整合する手話画像ストリームを取得する。次に
多重化器２０２により主ストリームと手話画像ストリー
ムを多重化して最終的なビットストリームを出力する。Next, the operation of FIG. 2 will be described. First,
The main video and audio material signals are encoder 200
Is encoded and output as the main stream. At the same time, the audio material signal is recognized and decoded by the voice recognizer 201. After that, the sign language image database 1 accumulated for the voice decoded by the sign language image searcher 203
Get the matching sign language image stream from 05. Next, the multiplexer 202 multiplexes the main stream and the sign language image stream to output a final bit stream.

【００３６】図３は、本発明の第１の実施形態としての
主たるオーディオ・ビデオと手話画像を復号し合成する
装置の全体構成を示すブロック図である。図３の構成を
説明する。３００は前記ビットストリームを入力し、主
ストリームと手話画像ストリームとに分配する分配器で
ある。３０１は分配器３００からの主ストリームをオー
ディオ信号とビデオ信号へ復号する主AV復号器である。
３０２は前記手話画像ストリームから手話者のテクスチ
ャストリームと形状情報であるシェイプストリームを分
配する分配器である。３０３は前記テクスチャストリー
ムから手話者のテクスチャ画像信号を復号するテクスチ
ャ復号器である。３０４は前記シェイプストリームから
手話者のシェイプ画像信号を復号するシェイプ復号器で
ある。３０５は画像合成指示信号を入力し、画像合成指
示信号がONの場合は主AV復号器３０１からのビデオ信号
と手話画像復号器３０２からの信号を合成して出力し、
OFFの場合は主AV復号器３０１からのオーディオ・ビデ
オ信号のみを出力する手話画像合成器である。FIG. 3 is a block diagram showing the overall construction of an apparatus for decoding and synthesizing main audio / video and sign language images as the first embodiment of the present invention. The configuration of FIG. 3 will be described. Reference numeral 300 is a distributor that inputs the bit stream and distributes it to the main stream and the sign language image stream. A main AV decoder 301 decodes the main stream from the distributor 300 into an audio signal and a video signal.
Reference numeral 302 denotes a distributor which distributes the texture stream of the signer and the shape stream which is the shape information from the sign language image stream. A texture decoder 303 decodes a texture image signal of a signer from the texture stream. A shape decoder 304 decodes a shape image signal of a signer from the shape stream. 305 inputs the image synthesis instruction signal, and when the image synthesis instruction signal is ON, synthesizes the video signal from the main AV decoder 301 and the signal from the sign language image decoder 302 and outputs,
When OFF, the sign language image synthesizer outputs only the audio / video signals from the main AV decoder 301.

【００３７】次に図３の動作について説明する。まず、
ストリームを入力し、分配器３００により主ストリーム
と手話画像ストリームを分配する。主ストリームは主AV
復号器３０１により復号され主ビデオ信号と主オーディ
オ信号として出力される。手話画像ストリームは手話画
像分配器３０２により手話者のテクスチャ画像信号とな
るテクスチャストリームと手話者のシェイプ画像信号と
なるシェイプストリームへ分配される。テクスチャスト
リームはテクスチャ復号器３０３により手話者のテクス
チャ画像信号へ復号され、シェイプストリームはシェイ
プ復号器３０４により手話者のシェイプ画像信号へ復号
される。次に主AV復号器３０１からの主ビデオ信号とテ
クスチャ復号器３０３からのテクスチャ画像信号とシェ
イプ復号器３０４からのシェイプ画像信号から手話画像
合成器３０５により主ビデオ信号と手話画像信号を合成
したビデオ信号を出力する。Next, the operation of FIG. 3 will be described. First,
The stream is input, and the distributor 300 distributes the main stream and the sign language image stream. Main stream is main AV
It is decoded by the decoder 301 and output as a main video signal and a main audio signal. The sign language image stream is distributed by the sign language image distributor 302 into a texture stream which is a texture image signal of a signer and a shape stream which is a shape image signal of a signer. The texture stream is decoded by a texture decoder 303 into a texture image signal of a signer, and the shape stream is decoded by a shape decoder 304 into a shape image signal of a signer. Next, the main video signal from the main AV decoder 301, the texture image signal from the texture decoder 303, and the shape image signal from the shape decoder 304 are combined by the sign language image synthesizer 305 to obtain a video in which the main video signal and the sign language image signal are synthesized. Output a signal.

【００３８】図４にMPEG-4を用いて、主オーディオ・ビ
デオストリームと手話画像ストリームを多重化したとき
の全体のストリーム構成を示す。MPEG-4を用いること
で、全てのストリームを一定の時間単位でパケット化し
て、表示時刻を示すTS(Time Stamp)を重畳後多重化する
ことで、各ストリームの表示管理等が容易に可能とな
る。FIG. 4 shows the entire stream structure when the main audio / video stream and the sign language image stream are multiplexed using MPEG-4. By using MPEG-4, all streams are packetized in fixed time units, and TS (Time Stamp) indicating the display time is superimposed and then multiplexed, making it possible to easily manage the display of each stream. Become.

【００３９】図５に前記手話画像合成器３０５の動作フ
ローチャートを示す。まず主オーディオ・ビデオ信号及
び手話画像信号（テクスチャ画像とシェイプ画像）を入
力する（４００）。次に必ず出力すべき主たるビデオ信
号を出力する（４０１）。次に手話画像を合成して表示
するための指示信号を入力する（４０２）。合成するよ
うに指示を受けた場合、つまり合成表示指示信号がONの
場合は、主ビデオに手話画像を合成したビデオ信号を出
力する（４０４）。指示信号がOFFの場合は合成を行わ
ず主ビデオ信号を出力し（４０１）、同時にオーディオ
信号を出力する（４０３）。FIG. 5 shows an operation flowchart of the sign language image synthesizer 305. First, the main audio / video signal and the sign language image signal (texture image and shape image) are input (400). Next, the main video signal that must be output is output (401). Next, an instruction signal for synthesizing and displaying the sign language image is input (402). When the instruction to synthesize is received, that is, when the synthesis display instruction signal is ON, a video signal in which the sign language image is synthesized with the main video is output (404). When the instruction signal is OFF, the main video signal is output without combining (401) and at the same time the audio signal is output (403).

【００４０】第１の実施形態のその他の構成を説明す
る。本実施形態では、オブジェクト・ベース符号化方式
について国際標準であるMPEG-4を用いているが、その他
のオブジェクト・ベース符号化方式に適用してもよい。
また、オブジェクト・ベース符号化方式以外の符号化及
び復号はMPEG-4以外、例えばMPEG-1,2などであってもか
まわない。手話画像合成器３０５ではオーディオと手話
画像をスイッチしているが、手話画像のみを合成もしく
は非出力として、オーディオは出力したままとしてもか
まわない。Another configuration of the first embodiment will be described. In this embodiment, the international standard MPEG-4 is used for the object-based coding method, but it may be applied to other object-based coding methods.
In addition, encoding and decoding other than the object-based encoding method may be other than MPEG-4, for example, MPEG-1, 2 or the like. Although the sign language image synthesizer 305 switches the audio and the sign language image, the sign language image may be synthesized or not output, and the audio may be kept output.

【００４１】第１の実施形態の効果を説明する。本実施
形態は、符号化した手話画像ストリームを用いて手話画
像データベースを構築するため、少ないデータ容量で多
くの手話画像を蓄積する手段を提供する。手話画像を自
然画像にて作成するため、CGなどの画像に比較して、手
話者の細やかな表情や動きを蓄積することができる。オ
ブジェクト・ベース符号化を用いて手話画像を符号化し
ているため、他の画像との合成がしやすい。The effects of the first embodiment will be described. The present embodiment provides a means for accumulating a large number of sign language images with a small amount of data in order to construct a sign language image database using an encoded sign language image stream. Since the sign language image is created as a natural image, it is possible to accumulate the detailed facial expressions and movements of the sign language person compared to images such as CG. Since the sign language image is encoded using the object-based encoding, it can be easily combined with other images.

【００４２】（第２の実施形態）図６は、第２の実施形
態におけるコンピュータの構成を示すブロック図であ
る。５００はコンピュータ全体の制御、及び種々の処理
を行う中央演算装置（CPU）、５０１は本コンピュータ
の制御に必要なオペレーティングシステム（OS）、ソフ
トウエア、データ、演算に必要な記憶領域を提供するメ
モリである。また、CPU５００が各種の処理を行う際の
ワークエリアとしても用いられる。(Second Embodiment) FIG. 6 is a block diagram showing the arrangement of a computer according to the second embodiment. Reference numeral 500 denotes a central processing unit (CPU) that controls the entire computer and performs various kinds of processing, and 501 denotes a memory that provides an operating system (OS) necessary for controlling the computer, software, data, and a storage area necessary for calculation. Is. It is also used as a work area when the CPU 500 performs various processes.

【００４３】５０２は種々の装置をつなぎ、データ、制
御信号をやりとりするバス、５０３は各種のソフトウエ
アを蓄積する記憶装置、５０４は動画像データを蓄積す
る記憶装置、５０５は画像やコンピュータからのシステ
ムメッセージなどを表示するモニタである。Reference numeral 502 is a bus for connecting various devices and exchanging data and control signals, 503 is a storage device for storing various software, 504 is a storage device for storing moving image data, and 505 is an image or from a computer. It is a monitor that displays system messages.

【００４４】５０７は通信回路に符号化データを送信す
る通信インターフェースであり、装置外部のLAN、公衆
回線、無線回線、放送電波等と接続されている。５０６
はコンピュータを起動したり、ビットレート等の各種条
件を設定したりするための端末である。Reference numeral 507 is a communication interface for transmitting encoded data to the communication circuit, which is connected to a LAN, a public line, a wireless line, a broadcast wave, etc. outside the apparatus. 506
Is a terminal for starting a computer and setting various conditions such as a bit rate.

【００４５】メモリ５０１にはコンピュータ全体を制御
し、各種ソフトウエアを動作させるためのOSや動作させ
るソフトウエアを格納し、画像データを符号化のために
読み込むエリア、一時的に符号データを格納する符号エ
リア、各種演算のパラメータ等を格納しておくワーキン
グエリアが存在する。The memory 501 stores an OS for operating the various computers and software for operating various software, an area for reading image data for encoding, and temporarily storing code data. There is a working area for storing a code area, various calculation parameters, and the like.

【００４６】図６の動作を説明する。上記のような構成
において、処理に先立ち、端末５０６から記憶装置５０
４に蓄積されている手話動画像及び対するテキストデー
タから符号化する動画像データを選択し、コンピュータ
の起動が指示される。すると、記憶装置５０３に格納さ
れているソフトウエアがバス５０２を介してメモリ５０
１に展開され、ソフトウエアが起動される。そして、CP
U５００により図１、図２、図３に示した処理を順次実
現する。The operation of FIG. 6 will be described. In the above configuration, prior to the processing, the storage device 50 is transferred from the terminal 506.
The moving image data to be encoded is selected from the sign language moving image and the corresponding text data stored in 4, and the activation of the computer is instructed. Then, the software stored in the storage device 503 executes the memory 50 via the bus 502.
It is expanded to 1 and the software is started. And CP
The U500 sequentially realizes the processing shown in FIGS. 1, 2, and 3.

【００４７】本実施形態におけるコンピュータは、第１
の実施形態における任意形状の手話画像のデータベース
を効率的に構築し、主たるAVストリーム（全画面の動画
像と音声）に手話画像を合成する装置として機能する。The computer in this embodiment is the first
In this embodiment, a database of sign language images of arbitrary shape is efficiently constructed, and the apparatus functions as a device for synthesizing the sign language image with the main AV stream (full-screen moving image and voice).

【００４８】以上の説明から明らかなように、第１及び
第２の実施形態は、オブジェクト・ベース符号化装置及
びその方法を用いて、任意形状の手話画像のデータベー
スを効率的に構築し、主たるAVストリーム（全画面の動
画像と音声）に手話画像を合成することで、聴覚障害者
にとって自然でわかりやすい映像サービスを充足させる
手段を提供する。As is clear from the above description, the first and second embodiments are mainly constructed by efficiently constructing a database of sign language images of arbitrary shapes using the object-based coding device and its method. By synthesizing a sign language image with an AV stream (full-screen moving image and audio), it provides a means to satisfy a video service that is natural and easy for the hearing impaired.

【００４９】本実施形態は、コンピュータがプログラム
を実行することによって実現することができる。また、
プログラムをコンピュータに供給するための手段、例え
ばかかるプログラムを記録したＣＤ−ＲＯＭ等の記録媒
体又はかかるプログラムを伝送するインターネット等の
伝送媒体も本発明の実施形態として適用することができ
る。上記のプログラム、記録媒体及び伝送媒体は、本発
明の範疇に含まれる。記録媒体としては、例えばフレキ
シブルディスク、ハードディスク、光ディスク、光磁気
ディスク、ＣＤ−ＲＯＭ、磁気テープ、不揮発性のメモ
リカード、ＲＯＭ等を用いることができる。This embodiment can be realized by a computer executing a program. Also,
Means for supplying the program to the computer, for example, a recording medium such as a CD-ROM storing the program or a transmission medium such as the Internet for transmitting the program can be applied as the embodiment of the present invention. The above program, recording medium, and transmission medium are included in the scope of the present invention. As the recording medium, for example, a flexible disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a magnetic tape, a non-volatile memory card, a ROM or the like can be used.

【００５０】なお、上記実施形態は、何れも本発明を実
施するにあたっての具体化の例を示したものに過ぎず、
これらによって本発明の技術的範囲が限定的に解釈され
てはならないものである。すなわち、本発明はその技術
思想、またはその主要な特徴から逸脱することなく、様
々な形で実施することができる。It should be noted that the above-mentioned embodiments are merely examples of specific embodiments for carrying out the present invention.
The technical scope of the present invention should not be limitedly interpreted by these. That is, the present invention can be implemented in various forms without departing from its technical idea or its main features.

【００５１】[0051]

【発明の効果】以上説明したように、任意形状の手話画
像のデータベースを効率的に構築し、主たる動画像及び
オーディオ信号（全画面の動画像と音声）に手話画像を
合成することで、聴覚障害者にとって自然でわかりやす
い映像サービスを充足させることができる。As described above, by efficiently constructing a database of sign language images of arbitrary shape and synthesizing the sign language image with the main moving image and audio signal (moving image and voice of full screen), hearing A video service that is natural and easy for people with disabilities to understand can be satisfied.

[Brief description of drawings]

【図１】本発明の第１の実施形態としての手話画像デー
タベース作成装置の全体構成を示すブロック図である。FIG. 1 is a block diagram showing an overall configuration of a sign language image database creation device as a first embodiment of the present invention.

【図２】本発明の第１の実施形態としての主たるオーデ
ィオ・ビデオと手話画像を符号化し多重化する装置の全
体構成を示すブロック図である。FIG. 2 is a block diagram showing an overall configuration of a device for encoding and multiplexing main audio / video and sign language images as the first embodiment of the present invention.

【図３】本発明の第１の実施形態としての主たるオーデ
ィオ・ビデオと手話画像を復号し合成する装置の全体構
成を示すブロック図である。FIG. 3 is a block diagram showing an overall configuration of a device for decoding and synthesizing main audio / video and sign language images as the first embodiment of the present invention.

【図４】主たるオーディオ・ビデオストリームと手話画
像ストリームを多重化したときの全体のストリーム構成
を示す図である。FIG. 4 is a diagram showing an overall stream configuration when a main audio / video stream and a sign language image stream are multiplexed.

【図５】手話画像を合成する際の動作フローチャートで
ある。FIG. 5 is an operation flowchart when synthesizing a sign language image.

【図６】本発明の第２の実施形態としてのコンピュータ
の構成を示すブロック図である。FIG. 6 is a block diagram showing a configuration of a computer as a second exemplary embodiment of the present invention.

【図７】手話者をオブジェクト符号化する際の自然画像
と形状情報画像の例を示す図である。FIG. 7 is a diagram showing an example of a natural image and a shape information image when a signer is object-encoded.

【図８】主たる画像に対して手話画像を合成した際の画
像例を示す図である。FIG. 8 is a diagram showing an example of an image when a sign language image is combined with a main image.

[Explanation of symbols]

１００カメラ１０１シェイプ生成器１０２テクスチャ符号化器１０３シェイプ符号化器１０４、２０２多重化器１０５手話画像データベース２００主AV符号化器２０１音声認識器２０３手話画像検索器３００分配器３０１主AV復号器３０２手話画像分配器３０３テクスチャ復号器３０４シェイプ復号器３０５手話画像合成器５００ CPU ５０１メモリ５０２バス５０３、５０４記憶メディア５０５モニタ５０６端末５０７通信I/F 100 cameras 101 shape generator 102 texture encoder 103 shape encoder 104, 202 multiplexer 105 Sign Language Image Database 200 Main AV encoder 201 voice recognizer 203 Sign Language Image Retrieval Device 300 distributor 301 Main AV decoder 302 Sign Language Image Distributor 303 texture decoder 304 shape decoder 305 Sign Language Image Synthesizer 500 CPU 501 memory 502 bus 503, 504 storage media 505 monitor 506 terminal 507 Communication I / F

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｎ 7/08 Ｇ１０Ｌ 3/00 ５５１Ｇ 7/081 ５５１ＣＦターム(参考） 5C023 AA06 AA17 AA37 AA38 BA02 BA11 CA01 CA05 DA04 5C052 AA01 AC08 CC11 DD04 DD06 5C059 KK37 MA00 MB03 MB06 MB12 MB23 PP04 PP28 PP29 RB01 RB18 RC19 RC32 SS06 SS12 SS19 SS30 UA02 UA05 5C063 AB07 AC01 AC05 CA11 CA20 CA36 5D015 KK01 Front page continuation (51) Int.Cl. ⁷ Identification code FI theme code (reference) H04N 7/08 G10L 3/00 551G 7/081 551C F term (reference) 5C023 AA06 AA17 AA37 AA38 BA02 BA11 CA01 CA05 DA04 5C052 AA01 AC08 CC11 DD04 DD06 5C059 KK37 MA00 MB03 MB06 MB12 MB23 PP04 PP28 PP29 RB01 RB18 RC19 RC32 SS06 SS12 SS19 SS30 UA02 UA05 5C063 AB07 AC01 AC05 CA11 CA20 CA36 5D015 KK01

Claims

[Claims]

1. A database creating means for creating a database of sign language images, a multiplexing means for encoding a main moving image to multiplex with a sign language image in the database, and a main moving image from the multiplexed images. An image processing apparatus including means for decoding a sign language image, and synthesizing and outputting the image.

2. An image processing apparatus for performing object-based encoding of a moving image to create a database of sign language images, comprising moving image input means for inputting a moving image of a signer, and only the signer as an object. An object extraction unit that extracts a moving image of the shape information, a natural image encoding unit that encodes the moving image of the signer, a shape information image encoding unit that encodes the moving image of the shape information, Multiplexing means for multiplexing the encoded natural image and shape information image, a text input means for inputting the meaning of the sign language image as text data, and a database of the multiplexed image and the text data. An image processing apparatus comprising: a database accumulating means for accumulating.

3. A coding means for coding a main moving image signal and an audio signal, a voice recognition means for extracting and recognizing voice information from the audio signal, and a database of a sign language image using the recognized voice data. An image processing apparatus comprising: a sign language image searching means for searching for a desired sign language image from the above; and a multiplexing means for multiplexing and outputting the encoded main moving image signal, audio signal, and the sign language image.

4. A first distribution means for distributing an input signal into a set of a main moving image signal and an audio signal and a sign language image, and a first decoding means for decoding the main moving image signal and the audio signal. A second distribution unit that distributes a natural image of the signer and a shape information image of the signer from the sign language image; a second decoding unit that decodes the natural image of the signer; and a shape of the signer An image processing apparatus comprising: a third decoding means for decoding an information image; and a synthesizing means for synthesizing a main moving image and a sign language image from the main moving image signal, a natural image of the signer and a shape information image.

5. The input means for inputting an instruction signal for instructing to synthesize the main moving image and the sign language image; and the synthesis means for receiving an instruction signal for instructing synthesis.
The image processing apparatus according to claim 4, further comprising: a switching unit that synthesizes the main moving image and the sign language image and outputs the main moving image without synthesizing the main moving image and the sign language image when no synthesis instruction is given.

6. The switching means outputs a main moving image and a sign language image by synthesizing and outputting the main moving image and the sign signal when the instruction signal instructs the synthesizing, and outputs a main moving image and the audio signal by not activating the synthesizing instruction. Image processing device.

7. A database creating step of creating a database of sign language images, a multiplexing step of encoding a main moving image to multiplex with a sign language image in the database, and a main moving image from the multiplexed image. An image processing method comprising the steps of decoding a sign language image and combining and outputting it.

8. An image processing method for creating a database of sign language images by performing object-based encoding of moving images, comprising: a moving image input step of inputting a moving image of a signer; and only the signer as an object. An object extraction step of obtaining a moving image of the extracted shape information, a natural image encoding step of encoding the moving image of the signer, a shape information image encoding step of encoding the moving image of the shape information, A multiplexing step of multiplexing the encoded natural image and the shape information image, a text input step of inputting the meaning of the sign language image as text data, and a database of the multiplexed image and the text data. An image processing method comprising a database storing step of storing.

9. A coding step of coding a main moving image signal and an audio signal, a voice recognition step of extracting and recognizing voice information from the audio signal, and a database of a sign language image using the recognized voice data. An image processing method comprising: a sign language image searching step of searching for a desired sign language image from the above; and a multiplexing step of multiplexing and outputting the encoded main moving image signal, audio signal, and the sign language image.

10. A first distributing step of distributing an input signal into a set of a main moving image signal and an audio signal and a sign language image, and a first decoding step of decoding the main moving image signal and the audio signal. A second distribution step of distributing the sign language image into a signer's natural image and a signer's shape information image, a second decoding step of decoding the signer's natural image, and the signer's shape An image processing method comprising: a third decoding step of decoding an information image; and a synthesizing step of synthesizing a main moving image and a sign language image from the main moving image signal, a natural image of the signer, and a shape information image.

11. A database creation procedure for creating a database of sign language images, a multiplexing procedure for encoding a main moving image and multiplexing it with a sign language image in the database, and a main moving image from the multiplexed images. A computer-readable recording medium in which a program for causing a computer to execute a procedure of decoding a sign language image and combining and outputting the image is recorded.

12. A recording medium in which a program for performing object-based encoding of a moving image and creating a database of a sign language image is recorded, the moving image input procedure for inputting a moving image of a signer, and a signer only. Object extraction procedure for extracting a moving image of shape information by extracting as a object, a natural image encoding procedure for encoding the moving image of the signer, and a shape information image encoding for encoding the moving image of the shape information. A procedure, a multiplexing procedure for multiplexing the encoded natural image and the shape information image, a text input procedure for inputting the meaning of the sign language image as text data, and the multiplexed image and the text data Computer readable recording program for causing a computer to execute the database storage procedure for storing as a database Recording medium.

13. A coding procedure for coding a main moving image signal and an audio signal, a voice recognition procedure for extracting and recognizing voice information from the audio signal, and a database of sign language images using the recognized voice data. A program for causing a computer to execute a sign language image search procedure for searching for a desired sign language image from the computer, and a multiplexing procedure for multiplexing and outputting the encoded main moving image signal, audio signal, and the sign language image is recorded. Computer-readable recording medium.

14. A first distribution procedure for distributing an input signal into a set of a main moving image signal and an audio signal and a sign language image, and a first decoding procedure for decoding the main moving image signal and the audio signal. A second distribution procedure for distributing the sign language image into a signer's natural image and a signer's shape information image, a second decoding procedure for decoding the signer's natural image, and the signer's shape A program for causing a computer to execute a third decoding procedure for decoding an information image, and a synthesizing procedure for synthesizing a main moving image and a sign language image from the main moving image signal, the natural image of the signer, and a shape information image. A computer-readable recording medium in which is recorded.

15. A database creation procedure for creating a database of sign language images, a multiplexing procedure for encoding a main moving image and multiplexing it with a sign language image in the database, and a main moving image from the multiplexed images. A program for causing a computer to execute a procedure of decoding a sign language image and synthesizing and outputting the image.

16. A program for creating a database of sign language images by performing object-based encoding of moving images, comprising a moving image input procedure for inputting a moving image of a signer and extracting only the signer as an object. An object extraction procedure for obtaining a moving image of shape information, a natural image coding procedure for coding the moving image of the signer, a shape information image coding procedure for coding the moving image of the shape information, and the code A multiplexing procedure for multiplexing the converted natural image and the shape information image, a text input procedure for inputting the meaning of the sign language image as text data, and a database of the multiplexed image and the text data for storage. A program that causes a computer to execute the database storage procedure.

17. A coding procedure for coding a main moving image signal and an audio signal, a voice recognition procedure for extracting and recognizing voice information from the audio signal, and a sign language image database using the recognized voice data. A program for causing a computer to execute a sign language image search procedure for searching for a desired sign language image from the computer and a multiplexing procedure for multiplexing and outputting the encoded main moving image signal, audio signal, and the sign language image.

18. A first distribution procedure for distributing an input signal into a set of a main moving image signal and an audio signal and a sign language image, and a first decoding procedure for decoding the main moving image signal and the audio signal. A second distribution procedure for distributing the sign language image into a signer's natural image and a signer's shape information image, a second decoding procedure for decoding the signer's natural image, and the signer's shape A program for causing a computer to execute a third decoding procedure for decoding an information image, and a synthesizing procedure for synthesizing a main moving image and a sign language image from the main moving image signal, the natural image of the signer, and a shape information image. .