JP7485454B2

JP7485454B2 - Sign language translation processing device, sign language translation processing system, sign language translation processing method, program, and recording medium

Info

Publication number: JP7485454B2
Application number: JP2022125151A
Authority: JP
Inventors: 浩臣岡田; 祐樹樋口; 恭聖山本; 天斗山本
Original assignee: NEC Solution Innovators Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 2022-08-05
Filing date: 2022-08-05
Publication date: 2024-05-16
Anticipated expiration: 2042-08-05
Also published as: JP2024021935A

Description

本発明は、手話翻訳処理装置、手話翻訳処理システム、手話翻訳処理方法、プログラム、及び記録媒体に関する。 The present invention relates to a sign language translation processing device, a sign language translation processing system, a sign language translation processing method, a program, and a recording medium.

ろう者と聴者とが会話をする手段として、例えば、特許文献１には、手話情報を音声又は文字に変換するものがある。 For example, Patent Document 1 describes a method for deaf and hearing people to communicate by converting sign language information into voice or text.

特開２００５‐１９７８８９号公報JP 2005-197889 A

ここで、特許文献１に記載の発明は、手話パターンライブラリを参照して手話情報を音声又は文字に変換するものである。手話パターンライブラリを参照する場合、性別や背丈、衣服の色などの多種多様なパターンを想定した手話パターンを取得する必要がある。このとき、例えば、ライブラリに保存されたパターンと相違する手話情報が入力された場合、その翻訳精度は低下するという課題がある。 The invention described in Patent Document 1 converts sign language information into speech or text by referring to a sign language pattern library. When referring to a sign language pattern library, it is necessary to obtain sign language patterns that assume a wide variety of patterns, such as gender, height, and clothing color. In this case, for example, if sign language information that differs from the patterns stored in the library is input, there is a problem that the translation accuracy decreases.

そこで、本発明は、高精度で手話翻訳をするための、手話翻訳処理装置、手話翻訳処理システム、手話翻訳処理方法、プログラム、及び記録媒体の提供を目的とする。 The present invention aims to provide a sign language translation processing device, a sign language translation processing system, a sign language translation processing method, a program, and a recording medium for performing sign language translation with high accuracy.

前記目的を達成するために、本発明の手話翻訳処理装置は、
画像取得部、手骨格情報取得部、身体骨格情報取得部、情報統合部、及び動作認識部を含み、
前記画像取得部は、手を含む身体の画像を取得し、
前記画像は、経時的に撮像された複数の時系列画像であり、
前記手骨格情報取得部は、前記取得した画像から手の骨格情報を取得し、
前記手の骨格情報は、手の骨格座標を含み、
前記身体骨格情報取得部は、前記取得した画像から身体の骨格情報を取得し、
前記身体の骨格情報は、身体の骨格座標を含み、
前記情報統合部は、前記手の骨格情報及び前記身体の骨格情報を統合して、身体における手の骨格の位置及び手の位置を算出し、
前記動作認識部は、前記統合して算出した前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報を時系列順に整理し、前記整理された前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報から手の動作情報を認識して手話単語を推定する。 In order to achieve the above object, the sign language translation processing device of the present invention comprises:
The present invention includes an image acquisition unit, a hand skeletal information acquisition unit, a body skeletal information acquisition unit, an information integration unit, and a motion recognition unit,
The image acquisition unit acquires an image of a body including a hand,
The images are a plurality of time-series images captured over time,
the hand skeletal information acquisition unit acquires hand skeletal information from the acquired image;
The hand skeletal information includes hand skeletal coordinates;
the body skeletal information acquisition unit acquires body skeletal information from the acquired image;
The body skeletal information includes body skeletal coordinates;
the information integration unit integrates the hand skeletal information and the body skeletal information to calculate a position of the hand skeletal position on the body and a position of the hand;
The movement recognition unit organizes the integrated and calculated hand skeletal position and hand position, as well as the body skeletal information, in chronological order, and recognizes hand movement information from the organized hand skeletal position and hand position, as well as the body skeletal information, to infer a sign language word.

本発明の手話翻訳処理システムは、
手話翻訳処理装置、及びユーザ端末を含み、
前記手話翻訳処理装置が、本発明の前記手話翻訳処理装置であり、
前記ユーザ端末が、手を含む身体の画像を取得可能であり、
前記画像は、経時的に撮像された複数の時系列画像である。 The sign language translation processing system of the present invention comprises:
A sign language translation processing device and a user terminal,
The sign language translation processing device is the sign language translation processing device of the present invention,
The user terminal is capable of acquiring an image of a body including a hand;
The images are a plurality of time-series images captured over time.

本発明の手話翻訳処理方法は、
画像取得工程、手骨格情報取得工程、身体骨格情報取得工程、情報統合工程、及び動作認識工程を含み、
前記画像取得工程は、手を含む身体の画像を取得し、
前記画像は、経時的に撮像された複数の時系列画像であり、
前記手骨格情報取得工程は、前記取得した画像から手の骨格情報を取得し、
前記手の骨格情報は、手の骨格座標を含み、
前記身体骨格情報取得工程は、前記取得した画像から身体の骨格情報を取得し、
前記身体の骨格情報は、身体の骨格座標を含み、
前記情報統合工程は、前記手の骨格情報及び前記身体の骨格情報を統合して、身体における手の骨格の位置及び手の位置を算出し、
前記動作認識工程は、前記統合して算出した前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報を時系列順に整理し、前記整理された前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報から手の動作情報を認識して手話単語を推定する。 The sign language translation processing method of the present invention comprises the steps of:
The method includes an image acquisition step, a hand skeleton information acquisition step, a body skeleton information acquisition step, an information integration step, and a motion recognition step,
The image capturing step captures an image of a body including a hand;
The images are a plurality of time-series images captured over time,
The hand skeletal information acquiring step acquires hand skeletal information from the acquired image,
The hand skeletal information includes hand skeletal coordinates;
The body skeleton information acquiring step acquires body skeleton information from the acquired image,
The body skeletal information includes body skeletal coordinates;
the information integration step includes integrating the hand skeletal information and the body skeletal information to calculate a position of the hand skeletal position on the body and a position of the hand;
The movement recognition process organizes the integrated and calculated hand skeletal position and hand position, as well as the body skeletal information, in chronological order, and recognizes hand movement information from the organized hand skeletal position and hand position, as well as the body skeletal information, to infer a sign language word.

本発明のプログラムは、
画像取得手順、手骨格情報取得手順、身体骨格情報取得手順、情報統合手順、及び動作認識手順を含み、
前記画像取得手順は、手を含む身体の画像を取得し、
前記画像は、経時的に撮像された複数の時系列画像であり、
前記手骨格情報取得手順は、前記取得した画像から手の骨格情報を取得し、
前記手の骨格情報は、手の骨格座標を含み、
前記身体骨格情報取得手順は、前記取得した画像から身体の骨格情報を取得し、
前記身体の骨格情報は、身体の骨格座標を含み、
前記情報統合手順は、前記手の骨格情報及び前記身体の骨格情報を統合して、身体における手の骨格の位置及び手の位置を算出し、
前記動作認識手順は、前記統合して算出した前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報を時系列順に整理し、前記整理された前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報から手の動作情報を認識して手話単語を推定し、
前記各手順をコンピュータに実行させるためのプログラムである。 The program of the present invention comprises:
The method includes an image acquisition step, a hand skeleton information acquisition step, a body skeleton information acquisition step, an information integration step, and a motion recognition step.
The image acquisition step includes acquiring an image of a body including a hand;
The images are a plurality of time-series images captured over time,
The hand skeletal information acquisition step acquires hand skeletal information from the acquired image,
The hand skeletal information includes hand skeletal coordinates;
The body skeleton information acquisition step acquires body skeleton information from the acquired image,
The body skeletal information includes body skeletal coordinates;
the information integration step includes integrating the hand skeletal information and the body skeletal information to calculate a position of the hand skeletal position on the body and a position of the hand;
the motion recognition step includes chronologically organizing the integrated and calculated skeletal position of the hand and the position of the hand, and the skeletal information of the body, and recognizing hand motion information from the organized skeletal position of the hand and the position of the hand, and the skeletal information of the body, to estimate a sign language word;
The program is for causing a computer to execute each of the above procedures.

本発明の記録媒体は、前記本発明のプログラムを記録したコンピュータ読み取り可能な記録媒体である。 The recording medium of the present invention is a computer-readable recording medium on which the program of the present invention is recorded.

本発明によれば、高精度で手話翻訳をすることができる。 The present invention enables highly accurate sign language translation.

図１は、実施形態１の手話翻訳処理装置の一例の構成を示すブロック図である。FIG. 1 is a block diagram showing an example of a configuration of a sign language translation processing apparatus according to the first embodiment. 図２は、実施形態１の手話翻訳処理装置のハードウェア構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of a hardware configuration of the sign language translation processing device according to the first embodiment. 図３は、実施形態１の手話翻訳処理装置における処理の一例を示すフローチャートである。FIG. 3 is a flowchart showing an example of processing in the sign language translation processing apparatus according to the first embodiment. 図４は、実施形態２の手話翻訳処理装置の一例の構成を示すブロック図である。FIG. 4 is a block diagram showing an example of the configuration of a sign language translation processing apparatus according to the second embodiment. 図５は、実施形態２の手話翻訳処理装置のハードウェア構成の一例を示すブロック図である。FIG. 5 is a block diagram showing an example of a hardware configuration of a sign language translation processing apparatus according to the second embodiment. 図６は、実施形態２の手話翻訳処理装置における処理の一例を示すフローチャートである。FIG. 6 is a flowchart showing an example of processing in the sign language translation processing apparatus of the second embodiment. 図７は、実施形態４の手話翻訳処理システムの一例を示す説明図である。FIG. 7 is an explanatory diagram illustrating an example of a sign language translation processing system according to the fourth embodiment.

つぎに、本発明の実施形態について図を用いて説明する。本発明は、以下の実施形態には限定されない。以下の各図において、同一部分には、同一符号を付している。また、各実施形態の説明は、特に言及がない限り、互いの説明を援用でき、各実施形態の構成は、特に言及がない限り、組合せ可能である。 Next, an embodiment of the present invention will be described with reference to the drawings. The present invention is not limited to the following embodiment. In each of the drawings, the same parts are given the same reference numerals. Furthermore, the explanations of each embodiment can be mutually incorporated unless otherwise specified, and the configurations of each embodiment can be combined unless otherwise specified.

［実施形態１］
図１は、本実施形態の手話翻訳処理装置１０の一例の構成を示すブロック図である。図１に示すように、本装置１０は、画像取得部１１、手骨格情報取得部１２、身体骨格情報取得部１３、情報統合部１４、及び動作認識部１５を含む。 [Embodiment 1]
1 is a block diagram showing an example of the configuration of a sign language translation processing device 10 according to the present embodiment. As shown in FIG. 1, the device 10 includes an image acquisition unit 11, a hand skeletal information acquisition unit 12, a body skeletal information acquisition unit 13, an information integration unit 14, and a motion recognition unit 15.

本装置１０は、例えば、前記各部を含む１つの装置でもよいし、前記各部が、通信回線網を介して接続可能な装置でもよい。また、本装置１０は、前記通信回線網を介して、後述する外部装置と接続可能である。前記通信回線網は、特に制限されず、公知のネットワークを使用でき、例えば、有線でも無線でもよい。前記通信回線網は、例えば、インターネット回線、ＷＷＷ（ＷｏｒｌｄＷｉｄｅＷｅｂ）、電話回線、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＳＡＮ（ＳｔｏｒａｇｅＡｒｅａＮｅｔｗｏｒｋ）、ＤＴＮ（ＤｅｌａｙＴｏｌｅｒａｎｔＮｅｔｗｏｒｋｉｎｇ）、ＬＰＷＡ（ＬｏｗＰｏｗｅｒＷｉｄｅＡｒｅａ）、Ｌ５Ｇ（ローカル５Ｇ）、等があげられる。無線通信としては、例えば、Ｗｉ－Ｆｉ（登録商標）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ローカル５Ｇ、ＬＰＷＡ等が挙げられる。前記無線通信としては、各装置が直接通信する形態（ＡｄＨｏｃ通信）、インフラストラクチャ（infrastructure通信）、アクセスポイントを介した間接通信等であってもよい。本装置１０は、例えば、システムとしてサーバに組み込まれていてもよい。また、本装置１０は、例えば、本発明のプログラムがインストールされたパーソナルコンピュータ（ＰＣ、例えば、デスクトップ型、ノート型）、スマートフォン、タブレット端末等であってもよい。本装置１０は、例えば、前記各部のうち少なくとも一つがサーバ上にあり、その他の前記各部が端末上にあるような、クラウドコンピューティングやエッジコンピューティング等の形態であってもよい。 The device 10 may be, for example, a single device including each of the above-mentioned parts, or a device to which each of the above-mentioned parts can be connected via a communication line network. The device 10 may also be connected to an external device described later via the communication line network. The communication line network is not particularly limited, and any known network can be used, and may be, for example, wired or wireless. Examples of the communication line network include the Internet line, WWW (World Wide Web), telephone line, LAN (Local Area Network), SAN (Storage Area Network), DTN (Delay Tolerant Networking), LPWA (Low Power Wide Area), L5G (Local 5G), etc. Examples of wireless communication include Wi-Fi (registered trademark), Bluetooth (registered trademark), local 5G, LPWA, etc. The wireless communication may be a form in which each device communicates directly (Ad Hoc communication), infrastructure communication, indirect communication via an access point, etc. The device 10 may be incorporated into a server as a system, for example. The device 10 may be a personal computer (PC, for example, desktop type or notebook type) in which the program of the present invention is installed, a smartphone, a tablet terminal, etc. The device 10 may be in a form such as cloud computing or edge computing in which at least one of the parts is on a server and the other parts are on a terminal.

図２に、本装置１０のハードウェア構成のブロック図を例示する。本装置１０は、例えば、中央処理装置（ＣＰＵ、ＧＰＵ等）１０１、メモリ１０２、バス１０３、記憶装置１０４、入力装置１０５、出力装置１０６、通信デバイス１０７等を含む。本装置１０の各部は、それぞれのインタフェース（Ｉ／Ｆ）により、バス１０３を介して相互に接続されている。 Figure 2 shows an example block diagram of the hardware configuration of the device 10. The device 10 includes, for example, a central processing unit (CPU, GPU, etc.) 101, a memory 102, a bus 103, a storage device 104, an input device 105, an output device 106, a communication device 107, etc. Each part of the device 10 is connected to each other via the bus 103 by its respective interface (I/F).

中央処理装置１０１は、コントローラ（システムコントローラ、Ｉ／Ｏコントローラ等）等により、他の構成と連携動作し、本装置１０の全体の制御を担う。本装置１０において、中央処理装置１０１により、例えば、本発明のプログラムやその他のプログラムが実行され、また、各種情報の読み込みや書き込みが行われる。具体的には、例えば、中央処理装置１０１が、画像取得部１１、手骨格情報取得部１２、身体骨格情報取得部１３、情報統合部１４、及び動作認識部１５として機能する。中央処理装置１０１は、演算装置として、ＣＰＵ、ＧＰＵ（Graphics Processing Unit）、ＡＰＵ（Accelerated Processing Unit）等の演算装置を備えてもよいし、これらの組合せを備えてもよい。 The central processing unit 101 cooperates with other components through a controller (system controller, I/O controller, etc.) and is responsible for overall control of the device 10. In the device 10, the central processing unit 101 executes, for example, the program of the present invention and other programs, and also reads and writes various information. Specifically, for example, the central processing unit 101 functions as an image acquisition unit 11, a hand skeletal information acquisition unit 12, a body skeletal information acquisition unit 13, an information integration unit 14, and a motion recognition unit 15. The central processing unit 101 may include, as a computing device, a CPU, a GPU (Graphics Processing Unit), an APU (Accelerated Processing Unit), or a combination of these.

バス１０３は、例えば、外部装置とも接続できる。前記外部装置は、例えば、外部記憶装置（外部データベース等）、プリンタ、外部入力装置、外部表示装置、外部撮像装置等があげられる。本装置１０は、例えば、バス１０３に接続された通信デバイス１０７により、外部ネットワーク（前記通信回線網）に接続でき、外部ネットワークを介して、他の装置と接続することもできる。 The bus 103 can also be connected to, for example, an external device. Examples of the external device include an external storage device (external database, etc.), a printer, an external input device, an external display device, an external imaging device, etc. The present device 10 can be connected to an external network (the above-mentioned communication line network) by, for example, a communication device 107 connected to the bus 103, and can also be connected to other devices via the external network.

メモリ１０２は、例えば、メインメモリ（主記憶装置）が挙げられる。中央処理装置１０１が処理を行う際には、例えば、後述する記憶装置１０４に記憶されている本発明のプログラム等の種々の動作プログラムを、メモリ１０２が読み込み、中央処理装置１０１は、メモリ１０２からデータを受け取って、プログラムを実行する。前記メインメモリは、例えば、ＲＡＭ（ランダムアクセスメモリ）である。また、メモリ１０２は、例えば、ＲＯＭ（読み出し専用メモリ）であってもよい。 The memory 102 may be, for example, a main memory (primary storage device). When the central processing unit 101 performs processing, the memory 102 reads various operating programs, such as the program of the present invention, stored in the storage device 104 described below, and the central processing unit 101 receives data from the memory 102 and executes the programs. The main memory may be, for example, a RAM (random access memory). The memory 102 may also be, for example, a ROM (read only memory).

記憶装置１０４は、例えば、前記メインメモリ（主記憶装置）に対して、いわゆる補助記憶装置ともいう。前述のように、記憶装置１０４には、本発明のプログラムを含む動作プログラムが格納されている。記憶装置１０４は、例えば、記録媒体と、記録媒体に読み書きするドライブとの組合せであってもよい。前記記録媒体は、特に制限されず、例えば、内蔵型でも外付け型でもよく、ＨＤ（ハードディスク）、ＣＤ－ＲＯＭ、ＣＤ－Ｒ、ＣＤ－ＲＷ、ＭＯ、ＤＶＤ、フラッシュメモリー、メモリーカード等が挙げられる。記憶装置１０４は、例えば、記録媒体とドライブとが一体化されたハードディスクドライブ（ＨＤＤ）、及びソリッドステートドライブ（ＳＳＤ）であってもよい。 The storage device 104 is also referred to as an auxiliary storage device, for example, in contrast to the main memory. As described above, the storage device 104 stores operating programs including the program of the present invention. The storage device 104 may be, for example, a combination of a recording medium and a drive that reads and writes from the recording medium. The recording medium is not particularly limited, and may be, for example, an internal or external type, such as a HD (hard disk), CD-ROM, CD-R, CD-RW, MO, DVD, flash memory, memory card, etc. The storage device 104 may be, for example, a hard disk drive (HDD) in which the recording medium and the drive are integrated, or a solid state drive (SSD).

本装置１０において、メモリ１０２及び記憶装置１０４は、ログ情報、外部データベース（図示せず）や外部の装置から取得した情報、本装置１０によって生成した情報、本装置１０が処理を実行する際に用いる情報等の種々の情報を記憶することも可能である。この場合、メモリ１０２及び記憶装置１０４は、例えば、手の動作情報と手話会話とを紐づけて記憶していてもよい。なお、少なくとも一部の情報は、例えば、メモリ１０２及び記憶装置１０４以外の外部サーバに記憶されていてもよいし、複数の端末にブロックチェーン技術等を用いて分散して記憶されていてもよい。 In the present device 10, the memory 102 and the storage device 104 can also store various information such as log information, information acquired from an external database (not shown) or an external device, information generated by the present device 10, and information used by the present device 10 when executing processing. In this case, the memory 102 and the storage device 104 may store hand movement information and sign language conversation in association with each other, for example. Note that at least a portion of the information may be stored, for example, in an external server other than the memory 102 and the storage device 104, or may be stored in a distributed manner across multiple terminals using blockchain technology or the like.

本装置１０は、例えば、さらに、入力装置１０５、出力装置１０６を備える。入力装置１０５は、例えば、タッチパネル、トラックパッド、マウス等のポインティングデバイス；キーボード；カメラ、スキャナ等の撮像手段；ＩＣカードリーダ、磁気カードリーダ等のカードリーダ；マイク等の音声入力手段；等があげられる。出力装置１０６は、例えば、ＬＥＤ（ｌｉｇｈｔ-ｅｍｉｔｔｉｎｇｄｉｏｄｅ）ディスプレイ、液晶ディスプレイ等の表示装置；スピーカ等の音声出力装置；プリンタ；等があげられる。本実施形態１において、入力装置１０５と出力装置１０６とは、別個に構成されているが、入力装置１０５と出力装置１０６とは、タッチパネルディスプレイのように、一体として構成されてもよい。 The device 10 further includes, for example, an input device 105 and an output device 106. Examples of the input device 105 include pointing devices such as a touch panel, a track pad, and a mouse; a keyboard; an imaging means such as a camera and a scanner; a card reader such as an IC card reader and a magnetic card reader; and an audio input means such as a microphone. Examples of the output device 106 include a display device such as an LED (light-emitting diode) display and a liquid crystal display; an audio output device such as a speaker; a printer; and the like. In the first embodiment, the input device 105 and the output device 106 are configured separately, but the input device 105 and the output device 106 may be configured as an integrated device, such as a touch panel display.

つぎに、本実施形態の手話翻訳処理方法の一例を、図３のフローチャートＳ１０に基づき説明する。本実施形態の手話翻訳処理方法は、例えば、図１又は図２の装置１０を用いて、次のように実施する。なお、本実施形態の手話翻訳処理方法は、図１又は図２の装置１０の使用には限定されない。 Next, an example of the sign language translation processing method of this embodiment will be described based on the flowchart S10 of FIG. 3. The sign language translation processing method of this embodiment is implemented as follows, for example, using the device 10 of FIG. 1 or FIG. 2. Note that the sign language translation processing method of this embodiment is not limited to the use of the device 10 of FIG. 1 or FIG. 2.

まず、画像取得部１１により、手を含む身体の画像を取得する（Ｓ１１）。ここで、前記画像は、経時的に撮像された複数の時系列画像である。前記身体は、例えば、上半身及び下半身を含む全身でも良いし、上半身のみでもよい。ここで、前記画像の取得は、例えば、本装置１０が備えるカメラなどにより行ってもよいし、本装置１０以外のカメラなどが取得した画像を、通信デバイス１０７を介して取得してもよい。前記画像の取得は、例えば、１フレーム毎に行われる。なお、本発明において「手」という場合は、特に断りがない限り、両手でも良いし、片手でも良い。 First, the image acquisition unit 11 acquires images of the body including the hands (S11). Here, the images are a plurality of time-series images captured over time. The body may be, for example, the entire body including the upper and lower body, or only the upper body. Here, the images may be acquired, for example, by a camera provided in the device 10, or images acquired by a camera other than the device 10 may be acquired via the communication device 107. The images are acquired, for example, frame by frame. In the present invention, the term "hands" may refer to both hands or one hand, unless otherwise specified.

つぎに、手骨格情報取得部１２により、前記取得した画像から手の骨格情報を取得する（Ｓ１２）。前記手の骨格情報は、手の骨格座標を含む。前記手の骨格情報は、例えば、従来公知の方法により検出し、取得することができる。前記手の骨格は、例えば、手の関節を含む。前記手の骨格座標は、例えば、手の骨格検出モデルを用いて取得しても良い。 Next, the hand skeletal information acquisition unit 12 acquires hand skeletal information from the acquired image (S12). The hand skeletal information includes the skeletal coordinates of the hand. The hand skeletal information can be detected and acquired, for example, by a conventionally known method. The hand skeleton includes, for example, the joints of the hand. The hand skeletal coordinates may be acquired, for example, by using a hand skeleton detection model.

また、身体骨格情報取得部１３により、前記取得した画像から身体の骨格情報を取得する（Ｓ１３）。前記身体の骨格情報は、身体の骨格座標を含む。前記身体の骨格情報は、例えば、従来公知の方法により検出し、取得することができる。前記身体の骨格座標は、例えば、身体の骨格検出モデルを用いて取得しても良い図３において、図示していないが、例えば、Ｓ１３で取得した身体の骨格座標を保存してもよい。Ｓ１３で取得した身体の骨格座標を保存しておけば、例えば、次フレームで身体の骨格情報を取得する際、保存した身体の骨格座標と次フレームの身体の画像とを統合して、次フレームの身体の骨格座標を取得することができる。このようにすれば、次フレーム以降の身体の骨格座標の取得精度が向上する。 The body skeletal information acquisition unit 13 acquires body skeletal information from the acquired image (S13). The body skeletal information includes the body skeletal coordinates. The body skeletal information can be detected and acquired, for example, by a conventionally known method. The body skeletal coordinates can be acquired, for example, by using a body skeletal detection model. Although not shown in FIG. 3, the body skeletal coordinates acquired in S13 can be saved, for example. If the body skeletal coordinates acquired in S13 are saved, for example, when acquiring body skeletal information in the next frame, the saved body skeletal coordinates and the body image of the next frame can be integrated to acquire the body skeletal coordinates of the next frame. In this way, the accuracy of acquiring the body skeletal coordinates from the next frame onwards is improved.

なお、図３においては、手を含む身体の画像を取得した後（Ｓ１１）、手の骨格情報の取得（Ｓ１２）と、身体の骨格情報の取得（Ｓ１３）とを、それぞれ同時並行して実施しているが、これはあくまでも例示であり、例えば、手の骨格情報を取得した後に身体の骨格情報を取得しても良く、又は、身体の骨格情報を取得した後に手の骨格情報を取得してもよい。 In FIG. 3, after acquiring an image of the body including the hand (S11), skeletal information of the hand (S12) and skeletal information of the body (S13) are acquired simultaneously in parallel. However, this is merely an example. For example, skeletal information of the hand may be acquired first, and then skeletal information of the body may be acquired, or skeletal information of the hand may be acquired first, and then skeletal information of the body may be acquired.

つぎに、情報統合部１４により、前記手の骨格情報及び前記身体の骨格情報を統合する（Ｓ１４）。前記統合した情報をもとに、身体における手の骨格の位置及び手の位置を算出する（Ｓ１５）。 Next, the information integration unit 14 integrates the hand skeletal information and the body skeletal information (S14). Based on the integrated information, the position of the hand skeleton on the body and the position of the hand are calculated (S15).

つぎに、動作認識部１５により、前記統合して算出した前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報を時系列順に整理し、前記整理された前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報から手の動作情報を認識して手話単語を推定する（Ｓ１６）。前記手話は、特に限定されないが、例えば、日本手話、日本語対応手話、及び中間型手話等があげられる。世界各国で使用されている手話であってもよい。前記手話単語の推定は、例えば、従来公知の方法によりすることができる。前記手の動作情報の認識は、例えば、動作認識モデルを用いて認識してもよい。 Next, the action recognition unit 15 organizes the integrated and calculated hand skeletal position and hand position, and the body skeletal information in chronological order, and recognizes hand action information from the organized hand skeletal position and hand position, and the body skeletal information to estimate a sign language word (S16). The sign language is not particularly limited, but examples include Japanese Sign Language, Japanese-compatible Sign Language, and intermediate sign language. It may also be a sign language used in countries around the world. The sign language word can be estimated, for example, by a conventionally known method. The hand action information may be recognized, for example, by using an action recognition model.

さらに、本装置１０が、例えば、出力部を含む場合（図示せず）、前記出力部は、前記取得した口語会話を文字または音声によりユーザ端末装置に出力しても良い。前記出力部を含む場合、例えば、中央処理装置１０１が前記出力部として機能しても良い。前記ユーザ端末措置は、例えば、パーソナルコンピュータ（ＰＣ、例えば、デスクトップ型、ノート型）、スマートフォン、タブレット端末等であってもよい。前記出力が文字である場合、例えば、出力装置１０６のＬＥＤディスプレイ、液晶ディスプレイ等の表示装置により出力してもよい。前記出力が音声である場合、例えば、スピーカ等の音声出力装置により出力してもよい。 Furthermore, if the device 10 includes, for example, an output unit (not shown), the output unit may output the acquired spoken conversation to a user terminal device as text or audio. If the device 10 includes the output unit, for example, the central processing unit 101 may function as the output unit. The user terminal device may be, for example, a personal computer (PC, for example, desktop type or notebook type), a smartphone, a tablet terminal, etc. If the output is text, it may be output by a display device such as an LED display or liquid crystal display of the output device 106. If the output is audio, it may be output by an audio output device such as a speaker, for example.

本実施形態によれば、前述のとおり、手の骨格座標及び身体の骨格座標をもとに手話会が推定される。したがって、例えば、手話パターンライブラリを使用した従来の手話翻訳技術とは異なり、ヒトの性別や背丈、衣服の色などの違いによらず、高精度の手話翻訳が可能となる。また、ヒトの性別や背丈、衣服の色などの違いを想定した膨大な手話パターンライブラリを準備する必要がない点にもメリットがある。 According to this embodiment, as described above, the sign language is estimated based on the skeletal coordinates of the hand and the skeletal coordinates of the body. Therefore, unlike conventional sign language translation techniques that use a sign language pattern library, for example, highly accurate sign language translation is possible regardless of differences in a person's gender, height, clothing color, etc. Another advantage is that there is no need to prepare a huge sign language pattern library that takes into account differences in a person's gender, height, clothing color, etc.

［実施形態２］
本実施形態は、本発明の手話翻訳処理装置及び手話翻訳処理方法のその他の例である。図４のとおり、本装置１０は、例えば、さらに補正部１６を含んでもよい。また、図５のとおり、例えば、中央処理装置１０１が補正部１６として機能してもよい。 [Embodiment 2]
The present embodiment is another example of the sign language translation processing device and the sign language translation processing method of the present invention. As shown in Fig. 4, the device 10 may further include, for example, a correction unit 16. Also, as shown in Fig. 5, for example, the central processing unit 101 may function as the correction unit 16.

つぎに、本実施形態の手話翻訳処理方法の一例を、図６のフローチャートＳ２０に基づき説明する。本実施形態の手話翻訳処理方法は、例えば、図４又は図５の装置１０を用いて、次のように実施する。なお、本実施形態の手話翻訳処理方法は、図４又は図５の装置１０の使用には限定されない。 Next, an example of the sign language translation processing method of this embodiment will be described based on the flowchart S20 of FIG. 6. The sign language translation processing method of this embodiment is implemented as follows, for example, using the device 10 of FIG. 4 or FIG. 5. Note that the sign language translation processing method of this embodiment is not limited to the use of the device 10 of FIG. 4 or FIG. 5.

まず、実施形態１のフロー（Ｓ１１）と同じく、手を含む身体の画像を取得する（Ｓ２１）。つぎに、手の骨格情報を取得する。ここで、前記手の骨格情報は、さらに、手の検出領域座標を含み、手の検出領域座標を取得する（Ｓ２２０）。その後、手の骨格座標を取得する（Ｓ２２１）。つぎに、補正部１６は、前記手の骨格座標から前記手の検出領域座標を補正し、保存する（Ｓ２２２）。保存された補正後の座標は、例えば、次フレームの手の骨格座標を取得する際に使用することができる。前記手の骨格情報は、前述のとおり、例えば、従来公知の方法により取得し、取得することができる。前記手の検出領域座標は、例えば、手の検出モデルを用いて取得してもよい。 First, as in the flow (S11) of the first embodiment, an image of the body including the hand is acquired (S21). Next, hand skeletal information is acquired. Here, the hand skeletal information further includes hand detection area coordinates, and the hand detection area coordinates are acquired (S220). Thereafter, the hand skeletal coordinates are acquired (S221). Next, the correction unit 16 corrects the hand detection area coordinates from the hand skeletal coordinates and stores them (S222). The stored corrected coordinates can be used, for example, when acquiring the hand skeletal coordinates of the next frame. The hand skeletal information can be acquired, for example, by a conventionally known method, as described above. The hand detection area coordinates may be acquired, for example, by using a hand detection model.

その他のフローは、実施形態１におけるＳ１３からＳ１６のフローと同様である（Ｓ２２～Ｓ２６）。 The rest of the flow is the same as steps S13 to S16 in embodiment 1 (S22 to S26).

本実施形態のとおり、手の検出領域座標を取得してから手の骨格座標を取得することで、例えば、手の検出領域座標を取得せずに手の骨格座標を取得する場合と比べて、手の骨格情報を素早く取得することができる。また、前述のとおり、手の検出領域座標を補正し、保存することで、例えば、次フレームで手の骨格座標を検出する際の精度が向上する。 As in this embodiment, by acquiring the hand detection area coordinates and then the hand skeletal coordinates, hand skeletal information can be acquired more quickly than, for example, acquiring the hand skeletal coordinates without acquiring the hand detection area coordinates. Also, as described above, correcting and saving the hand detection area coordinates improves the accuracy of detecting the hand skeletal coordinates in the next frame, for example.

［実施形態３］
実施形態１及び２において、本装置１０が、さらに記憶部を含む場合、例えば、記憶装置１０４は、前記記憶部として機能する。前記記憶部は、例えば、手の動作情報と手話単語とを紐づけて記憶することができる。このとき、前記動作認識部は、前記手の動作情報と、前記紐づけて記憶された手の動作情報及び手話単語とから、手話単語を推定することができる。 [Embodiment 3]
In the first and second embodiments, when the device 10 further includes a storage unit, for example, the storage device 104 functions as the storage unit. The storage unit can store, for example, hand movement information and sign language words in association with each other. In this case, the movement recognition unit can estimate the sign language word from the hand movement information and the associated and stored hand movement information and sign language word.

［実施形態４］
次に、実施形態１から３のいずれかの装置１０及び、ユーザ端末を含む、手話翻訳処理システムの一例を図７に示す。前記ユーザ端末は、手を含む身体の画像を取得可能である。 [Embodiment 4]
Next, an example of a sign language translation processing system including the device 10 according to any one of the first to third embodiments and a user terminal is shown in Fig. 7. The user terminal is capable of acquiring an image of the body including the hands.

図７のとおり、ろう者（手話話者）は、本装置１０に対して手話を入力する。図７において、本装置１０は、例えば、スマートフォンやタブレット端末等の機器であってもよく、前記機器が備えるカメラに向かって手話を行い、本装置１０に対して手話を入力してもよい。本装置１０は、例えば、実施形態１から３のいずれかの処理を行って手話翻訳を実施し、その翻訳結果をユーザ端末へ出力する。実施形態１から３のいずれかの処理は、本装置１０を備える前記機器自体が行ってもよいし、本装置１０の各部を備えるサーバが行ってもよい。出力された翻訳結果は、例えば、ユーザ端末の表示画面に文字として表示されてもよいし、スピーカによって音声出力されてもよい。聴者（非手話話者）は、ユーザ端末に出力された手話翻訳の結果を確認することができる。 As shown in FIG. 7, a deaf person (sign language speaker) inputs sign language into the device 10. In FIG. 7, the device 10 may be, for example, a device such as a smartphone or a tablet terminal, and the user may sign into a camera of the device and input the sign language into the device 10. The device 10 performs sign language translation by, for example, performing any of the processes of embodiments 1 to 3, and outputs the translation result to a user terminal. Any of the processes of embodiments 1 to 3 may be performed by the device itself that includes the device 10, or may be performed by a server that includes each part of the device 10. The output translation result may be displayed as characters on a display screen of the user terminal, or may be output as audio by a speaker. A hearing person (non-sign language speaker) can check the sign language translation result output to the user terminal.

［実施形態５］
本実施形態のプログラムは、前述の各工程を、コンピュータに実行させるためのプログラムである。具体的に、本実施形態のプログラムは、コンピュータに、画像取得手順、手骨格情報取得手順、身体骨格情報取得手順、情報統合手順、及び動作認識手順を実行させるためのプログラムである。 [Embodiment 5]
The program of the present embodiment is a program for causing a computer to execute each of the above-mentioned steps. Specifically, the program of the present embodiment is a program for causing a computer to execute an image acquisition procedure, a hand skeletal information acquisition procedure, a body skeletal information acquisition procedure, an information integration procedure, and a motion recognition procedure.

前記画像取得手順は、手を含む身体の画像を取得し、
前記画像は、経時的に撮像された複数の時系列画像であり、
前記手骨格情報取得手順は、前記取得した画像から手の骨格情報を取得し、
前記手の骨格情報は、手の骨格座標を含み、
前記身体骨格情報取得手順は、前記取得した画像から身体の骨格情報を取得し、
前記身体の骨格情報は、身体の骨格座標を含み、
前記情報統合手順は、前記手の骨格情報及び前記身体の骨格情報を統合して、身体における手の骨格の位置及び手の位置を算出し、
前記動作認識手順は、前記統合して算出した前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報を時系列順に整理し、前記整理された前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報から手の動作情報を認識して手話単語を推定する。 The image acquisition step includes acquiring an image of a body including a hand;
The images are a plurality of time-series images captured over time,
The hand skeletal information acquisition step acquires hand skeletal information from the acquired image,
The hand skeletal information includes hand skeletal coordinates;
The body skeleton information acquisition step acquires body skeleton information from the acquired image,
The body skeletal information includes body skeletal coordinates;
the information integration step includes integrating the hand skeletal information and the body skeletal information to calculate a position of the hand skeletal position on the body and a position of the hand;
The movement recognition procedure organizes the integrated and calculated hand skeletal position and hand position, as well as the body skeletal information, in chronological order, and recognizes hand movement information from the organized hand skeletal position and hand position, as well as the body skeletal information, to infer a sign language word.

また、本実施形態のプログラムは、コンピュータを、画像取得手順、手骨格情報取得手順、身体骨格情報取得手順、情報統合手順、及び動作認識手順として機能させるプログラムということもできる。 The program of this embodiment can also be said to cause a computer to function as an image acquisition procedure, a hand skeletal information acquisition procedure, a body skeletal information acquisition procedure, an information integration procedure, and a motion recognition procedure.

本実施形態のプログラムは、前記本発明の手話翻訳処理装置および手話翻訳処理方法における記載を援用できる。前記各手順は、例えば、「手順」を「処理」と読み替え可能である。また、本実施形態のプログラムは、例えば、コンピュータ読み取り可能な記録媒体に記録されてもよい。前記記録媒体は、例えば、非一時的なコンピュータ可読記録媒体（non-transitory computer-readable storage medium）である。前記記録媒体は、特に制限されず、例えば、ランダムアクセスメモリ（ＲＡＭ）、読み出し専用メモリ（ＲＯＭ）、ハードディスク（ＨＤ）、光ディスク、フロッピー（登録商標）ディスク（ＦＤ）等があげられる。 The program of this embodiment can use the description of the sign language translation processing device and the sign language translation processing method of the present invention. For example, the "procedure" in each of the steps can be read as "processing". The program of this embodiment can be recorded in a computer-readable recording medium. The recording medium is, for example, a non-transitory computer-readable storage medium. The recording medium is not particularly limited, and examples include random access memory (RAM), read-only memory (ROM), hard disk (HD), optical disk, and floppy (registered trademark) disk (FD).

以上、実施形態を参照して本発明を説明したが、本発明は、上記実施形態に限定されるものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解しうる様々な変更をできる。 The present invention has been described above with reference to the embodiments, but the present invention is not limited to the above embodiments. Various modifications that can be understood by a person skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

＜付記＞
上記の実施形態の一部または全部は、以下の付記のように記載されうるが、以下には限られない。
（付記１）
画像取得部、手骨格情報取得部、身体骨格情報取得部、情報統合部、及び動作認識部を含み、
前記画像取得部は、手を含む身体の画像を取得し、
前記画像は、経時的に撮像された複数の時系列画像であり、
前記手骨格情報取得部は、前記取得した画像から手の骨格情報を取得し、
前記手の骨格情報は、手の骨格座標を含み、
前記身体骨格情報取得部は、前記取得した画像から身体の骨格情報を取得し、
前記身体の骨格情報は、身体の骨格座標を含み、
前記情報統合部は、前記手の骨格情報及び前記身体の骨格情報を統合して、身体における手の骨格の位置及び手の位置を算出し、
前記動作認識部は、前記統合して算出した前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報を時系列順に整理し、前記整理された前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報から手の動作情報を認識して手話単語を推定する、
手話翻訳処理装置。
（付記２）
さらに、出力部を含み、
前記出力部は、前記手話単語を文字又は音声によりユーザ端末装置に出力する、付記１記載の手話翻訳処理装置。
（付記３）
前記手の骨格座標は、手の骨格検出モデルを用いて検出され、
前記身体の骨格座標は、身体骨格検出モデルを用いて検出され、
前記手の動作情報は、動作認識モデルを用いて認識される、
付記１又は２記載の手話翻訳処理装置。
（付記４）
さらに、補正部を含み、
前記手の骨格情報は、さらに、手の検出領域座標を含み、
前記補正部は、前記手の骨格座標から前記手の検出領域座標を補正する、
付記１から３のいずれかに記載の手話翻訳処理装置。
（付記５）
前記手の検出領域座標は、手の検出モデルを用いて検出される、
付記４記載の手話翻訳処理装置。
（付記６）
さらに、手の動作情報と手話単語とを紐づけて記憶する記憶部を含み、
前記動作認識部は、前記手の動作情報と、前記紐づけて記憶された手の動作情報及び手話単語とから、手話単語を推定する、
付記１から５のいずれかに記載の手話翻訳処理装置。
（付記７）
手話翻訳処理装置、及びユーザ端末を含み、
前記手話翻訳処理装置が、付記１から６のいずれかに記載の手話翻訳処理装置であり、
前記ユーザ端末が、手を含む身体の画像を取得可能であり、
前記画像は、経時的に撮像された複数の時系列画像である、
手話翻訳処理システム。
（付記８）
画像取得工程、手骨格情報取得工程、身体骨格情報取得工程、情報統合工程、及び動作認識工程を含み、
前記画像取得工程は、手を含む身体の画像を取得し、
前記画像は、経時的に撮像された複数の時系列画像であり、
前記手骨格情報取得工程は、前記取得した画像から手の骨格情報を取得し、
前記手の骨格情報は、手の骨格座標を含み、
前記身体骨格情報取得工程は、前記取得した画像から身体の骨格情報を取得し、
前記身体の骨格情報は、身体の骨格座標を含み、
前記情報統合工程は、前記手の骨格情報及び前記身体の骨格情報を統合して、身体における手の骨格の位置及び手の位置を算出し、
前記動作認識工程は、前記統合して算出した前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報を時系列順に整理し、前記整理された前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報から手の動作情報を認識して手話単語を推定する、
手話翻訳処理方法。
（付記９）
さらに、出力工程を含み、
前記出力工程は、前記手話単語を文字又は音声によりユーザ端末装置に出力する、付記８記載の手話翻訳処理方法。
（付記１０）
前記手の骨格座標は、手の骨格検出モデルを用いて検出され、
前記身体の骨格座標は、身体骨格検出モデルを用いて検出され、
前記手の動作情報は、動作認識モデルを用いて認識される、
付記８又は９記載の手話翻訳処理方法。
（付記１１）
さらに、補正工程を含み、
前記手の骨格情報は、さらに、手の検出領域座標を含み、
前記補正工程は、前記手の骨格座標から前記手の検出領域座標を補正する、
付記８から１０のいずれかに記載の手話翻訳処理方法。
（付記１２）
前記手の検出領域座標は、手の検出モデルを用いて検出される、
付記１１記載の手話翻訳処理方法。
（付記１３）
前記動作認識工程は、前記手の動作情報と、紐づけて記憶された手の動作情報及び手話単語とから、手話単語を推定する、付記８から１２のいずれかに記載の手話翻訳処理方法。
（付記１４）
画像取得手順、手骨格情報取得手順、身体骨格情報取得手順、情報統合手順、及び動作認識手順を含み、
前記画像取得手順は、手を含む身体の画像を取得し、
前記画像は、経時的に撮像された複数の時系列画像であり、
前記手骨格情報取得手順は、前記取得した画像から手の骨格情報を取得し、
前記手の骨格情報は、手の骨格座標を含み、
前記身体骨格情報取得手順は、前記取得した画像から身体の骨格情報を取得し、
前記身体の骨格情報は、身体の骨格座標を含み、
前記情報統合手順は、前記手の骨格情報及び前記身体の骨格情報を統合して、身体における手の骨格の位置及び手の位置を算出し、
前記動作認識手順は、前記統合して算出した前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報を時系列順に整理し、前記整理された前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報から手の動作情報を認識して手話単語を推定し、
前記各手順をコンピュータに実行させるためのプログラム。
（付記１５）
さらに、出力手順を含み、
前記出力手順は、前記手話単語を文字又は音声によりユーザ端末装置に出力する、付記１４記載のプログラム。
（付記１６）
前記手の骨格座標は、手の骨格検出モデルを用いて検出され、
前記身体の骨格座標は、身体骨格検出モデルを用いて検出され、
前記手の動作情報は、動作認識モデルを用いて認識される、
付記１４又は１５記載のプログラム。
（付記１７）
さらに、補正手順を含み、
前記手の骨格情報は、さらに、手の検出領域座標を含み、
前記補正手順は、前記手の骨格座標から前記手の検出領域座標を補正する、
付記１４から１６のいずれかに記載のプログラム。
（付記１８）
前記手の検出領域座標は、手の検出モデルを用いて検出される、
付記１７記載のプログラム。
（付記１９）
前記動作認識手順は、前記手の動作情報と、紐づけて記憶された手の動作情報及び手話単語とから、手話単語を推定する、付記１４から１８のいずれかに記載のプログラム。
（付記２０）
画像取得手順、手骨格情報取得手順、身体骨格情報取得手順、情報統合手順、及び動作認識手順を含み、
前記画像取得手順は、手を含む身体の画像を取得し、
前記画像は、経時的に撮像された複数の時系列画像であり、
前記手骨格情報取得手順は、前記取得した画像から手の骨格情報を取得し、
前記手の骨格情報は、手の骨格座標を含み、
前記身体骨格情報取得手順は、前記取得した画像から身体の骨格情報を取得し、
前記身体の骨格情報は、身体の骨格座標を含み、
前記情報統合手順は、前記手の骨格情報及び前記身体の骨格情報を統合して、身体における手の骨格の位置及び手の位置を算出し、
前記動作認識手順は、前記統合して算出した前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報を時系列順に整理し、前記整理された前記手の骨格の位置及び前記手の位置、並びに前記身体の骨格情報から手の動作情報を認識して手話単語を推定し、
前記各手順をコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体。
（付記２１）
さらに、出力手順を含み、
前記出力手順は、前記手話単語を文字又は音声によりユーザ端末装置に出力する、付記２０記載の記録媒体。
（付記２２）
前記手の骨格座標は、手の骨格検出モデルを用いて検出され、
前記身体の骨格座標は、身体骨格検出モデルを用いて検出され、
前記手の動作情報は、動作認識モデルを用いて認識される、
付記２０又は２１記載の記録媒体。
（付記２３）
さらに、補正手順を含み、
前記手の骨格情報は、さらに、手の検出領域座標を含み、
前記補正手順は、前記手の骨格座標から前記手の検出領域座標を補正する、
付記２０から２２のいずれかに記載の記録媒体。
（付記２４）
前記手の検出領域座標は、手の検出モデルを用いて検出される、
付記２３記載の記録媒体。
（付記２５）
前記動作認識手順は、前記手の動作情報と、紐づけて記憶された手の動作情報及び手話単語とから、手話単語を推定する、付記２０から２４のいずれかに記載の記録媒体。 <Additional Notes>
Some or all of the above embodiments may be described as follows, but are not limited to the following:
(Appendix 1)
The present invention includes an image acquisition unit, a hand skeleton information acquisition unit, a body skeleton information acquisition unit, an information integration unit, and a motion recognition unit,
The image acquisition unit acquires an image of a body including a hand,
the images are a plurality of time-series images captured over time,
the hand skeletal information acquisition unit acquires hand skeletal information from the acquired image;
The hand skeletal information includes hand skeletal coordinates;
the body skeletal information acquisition unit acquires body skeletal information from the acquired image;
The body skeletal information includes body skeletal coordinates;
the information integration unit integrates the hand skeletal information and the body skeletal information to calculate a position of the hand skeletal position on the body and a position of the hand;
the movement recognition unit organizes the integrated and calculated hand skeletal position and hand position, and the body skeletal information in chronological order, and recognizes hand movement information from the organized hand skeletal position and hand position, and the body skeletal information, to estimate a sign language word.
Sign language translation processing device.
(Appendix 2)
Further, an output unit is included,
2. The sign language translation processing device according to claim 1, wherein the output unit outputs the sign language words to a user terminal device by characters or voice.
(Appendix 3)
the hand skeleton coordinates are detected using a hand skeleton detection model;
The body skeletal coordinates are detected using a body skeletal detection model;
The hand movement information is recognized using a movement recognition model.
3. A sign language translation processing device according to claim 1 or 2.
(Appendix 4)
Further, a correction unit is included,
The hand skeleton information further includes hand detection area coordinates,
The correction unit corrects the detection area coordinates of the hand from the skeletal coordinates of the hand.
4. A sign language translation processing device according to any one of claims 1 to 3.
(Appendix 5)
The hand detection region coordinates are detected using a hand detection model.
5. A sign language translation processing device as claimed in claim 4.
(Appendix 6)
Further, a storage unit is included which stores hand movement information and sign language words in association with each other,
The action recognition unit estimates a sign language word from the hand action information and the associated and stored hand action information and sign language word.
6. A sign language translation processing device according to any one of appendix 1 to 5.
(Appendix 7)
A sign language translation processing device and a user terminal,
The sign language translation processing device is the sign language translation processing device according to any one of Supplementary Note 1 to 6,
The user terminal is capable of acquiring an image of a body including a hand;
The images are a plurality of time-series images captured over time.
Sign language translation processing system.
(Appendix 8)
The method includes an image acquisition step, a hand skeleton information acquisition step, a body skeleton information acquisition step, an information integration step, and a motion recognition step,
The image capturing step captures an image of a body including a hand;
the images are a plurality of time-series images captured over time,
The hand skeletal information acquiring step acquires hand skeletal information from the acquired image,
The hand skeletal information includes hand skeletal coordinates;
The body skeleton information acquiring step acquires body skeleton information from the acquired image,
The body skeletal information includes body skeletal coordinates;
the information integration step includes integrating the hand skeletal information and the body skeletal information to calculate a position of the hand skeletal position on the body and a position of the hand;
the action recognition step includes chronologically arranging the integrated and calculated hand skeletal position and hand position, and the body skeletal information, and recognizing hand action information from the organized hand skeletal position and hand position, and the body skeletal information, to estimate a sign language word;
A sign language translation processing method.
(Appendix 9)
Further, an output step is included,
9. The sign language translation processing method according to claim 8, wherein the output step outputs the sign language words to a user terminal device by characters or voice.
(Appendix 10)
the hand skeleton coordinates are detected using a hand skeleton detection model;
The body skeletal coordinates are detected using a body skeletal detection model;
The hand movement information is recognized using a movement recognition model.
10. A sign language translation processing method according to claim 8 or 9.
(Appendix 11)
Further, a correction step is included,
The hand skeleton information further includes hand detection area coordinates,
The correction step includes correcting coordinates of the detection area of the hand based on skeletal coordinates of the hand.
A sign language translation processing method according to any one of appendices 8 to 10.
(Appendix 12)
The hand detection region coordinates are detected using a hand detection model.
12. A sign language translation processing method as described in appendix 11.
(Appendix 13)
A sign language translation processing method according to any one of appendices 8 to 12, wherein the action recognition step estimates a sign language word from the hand action information and the hand action information and sign language word stored in association with each other.
(Appendix 14)
The method includes an image acquisition step, a hand skeleton information acquisition step, a body skeleton information acquisition step, an information integration step, and a motion recognition step.
The image acquisition step includes acquiring an image of a body including a hand;
The images are a plurality of time-series images captured over time,
The hand skeletal information acquisition step acquires hand skeletal information from the acquired image,
The hand skeletal information includes hand skeletal coordinates;
The body skeleton information acquisition step acquires body skeleton information from the acquired image,
The body skeletal information includes body skeletal coordinates;
the information integration step includes integrating the hand skeletal information and the body skeletal information to calculate a position of the hand skeletal position on the body and a position of the hand;
the motion recognition step includes chronologically organizing the integrated and calculated hand skeletal position and hand position, and the body skeletal information, and recognizing hand motion information from the organized hand skeletal position and hand position, and the body skeletal information, to estimate a sign language word;
A program for causing a computer to execute each of the above procedures.
(Appendix 15)
Further, an output procedure is included,
The program according to claim 14, wherein the output step outputs the sign language word to a user terminal device by text or voice.
(Appendix 16)
the hand skeleton coordinates are detected using a hand skeleton detection model;
The body skeletal coordinates are detected using a body skeletal detection model;
The hand movement information is recognized using a movement recognition model.
16. The program according to claim 14 or 15.
(Appendix 17)
Further, a correction procedure is included,
The hand skeleton information further includes hand detection area coordinates,
the correction step corrects the detection area coordinates of the hand from the skeleton coordinates of the hand;
17. The program according to any one of appendixes 14 to 16.
(Appendix 18)
The hand detection region coordinates are detected using a hand detection model.
18. The program according to claim 17.
(Appendix 19)
19. The program according to any one of appendices 14 to 18, wherein the action recognition step estimates a sign language word from the hand action information and the hand action information and the sign language word stored in association with each other.
(Appendix 20)
The method includes an image acquisition step, a hand skeleton information acquisition step, a body skeleton information acquisition step, an information integration step, and a motion recognition step.
The image acquisition step includes acquiring an image of a body including a hand;
the images are a plurality of time-series images captured over time,
The hand skeletal information acquisition step acquires hand skeletal information from the acquired image,
The hand skeletal information includes hand skeletal coordinates;
The body skeleton information acquisition step acquires body skeleton information from the acquired image,
The body skeletal information includes body skeletal coordinates;
the information integration step includes integrating the hand skeletal information and the body skeletal information to calculate a position of the hand skeletal position on the body and a position of the hand;
the motion recognition step includes chronologically organizing the integrated and calculated skeletal position of the hand and the position of the hand, and the skeletal information of the body, and recognizing hand motion information from the organized skeletal position of the hand and the position of the hand, and the skeletal information of the body, to estimate a sign language word;
A computer-readable recording medium having recorded thereon a program for causing a computer to execute each of the above procedures.
(Appendix 21)
Further, an output procedure is included,
21. The recording medium according to claim 20, wherein the output step outputs the sign language word to a user terminal device by text or voice.
(Appendix 22)
the hand skeleton coordinates are detected using a hand skeleton detection model;
The body skeletal coordinates are detected using a body skeletal detection model;
The hand movement information is recognized using a movement recognition model.
22. The recording medium according to claim 20 or 21.
(Appendix 23)
Further, a correction procedure is included,
The hand skeleton information further includes hand detection area coordinates,
the correction step corrects coordinates of the detection area of the hand from skeletal coordinates of the hand;
23. A recording medium according to any one of appendices 20 to 22.
(Appendix 24)
The hand detection region coordinates are detected using a hand detection model.
24. The recording medium according to claim 23.
(Appendix 25)
25. The recording medium according to any one of appendices 20 to 24, wherein the action recognition step estimates a sign language word from the hand action information and the hand action information and the sign language word stored in association with each other.

本発明によれば、高精度で手話翻訳をすることができる。本発明は、例えば、ろう者と聴者との円滑なコミュニケーションを目的とした手話翻訳処理装置に適用できるが、適用できる分野は制限されず、手話翻訳処理装置を用いた幅広い分野に適用可能である。 The present invention enables highly accurate sign language translation. For example, the present invention can be applied to a sign language translation processing device aimed at smooth communication between deaf and hearing people, but the fields of application are not limited, and the present invention can be applied to a wide range of fields using sign language translation processing devices.

１０手話翻訳処理装置
１１画像取得部
１２手骨格情報取得部
１３身体骨格情報取得部
１４情報統合部
１５動作認識部
１６補正部
１０１ＣＰＵ
１０２メモリ
１０３バス
１０４記憶装置
１０５入力装置
１０６出力装置
１０７通信デバイス

10 Sign language translation processing device 11 Image acquisition unit 12 Hand skeleton information acquisition unit 13 Body skeleton information acquisition unit 14 Information integration unit 15 Action recognition unit 16 Correction unit 101 CPU
102 memory 103 bus 104 storage device 105 input device 106 output device 107 communication device

Claims

The present invention includes an image acquisition unit, a hand skeletal information acquisition unit, a body skeletal information acquisition unit, an information integration unit, a motion recognition unit, and a correction unit,
The image acquisition unit acquires an image of a body including a hand in a sign language of a sign language speaker ,
The sign language is a sign language for communication between the sign language user and a non-sign language user,
the images are a plurality of time-series images captured over time,
the hand skeletal information acquisition unit acquires hand skeletal information from the acquired image;
The hand skeleton information includes hand skeleton coordinates and hand detection area coordinates;
the body skeletal information acquisition unit acquires body skeletal information from the acquired image;
The body skeletal information includes body skeletal coordinates;
the information integration unit integrates the hand skeletal information and the body skeletal information to calculate a position of the hand skeletal position on the body and a position of the hand;
the movement recognition unit organizes the integrated and calculated hand skeletal position and hand position, and the body skeletal information in chronological order, recognizes hand movement information from the organized hand skeletal position and hand position, and the body skeletal information, and estimates a sign language word;
The correction unit corrects the detection area coordinates of the hand from the skeletal coordinates of the hand.
Sign language translation processing device.

Further, an output unit is included,
The sign language translation processing device according to claim 1 , wherein the output unit outputs the sign language words to a user terminal device by characters or voice.

the hand skeleton coordinates are detected using a hand skeleton detection model;
The body skeletal coordinates are detected using a body skeletal detection model;
The hand movement information is recognized using a movement recognition model.
3. The sign language translation processing device according to claim 1 or 2.

The hand detection region coordinates are detected using a hand detection model.
3. The sign language translation processing device according to claim 1 or 2.

Further, a storage unit is included which stores hand movement information and sign language words in association with each other,
The action recognition unit estimates a sign language word from the hand action information and the associated and stored hand action information and sign language word.
3. The sign language translation processing device according to claim 1 or 2.

A sign language translation processing device and a user terminal,
The sign language translation processing device is a sign language translation processing device according to claim 1 or 2,
The user terminal is capable of acquiring an image of a body including a hand;
The images are a plurality of time-series images captured over time.
Sign language translation processing system.

The method includes an image acquisition step, a hand skeleton information acquisition step, a body skeleton information acquisition step, an information integration step, a motion recognition step, and a correction step,
The image acquisition step acquires an image of a body including a hand of a sign language user ,
The sign language is a sign language for communication between the sign language user and a non-sign language user,
The images are a plurality of time-series images captured over time,
The hand skeletal information acquiring step acquires hand skeletal information from the acquired image,
The hand skeleton information includes hand skeleton coordinates and hand detection area coordinates;
The body skeleton information acquiring step acquires body skeleton information from the acquired image,
The body skeletal information includes body skeletal coordinates;
the information integration step includes integrating the hand skeletal information and the body skeletal information to calculate a position of the hand skeletal position on the body and a position of the hand;
the action recognition step includes chronologically arranging the integrated and calculated hand skeletal position and hand position, and the body skeletal information, and recognizing hand action information from the organized hand skeletal position and hand position, and the body skeletal information, to estimate a sign language word;
The correction step includes correcting coordinates of the detection area of the hand based on skeletal coordinates of the hand.
A sign language translation processing method.

The method includes an image acquisition step, a hand skeleton information acquisition step, a body skeleton information acquisition step, an information integration step, a motion recognition step, and a correction step.
The image acquisition step includes acquiring an image of a body including a hand of a sign language user ;
The sign language is a sign language for communication between the sign language user and a non-sign language user,
The images are a plurality of time-series images captured over time,
The hand skeletal information acquisition step acquires hand skeletal information from the acquired image,
The hand skeleton information includes hand skeleton coordinates and hand detection area coordinates;
The body skeleton information acquisition step acquires body skeleton information from the acquired image,
The body skeletal information includes body skeletal coordinates;
the information integration step includes integrating the hand skeletal information and the body skeletal information to calculate a position of the hand skeletal position on the body and a position of the hand;
the motion recognition step includes chronologically organizing the integrated and calculated skeletal position of the hand and the position of the hand, and the skeletal information of the body, and recognizing hand motion information from the organized skeletal position of the hand and the position of the hand, and the skeletal information of the body, to estimate a sign language word;
the correction step corrects coordinates of the detection area of the hand from skeletal coordinates of the hand;
A program for causing a computer to execute each of the above procedures.

The method includes an image acquisition step, a hand skeleton information acquisition step, a body skeleton information acquisition step, an information integration step, a motion recognition step, and a correction step.
The image acquisition step includes acquiring an image of a body including a hand of a sign language user ;
The sign language is a sign language for communication between the sign language user and a non-sign language user,
the images are a plurality of time-series images captured over time,
The hand skeletal information acquisition step acquires hand skeletal information from the acquired image,
The hand skeleton information includes hand skeleton coordinates and hand detection area coordinates;
The body skeleton information acquisition step acquires body skeleton information from the acquired image,
The body skeletal information includes body skeletal coordinates;
the information integration step includes integrating the hand skeletal information and the body skeletal information to calculate a position of the hand skeletal position on the body and a position of the hand;
the motion recognition step includes chronologically organizing the integrated and calculated skeletal position of the hand and the position of the hand, and the skeletal information of the body, and recognizing hand motion information from the organized skeletal position of the hand and the position of the hand, and the skeletal information of the body, to estimate a sign language word;
the correction step corrects the detection area coordinates of the hand from the skeleton coordinates of the hand;
A computer-readable recording medium having recorded thereon a program for causing a computer to execute each of the above procedures.