JP2000305930A

JP2000305930A - Language conversion rule preparing device, language converter and program recording medium

Info

Publication number: JP2000305930A
Application number: JP11156484A
Authority: JP
Inventors: Yumi Wakita; 由実脇田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1998-06-04
Filing date: 1999-06-03
Publication date: 2000-11-02
Anticipated expiration: 2019-06-03
Also published as: JP3441400B2

Abstract

PROBLEM TO BE SOLVED: To surely convert an inputted voice sentence into an objective language even when an unlearned part exists in the inputted voice sentence and a mistake is made in a part of voice recognition and further to automatically prepare a phrase dictionary required for conversion and rules between phrases without any manual work to the utmost. SOLUTION: This device is provided with a language rule preparing part 2 to automatically and statistically learn an integral or semantic regulation rule for a partial word or a word string from a translation copath and to describe the rule in a form that sentences of an objective language part and an original language part are answered to each other a voice recognition part 7 to perform the vice recognition of voice sentence of the original language by using the prepared language rule and to output a recognized result, and a language conversion part 9 to convert the sentence in the original language into the one in the objective language by using the same language rule.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力音声まだは入
力テキストを、他言語または他の文体型などに変換して
出力する言語変換装置とその変換規則を作成する言語変
換規則作成装置に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a language conversion device for converting input speech or input text into another language or another stylistic type and outputting the same, and a language conversion rule creating device for creating the conversion rules.

【０００２】[0002]

【従来の技術】以下、従来の技術を言語変換装置の１つ
である、入力音声を他言語に翻訳（以下通訳と呼ぶ）す
る装置を例にして説明する。2. Description of the Related Art The prior art will be described below with reference to an example of an apparatus for translating an input voice into another language (hereinafter referred to as an interpreter), which is one of the language conversion apparatuses.

【０００３】通訳装置は、音響信号として入力された発
声文を単語テキスト列で表示された出力文に変換するた
めの音声認識と、単語テキスト列で表示された文を入力
し他言語文に翻訳する言語翻訳とを順次実行することで
通訳を実現している。さらに上記言語翻訳部は、入力文
の統語的または意味的構造を解析する言語解析部と、解
析結果に基づいて他言語に変換する言語変換部と、翻訳
結果から自然な出力文を生成する出力文生成部とから構
成されている。[0003] The interpreting apparatus recognizes speech for converting an utterance sentence input as an acoustic signal into an output sentence displayed as a word text string, and inputs a sentence displayed as a word text string and translates the sentence into another language sentence. Interpretation is realized by sequentially executing language translation. The language translation unit further includes a language analysis unit that analyzes a syntactic or semantic structure of the input sentence, a language conversion unit that converts the input sentence into another language based on the analysis result, and an output that generates a natural output sentence from the translation result. And a sentence generation unit.

【０００４】しかし、音声認識部が発声文の一部を誤認
識した場合や、文にあいづちや言い直しなどが挿入され
たり、文として不完結なまま発声を終えてしまうなど、
発声文自体が統語的または意味的にも不自然な場合は、
音声認識結果を言語解析部に入力しても解析が失敗し、
結果的に翻訳結果が出力されないという問題があった。However, when the speech recognition unit misrecognizes a part of the utterance sentence, inserts or rephrases the sentence, or ends the utterance without completing the sentence.
If the utterance itself is not syntactically or semantically unnatural,
Even if the speech recognition result is input to the language analysis unit, the analysis fails,
As a result, there is a problem that a translation result is not output.

【０００５】この問題を解決するために、フレーズに分
割し、フレーズ内とフレーズ間とを分けて規則化し、不
完結な発声にはフレーズ内規則のみを用いて解析し、解
析結果の出力を可能にするように構成することである。
（たとえば竹沢、森元：電子通信学会論文誌 D-II,Vo
l.J79-D-II(12)）。図１４は従来のフレーズ内及びフレ
ーズ間規則例である。この例では、コーパス例３０１の
「今晩シングルの部屋の予約お願いね」に対して、
フレーズ内規則は、書き言葉にも共通な文法規則に基づ
きフレーズ内規則３０２のような木構造で記述し、フレ
ーズ間規則は、学習用コーパスにおけるフレーズ間の隣
接確率で記述されている。例えばフレーズ間規則はフレ
ーズ間規則３０３のように記述される。[0005] In order to solve this problem, the phrase is divided into phrases, the inside of the phrase and between the phrases are regularized, and incomplete utterances are analyzed using only the rules within the phrase, and the analysis result can be output. That is, it is configured.
(For example, Takezawa, Morimoto: IEICE Transactions D-II, Vo
l.J79-D-II (12)). FIG. 14 shows an example of a conventional intra-phrase and inter-phrase rule. In this example, for the corpus example 301 “Please book a single room tonight”,
The intra-phrase rule is described in a tree structure like the intra-phrase rule 302 based on a grammar rule common to written words, and the inter-phrase rule is described by an adjacent probability between phrases in the learning corpus. For example, an inter-phrase rule is described as an inter-phrase rule 303.

【０００６】入力文を解析する際には、文頭から順次フ
レーズ内規則を当てはめ、フレーズの終端では、各フレ
ーズ毎に隣接確率の高いフレーズ候補が隣接するように
フレーズを接続しながら入力文解析が行われる。このよ
うな文解析方法では、文の一部が誤認識を起こし通常の
文全体の解析が失敗する場合でも、誤認識を含まない部
分のフレーズ解析は正しく行われるため、解析された部
分フレーズのみを翻訳することにより、翻訳結果を部分
的に出力できる枠組みになっている。When analyzing an input sentence, rules within a phrase are applied sequentially from the beginning of the sentence, and at the end of the phrase, the input sentence analysis is performed while connecting the phrases such that a phrase candidate having a high adjacent probability is adjacent to each phrase. Done. With such a sentence analysis method, even if a part of the sentence is misrecognized and the analysis of the entire sentence fails, the phrase analysis of the part that does not include the misrecognition is performed correctly, so only the analyzed partial phrase is used. Is a framework that can partially output the translation result.

【０００７】また、この問題に解決するために、従来の
文法に則って言語解析を行うのではなく、従来の文法で
は解析できないような発声文も含めた発声文例から、対
応する原言語文と目的言語文の対訳フレーズを抽出し、
このフレーズ対をなるべく一般化した形で記述された対
訳フレーズ辞書を作成し、この辞書を用いて言語解析と
言語変換とを行う方法も提案されている（たとえば、古
瀬、隅田、飯田：情報処理学会論文誌Vol35,no3,1994-
3）。図１５は従来の言語変換規則作成装置である。通
訳を行う前に、予め発声文対訳コーパスから対訳フレー
ズ辞書を作成する。ここでも、一部の単語が誤ったり省
略されたりすることを考慮し、発声文例をフレーズ毎に
分割し、フレーズ内規則とフレーズ間の依存規則とを作
成している。まず形態素解析部３６０で、原言語文と目
的言語文との形態素解析を行ない、各文を形態素列に変
換する。次にフレーズ決定部３６１で、原言語及び目的
言語の形態素例をフレーズ単位に分割し、フレーズ内規
則とフレーズ間の依存関係規則を作成する。この際のフ
レーズ単位は、意味的にまとまった単位であることに加
えて、対訳において対応関係が明らかな部分文であるこ
とを考慮して人手で決定される。たとえば、「部屋の予
約をお願いしたいんですが」「 I’d like toreserve a
room」という対訳文例は、(a)「部屋の予約」「reserv
e a room」,(b)「をお願いしたいんですが」「I’d lik
e to」という(a)(b)２つの対訳フレーズに分割され、
「(a)を(b)する」「(b) to (a)」という依存関係が規則
化される。上記対訳フレーズは対訳フレーズ辞書３６２
に、フレーズ間の依存関係を対訳の形で表されたものは
フレーズ間規則テーブル３６３に各々保管される。この
ような処理が対訳コーパスに含まれた全発声文分につい
て行われる。このフレーズの分割と依存関係は、文の意
味的情報やどの程度文法的に崩れていないかの度合いな
どのファクターから決定されるため、自動的に各文につ
いて決定することが難しく、従来は人手で決定されてい
る。In order to solve this problem, language analysis is not performed according to the conventional grammar, but a corresponding source language sentence is extracted from an utterance example including utterances that cannot be analyzed by the conventional grammar. Extract the bilingual phrase of the target language sentence,
There has also been proposed a method of creating a bilingual phrase dictionary in which the phrase pairs are described in a generalized form as much as possible, and performing language analysis and language conversion using the dictionary (for example, Furuse, Sumida, Iida: Information Processing) Journal 35, no3, 1994-
3). FIG. 15 shows a conventional language conversion rule creation device. Before translating, a bilingual phrase dictionary is created from the utterance bilingual corpus in advance. Also in this case, the utterance sentence example is divided for each phrase in consideration of the fact that some words are erroneous or omitted, and rules within phrases and dependency rules between phrases are created. First, the morphological analysis unit 360 performs a morphological analysis of the source language sentence and the target language sentence, and converts each sentence into a morphological sequence. Next, the phrase determination unit 361 divides the morpheme examples of the source language and the target language into phrases, and creates rules within the phrases and rules for dependency between the phrases. At this time, the phrase unit is determined manually by taking into account that the phrase unit is a partial sentence whose correspondence is clear in the bilingual translation, in addition to a semantically unitized unit. For example, "I'd like to request a room reservation""I'd like toreserve a
(a) "Reserving a room" and "reserv
ea room ", (b)"I'd like to ask, ""I'd lik
e to "(a) (b)
Dependencies such as "(b) to (a)" and "(b) to (a)" are regularized. The bilingual phrase is a bilingual phrase dictionary 362
In the meantime, translations between phrases in a bilingual form are stored in the inter-phrase rule table 363. Such processing is performed for all utterance sentences included in the bilingual corpus. Since the phrase division and dependency depend on factors such as the semantic information of the sentence and the degree to which the sentence is not broken grammatically, it is difficult to automatically determine each sentence. Has been determined.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、第１の
従来例における文解析手段においては、扱っているフレ
ーズは原言語のみに依存した言語依存フレーズであり、
目的言語のフレーズ単位とは合わない場合が多い。その
ため、原言語においては正しいフレーズを言語変換部に
入力しても、結局は受理できない場合が多い、という問
題を有している。この第１の従来例の枠組みは、言語非
依存フレーズを用いても可能な枠組みではあるが、その
場合は、言語非依存フレーズの解析を人手で作成する必
要があり、開発に時間がかかる、人手の作成基準の揺れ
が規則性能を歪ませるという新たな問題が生じる。However, in the sentence analyzing means in the first conventional example, the phrase handled is a language-dependent phrase that depends only on the source language.
It often does not match the phrase unit of the target language. Therefore, there is a problem in that even if a correct phrase is input to the language conversion unit in the source language, it is often not accepted after all. Although the framework of the first conventional example is a framework that can be used even using language-independent phrases, it is necessary to manually create an analysis of language-independent phrases, and development takes time. A new problem arises in that the fluctuations in the manual production standards distort the rule performance.

【０００９】また、第２の従来例における対訳フレーズ
辞書作成方法においては、発声文の意味的情報や文法的
情報を自動的に解析できる手段がないために、人手で作
成しなければならない。そのため、開発に時間がかか
り、人手の作成基準の揺れが規則性能を歪ませるという
問題点がある。たとえば、通訳装置の目標となるタスク
を変更したり、原言語及び目的言語の言語種が変更にな
った場合は、一度構築した規則を適応できずにはじめか
ら規則を作成しなければならず、開発効率が悪く手間が
かかる。Further, in the method of creating a bilingual phrase dictionary in the second conventional example, since there is no means for automatically analyzing the semantic information and grammatical information of an utterance sentence, it must be manually created. For this reason, there is a problem that development takes time, and the fluctuation of the creation standard manually distorts the rule performance. For example, if the target task of the interpreter is changed, or if the language type of the source language and the target language is changed, the rules that have been constructed cannot be applied and rules must be created from the beginning. The development efficiency is poor and it takes time.

【００１０】また、上記フレーズ辞書３６２やフレーズ
間規則３６３は、対訳コーパスの対応関係を重視してフ
レーズ単位を決定しており、音声認識部３６４が認識す
るのに適切なフレーズ単位であるかどうかの評価がなさ
れているものではない。音声認識にとって適切なフレー
ズかどうかを人手で判断しながらフレーズ単位を決める
ことは困難であり、決定されたフレーズを用いて認識し
た場合、認識率が確保できる保証がない、という課題を
有している。The phrase dictionary 362 and the inter-phrase rule 363 determine the phrase unit with emphasis on the correspondence of the bilingual corpus, and determine whether the phrase unit is appropriate for the speech recognition unit 364 to recognize. Has not been evaluated. It is difficult to determine the phrase unit while manually determining whether the phrase is appropriate for speech recognition, and there is no guarantee that the recognition rate can be ensured when using the determined phrase for recognition. I have.

【００１１】本発明の目的は以上の問題点を解決し、入
力音声文に未学習部分があったり、音声認識が一部誤り
を起こしても、必ず目的言語への変換を可能とし、さら
に、変換に必要なフレーズ辞書作成やフレーズ間規則
を、なるべく人手をかけずに自動的に作成できる言語変
換装置を提供することにある。An object of the present invention is to solve the above-mentioned problems and to make it possible to always convert an input speech sentence to a target language even if there is an unlearned part or a partial error in speech recognition. An object of the present invention is to provide a language conversion device that can automatically create a phrase dictionary and an inter-phrase rule required for conversion with as little effort as possible.

【００１２】[0012]

【課題を解決するための手段】上述した課題を解決する
ために、第１の本発明（請求項１に対応）は、音声また
はテキストで入力される言語変換の対象となる文（以
下、原言語文と呼ぶ、これに対応して言語変換された文
を目的言語文と呼ぶ）と、目的言語文とが対になった学
習用データベース（以下、対訳コーパスと呼ぶ）から単
語または単語列に対する文法的または意味的制約規則を
学習して得られた言語規則を格納する格納手段と、格納
された前記言語規則を用いて入力音声の音声認識を行
い、言語変換の対象となる文で認識結果を出力する音声
認識部と、前記音声認識部で用いられたのと同じ前記言
語規則を用いて言語変換の対象となる文を言語変換され
た文に変換する言語変換部とを備えたことを特徴とする
言語変換装置である。In order to solve the above-mentioned problems, a first aspect of the present invention (corresponding to claim 1) is to provide a sentence (hereinafter referred to as an original) to be subjected to language conversion inputted by voice or text. A sentence, which is called a language sentence, a sentence that is correspondingly converted in language is called a target language sentence) and a learning database (hereinafter, referred to as a bilingual corpus) in which the target language sentence is paired. Storage means for storing language rules obtained by learning grammatical or semantic constraint rules; and performing speech recognition of input speech using the stored language rules, and performing recognition on a sentence to be subjected to language conversion. And a language conversion unit that converts a sentence to be subjected to language conversion into a language-converted sentence using the same language rule as used in the voice recognition unit. It is a language conversion device characterized by the following.

【００１３】また、第２の本発明（請求項２に対応）
は、前記言語規則は、言語変換の対象となる文と、変換
された文とが共に意味的なまとまりを形成する部分（体
型非依存フレーズと呼ぶ）に分割し、前記体型非依存フ
レーズ内の言語規則と前記体型非依存フレーズ間の言語
規則とを分けて規則化されて作られるものであることを
特徴とする第１の本発明に記載の言語変換装置である。Further, a second aspect of the present invention (corresponding to claim 2)
Is that the language rule divides a sentence to be subjected to language conversion and a converted sentence into a part (referred to as a form-independent phrase) that forms a semantic unit, and The language converter according to the first aspect of the present invention is characterized in that the language rule and the language rule between the form-independent phrases are separately ruled and made.

【００１４】また、第３の本発明（請求項３に対応）
は、前記言語規則は、前記体型非依存フレーズ内の文法
的または意味的規則と前記体型非依存フレーズ間の共起
または連接関係を規則化されて作られるものであること
を特徴とする第２の発明に記載の言語変換装置である。A third aspect of the present invention (corresponding to claim 3)
Wherein the language rule is created by regularizing co-occurrence or connection between the grammatical or semantic rule in the type-independent phrase and the type-independent phrase. A language conversion device according to the invention.

【００１５】また、第４の本発明（請求項４に対応）
は、前記言語変換部で用いられたのと同じ言語規則を用
いて前記言語変換された文を音声合成する音声合成部と
を備えたことを特徴とする第１の発明に記載の言語変換
装置である。A fourth aspect of the present invention (corresponding to claim 4)
The speech conversion device according to the first invention, further comprising: a speech synthesis unit for speech-synthesizing the language-converted sentence using the same language rules as those used in the language conversion unit. It is.

【００１６】また、第５の本発明（請求項５に対応）
は、前記言語規則のうち、目的言語文が同じである言語
規則を同じカテゴリーとしてまとめられた言語規則群に
対して、前記言語規則群に含まれる言語規則の言語変換
の対象となる文の音響的規則間距離を算出する規則間距
離算出部と、音声認識の認識レベルを上げるために、算
出された前記距離が近い言語規則どうしをマージするこ
とで前記規則群の最適化を行う最適規則作成部と、を備
えたことを特徴とする第１〜４の発明のいずれかに記載
の言語変換装置である。A fifth aspect of the present invention (corresponding to claim 5)
Is a language rule group in which language rules having the same target language sentence are grouped into the same category among the language rules. Rule calculating unit for calculating a distance between rules, and creating an optimum rule for optimizing the rule group by merging language rules having the calculated distances close to each other to increase the recognition level of speech recognition A language conversion device according to any one of the first to fourth inventions, comprising:

【００１７】また、第６の本発明（請求項６に対応）
は、対訳コーパスと、その対訳コーパス中の原言語文及
び目的言語文における単語または品詞の隣接頻度を算出
し、頻度の高い単語及び品詞を連結して意味的なまとま
りを形成する部分文（以下、フレーズと呼ぶ）を抽出す
るフレーズ抽出部と、前記フレーズ抽出部で抽出された
前記フレーズで、原言語及び目的言語のフレーズの関係
を調べることで対応するフレーズを決定するフレーズ決
定部と、決定された前記対応するフレーズを保管してお
くフレーズ辞書とを備え、前記フレーズ辞書は、言語変
換を行う際に用いられ、その言語変換は、原言語文が入
力された際にこの入力文と前記フレーズ辞書に格納され
ている前記対応するフレーズとを照合することで言語ま
たは文体変換を行うものであるたことを特徴とする言語
変換規則作成装置である。The sixth invention (corresponding to claim 6)
Calculates the adjacency frequency of words or parts of speech in the bilingual corpus and the source language sentence and target language sentence in the bilingual corpus, and connects the frequently-used words and parts of speech to form a semantic unit (hereinafter referred to as a semantic unit). And a phrase extracting unit for extracting corresponding phrases from the phrase extracted by the phrase extracting unit, and determining a corresponding phrase by examining a relationship between phrases in a source language and a target language. And a phrase dictionary for storing the corresponding corresponding phrase, the phrase dictionary is used when performing language conversion, and the language conversion is performed when an input sentence is input when a source language sentence is input. A language conversion rule creating apparatus for performing language or stylistic conversion by collating with the corresponding phrase stored in a phrase dictionary. A.

【００１８】また、第７の本発明（請求項７に対応）
は、前記フレーズ決定部は、原言語及び目的言語のフレ
ーズの共起関係を調べることで対応するフレーズを決定
することを特徴とする第６の本発明に記載の言語変換規
則作成装置である。The seventh invention (corresponding to claim 7)
The phrase conversion unit according to the sixth aspect of the present invention, wherein the phrase determination unit determines a corresponding phrase by examining a co-occurrence relationship between phrases in a source language and a target language.

【００１９】また、第８の本発明（請求項８に対応）
は、前記対訳コーパスの原言語文を単語列に変換する形
態素解析部と、その形態素解析部の結果を利用して原言
語文及び目的言語文の一部または全部の単語を品詞名で
置き換えた対訳コーパスを作成する品詞化部を更に有
し、前記フレーズ抽出部は、前記品詞化部で品詞化され
た対訳コーパスからフレーズを抽出することを特徴とす
る第６の本発明に記載の言語変換規則作成装置である。An eighth aspect of the present invention (corresponding to claim 8)
Is a morphological analyzer for converting the source language sentence of the bilingual corpus into a word string, and using the result of the morphological analyzer, replacing some or all of the words in the source language sentence and the target language sentence with a part of speech name. The language conversion apparatus according to a sixth aspect of the present invention, further comprising a part-of-speech conversion unit that creates a bilingual corpus, wherein the phrase extraction unit extracts a phrase from the bilingual corpus that is part-of-speech converted by the part-of-speech conversion unit. It is a rule making device.

【００２０】また、第９の本発明（請求項９に対応）
は、原言語と目的言語との対訳単語辞書を有し、前記品
詞化部は、前記対訳単語辞書で対応付けされている単語
でかつ原言語が内容語である単語を品詞化することを特
徴とする第８の発明に記載の言語変換規則作成装置であ
る。A ninth aspect of the present invention (corresponding to claim 9)
Has a bilingual word dictionary of a source language and a target language, and the part-of-speech unit converts the words associated with the bilingual word dictionary and words whose source language is a content word into parts of speech. A language conversion rule creating device according to an eighth aspect of the present invention.

【００２１】また、第１０の本発明（請求項１０に対
応）は、前記対訳コーパスの原言語文を単語列に変換す
る形態素解析部と、その形態素解析部の結果を利用し
て、意味的類似した単語を同クラスと見なして単語を分
類し、同クラス内の単語に同コードを与えている表（以
下、分類語彙表という）に基づき、原言語文及び目的言
語文の一部または全部の単語を前記分類語彙表のコード
に置き換えた対訳コーパスを作成する意味コード化部を
更に有し、前記フレーズ抽出部は、前記意味コード化部
でコードに置き換えられた対訳コーパスからフレーズを
抽出することを特徴とする請求項６記載の言語変換規則
作成装置である。According to a tenth aspect of the present invention (corresponding to claim 10), a morphological analyzer for converting a source language sentence of the bilingual corpus into a word string, and a semantic analysis using the result of the morphological analyzer. Based on a table in which similar words are regarded as being in the same class and the words are classified into the same class, and words in the same class are given the same code (hereinafter referred to as “classified vocabulary table”), part or all of the source language sentence and the target language sentence Further comprising a semantic encoding unit that creates a bilingual corpus in which the words of the words are replaced by the codes of the classification vocabulary table, and the phrase extracting unit extracts a phrase from the bilingual corpus replaced by the code in the semantic encoding unit. 7. The language conversion rule creating device according to claim 6, wherein:

【００２２】また、第１１の本発明（請求項１１に対
応）は、原言語と目的言語との対訳単語辞書を有し、前
記意味コード化部は、前記対訳単語辞書で対応つけられ
ている単語のみ意味コード化することを特徴とする第１
０の発明に記載の言語変換規則作成装置である。The eleventh invention (corresponding to claim 11) has a bilingual word dictionary of a source language and a target language, and the semantic coding unit is associated with the bilingual word dictionary. A first feature in which only words are converted into meaning codes.
0 is a language conversion rule creation device according to the invention.

【００２３】また、第１２の本発明（請求項１２に対
応）は、前記フレーズ抽出部は、予め優先的にフレーズ
とみなしたい単語または品詞列を原言語と目的言語を対
にして保管しておくフレーズ定義表をも利用して、フレ
ーズを抽出することを特徴とする第６の本発明に記載の
言語変換規則作成装置である。According to a twelfth aspect of the present invention (corresponding to claim 12), the phrase extracting unit stores in advance a word or a part-of-speech sequence that is to be preferentially regarded as a phrase by pairing the source language and the target language. A sixth aspect of the present invention is a language conversion rule creating apparatus according to the sixth aspect, wherein a phrase is extracted by also using a phrase definition table.

【００２４】また、第１３の本発明（請求項１３に対
応）は、コーパスのパープレキシティー（文複雑度）を
算出する文複雑度算出部を有し、前記フレーズ抽出部
は、単語または単語クラスの隣接頻度と前記文複雑度を
用いてフレーズを抽出することを特徴とする第６〜１３
の発明のいずれかに記載の言語変換規則作成装置であ
る。According to a thirteenth aspect of the present invention (corresponding to claim 13), a sentence complexity calculator for calculating perplexity (sentence complexity) of a corpus is provided. Nos. 6 to 13 characterized in that phrases are extracted using the class adjacency frequency and the sentence complexity.
A language conversion rule creation device according to any one of the inventions.

【００２５】また、第１４の本発明（請求項１４に対
応）は、第１〜１３の発明のいずれかに記載の言語変換
装置または言語変換規則作成装置の各構成要素の全部ま
たは一部の機能をコンピュータに実行するためのプログ
ラムを格納していることを特徴とするプログラム記録媒
体である。A fourteenth aspect of the present invention (corresponding to claim 14) relates to a language conversion device or a language conversion rule creating device according to any one of the first to thirteenth aspects. A program recording medium storing a program for executing a function in a computer.

【００２６】[0026]

【発明の実施の形態】以下に、本発明の実施の形態につ
いて図面を参照して説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００２７】（第１の実施の形態）まず第１の実施の形
態について説明する。(First Embodiment) First, a first embodiment will be described.

【００２８】第１の実施の形態では、言語変換装置の一
例として、従来例同様、異なる言語間の変換を行う通訳
装置を用いて説明する。図１は本実施の形態の通訳装置
のブロック図である。In the first embodiment, as an example of the language conversion device, an explanation will be given using an interpreter device for performing conversion between different languages as in the conventional example. FIG. 1 is a block diagram of the interpreter of the present embodiment.

【００２９】本実施の形態の通訳装置は、まず通訳する
前に、言語解析部２で予め対訳コーパスや対訳単語辞書
などを有している学習用データベース１から発声文の原
言語及び目的言語の言語規則を学習する。言語規則の学
習例を図３に示す。In the interpreting apparatus of this embodiment, first, before interpreting, the language analyzing unit 2 uses the learning database 1 having a bilingual corpus, a bilingual word dictionary, etc., in advance, for the source language of the utterance sentence and the target language. Learn language rules. FIG. 3 shows an example of learning a language rule.

【００３０】言語規則作成部２では、たとえば、品詞タ
グが付与されている対訳コーパスを用いて原言語文及び
目的言語文の内容語を品詞化する。さらに、原言語にお
けるフレーズと目的言語におけるフレーズとが一まとま
りとして対応している場合に、その一まとまりを体型非
依存フレーズとしてその境界を区切る。すなわち、原言
語における体型依存フレーズと目的言語における体型依
存フレーズとが一まとまりとして対応している場合に、
その一まとまりを体型非依存フレーズの境界とする。原
言語の体型依存フレーズに対応する目的言語の体型依存
フレーズがひとまとまりとして対応しない場合には、対
応する部分が一まとまりとして存在するまで体型依存フ
レーズの連結やフレーズ境界の修正を行い体型非依存フ
レーズとする。図３において、対訳コーパスの文「今
晩、部屋の予約をしたいんですが」「I'd like to room
-reservation tonight」２６が、内容語の品詞化３０
で、「＜普通名詞＞｜＜普通名詞＞の＜サ変名詞＞｜を
したいんですが」２７のように品詞化されている。また
「＜普通名詞＞」、「＜普通名詞＞の＜サ変名詞＞」、
「をしたいんですが」のように体型非依存フレーズとし
て境界を区切られている。次に各体型非依存フレーズに
おいて、品詞と単語の混合列、および品詞で表されてい
る部分の単語名、さらに各体型非依存フレーズの対訳コ
ーパスにおける出現頻度を体型非依存フレーズ内規則３
として記述する。対訳コーパスの全文に対して上記規則
を記述する。図３においては、上述した内容は、フレー
ズ内規則の記述３１により３に記述される。図３の３に
おいて、規則１は、日本語が「＜普通名詞＞」であり、
英語が「＜noun＞」である。品詞の内容としては、日本
語が「今晩」、英語が「tonight」となっている。対訳
コーパスに現れていれば、「明日」、「tomorrow」等も
規則１に記述されるものである。The language rule creator 2 converts the content words of the source language sentence and the target language sentence into parts of speech using, for example, a bilingual corpus to which a part of speech tag is attached. Further, when the phrase in the source language and the phrase in the target language correspond as a group, the group is delimited as a body-type independent phrase. That is, if the body-dependent phrase in the source language and the body-dependent phrase in the target language correspond as a unit,
The set is defined as the boundary of the body-type independent phrase. If the target language's body-dependent phrases corresponding to the source language's body-dependent phrases do not correspond as a group, the body-type dependent phrases are linked and the phrase boundaries are corrected until the corresponding parts exist as a unit, and the body shape is independent. A phrase. In Figure 3, the text of the bilingual corpus "I want to reserve a room tonight""I'd like to room
-reservation tonight "26
In this case, the part-of-speech is expressed as "<normal noun> | <normal noun><sa-variantnoun> | "<Ordinary noun>", "<ordinary noun> of <ordinary noun>",
The boundaries are separated as a body-independent phrase such as "I want to do it." Next, for each type-independent phrase, the mixed sequence of part-of-speech and words, the word name of the part represented by the part-of-speech, and the frequency of appearance of each type-independent phrase in the bilingual corpus are determined according to rule 3 in the type-independent phrase.
Described as Write the above rules for the entire text of the bilingual corpus. In FIG. 3, the above-mentioned contents are described in 3 by the description 31 of the rule in the phrase. In 3 of FIG. 3, rule 1 is that Japanese is “<ordinary noun>”
English is "<noun>". The parts of speech are "tonight" for Japanese and "tonight" for English. If it appears in the bilingual corpus, "tomorrow", "tomorrow", etc. are also described in rule 1.

【００３１】さらに、各フレーズ内規則の共起関係を体
型非依存フレーズ間規則４として記述する。たとえば、
共起関係をフレーズbi-gramとして規則化する場合は、
各体型非依存フレーズの隣接頻度を記述しておく。Further, the co-occurrence relation of the rules within each phrase is described as a rule 4 between body type independent phrases. For example,
When regularizing the co-occurrence relationship as a phrase bi-gram,
The adjacent frequency of each type-independent phrase is described.

【００３２】上述した内容は、図３において、フレーズ
間規則の記述３２が、２８を記述することを意味する。
２８がフレーズbi-gramの例である。規則番号対が例え
ば「（規則１）（規則２）」となっており、その出現頻
度が４となっている。これは対訳コーパスから学習する
過程で、規則１と規則２が文中にならんで出現する回数
が４回あったことを意味する。規則２と規則３が文中で
ならんで出現する回数は２８の例では６回あったことに
なる。The above description means that the description 32 of the rule between phrases describes 28 in FIG.
28 is an example of the phrase bi-gram. The rule number pair is, for example, “(rule 1) (rule 2)”, and its appearance frequency is 4. This means that in the process of learning from the bilingual corpus, rules 1 and 2 appear four times in the sentence. In the example of Rule 28, Rule 2 and Rule 3 appear six times in a sentence.

【００３３】さらに、各体型非依存フレーズ間の構文構
造も体型非依存フレーズ間規則４に記述しておく。これ
は図３において、フレーズ間規則の記述３２が２９を記
述することである。つまりフレーズ間規則の記述３２
が、日本語と英語で体型非依存フレーズが現れる順序が
違うので、順序関係の対応をつけるために２５で言語構
造をツリー状にして対応をとっている。Further, the syntactic structure between each type-independent phrase is also described in rule 4 between type-independent phrases. This means that the description 32 of the rule between phrases describes 29 in FIG. That is, the description 32 of the rules between phrases
However, since the order in which the body-type-independent phrases appear in Japanese and English is different, the language structure is made into a tree shape at 25 in order to establish a correspondence in order relation.

【００３４】文生成規則５には、上記言語規則３および
４で不足している目的言語規則を記述しておく。たとえ
ば、日英翻訳の場合には、冠詞および不定冠詞規則や三
人称単数化規則などがその内容として記述されている。The sentence generation rule 5 describes a target language rule that is lacking in the language rules 3 and 4. For example, in the case of Japanese-English translation, articles and indefinite article rules, third person singularization rules, and the like are described as the contents.

【００３５】なお、フレーズ内言語規則３及び／または
フレーズ間言語規則４が本発明の格納手段の例である。The intra-phrase language rule 3 and / or the inter-phrase language rule 4 are examples of storage means of the present invention.

【００３６】通訳の際には、まず発声された原言語音声
はマイクロホン６から入力され音声認識部７に入力され
る。音声認識部では、たとえば、体型非依存フレーズ内
言語規則３として記述されている品詞および単語の混合
列と体型非依存フレーズ間言語規則４としてのフレーズ
bi-gramとにより、時系列に沿って順次認識単語候補が
予測される。予め学習されている音響モデル８と入力音
声との距離値をベースとした音響スコアとフレーズbi-g
ramによる言語スコアとの和を認識スコアとし、Nbest-s
earchにより認識候補である連続単語列が決定される。
このように決定された連続単語列は言語変換部９に入力
される。フレーズ内言語規則３、フレーズ間言語規則４
では、予め原言語と目的言語とが対応しながら規則化さ
れている。言語変換部９では、上記規則を用いて、本連
続単語列は目的言語のフレーズ列に変換され出力され
る。この際、入力された原言語フレーズ列が、既に学習
されたフレーズ間の構文構造に当てはまる場合には、目
的言語のフレーズ列は構文構造に沿って修正された後出
力される。At the time of interpreting, first, the uttered source language voice is input from the microphone 6 and input to the voice recognition unit 7. In the speech recognition unit, for example, a mixed sequence of parts of speech and words described as a body-independent phrase language rule 3 and a phrase as a body-independent phrase language rule 4
With the bi-gram, recognition word candidates are predicted sequentially in time series. An acoustic score and a phrase bi-g based on the distance value between the previously learned acoustic model 8 and the input speech
The sum of the ram and the language score is used as the recognition score, and Nbest-s
A continuous word string that is a recognition candidate is determined by earch.
The continuous word string determined in this way is input to the language conversion unit 9. Intra-phrase language rule 3, Inter-phrase language rule 4
In, the source language and the target language are ruled in advance while corresponding to each other. The language conversion unit 9 converts the continuous word string into a phrase string in the target language using the above rules and outputs the converted word string. At this time, if the input source language phrase sequence applies to the syntax structure between the phrases that have already been learned, the target language phrase sequence is output after being modified along the syntax structure.

【００３７】出力された目的言語文は出力文生成１０に
入力され、文法的な不自然さを修正する。例えば、定冠
詞や不定冠詞の付与、代名詞、動詞における３人称化や
複数化や過去形化などの最適化などが行われる。修正後
の目的言語翻訳結果文はたとえばテキストとして出力さ
れる。The output target language sentence is input to the output sentence generator 10 and corrects grammatical unnaturalness. For example, the addition of definite or indefinite articles, optimization of pronouns and verbs such as third personization, pluralization, and past tense are performed. The corrected target language translation result sentence is output, for example, as text.

【００３８】以上の実施の形態では、音声認識で使用す
る言語規則を学習する際に、原言語と目的言語とがとも
に意味をもつ一かたまりとなった部分を単位として規則
化を行い、この規則の制約に基づいて認識を行うことに
より、入力音声文に未学習部分があったり、音声認識が
一部誤りを起こしても、全文に対する翻訳結果が全く出
力されないという問題点を解決し、正しく認識された部
分については、適切な翻訳結果を出力できる言語変換装
置を実現できる。In the above embodiment, when learning the language rules used in speech recognition, the source language and the target language are ruled in units of a unit having both meanings. By performing recognition based on the constraints of the above, it is possible to solve the problem that even if there is an unlearned part in the input speech sentence or a part of the speech recognition makes an error, the translation result for the entire sentence is not output at all and correct recognition A language conversion device that can output an appropriate translation result can be realized for the part that has been set.

【００３９】なお、本実施の形態では、言語変換装置の
１つの例として通訳装置を例にあげて説明したが、これ
は他の言語変換装置、例えばくだけた発話文を書き言葉
のようなテキスト文に変換する言語変換装置において
も、同様に使用することが出来る。In this embodiment, an interpreter has been described as an example of a language conversion apparatus. However, this is not the case with other language conversion apparatuses, for example, in which a spoken utterance is converted into a text sentence such as a written word. The same can be used in a language conversion device for converting to.

【００４０】（第２の実施の形態）次に第２の実施の形
態について図面を参照しながら説明する。本実施の形態
でも、第１の実施の形態同様、通訳装置を用いて説明す
る。図２は本実施の形態の通訳装置のブロック図であ
る。(Second Embodiment) Next, a second embodiment will be described with reference to the drawings. In this embodiment, as in the first embodiment, description will be made using an interpreter. FIG. 2 is a block diagram of the interpreter of the present embodiment.

【００４１】本実施の形態の通訳装置は、まず通訳する
前に、予め言語規則作成部１１で対訳コーパスや対訳単
語辞書を有している学習データベース１から発声文の原
言語及び目的言語のフレーズ内言語規則１２、フレーズ
間言語規則１３を学習する。学習される規則は、第１の
実施の形態における言語規則の学習と同様である。次に
学習された言語規則の最適化を行う。最適化の例を図４
に示す。In the interpreting apparatus of this embodiment, first, before interpreting, the language rule creating section 11 reads the phrases of the source language and target language of the uttered sentence from the learning database 1 having a bilingual corpus and a bilingual word dictionary in advance. The internal language rules 12 and the inter-phrase language rules 13 are learned. The rules to be learned are the same as those for the language rules in the first embodiment. Next, the learned language rules are optimized. Figure 4 shows an example of optimization
Shown in

【００４２】まず、学習された体型非依存フレーズにお
いて、目的言語フレーズが同じであるフレーズを同カテ
ゴリーとしてまとめる。図４において、１２は言語規則
であり、規則間距離算出１４で、３３のようにカテゴリ
ーとしてまとめる。規則１、規則２、規則３は目的言語
規則が「I'd like to」と同じであるので、同カテゴリ
ーになる。また、規則４は、目的言語規則が「please」
となっているので、規則１、規則２、規則３とは別のカ
テゴリーに分類される。次に同カテゴリーに含まれる原
言語フレーズ間の音響的距離を規則間距離算出部１４で
算出する。図４において、１５が原言語フレーズ間の音
響的距離を算出した例である。１５では、規則１と規則
２の距離は７となっており、規則１と規則３の距離は２
となっている。First, in the learned body type independent phrases, phrases having the same target language phrase are put together in the same category. In FIG. 4, reference numeral 12 denotes a language rule. Rules 1, 2, and 3 are in the same category because the target language rule is the same as "I'd like to". Rule 4 is that the target language rule is “please”
Therefore, the rule is classified into a different category from the rules 1, 2, and 3. Next, the acoustic distance between the source language phrases included in the same category is calculated by the rule distance calculating unit 14. In FIG. 4, reference numeral 15 denotes an example of calculating the acoustic distance between source language phrases. In Rule 15, the distance between Rule 1 and Rule 2 is 7, and the distance between Rule 1 and Rule 3 is 2
It has become.

【００４３】同カテゴリー規則における原言語フレーズ
の音響的距離は次のように算出する。まず、カテゴリー
内の全ての目的言語フレーズにおける混合列の品詞部分
に、同品詞であれば同じ単語を当てはめ、全ての混合列
を単語列に変換する。次に各単語列の発音が類似してい
るかを調べるために、各単語列の文字列の違いに対する
距離を、（数１）を用いて算出し、規則間距離テーブル
１５に記述する。ｎ個の単語からなるフレーズＸ＝[ x
1,x2,x3,...xn]（ｘは各単語）とｍ個の単語からなるフ
レーズＹ＝[ y1,y2,y3,..ym]との間の距離をD(Xn,Ym)と
して、The acoustic distance of the source language phrase in the category rule is calculated as follows. First, the same word is applied to the part-of-speech part of the mixed sequence in all target language phrases in the category if the part-of-speech is the same, and all the mixed sequences are converted into a word sequence. Next, in order to check whether the pronunciation of each word string is similar, the distance to the difference in the character string of each word string is calculated using (Equation 1) and described in the inter-rule distance table 15. Phrase X = [x composed of n words
1, x2, x3, ... xn] (x is each word) and the phrase Y = [y1, y2, y3, .. ym] consisting of m words is D (Xn, Ym) As

【００４４】[0044]

【数１】 (Equation 1)

【００４５】次に最適規則作成部１６で、距離値が一定
値以内であるフレーズの中で、最も出現数の多い規則の
みを残し、他の規則を消去する。たとえば、図４の例で
は、上記一定値を２とした場合、３３において、同カテ
ゴリーである規則１と規則３との規則間距離は２であ
り、上記一定値２以下である。従って、この２つの規則
の出現頻度の多い規則１を採用し、規則３を規則から削
除する。それに合わせて出現数も書き換える。Next, in the optimum rule creating section 16, only the rule having the largest number of occurrences of the phrases whose distance value is within a certain value is left, and the other rules are deleted. For example, in the example of FIG. 4, when the above-mentioned constant value is set to 2, in 33, the distance between rules 1 and 3 in the same category is 2 and is equal to or less than the above-mentioned constant value 2. Therefore, the rule 1 having the higher appearance frequency of the two rules is adopted, and the rule 3 is deleted from the rules. The number of appearances is rewritten accordingly.

【００４６】フレーズ内言語規則１２に書かれている全
ての規則に対して上記最適規則化を行った後、消去され
なかった言語規則のみをフレーズ内最適言語規則１７と
して保管する。最適化された規則に従い、フレーズ間規
則１３の中の除去された規則を採用した規則で書き換
え、合わせて出現数も修正する。図４において、最適規
則作成１６により規則３は削除され、規則１として１本
化される。それにあわせて、規則１の出現数は、１７の
ように削除された規則３との和である１５となってい
る。After the above-mentioned optimal rule is applied to all rules written in the intra-phrase language rules 12, only the language rules that have not been deleted are stored as the intra-phrase optimal language rules 17. In accordance with the optimized rule, the rule between the phrases 13 is rewritten with the rule that has been removed, and the number of appearances is also corrected. In FIG. 4, rule 3 is deleted by the optimum rule creation 16, and rule 1 is unified as rule 1. Accordingly, the number of appearances of Rule 1 is 15, which is the sum of Rule 3 and Rule 3, which has been deleted, such as 17.

【００４７】文生成規則５には、コーパスから作成され
た上記言語規則で不足している目的言語規則を記述して
おく。たとえば、日英翻訳の場合には、冠詞および不定
冠詞規則や三人称単数化規則などがその内容として記述
されている。The sentence generation rule 5 describes a target language rule that is lacking in the language rule created from the corpus. For example, in the case of Japanese-English translation, articles and indefinite article rules, third person singularization rules, and the like are described as the contents.

【００４８】通訳の際には、まず発声された原言語音声
はマイクロホン６から入力され音声認識部７に入力され
る。音声認識部では、たとえば、体型非依存フレーズ内
言語規則１７として記述されている品詞および列単語の
混合列と体型非依存フレーズ間言語規則１８としてのフ
レーズ隣接頻度とにより、時系列に沿って順次認識単語
候補が予測される。予め学習されている音響モデル８と
入力音声との距離値をベースとした音響スコアとフレー
ズbi-gramによる言語スコアとの和を認識スコアとし、N
best-searchにより認識候補である連続単語列が決定さ
れる。このように決定された連続単語列は言語変換部９
に入力される。言語規則１７、１８では、予め原言語と
目的言語とが対応しながら規則化されている。言語変換
部９では、上記規則を用いて、本連続単語列は目的言語
のフレーズ列に変換され出力される。この際、入力され
た原言語フレーズ列が、既に学習されたフレーズ間の構
文構造に当てはまる場合には、目的語のフレーズ列は構
文構造に沿って修正された後出力される。At the time of interpretation, first, the uttered source language voice is input from the microphone 6 and input to the voice recognition unit 7. In the speech recognition unit, for example, a mixed sequence of part-of-speech and column words described as body-independent phrase-independent language rules 17 and phrase adjacency frequency as body-independent phrase-to-phrase interlanguage rules 18 are sequentially used in time series. Recognized word candidates are predicted. The sum of the acoustic score based on the distance value between the previously learned acoustic model 8 and the input speech and the linguistic score based on the phrase bi-gram is used as the recognition score.
A continuous word string that is a recognition candidate is determined by best-search. The continuous word string determined in this way is converted to a language conversion unit 9.
Is input to In the language rules 17 and 18, the source language and the target language are ruled in advance while corresponding to each other. The language conversion unit 9 converts the continuous word string into a phrase string in the target language using the above rules and outputs the converted word string. At this time, if the input source language phrase sequence applies to the syntax structure between the phrases that have already been learned, the object phrase sequence is output after being modified along the syntax structure.

【００４９】出力された目的言語文は出力文生成部１０
に入力され、文法的な不自然さを修正する。たとえば、
定冠詞や不定冠詞の付与、代名詞、動詞における３人称
化や複数化や過去形化などの最適化などが行われる。修
正後の目的言語翻訳結果文はたとえばテキストとして出
力される。The output target language sentence is output from the output sentence generator 10.
To correct grammatical unnaturalness. For example,
The addition of definite articles and indefinite articles, optimization of pronouns and verbs such as third personization, pluralization, and past tense are performed. The corrected target language translation result sentence is output, for example, as text.

【００５０】以上の実施の形態では、音声認識で使用す
る言語規則を学習する際に、原言語と目的言語とがとも
に意味をもつ一かたまりとなった部分を単位として規則
化を行った後、規則化されている目的言語部分が同じで
ある原言語フレーズが音響的に類似している場合には、
類似している中から最も出現頻度の高い規則のみを採用
し残りの規則を消去することにより、なるべく言語規則
の性能を落とさずに、体型非依存フレーズを単位にする
ことによる規則数の増加を押さえ、従って高性能な認識
及び言語変換を可能にする通訳装置を実現するものであ
る。In the above embodiment, when learning the language rules used in the speech recognition, the source language and the target language are ruled in units of a unit having both meanings. If the source language phrases that have the same regularized target language part are acoustically similar,
By adopting only the most frequently occurring rules from among similarities and eliminating the remaining rules, it is possible to reduce the number of rules by using a body-independent phrase as a unit without reducing the performance of language rules as much as possible. An object of the present invention is to realize an interpreting device that enables high-performance recognition and language conversion.

【００５１】なお、本実施の形態では、言語変換装置の
１つの例として通訳装置を例にあげて説明したが、これ
は他の言語変換装置、例えばくだけた発話文を書き言葉
のようなテキスト文に変換する言語変換装置において
も、同様に使用することが出来る。In the present embodiment, an interpreter has been described as an example of a language conversion apparatus. However, this is not the case with other language conversion apparatuses, for example, in which a spoken utterance is converted into a text sentence such as a written word. The same can be used in a language conversion device for converting to.

【００５２】（実施の形態３）本実施の形態では、言語
変換装置の一例として、従来例同様、異なる言語間の変
換を行う通訳装置を用いて説明する。図５は本実施の形
態の通訳装置のブロック図である。(Embodiment 3) In the present embodiment, as an example of a language conversion device, an interpreter that performs conversion between different languages will be described as in the conventional example. FIG. 5 is a block diagram of the interpreter of the present embodiment.

【００５３】なお、本実施の形態のうち、対訳コーパス
１０１、内容語定義表１０３、対訳単語辞書１０７、形
態素解析部１０２、品詞化部１０４、フレーズ抽出部１
０５、フレーズ決定部１０６は、対訳フレーズ間規則表
１０８、対訳フレーズ辞書１０９は、本発明の言語変換
規則作成装置の例である。また、本実施の形態の対訳フ
レーズ辞書１０９は本発明の請求項６記載のフレーズ辞
書の例である。In this embodiment, the bilingual corpus 101, the content word definition table 103, the bilingual word dictionary 107, the morphological analysis unit 102, the part-of-speech unit 104, the phrase extraction unit 1
05, the phrase determination unit 106, the bilingual phrase rule table 108, and the bilingual phrase dictionary 109 are examples of the language conversion rule creation device of the present invention. Further, the bilingual phrase dictionary 109 of the present embodiment is an example of the phrase dictionary according to claim 6 of the present invention.

【００５４】本実施の形態の通訳装置は、まず通訳する
前に、形態素解析部１０２で対訳コーパス１０１内の原
言語文の形態素解析を行うことで原言語文のみ品詞タグ
が付与された対訳コーパスを作成する。たとえば、図６
の１２０の「部屋の予約をお願いしたいんですが」の発
声文例では、１２１のような品詞タグが原言語文に与え
られる。次に、品詞化部１０４で、上記コーパスの品詞
タグ付き原言語文において、一部の単語名を品詞名に置
きかえた品詞化対訳コーパスを作成する。この際に品詞
名に変換される単語は以下の条件を満たすものとする。（１）内容語テーブルに記載の品詞に対応する単語であ
る。（２）対訳単語辞書に登録されている単語で、対訳単語
辞書の目的言語訳に相当する単語が、コーパス内の相当
する目的言語対訳文に存在する。In the translator according to the present embodiment, first, before interpreting, the morphological analyzer 102 performs a morphological analysis of the source language sentence in the bilingual corpus 101, so that only the source language sentence is given a part-of-speech tag. Create For example, FIG.
In the example of the utterance sentence of “I would like to request a reservation for a room” of 120, a part of speech tag like 121 is given to the source language sentence. Next, the part-of-speech conversion unit 104 creates a part-of-speech bilingual corpus in which part of the word names are replaced with part-of-speech names in the source language sentence with the part of speech tag of the corpus. At this time, the word converted to the part of speech name satisfies the following conditions. (1) A word corresponding to the part of speech described in the content word table. (2) A word registered in the bilingual word dictionary and corresponding to the target language translation of the bilingual word dictionary exists in the corresponding target language bilingual sentence in the corpus.

【００５５】図６の内容語定義表１０３の例では、内容
語テーブルに記載されている一般名詞、さ変名詞、動詞
の中で、対訳単語辞書１０７に登録されている「部屋」
と「予約」のみが品詞化され、１２２のようにこれらの
単語を品詞名に置き換えたコーパスが作成される。さら
に、相当する目的言語対訳文内の単語名も１２３のよう
に日本語品詞名に置き換える。In the example of the content word definition table 103 shown in FIG. 6, among the common nouns, the inflectional nouns, and the verbs described in the content word table, “room” registered in the bilingual word dictionary 107
And only “reservation” are made into parts of speech, and a corpus is created as shown in 122 by replacing these words with parts of speech. Further, the word name in the corresponding target language bilingual sentence is also replaced with the Japanese part of speech name as 123.

【００５６】次に、上記の一部の内容語が品詞名に置き
換えられたコーパスについて、フレーズ抽出部１０５
は、原言語文、目的言語文別々に、各単語または品詞の
２連鎖出現頻度（以後 bi-gramと呼ぶ）を算出する。算
出式を（数２）に示す。Next, with respect to the corpus in which some of the above content words have been replaced with part of speech names, the phrase extraction unit 105
Calculates the two-chain appearance frequency (hereinafter referred to as bi-gram) of each word or part of speech for the source language sentence and the target language sentence separately. The calculation formula is shown in (Equation 2).

【００５７】[0057]

【数２】 (Equation 2)

【００５８】コーパス内の全原言語文及び目的言語文を
対象にbi-gramを算出した後、フレーズ抽出部５で、最
も出現頻度の高かった２単語または品詞対を１つの単語
とみなして連結し、再度bi-gramを算出する。これによ
り、たとえば頻度高く隣接する「お」「願い」、「願
い」「し」、「し」「ます」などの単語対が連結され、
「お願いします」というフレーズ候補が形成される。目
的言語では「I'd」「like」、「like」「to」の単語対
が連結される。全原言語文及び目的言語文別々に、以上
の連結とbi-gram算出とを、bi-gramの値が全て一定閾値
を超えなくなるまで繰り返す。そして、連結された単語
も含めた個々の単語をフレーズ候補として抽出する。After calculating bi-grams for all source language sentences and target language sentences in the corpus, the phrase extracting unit 5 regards the two most frequently occurring words or parts of speech as one word and connects them. Then, the bi-gram is calculated again. As a result, for example, frequently-adjacent word pairs such as “O”, “Wish”, “Wish”, “Shi”, “Shi” and “Masu” are connected,
A phrase candidate "Please, please" is formed. In the target language, word pairs "I'd", "like", "like", and "to" are connected. The above connection and bi-gram calculation are repeated for all source language sentences and target language sentences separately until all the values of the bi-gram do not exceed a certain threshold. Then, individual words including the connected words are extracted as phrase candidates.

【００５９】次にフレーズ決定部１０６で、原言語文と
目的言語文対において、各フレーズが同時に出現してい
る頻度を算出する。ｉ番目の原言語フレーズをＪ[ｉ]、
ｊ番目の目的言語フレーズをＥ[ｊ]とすると、フレーズ
Ｊ[ｉ]とＥ[ｊ]との共起頻度Ｋ[ｉ，ｊ]は、算出式を
（数３）にて算出される。Next, the phrase determination unit 106 calculates the frequency at which each phrase appears simultaneously in the source language sentence and the target language sentence pair. The i-th source language phrase is J [i],
Assuming that the j-th target language phrase is E [j], the co-occurrence frequency K [i, j] of the phrases J [i] and E [j] is calculated by Expression (3).

【００６０】[0060]

【数３】 (Equation 3)

【００６１】たとえば、図７の例では、フレーズ列とし
て記述された３つの対訳文１３０のうち、原言語フレー
ズの「お願いします」と目的言語フレーズの「I'd like
to」との共起頻度は２/（２＋３）、「したいんです
が」と目的言語フレーズの共起頻度は１/ (１＋３)とな
る。この頻度が一定値以上のフレーズ対を対訳フレーズ
として決定し、頻度と共にフレーズ番号を付けて対訳フ
レーズ辞書１０９に登録する。さらに、対訳フレーズと
して決定されなかったフレーズ候補の中で、既に品詞化
されている単語は、それ単独で対訳フレーズとして対訳
フレーズ辞書１０９に登録する。それ以外の部分は、対
訳対の中で各々の単語列どうしを一対としてフレーズ辞
書に登録する。For example, in the example of FIG. 7, of the three bilingual sentences 130 described as a phrase string, the source language phrase “Please please” and the target language phrase “I'd like”
The co-occurrence frequency of "to" is 2 / (2 + 3), and the co-occurrence frequency of "I want to do it" is 1 / (1 + 3). A phrase pair whose frequency is equal to or more than a certain value is determined as a bilingual phrase, and a phrase number is added along with the frequency and registered in the bilingual phrase dictionary 109. Further, among the phrase candidates that have not been determined as a bilingual phrase, words that have already been converted to part of speech are registered in the bilingual phrase dictionary 109 as bilingual phrases by themselves. Other parts are registered in the phrase dictionary as pairs of each word string in the bilingual pair.

【００６２】たとえば、図７の例では、１３１のように
対訳フレーズ辞書１０９に登録される。For example, in the example shown in FIG. 7, it is registered in the bilingual phrase dictionary 109 as 131.

【００６３】このようにして、フレーズ登録を行なった
後、一文に共起するフレーズ番号を記録し、フレーズ番
号対として対訳フレーズ間規則表１０８に登録する。図
７の例では１３２となる。After the phrase registration is performed in this way, the phrase numbers that co-occur in one sentence are recorded and registered in the bilingual phrase rule table 108 as a pair of phrase numbers. In the example of FIG.

【００６４】また、上記フレーズ番号対のフレーズbi-g
ramを求め、これも対訳フレーズ間規則表１０８に記録
する。すなわち、原言語コーパスを、対訳フレーズ辞書
に登録されたフレーズ番号列で表し、フレーズ番号で表
されたコーパスを用いてフレーズbi-gramを求め、これ
も対訳フレーズ間規則表８に記録する。フレーズiに続
くフレーズjの出現確立を表すフレーズbi-gramは（数
４）で表される。The phrase bi-g of the above phrase number pair
The ram is obtained, and this is also recorded in the bilingual phrase rule table 108. That is, the source language corpus is represented by a phrase number string registered in the bilingual phrase dictionary, a phrase bi-gram is obtained using the corpus represented by the phrase number, and this is also recorded in the bilingual phrase rule table 8. The phrase bi-gram representing the probability of occurrence of the phrase j following the phrase i is represented by (Equation 4).

【００６５】[0065]

【数４】 (Equation 4)

【００６６】例えば図７の１３２では、例えばフレーズ
３とフレーズ１のフレーズbi-gramを求める。またフレ
ーズ４、フレーズ５、フレーズ２のフレーズ間規則に関
してはフレーズ４、フレーズ５及びフレーズ５、フレー
ズ２のbi-gramをそれぞれ求め、対訳フレーズ間規則表
１０８に記録する。For example, at 132 in FIG. 7, the phrase bi-gram of phrase 3 and phrase 1 is determined. In addition, regarding the rules between phrases 4, 5 and 2, the bi-grams of phrases 4, 5 and 5 and 2 are obtained and recorded in the bilingual rule table 108.

【００６７】通訳の際には、まず発声された原言語音声
は音声認識部１１０に入力される。音声認識部１１３で
は、たとえば、対訳フレーズ辞書１０９にフレーズとし
て記述されている単語のネットワークと対訳フレーズ間
規則表１０８にて記述されているフレーズbi-gramとに
より、時系列に沿って順次認識単語候補が予測される。
予め学習されている音響モデル１１３と入力音声との距
離値をベースとした音響スコアとフレーズbi-gramによ
る言語スコアとの和を認識スコアとし、Nbest-searchに
より認識候補である連続単語列が決定される。At the time of interpretation, first, the uttered source language voice is input to the voice recognition unit 110. The speech recognition unit 113 sequentially recognizes the recognized words in a chronological order using, for example, a network of words described as a phrase in the bilingual phrase dictionary 109 and a phrase bi-gram described in the bilingual rule table 108. Candidates are predicted.
The sum of the acoustic score based on the distance value between the acoustic model 113 and the input speech that has been learned in advance and the language score based on the phrase bi-gram is used as the recognition score, and a continuous word sequence that is a recognition candidate is determined by Nbest-search. Is done.

【００６８】認識された連続単語列は、言語変換部１１
１に入力される。言語変換部１１１では、入力された連
続単語列を対訳フレーズ辞書１０９内のフレーズ列に変
換し、各フレーズ列に相当するフレーズ間規則を探索す
る。そして、各フレーズの対訳である目的言語フレーズ
と目的言語のフレーズ間規則とから、入力原言語認識結
果文を目的言語文に変換する。The recognized continuous word string is sent to the language conversion unit 11.
1 is input. The language conversion unit 111 converts the input continuous word string into a phrase string in the bilingual phrase dictionary 109, and searches for an inter-phrase rule corresponding to each phrase string. Then, the input source language recognition result sentence is converted into the target language sentence based on the target language phrase that is a bilingual translation of each phrase and the inter-phrase rule of the target language.

【００６９】このように本実施の形態では、音声認識部
１１０と言語変換部１１１とでともに対訳フレーズ辞書
１０９と対訳フレーズ間規則表１０８が使用される。As described above, in the present embodiment, both the speech recognition unit 110 and the language conversion unit 111 use the bilingual phrase dictionary 109 and the bilingual phrase rule table 108.

【００７０】変換された目的言語文は出力文生成部１１
２に入力され、統語的な不自然さを修正する。たとえ
ば、定冠詞や不定冠詞の付与、代名詞、動詞における３
人称化や複数化や過去形化などの最適化などが行われ
る。修正後の目的言語翻訳結果文はたとえばテキストと
して出力される。The converted target language sentence is output from the output sentence generating unit 11.
2 to correct syntactic unnaturalness. For example, the use of definite or indefinite articles, pronouns, and verbs
Optimization such as personification, pluralization, and past tense are performed. The corrected target language translation result sentence is output, for example, as text.

【００７１】以上の実施例では、原言語フレーズと目的
言語フレーズが対応した形で規則を記述しておき、この
フレーズの単位で認識を行ないうことで、入力文の一部
が未知部分文であったり、音声認識が一部誤ったとして
も、正しく認識および解析された部分は適切に処理され
出力される言語変換装置を可能にする。また、原言語文
及び目的言語文各々における単語または品詞の隣接頻度
と、対訳における頻度の高い単語列または品詞列の共起
関係を用いて自動的に対訳フレーズとフレーズ間規則を
決定し、この対訳フレーズ規則を用いて通訳を行うこと
により、なるべく人手をかけずに、自動的に効率よくし
かも品質の高い対訳フレーズ辞書を生成できる言語規則
作成装置を可能とする。In the above embodiment, rules are described in such a manner that the source language phrase and the target language phrase correspond to each other, and recognition is performed in units of this phrase, so that a part of the input sentence is an unknown partial sentence. Even if there is some or incorrect speech recognition, the correctly recognized and analyzed part allows the language converter to be properly processed and output. In addition, the bilingual phrase and inter-phrase rules are automatically determined by using the co-occurrence relationship between the frequency of adjacent words or parts of speech in the source language sentence and the target language sentence, and the frequently occurring word strings or part of speech strings in the parallel translation. By translating using a bilingual phrase rule, a language rule creating device that can automatically and efficiently generate a high-quality bilingual phrase dictionary with as little effort as possible is made possible.

【００７２】なお、本実施の形態では、言語変換装置の
１つの例として通訳装置を例にあげて説明したが、これ
は他の言語変換装置、例えばくだけた発話文を書き言葉
のようなテキスト文に変換する言語変換装置において
も、同様に使用することが出来る。In this embodiment, an interpreter has been described as an example of a language conversion apparatus. However, this is not the case with other language conversion apparatuses, for example, in which a spoken utterance is converted into a text sentence such as a written word. The same can be used in a language conversion device for converting to.

【００７３】（実施の形態４）本実施の形態も、言語変
換装置の一例として、第３の実施の形態同様、異なる言
語間の変換を行う通訳装置を用いて説明する。図８は本
実施の形態の通訳装置のブロック図である。(Embodiment 4) In this embodiment, as in the third embodiment, an interpreter for performing conversion between different languages will be described as an example of a language converter. FIG. 8 is a block diagram of the interpreter of the present embodiment.

【００７４】なお、本実施の形態のうち、対訳コーパス
１０１、内容語定義表１０３、対訳単語辞書１０７、形
態素解析部１０２、品詞化部１０４、フレーズ抽出部１
４２、フレーズ決定部１４３は、対訳フレーズ間規則表
１４５、対訳フレーズ辞書１４４、フレーズ定義表１４
１は、本発明の言語変換規則作成装置の例である。ま
た、本実施の形態の対訳フレーズ辞書１４４は本発明の
請求項６記載のフレーズ辞書の例である。In this embodiment, the bilingual corpus 101, the content word definition table 103, the bilingual word dictionary 107, the morphological analysis unit 102, the part-of-speech unit 104, the phrase extraction unit 1
42, the phrase determination unit 143 includes a bilingual phrase inter-rule table 145, a bilingual phrase dictionary 144, and a phrase definition table 14.
1 is an example of a language conversion rule creation device of the present invention. Further, the bilingual phrase dictionary 144 of the present embodiment is an example of the phrase dictionary according to claim 6 of the present invention.

【００７５】本実施の形態の通訳装置は、まず通訳する
前に、第３の実施の形態同様、形態素解析後、品詞タグ
が付与された対訳コーパスを作成する。The translator according to this embodiment creates a bilingual corpus to which a part-of-speech tag is attached after morphological analysis, as in the third embodiment, before interpreting.

【００７６】次に、フレーズ抽出部１４２で、予めフレ
ーズとして抽出したい単語または品詞列を規則化して記
述してあるフレーズ定義表１４１に従い、規則に相当す
る単語または品詞を連結する。たとえば図９の１４１の
例では、「動詞＋助動詞」や「格助詞＋動詞」などの規
則により、「を＋(動詞)＋たい」が単語として連結され
る。このように、上記の一部の内容語が品詞名に置き換
えられ、さらに上記のような単語または品詞列が連結さ
れ一単語とみなされたコーパスについて、原言語文、目
的言語文別々に、各単語または品詞の２連鎖出現頻度
（以後 bi-gramと呼ぶ）を算出する。算出式は（数２）
と同様である。Next, the phrase extracting unit 142 connects words or parts of speech corresponding to the rules in accordance with a phrase definition table 141 in which words or parts of speech that are desired to be extracted as phrases are described in a regularized manner. For example, in the example 141 of FIG. 9, "+ (verb) + tai" is concatenated as a word by rules such as "verb + auxiliary verb" and "case particle + verb". In this way, for the corpus in which some of the content words described above are replaced with part-of-speech names and the words or part-of-speech strings are concatenated and regarded as one word, the source language sentence and the target language sentence are separately Calculate the frequency of two-chain occurrences of a word or part of speech (hereinafter referred to as bi-gram). The calculation formula is (Equation 2)
Is the same as

【００７７】さらに、bi-gramの値が全て一定閾値を超
えなくなるまで、第３の実施の形態と同等に、処理を繰
り返す。そして、連結された単語も含めた個々の単語を
フレーズ候補として抽出し、フレーズ決定部で、第３の
実施の形態と同様に対訳フレーズ辞書１４４と対訳フレ
ーズ間規則表１４５を作成する。図９の１５１はフレー
ズ定義表１４１に従って単語または品詞が連結されたコ
ーパスの例であり、１５２が作成された対訳フレーズ辞
書１４４の例である。Further, the processing is repeated in the same manner as in the third embodiment until the values of all the bi-grams do not exceed the fixed threshold. Then, individual words including the connected words are extracted as phrase candidates, and the phrase determination unit creates a bilingual phrase dictionary 144 and a bilingual phrase rule table 145 as in the third embodiment. 9 is an example of a corpus in which words or parts of speech are linked according to the phrase definition table 141, and an example of a bilingual phrase dictionary 144 in which 152 is created.

【００７８】通訳の際の動作も第３の実施の形態と同様
である。The operation at the time of interpretation is the same as that of the third embodiment.

【００７９】以上の実施の形態では、予め定義されてい
るフレーズとみなしたい単語または品詞列の規則に従っ
て単語または品詞を連結した後、原言語文及び目的言語
文各々における単語または品詞の隣接頻度と、対訳にお
ける頻度の高い単語列または品詞列の共起関係を用いて
自動的に対訳フレーズとフレーズ間規則を決定し、この
対訳フレーズ規則を用いて言語または文体変換とを行う
ことにより、人手を最小限度に押さえた範囲で、さらに
効率よく品質の高い対訳フレーズ辞書を生成できる言語
変換規則作成装置を提供することが出来る。In the above embodiment, after the words or parts of speech are linked according to the rules of the words or parts of speech that are to be regarded as predefined phrases, the adjacent frequencies of the words or parts of speech in the source language sentence and the target language sentence are determined. By automatically determining the bilingual phrases and inter-phrase rules using the co-occurrence relationship between frequent word strings or part-of-speech strings in bilingual translation, and performing language or stylistic conversion using these bilingual phrase rules, It is possible to provide a language conversion rule creation device that can more efficiently generate a high-quality bilingual phrase dictionary within a range that is minimized.

【００８０】なお、本実施の形態の対訳フレーズは、本
発明の対応するフレーズの例である。The bilingual phrases of the present embodiment are examples of the corresponding phrases of the present invention.

【００８１】さらに、本実施の形態では、言語変換装置
の１つの例として通訳装置を例にあげて説明したが、こ
れは他の言語変換装置、例えばくだけた発話文を書き言
葉のようなテキスト文に変換する言語変換装置において
も、同様に使用することが出来る。Further, in the present embodiment, an interpreter has been described as an example of the language conversion apparatus, but this is not the case with other language conversion apparatuses, for example, an unfoldable utterance sentence is written in a text sentence such as a written word. The same can be used in a language conversion device for converting to.

【００８２】（実施の形態５）第３の実施の形態では、
言語規則を構築する際に、コーパスの一部の単語を品詞
化することで、より一般的で品質の高い規則の構築を実
現しているが、品詞化の代わりに意味コード化すること
でも同様の効果が期待できる。以下に図１０を参照しな
がら、本実施の形態を説明する。本実施の形態でも、異
なる言語間の変換を行う通訳装置を用いて説明する。(Embodiment 5) In the third embodiment,
When constructing linguistic rules, some of the words in the corpus are converted to parts of speech to achieve more general and high-quality rules, but semantic coding instead of part of speech is also used. The effect can be expected. This embodiment will be described below with reference to FIG. Also in the present embodiment, description will be made using an interpreter that performs conversion between different languages.

【００８３】なお、本実施の形態のうち、対訳コーパス
２０１、分類語彙表２１６、対訳単語辞書２０７、形態
素解析部２０２、意味コード化部２１５、フレーズ抽出
部２０５、フレーズ決定部２０６は、対訳フレーズ間規
則表２０８、対訳フレーズ辞書２０９は、本発明の言語
変換規則作成装置の例である。また、本実施の形態の対
訳フレーズ辞書２０９は本発明の請求項６記載のフレー
ズ辞書の例である。In this embodiment, the bilingual corpus 201, the classified vocabulary table 216, the bilingual word dictionary 207, the morphological analysis unit 202, the semantic coding unit 215, the phrase extracting unit 205, and the phrase determining unit 206 The inter-rule table 208 and the bilingual phrase dictionary 209 are examples of the language conversion rule creation device of the present invention. Further, the bilingual phrase dictionary 209 of the present embodiment is an example of the phrase dictionary according to claim 6 of the present invention.

【００８４】本実施の形態の通訳装置は、第３の実施の
形態同様、形態素解析部２０２で対訳コーパス２０１内
の原言語文の形態素解析を行うことで品詞タグが原言語
文に与えられる。次に、意味コード化部２１５で、原言
語文の形態素列において、各形態素と分類語彙表２１６
に書かれている単語とを比較し、分類語彙表２１６で意
味コードが与えられている単語と一致した形態素につい
ては、形態素名を意味コードに置きかえることで、入力
形態素列を一部の形態素が意味コード化された形態素列
に変換する。この際に意味コード化される形態素には以
下の条件を満たすものとする。（条件）対訳単語辞書に登録されている単語で、対訳単
語辞書の目的言語訳に相当する単語が、コーパス内の相
当する目的言語対訳文に存在する。In the interpreter of this embodiment, as in the third embodiment, the morphological analysis unit 202 performs a morphological analysis of the source language sentence in the bilingual corpus 201 to give a part-of-speech tag to the source language sentence. Next, in the semantic encoding unit 215, in the morpheme sequence of the source language sentence, each morpheme and the classification vocabulary table 216
Is compared with the words written in the vocabulary table 216, and for the morphemes that match the words to which the semantic codes are given in the classification vocabulary table 216, the morpheme names are replaced with the semantic codes, so that some of the morphemes are Convert to semantically encoded morpheme sequence. At this time, the morphemes to be semantically encoded satisfy the following conditions. (Condition) A word registered in the bilingual word dictionary and corresponding to the target language translation of the bilingual word dictionary exists in the corresponding target language bilingual sentence in the corpus.

【００８５】図１１の例では、対訳単語辞書に登録され
ておりしかも分類語彙表でコードが与えられている「部
屋」と「予約」のみが意味コード化され、２１３２のよ
うにこれらの形態素を意味コードに置き換えた形態素列
が作成される。さらに、相当する目的言語対訳文内の単
語名も２１３３のように意味コードに置き換える。In the example of FIG. 11, only “room” and “reservation” which are registered in the bilingual word dictionary and are given codes in the classification vocabulary table are converted into meaning codes. A morpheme string replaced with the semantic code is created. Further, the word name in the corresponding target language bilingual sentence is also replaced with a semantic code like 2133.

【００８６】次に、上記の一部の内容語が意味コードに
置き換えられたコーパスについて、フレーズ抽出部２０
５で、原言語文、目的言語文別々に、各単語または意味
コードの２連鎖出現頻度を算出する。算出式を（数５）
に示す。Next, with respect to the corpus in which some of the content words have been replaced with meaning codes, the phrase extraction unit 20
In 5, the two-language appearance frequency of each word or meaning code is calculated separately for the source language sentence and the target language sentence. Equation (5)
Shown in

【００８７】[0087]

【数５】 (Equation 5)

【００８８】コーパス内の全原言語文及び目的言語文を
対象にbi-gramを算出した後、フレーズ抽出部で、最も
出現頻度の高かった２単語または意味コード対を１つの
単語とみなして連結し、再度bi-gramを算出する。これ
により、たとえば頻度高く隣接する「お」「願い」、
「願い」「し」、「し」「ます」などの単語対が連結さ
れ、「お願いします」というフレーズ候補が形成され
る。目的言語では「I'd」「like」、「like」「to」の
単語対が連結される。After calculating bi-grams for all source language sentences and target language sentences in the corpus, the phrase extraction unit regards the two words or semantic code pairs that appear most frequently as one word and connects them. Then, the bi-gram is calculated again. As a result, for example, "O""Wish"
Word pairs such as "Wish", "Shi", "Shi", and "Masu" are concatenated to form a phrase candidate "Please". In the target language, word pairs "I'd", "like", "like", and "to" are connected.

【００８９】全原言語文及び目的言語文別々に、以上の
連結とbi-gram算出とを、bi-gramの値が全て一定閾値を
超えなくなるまで繰り返す。そして、連結された単語も
含めた個々の単語をフレーズ候補として抽出する。The above connection and bi-gram calculation are repeated for all source language sentences and target language sentences separately until all of the bi-gram values do not exceed a certain threshold. Then, individual words including the connected words are extracted as phrase candidates.

【００９０】以下第３の実施の形態と同様にフレーズ決
定部２０６にて対訳フレーズを決定し、対訳フレーズ辞
書２０９に登録する。さらに第３の実施の形態と同様に
フレーズ間言語規則及びフレーズbi-gramを作成し、対
訳フレーズ間規則表２０８に登録する。Hereinafter, similarly to the third embodiment, the translated phrase is determined by the phrase determining unit 206 and registered in the translated phrase dictionary 209. Further, an inter-phrase language rule and a phrase bi-gram are created and registered in the bilingual inter-phrase rule table 208 as in the third embodiment.

【００９１】通訳の際も第３の実施の形態と同様に動作
する。At the time of interpretation, the operation is the same as in the third embodiment.

【００９２】以上の実施の形態では、原言語フレーズと
目的言語フレーズが対応した形で規則を記述しておき、
このフレーズの単位で認識を行ないうことで、入力文の
一部が未知部分文であったり、音声認識が一部誤ったと
しても、正しく認識および解析された部分は適切に処理
され出力される言語変換装置を可能にする。また、原言
語文及び目的言語文各々における単語または意味コード
の隣接頻度と、対訳における頻度の高い単語列または意
味コード列の共起関係を用いて自動的に対訳フレーズと
フレーズ間規則を決定し、この対訳フレーズ規則を用い
て通訳を行うことにより、なるべく人手をかけずに、自
動的に効率よくしかも品質の高い対訳フレーズ辞書を生
成できる言語規則作成装置を可能とする。In the above embodiment, rules are described in such a manner that the source language phrase and the target language phrase correspond to each other.
By performing recognition in units of this phrase, even if a part of the input sentence is an unknown sub-sentence or a part of the speech recognition is incorrect, the correctly recognized and analyzed part is appropriately processed and output. Enables a language translator. In addition, the bilingual phrase and inter-phrase rules are automatically determined using the co-occurrence relationship between the frequency of adjacent words or meaning codes in the source language sentence and the target language sentence and the frequently occurring word strings or meaning code strings in the bilingual translation. By performing translation using this bilingual phrase rule, a language rule creating device capable of automatically and efficiently generating a high-quality bilingual phrase dictionary with minimal human intervention becomes possible.

【００９３】なお、本実施の形態では、言語変換装置の
１つの例として通訳装置を例にあげて説明したが、これ
は他の言語変換装置、例えばくだけた発話文を書き言葉
のようなテキスト文に変換する言語変換装置においても
同様に使用することが出来る。In the present embodiment, an interpreter has been described as an example of the language conversion apparatus. However, this is not the case with other language conversion apparatuses, for example, in which a spoken utterance sentence is converted into a text sentence such as a written word. The same can be used in a language conversion device for converting to.

【００９４】（実施の形態６）第５の実施の形態では、
言語規則を構築する際に、隣接頻度の高い単語または品
詞、意味コードを連結してフレーズを作成していたが、
フレーズを作成した後に、文複雑度を評価することで、
より品質が高く、認識率を保証できるフレーズを形成す
ることができる。(Embodiment 6) In the fifth embodiment,
When building linguistic rules, phrases were created by linking frequently adjacent words or parts of speech and semantic codes.
After you create a phrase, you can evaluate sentence complexity,
It is possible to form a phrase that has higher quality and can guarantee the recognition rate.

【００９５】以下に図１２を参照しながら、言語変換規
則作成装置の実施の形態を説明する。An embodiment of the language conversion rule creating device will be described below with reference to FIG.

【００９６】なお、本実施の形態における対訳フレーズ
辞書は本発明の請求項６記載のフレーズ辞書の例であ
る。The bilingual phrase dictionary in the present embodiment is an example of the phrase dictionary according to claim 6 of the present invention.

【００９７】先の実施の形態同様、形態素解析後、意味
コード化部２１３で一部の形態素を意味コードに変換し
た対訳コーパスを作成する。さらに、フレーズ抽出部
で、原言語文、目的言語文別々に、各単語または意味コ
ードのbi-gramを算出する。算出式は（数５）と同様で
ある。As in the previous embodiment, after morphological analysis, a semantic encoding unit 213 creates a bilingual corpus in which some morphemes are converted into semantic codes. Further, the phrase extraction unit calculates a bi-gram of each word or meaning code separately for the source language sentence and the target language sentence. The calculation formula is the same as (Equation 5).

【００９８】さらに、bi-gramの値が全て一定閾値を超
えなくなるまで、先の実施の形態と同等に、処理を繰り
返す。そして、連結された単語も含めた個々の単語をフ
レーズ候補として抽出する。Further, the processing is repeated in the same manner as in the previous embodiment until all the values of the bi-gram do not exceed the fixed threshold. Then, individual words including the connected words are extracted as phrase candidates.

【００９９】上記の処理を行う際に、文複雑度算出部２
１８で、各単語または意味コードのbi-gramを算出し、b
i-gramの値によって連結処理を行う際に、各単語対を連
結した場合と連結しない場合との文複雑度を算出し比較
する。文複雑度は（数６）で算出されるものである。When performing the above processing, the sentence complexity calculation unit 2
At 18, a bi-gram of each word or meaning code is calculated, and b
When performing the connection process based on the value of the i-gram, the sentence complexity between the case where each word pair is connected and the case where each word pair is not connected is calculated and compared. The sentence complexity is calculated by (Equation 6).

【０１００】[0100]

【数６】 (Equation 6)

【０１０１】比較した結果、フレーズ抽出部２１７で各
単語または意味コードを連結することで文複雑度が増加
するものについては、フレーズ候補から除去する。As a result of the comparison, the phrase extraction unit 217 that removes from the phrase candidates those sentences which increase the sentence complexity by linking each word or meaning code.

【０１０２】上記処理でフレーズ候補に残ったフレーズ
を対象に、先の実施の形態と同条件でフレーズを決定
し、対訳フレーズ辞書２０９とフレーズ間規則表２０８
を決定する。For the phrases remaining as the phrase candidates in the above processing, the phrases are determined under the same conditions as in the previous embodiment, and the bilingual phrase dictionary 209 and the inter-phrase rule table 208 are determined.
To determine.

【０１０３】以上の実施の形態では、対訳フレーズを決
定する際に、意味コードによる単語クラス化された対訳
コーパスの文複雑度を用いて決定することにより、コー
パスから対訳フレーズを自動的に抽出することを可能と
し、人手をなるべく用いずに、効率よく品質の高い対訳
フレーズ辞書を生成できる。また、文複雑度の尺度が、
音声認識に適切なフレーズかどうかの尺度と密接に関係
があるため、認識精度を保証しながら、自動的にフレー
ズ抽出することが可能となる。In the above-described embodiment, when a bilingual phrase is determined, the bilingual phrase is automatically extracted from the corpus by using the sentence complexity of the bilingual corpus that has been word-classified by the semantic code. This makes it possible to efficiently generate a high-quality bilingual phrase dictionary without using human resources as much as possible. Also, the measure of sentence complexity is
Since it is closely related to a measure of whether a phrase is appropriate for speech recognition, it is possible to automatically extract a phrase while guaranteeing recognition accuracy.

【０１０４】なお、本実施の形態では、一部の単語を意
味コード化したコーパスを扱ってフレーズ抽出する例を
説明したが、品詞化したコーパスを扱ってフレーズ抽出
する場合でも同様の効果が期待できる。In the present embodiment, an example has been described in which phrases are extracted by using a corpus in which some words are converted into meaning codes. However, similar effects are expected when phrases are extracted by using a corpus in which parts of speech are converted. it can.

【０１０５】さらに、第４の実施の形態では、品詞タグ
が付与された対訳コーパスを扱ってフレーズ定義表によ
りフレーズを抽出する例を説明したが、第５の実施の形
態で説明したように一部の単語を意味コード化したコー
パスを扱って、フレーズ定義表によりフレーズを抽出す
る場合でも同様の効果が期待できる。Further, in the fourth embodiment, an example has been described in which a phrase is extracted from a phrase definition table by using a bilingual corpus to which a part-of-speech tag is attached. A similar effect can be expected even when a phrase is extracted from a phrase definition table by using a corpus in which a part of the word is converted into a meaning code.

【０１０６】さらに、第１〜５の実施の形態では言語変
換装置は、音声認識部、言語変換部、出力文生成部から
構成されるとして説明したが、これに限らない。図１３
に示すように、出力文生成部２１２が出力した翻訳結果
文を音声合成する音声合成部を設けても構わない。そし
てこの音声合成部は、音声合成する際に音声認識部２１
０、言語変換部２１１で用いられたのと同じ対訳フレー
ズ間規則表２０８、対訳フレーズ辞書２０９を用いて音
声合成を行う。このようにすれば入力音声文に未学習部
分があったり、音声認識が一部誤りを起こしても、全文
に対する音声合成結果が全く出力されないという問題点
を解決し、正しく認識された部分については、適切な音
声を出力できることが期待できる。Further, in the first to fifth embodiments, the language conversion device has been described as including the speech recognition unit, the language conversion unit, and the output sentence generation unit, but the present invention is not limited to this. FIG.
As shown in (1), a speech synthesis unit that performs speech synthesis on the translation result sentence output by the output sentence generation unit 212 may be provided. Then, the speech synthesizing unit performs the speech recognition
0, speech synthesis is performed using the same bilingual phrase rule table 208 and bilingual phrase dictionary 209 used by the language conversion unit 211. This solves the problem that even if there is an unlearned part in the input speech sentence or a part of speech recognition causes an error, the speech synthesis result for the whole sentence is not output at all. It can be expected that an appropriate sound can be output.

【０１０７】さらに、本発明の言語変換装置または言語
変換規則作成装置の各構成要素の全部または一部の機能
を専用のハードウェアを用いて実現しても構わないし、
またコンピュータのプログラムによってソフトウェア的
に実現しても構わない。Furthermore, all or some of the functions of each component of the language conversion device or language conversion rule creation device of the present invention may be realized using dedicated hardware.
Further, it may be realized by software by a computer program.

【０１０８】さらに、本発明の言語変換装置または言語
変換規則作成装置の各構成要素の全部または一部の機能
をコンピュータに実行させるためのプログラムを格納し
ていることを特徴とするプログラム記録媒体も本発明に
属する。Further, there is also provided a program recording medium storing a program for causing a computer to execute all or a part of the functions of each component of the language conversion device or language conversion rule creation device of the present invention. It belongs to the present invention.

【０１０９】[0109]

【発明の効果】以上説明したところから明らかなよう
に、本発明は、必ず目的言語文に変換可能な認識結果を
出力でき、従って、入力文の一部が未知部分文であった
り、音声認識が一部誤ったとしても、正しく認識および
解析された部分は適切に処理され出力されることを可能
にする言語変換規則作成装置および言語変換装置を提供
することが出来る。As is clear from the above description, the present invention can always output a recognition result which can be converted into a target language sentence, and therefore, a part of an input sentence is an unknown partial sentence or a speech recognition Even if a part is incorrect, it is possible to provide a language conversion rule creating device and a language conversion device that enable a correctly recognized and analyzed part to be appropriately processed and output.

【０１１０】また、本発明は、入力音声文に未学習部分
があったり、音声認識が一部誤りを起こしても、正しく
認識され適切な解析規則が当てはまった部分のみの変換
が可能であり、部分的な変換結果を必ず出力することを
可能にする言語変換規則作成装置および言語変換装置を
提供することが出来る。Further, according to the present invention, even if there is an unlearned part in an input speech sentence or a part of speech recognition causes an error, it is possible to convert only a part which is correctly recognized and to which an appropriate analysis rule is applied. It is possible to provide a language conversion rule creation device and a language conversion device that can always output a partial conversion result.

【０１１１】また、本発明は、なるべく人手をかけずに
自動的に言語規則を作成することを可能にする言語変換
規則作成装置を提供することが出来る。Further, the present invention can provide a language conversion rule creating apparatus which can automatically create a language rule with minimum effort.

【０１１２】また、本発明は、なるべく人手をかけずに
自動的に、かつ、より効率よく高品質な言語規則を作成
することを可能にする言語変換規則作成装置を提供する
ことが出来る。Further, the present invention can provide a language conversion rule creating apparatus which can create a high-quality language rule automatically and more efficiently with as little labor as possible.

【０１１３】また、本発明は、自動的に、かつ、より効
率よく高品質な言語規則を作成することを可能にする言
語変換規則作成装置を提供することが出来る。Further, according to the present invention, it is possible to provide a language conversion rule creating apparatus which enables automatic and efficient creation of high quality language rules.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態における言語変換装
置の構成を示すブロック図FIG. 1 is a block diagram showing a configuration of a language conversion device according to a first embodiment of the present invention.

【図２】本発明の第２の実施の形態における言語変換装
置の構成を示すブロック図FIG. 2 is a block diagram showing a configuration of a language conversion device according to a second embodiment of the present invention.

【図３】本発明の第１の実施の形態における言語規則の
作成を説明する図FIG. 3 is a view for explaining creation of a language rule according to the first embodiment of the present invention;

【図４】本発明の第２の実施の形態における最適言語規
則の作成を説明する図FIG. 4 is a view for explaining creation of an optimal language rule according to the second embodiment of the present invention;

【図５】本発明の第３の実施の形態における言語変換装
置及び言語規則作成装置の構成を示すブロック図FIG. 5 is a block diagram showing a configuration of a language conversion device and a language rule creation device according to a third embodiment of the present invention.

【図６】本発明の第３の実施の形態における言語変換規
則の作成を説明する図FIG. 6 is a view for explaining creation of a language conversion rule according to the third embodiment of the present invention;

【図７】本発明の第３の実施の形態における対訳フレー
ズ間規則表と対訳フレーズ辞書の例を示す図。FIG. 7 is a diagram showing an example of a bilingual phrase rule table and a bilingual phrase dictionary according to the third embodiment of the present invention.

【図８】本発明の第４の実施の形態における言語変換装
置及び言語規則作成装置の構成を示すブロック図FIG. 8 is a block diagram illustrating a configuration of a language conversion device and a language rule creation device according to a fourth embodiment of the present invention.

【図９】本発明の第４の実施の形態におけるフレーズ定
義表の例を説明する図FIG. 9 is a diagram illustrating an example of a phrase definition table according to the fourth embodiment of the present invention.

【図１０】本発明の第５の実施の形態における言語変換
装置及び言語規則作成装置の構成を示すブロック図FIG. 10 is a block diagram showing a configuration of a language conversion device and a language rule creation device according to a fifth embodiment of the present invention.

【図１１】本発明の第５の実施の形態における言語規則
の作成を説明する図FIG. 11 is a diagram illustrating creation of a language rule according to a fifth embodiment of the present invention.

【図１２】本発明の第６の実施の形態における言語変換
規則作成装置の構成を示すブロック図FIG. 12 is a block diagram illustrating a configuration of a language conversion rule creation device according to a sixth embodiment of the present invention.

【図１３】音声合成部を有する言語変換装置の構成例を
示すブロック図FIG. 13 is a block diagram illustrating a configuration example of a language conversion device having a speech synthesis unit.

【図１４】従来の言語変換装置で用いられる言語規則の
例を示す図FIG. 14 is a diagram showing an example of a language rule used in a conventional language conversion device.

【図１５】従来の言語変換装置の構成を示すブロック図FIG. 15 is a block diagram showing a configuration of a conventional language conversion device.

[Explanation of symbols]

１対訳コーパス２言語規則再生部３フレーズ内言語規則４フレーズ間言語規則５文生成規則６マイクロフォン７音声認識部８音響モデル９言語変換部１０出力文生成部１０１対訳コーパス１０２形態素解析部１０３内容語定義表１０４品詞化部１０５フレーズ抽出部１０６フレーズ決定部１０７対訳単語辞書１０８対訳フレーズ間規則表１０９対訳フレーズ辞書１１０音声認識１１１言語変換１１２出力文生成１１３音響モデル１１４文生成規則 Reference Signs List 1 bilingual corpus 2 language rule reproducing unit 3 language rule in phrase 4 language rule between phrases 5 sentence generation rule 6 microphone 7 speech recognition unit 8 acoustic model 9 language conversion unit 10 output sentence generation unit 101 bilingual corpus 102 morphological analysis unit 103 content word Definition table 104 Part-of-speech generation unit 105 Phrase extraction unit 106 Phrase determination unit 107 Bilingual word dictionary 108 Rules table between bilingual phrases 109 Bilingual phrase dictionary 110 Speech recognition 111 Language conversion 112 Output sentence generation 113 Sound model 114 Sentence generation rules

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０６Ｆ 15/38 Ｚ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G06F 15/38 Z

Claims

[Claims]

1. A sentence to be subjected to language conversion input by voice or text (hereinafter, referred to as a source language sentence, a sentence correspondingly converted to a target language sentence is referred to as a target language sentence) and a target language sentence Storage means for storing language rules obtained by learning grammatical or semantic constraint rules for words or word strings from a learning database (hereinafter referred to as a bilingual corpus) paired with A speech recognition unit that performs speech recognition of the input speech using a language rule and outputs a recognition result in a sentence to be subjected to language conversion; and a language conversion using the same language rule as used in the speech recognition unit. A language conversion unit for converting a sentence to be converted into a language-converted sentence.

2. The language rule divides a sentence to be subjected to language conversion and a converted sentence into a part (referred to as a form-independent phrase) which forms a semantic unit together with the form-independent phrase. 2. The language conversion apparatus according to claim 1, wherein the language conversion apparatus is formed by dividing a language rule in a phrase and a language rule between the body-type independent phrases into rules.

3. The linguistic rule is created by regularizing co-occurrence or connection between a grammatical or semantic rule in the type-independent phrase and the type-independent phrase. The language conversion device according to claim 2, wherein

4. The language conversion apparatus according to claim 1, further comprising a speech synthesis section that performs speech synthesis on the language-converted sentence using the same language rule as that used in the language conversion section. apparatus.

5. A linguistic rule group in which linguistic rules having the same target language sentence are grouped into the same category among the linguistic rules is subjected to linguistic conversion of the linguistic rules included in the linguistic rule group. An inter-rule distance calculation unit that calculates an acoustic inter-rule distance of a sentence; and optimizes the rule group by merging language rules whose calculated distances are close to each other in order to increase the recognition level of speech recognition. The language conversion device according to any one of claims 1 to 4, further comprising an optimum rule creation unit.

6. A part that calculates the frequency of adjacent words or parts of speech in a bilingual corpus and a source language sentence and a target language sentence in the bilingual corpus, and connects the frequent words and parts of speech to form a semantic unit. A phrase extraction unit that extracts a sentence (hereinafter, referred to as a phrase); and a phrase determination unit that determines a corresponding phrase by examining a relationship between phrases in a source language and a target language in the phrases extracted by the phrase extraction unit. And a phrase dictionary that stores the determined corresponding phrase. The phrase dictionary is used when performing language conversion, and the language conversion is performed when a source language sentence is input. Language conversion characterized by performing language or style conversion by comparing a sentence with the corresponding phrase stored in the phrase dictionary Law creating apparatus.

7. The language conversion rule creating device according to claim 6, wherein the phrase determining unit determines a corresponding phrase by examining a co-occurrence relationship between phrases in a source language and a target language.

8. A morphological analysis unit for converting a source language sentence of the bilingual corpus into a word string, and using a result of the morphological analysis unit, a part or all of the words of the source language sentence and the target language sentence are part-of-speech names 7. The language conversion according to claim 6, further comprising a part-of-speech conversion unit that creates a bilingual corpus replaced by: a. Rule making device.

9. A part-of-speech word dictionary comprising a source language and a target language bilingual word dictionary, wherein the part-of-speech unit converts words that are associated with the bilingual word dictionary and whose source language is a content word. 9. The language conversion rule creating device according to claim 8, wherein:

10. A morphological analysis unit for converting a source language sentence of the bilingual corpus into a word string, and using the result of the morphological analysis unit to classify words by regarding semantically similar words as the same class, A bilingual corpus in which some or all of the words in the source language sentence and the target language sentence are replaced with the codes in the classification vocabulary table, based on a table in which the same codes are assigned to the words in the class (hereinafter referred to as the classification vocabulary table) 7. The language conversion rule creation according to claim 6, further comprising: a semantic encoding unit that creates a phrase; and wherein the phrase extracting unit extracts a phrase from the bilingual corpus replaced with a code by the semantic encoding unit. apparatus.

11. A bilingual word dictionary for a source language and a target language, wherein the semantic coding unit converts only words associated with the bilingual word dictionary into semantic codes. Language conversion rule creation device described.

12. The phrase extracting unit extracts a phrase by also using a phrase definition table in which a word or a part-of-speech sequence to be preferentially regarded as a phrase is stored in a pair of a source language and a target language in advance. 7. The language conversion rule creating device according to claim 6, wherein:

13. A sentence complexity calculation unit for calculating perplexity (sentence complexity) of a corpus, wherein the phrase extraction unit extracts a phrase using the adjacent frequency of a word or a word class and the sentence complexity. The language conversion rule creating device according to any one of claims 6 to 13, wherein the language conversion rule creating device performs extraction.

14. A program for causing a computer to execute all or a part of the functions of each component of the language conversion device or language conversion rule creation device according to any one of claims 1 to 13. A program recording medium characterized by the above-mentioned.