JPH07129596A

JPH07129596A - Natural language processor

Info

Publication number: JPH07129596A
Application number: JP5294663A
Authority: JP
Inventors: Koji Inai; 幸治稲井
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1993-10-29
Filing date: 1993-10-29
Publication date: 1995-05-19

Abstract

PURPOSE:To easily obtain an analyzed result desired by a user by accurately analyzing various sentences by means of a small scale system without making the capacity of a dictionary larger. CONSTITUTION:A dictionary retrieving and dictionary control part 4 extracts a candidate work group through the use of only a standard independent work dictionary 2a and a standard attached word dictionary 2b in the case of a usual text sentence and a morpheme analysis part 5 executes morpheme analysis. In the case of the text sentence of a specified field, an extension independent word dictionary 2c and an extension attached work dictionary 2d which suit with a specified field are added in addition to it. In the case of a different field, the extension independent word dictionary 2c and the extension attached word dictionary 2d are replaced and in the case when there are plural dictionaries to be retrieved, a priority order is given to each dictionary so as to give a more precise judging reference at the time of selecting a word in morpheme analysis processing. In this constitution, a dictionary 2 can be constituted to the absolute minimum.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、機械翻訳システム、日
本語テキスト音声合成システム、もしくは漢字かな混じ
り文を扱う電子計算機等の自然言語処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a natural language processing device such as a machine translation system, a Japanese text-to-speech synthesis system, or an electronic computer that handles mixed kanji and kana sentences.

【０００２】[0002]

【従来の技術】従来より、テキスト文を入力すると、入
力された文字の系列を解析した後、翻訳する機械翻訳シ
ステムや、所定の規則に従ってパラメータを合成し、音
声合成する音声合成システムが知られている。音声合成
システムは、例えば、視覚障害者が介助者なしに、テキ
スト文を聞いて理解する福祉的な用途の他、視覚的に文
字情報を得るのが困難な状況下で用いられている。この
ような自然言語処理装置においては、入力されたテキス
ト文の極力正確な解析が期待されている。2. Description of the Related Art Conventionally, a machine translation system that analyzes a sequence of input characters when a text sentence is input and then translates it, and a voice synthesis system that synthesizes a voice by synthesizing parameters according to a predetermined rule are known. ing. The voice synthesis system is used, for example, in a welfare application in which a visually impaired person listens to and understands a text sentence without a helper, and also in a situation where it is difficult to visually obtain character information. In such a natural language processing apparatus, it is expected that the input text sentence will be analyzed as accurately as possible.

【０００３】上述したに自然言語処理装置おける漢字か
な混じり文の形態素解析は、自立語辞書と付属語辞書お
よびユーザ辞書による辞書検索を行ない、その検索結果
から最適な語を選択していく作業である。この形態素解
析を正確に行なうためには、辞書中に記録された語彙の
数と、複数の候補の語の中から最適と判断される語の選
択手法が重要である。As described above, in the morphological analysis of a mixed kanji / kana sentence in the natural language processing device, a dictionary search is performed by using the independent word dictionary, the auxiliary word dictionary, and the user dictionary, and the optimum word is selected from the search results. is there. In order to perform this morphological analysis accurately, the number of vocabularies recorded in the dictionary and the method of selecting the optimum word from a plurality of candidate words are important.

【０００４】辞書に記録される語彙数が多いほど、形態
素解析の結果が良くなることから、新聞等の解析を行な
う自然言語処理装置においては、５０万語という規模の
辞書を備えている。また、解析対象とする文が特定の分
野に限定される場合には、その分野の語彙だけを増や
し、最小限の辞書による方法がとられている。一方、複
数の語からの選択手法については、語同士の接続規則の
他に、辞書中に記録されている各語彙の重要度や、最長
一致原理をはじめとする判断基準を用いる。The larger the number of vocabularies recorded in the dictionary, the better the result of morphological analysis. Therefore, a natural language processing apparatus for analyzing newspapers or the like has a dictionary with a scale of 500,000 words. Further, when the sentence to be analyzed is limited to a specific field, only the vocabulary of the field is increased and a method using a minimum dictionary is adopted. On the other hand, as for the method of selecting from a plurality of words, in addition to the rules for connecting words, the importance of each vocabulary recorded in the dictionary and the determination criteria such as the longest matching principle are used.

【０００５】[0005]

【発明が解決しようとする課題】ところで、上述した従
来の自然言語処理装置にあっては、辞書が大きくなり、
新聞等の解析を行なう大規模な専用システム以外の、小
規模なシステムにおいては実現が困難である。また、特
定分野の語彙を増やす手法では、特定分野の専用システ
ムとなるため、他分野の文の解析には向かなくなるとい
う問題がある。また、複数の辞書を用いる自然言語処理
装置においては、選択する語数が増加するため、適切な
選択基準が必要となるが、その選択基準が設定しづらい
という問題があった。By the way, in the above-mentioned conventional natural language processing apparatus, the dictionary becomes large,
It is difficult to realize in a small-scale system other than a large-scale dedicated system for analyzing newspapers. Further, the method of increasing the vocabulary of a specific field has a problem that it is not suitable for analyzing sentences in other fields because it becomes a dedicated system for the specific field. Further, in a natural language processing device using a plurality of dictionaries, the number of words to be selected increases, so that an appropriate selection criterion is required, but there is a problem that the selection criterion is difficult to set.

【０００６】そこで本発明は、辞書容量を大きくするこ
となく、小規模のシステムで多様な文の解析を正確にで
き、利用者の意図する解析結果を容易に得ることができ
る自然言語処理装置を提供することを目的としている。Therefore, the present invention provides a natural language processing apparatus capable of accurately analyzing various sentences in a small-scale system without increasing the dictionary capacity and easily obtaining the analysis result intended by the user. It is intended to be provided.

【０００７】[0007]

【課題を解決するための手段】上記目的達成のため、請
求項１記載の発明による自然言語処理装置は、漢字かな
混じり文の形態素解析を行なう自然言語処理装置におい
て、一般的な語彙が記憶された標準辞書と、特定分野に
特有の語彙だけが記憶された着脱自在の複数の拡張辞書
が装着され、前記特有の語彙を読み取る辞書読み取り手
段と、前記漢字かな混じり文の検索対象文字列に応じ
て、前記標準辞書と前記複数の拡張辞書とを組合わせ、
該組合わせた辞書を検索し、前記検索対象文字列の候補
語群を抽出する検索抽出手段と、前記漢字かな混じり文
と、前記検索抽出手段による結果である候補語群とに対
して、形態素解析を行なう形態素解析手段とを具備する
ことを特徴とする。In order to achieve the above object, the natural language processing apparatus according to the invention of claim 1 stores a general vocabulary in a natural language processing apparatus for performing morphological analysis of a kanji / kana mixed sentence. Equipped with a standard dictionary and a plurality of detachable extended dictionaries that store only vocabulary peculiar to a specific field, and a dictionary reading means for reading the peculiar vocabulary and a search target character string for the kanji / kana mixed sentence A combination of the standard dictionary and the plurality of extended dictionaries,
A morpheme is searched for in the combined dictionary, a search and extraction unit that extracts a candidate word group of the search target character string, the Kanji / Kana mixed sentence, and the candidate word group that is the result of the search and extraction unit. And a morphological analysis means for performing analysis.

【０００８】また、請求項２記載の発明による自然言語
処理装置では、前記標準辞書は、一般的な自立語が記憶
された標準自立語辞書と、一般的な付属語が記憶された
標準付属語辞書とから構成され、前記複数の拡張辞書
は、少なくとも、特定分野に特有の自立語だけが記憶さ
れた複数の拡張自立語辞書から構成されることを特徴と
する。In the natural language processing apparatus according to the present invention, the standard dictionary includes a standard independent word dictionary in which general independent words are stored and a standard auxiliary word in which general auxiliary words are stored. It is characterized in that the plurality of extended dictionaries are constituted by a plurality of extended independent word dictionaries in which only independent words unique to a specific field are stored.

【０００９】また、請求項３記載の発明による自然言語
処理装置では、前記標準辞書と前記拡張辞書とに優先順
位を設定する優先順位設定手段を備え、前記検索抽出手
段は、検索した候補語群の各々に、前記優先順位設定手
段によって設定された優先順位を付与し、前記形態素解
析手段は、前記検索抽出手段によって抽出された候補語
群のうち、優先順位の高い語に所定の優先度を与えて、
形態素解析を行なうことを特徴とする。Further, in the natural language processing apparatus according to the third aspect of the present invention, there is provided a priority order setting means for setting the priority order in the standard dictionary and the extended dictionary, and the search / extract means is a searched candidate word group. The priority set by the priority setting means is given to each of the above, and the morphological analysis means assigns a predetermined priority to a word having a high priority among the candidate word groups extracted by the search and extraction means. Giving,
It is characterized by performing morphological analysis.

【００１０】また、請求項４記載の発明による自然言語
処理装置では、前記複数の拡張辞書は、ＩＣカード、磁
気ディスク、光磁気ディスク、光ディスクなどの記憶媒
体からなることを特徴とする。Further, in the natural language processing apparatus according to the present invention, the plurality of extended dictionaries are composed of a storage medium such as an IC card, a magnetic disk, a magneto-optical disk or an optical disk.

【００１１】[0011]

【作用】本発明では、一般的な漢字かな混じり文の場合
には、標準自立語辞書と標準付属語辞書だけ、特定分野
の漢字かな混じり文の場合には、それに加えて、特定分
野に合った拡張自立語辞書と拡張付属語辞書とを追加す
る。また、異なる分野の場合には、少なくとも拡張自立
語辞書を取り替える。したがって、辞書を必要最小限の
構成にすることができる。In the present invention, in the case of a general kanji / kana mixed sentence, only the standard independent word dictionary and the standard adjunct dictionary are used, and in the case of a kanji / kana mixed sentence of a specific field, in addition to that, it is suitable for the specific field. The extended independent word dictionary and the extended auxiliary word dictionary are added. In the case of different fields, at least the extended independent word dictionary is replaced. Therefore, the dictionary can be configured to the minimum required.

【００１２】また、検索する辞書が複数ある場合には、
各辞書毎に優先順位を付与することにより、形態素解析
処理における語の選択の際に、より正確な判定基準を与
えることができる。また、標準辞書を、一般的な自立語
が記憶された標準自立語辞書と、一般的な付属語が記憶
された標準付属語辞書とから構成し、複数の拡張辞書
を、少なくとも、特定分野に特有の自立語だけが記憶さ
れた複数の拡張自立語辞書から構成すれば、より正確な
判定基準を与えることができる。また、複数の拡張辞書
は、ＩＣカード、磁気ディスク、光磁気ディスク、光デ
ィスクなどの記憶媒体から構成してもよい。When there are a plurality of dictionaries to be searched,
By giving a priority to each dictionary, a more accurate determination criterion can be given when selecting a word in the morphological analysis process. In addition, the standard dictionary is composed of a standard independent word dictionary in which general independent words are stored and a standard auxiliary word dictionary in which general auxiliary words are stored. A more accurate judgment criterion can be given by constructing from a plurality of extended independent word dictionaries in which only unique independent words are stored. Further, the plurality of extended dictionaries may be composed of a storage medium such as an IC card, a magnetic disk, a magneto-optical disk, an optical disk.

【００１３】[0013]

【実施例】以下、本発明を図面に基づいて説明する。図
１は本発明の自然言語処理装置を適用した音声合成シス
テムの構成を示すブロック図である。図において、１
は、漢字かな混じり文を入力する入力部であり、例え
ば、キーボード、ＯＣＲ（光学的文字読み取り装置）、
磁気ディスク等からなる。また、辞書２は、例えば、Ｉ
Ｃメモリ、磁気ディスク等の記憶装置からなり、一般的
な自立語を記憶した標準自立語辞書２ａと、一般的な付
属語（非自立語以外の語を指す）を記憶した標準付属語
辞書２ｂとの２つを最小構成とし、特定分野（例えば、
計算機分野など）の自立語だけを記録した拡張自立語辞
書２ｃと、特定分野の付属語だけを記録した拡張付属語
辞書２ｄとを複数有する。各辞書には、形態素の基準と
なる単語の綴りや、その付属情報（例えば、読み、品詞
情報、アクセント等）等が記憶されている。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a speech synthesis system to which the natural language processing device of the present invention is applied. In the figure, 1
Is an input unit for inputting a mixture of Kanji and Kana, such as a keyboard, OCR (optical character reading device),
It consists of a magnetic disk. The dictionary 2 is, for example, I
A standard independent word dictionary 2a, which stores a general independent word, and a standard auxiliary word dictionary 2b, which stores general auxiliary words (indicating words other than non-independent words), including a storage device such as a C memory and a magnetic disk. And 2 are the minimum configurations, and specific fields (for example,
It has a plurality of extended independent-word dictionaries 2c in which only independent words in the computer field and the like, and an extended auxiliary-word dictionary 2d in which only auxiliary words in a specific field are recorded. Each dictionary stores the spelling of a word that is a reference of a morpheme, its accompanying information (for example, reading, part-of-speech information, accent, etc.).

【００１４】文章解析部３は、辞書検索・辞書管理部
４、形態素解析部５および発音記号生成部６からなる。
まず、辞書検索・辞書管理部４は、入力部１から入力さ
れた漢字かな混じり文中に含まれる語を辞書２に記憶さ
れた単語の綴りや、その付属情報に従って検索したり、
検索対象辞書の切替え、および優先順位の変更等を行な
う。The sentence analysis unit 3 comprises a dictionary search / dictionary management unit 4, a morpheme analysis unit 5 and a phonetic symbol generation unit 6.
First, the dictionary search / dictionary management unit 4 searches for words included in the kanji / kana mixture sentence input from the input unit 1 according to the spelling of the words stored in the dictionary 2 or the attached information thereof,
The search target dictionaries are switched and the priority order is changed.

【００１５】ここで、上述した辞書２および辞書検索・
辞書管理部４の構成について図２を参照して説明する。
辞書２は前述したように、標準自立語辞書２ａ、複数の
拡張自立語辞書２ｃ１，２ｃ２，…、および標準付属語
辞書２ｂ、複数の拡張付属語辞書２ｄ１，２ｄ２，…か
ら構成されている。また、辞書検索・辞書管理部４は、
辞書検索部４ａ、辞書管理部４ｂ、自立語辞書管理表４
ｃおよび付属語辞書管理表４ｄから構成されている。上
記標準自立語辞書２ａ、複数の拡張自立語辞書２ｃ１，
２ｃ２，…は、自立語辞書管理表４ｃに基づいて管理さ
れており、標準付属語辞書２ｂ、複数の拡張付属語辞書
２ｄ１，２ｄ２は、付属語辞書管理表４ｄに基づいて管
理されている。Here, the above-mentioned dictionary 2 and dictionary search /
The configuration of the dictionary management unit 4 will be described with reference to FIG.
As described above, the dictionary 2 is composed of the standard independent word dictionary 2a, a plurality of extended independent word dictionaries 2c1, 2c2, ..., And the standard auxiliary word dictionary 2b, and a plurality of extended auxiliary word dictionaries 2d1, 2d2 ,. Also, the dictionary search / dictionary management unit 4
Dictionary search unit 4a, dictionary management unit 4b, independent word dictionary management table 4
It is composed of c and the attached word dictionary management table 4d. The standard independent word dictionary 2a, a plurality of extended independent word dictionaries 2c1,
2c2, ... Are managed based on the independent word dictionary management table 4c, and the standard auxiliary word dictionary 2b and the plurality of extended auxiliary word dictionaries 2d1 and 2d2 are managed based on the auxiliary word dictionary management table 4d.

【００１６】自立語辞書管理表４ｃおよび付属語辞書管
理表４ｄは、現在使用している辞書と、その辞書の優先
順位とを管理しており、辞書管理部４ｂは利用者の指示
に従って、自立語辞書管理表４ｃおよび付属語辞書管理
表４ｄの各々を変更する。例えば、新しい辞書を使用す
る指示の場合には、自立語辞書管理表４ｃ、もしくは付
属語辞書管理表４ｄに追加する。また、ある辞書の使用
を止める指示の場合には、自立語辞書管理表４ｃ、もし
くは付属語辞書管理表４ｄから削除する。さらに、優先
順位を変更する場合には、自立語辞書管理表４ｃ、もし
くは付属語辞書管理表４ｄの優先順位を変更する。辞書
検索部４ａは、辞書検索対象文字列が与えられた際に、
自立語辞書管理表４ｃおよび付属語辞書管理表４ｄの各
々に登録された辞書を検索し、辞書の検索結果に検索し
た辞書の優先順位を付けて、図１に示す形態素解析部５
へ供給する。The independent word dictionary management table 4c and the auxiliary word dictionary management table 4d manage the currently used dictionary and the priority order of the dictionary, and the dictionary management unit 4b follows the instruction of the user and becomes independent. Each of the word dictionary management table 4c and the attached word dictionary management table 4d is changed. For example, in the case of an instruction to use a new dictionary, it is added to the independent word dictionary management table 4c or the auxiliary word dictionary management table 4d. Further, when an instruction to stop using a certain dictionary is given, the dictionary is deleted from the independent word dictionary management table 4c or the adjunct word dictionary management table 4d. Further, when changing the priority order, the priority order of the independent word dictionary management table 4c or the auxiliary word dictionary management table 4d is changed. The dictionary search unit 4a, when the dictionary search target character string is given,
The morpheme analysis unit 5 shown in FIG. 1 is searched by searching the dictionaries registered in the independent word dictionary management table 4c and the adjunct word dictionary management table 4d, prioritizing the searched dictionaries in the dictionary search results.
Supply to.

【００１７】次に、形態素解析部５は、入力部１から入
力された漢字かな混じり文と、辞書検索・辞書管理部４
で検索された語群とに基づいて、形態素の解析を行なっ
て、仮名文字列に変換した後、単語、文節毎に分解す
る。すなわち、日本語においては、英語のように単語が
分かち書きされていないことから、例えば、「米国産業
界」のような言葉は、「米国／産業・界」、「米／国産
／業界」のように２種類区分化し得る。このため、形態
素解析部５は、辞書２を参考にしながら、言葉の連続関
係および統計的性質を利用して、テキスト入力を単語、
文節毎に分解し、これにより単語、文節の境界を検出す
る。Next, the morphological analysis unit 5 includes the kanji / kana mixed sentence input from the input unit 1 and the dictionary search / dictionary management unit 4.
The morphemes are analyzed based on the word group retrieved in step S1, and converted into kana character strings, and then decomposed into words and phrases. That is, in Japanese, words are not separated into words like English, so for example, a word such as "US industry" is "US / industry / world" or "US / domestic / industry". Can be divided into two types. Therefore, the morphological analysis unit 5 refers to the dictionary 2 and uses the continuity of words and the statistical property to input a text input into a word,
The bunsetsu is decomposed into bunsetsu, and the boundaries of words and bunsetsus are detected.

【００１８】また、発音記号生成部６は、形態素解析部
５からのデータに基づき、発音記号列を生成する。次
に、音声単位記憶部７は、例えば、ＩＣメモリ、磁気デ
ィスク等の記憶装置からなり、音声単位が記憶されてい
る。音声単位は、各ＣＶ単位で表される合成音を生成す
る際に用いられる波形データからなる。この波形合成に
用いられる音声単位データは次のような構成からなる。The phonetic symbol generator 6 also generates a phonetic symbol string based on the data from the morpheme analyzer 5. Next, the voice unit storage unit 7 is composed of a storage device such as an IC memory and a magnetic disk, and stores voice units. The voice unit is composed of waveform data used when generating a synthetic sound represented by each CV unit. The voice unit data used for this waveform synthesis has the following configuration.

【００１９】音声単位データの有声部に関しては、実音
声の有声部分において上記複素ケプストラム分析を用い
て抽出された、１ピッチに対応するインパルスと単位応
答波形を一組として、この組を１つの音声単位データと
して必要なピッチ分だけ蓄えたものからなり、また、音
声単位データの無声部に関しては、実音声の無声部分の
波形を切り出してそのまま蓄えたものからなる。したが
って、音声単位データがＣＶ単位である場合には、１つ
の音声単位ＣＶの子音部Ｃが無声子音である時には無声
部分の切り出し波形と、インパルスと単位応答波形から
なる複数組によって、１つの音声単位データが構成さ
れ、また、１つの音声単位ＣＶの子音部Ｃが有声子音で
ある時には、インパルスと単位応答波形からなる複数組
のみによって１つの音声単位データが構成されることと
なる。Regarding the voiced part of the voice unit data, an impulse corresponding to one pitch and a unit response waveform extracted using the complex cepstrum analysis in the voiced part of the real voice are set as one set, and this set is set as one voice. The unit data is stored by a required pitch, and the unvoiced part of the voice unit data is formed by cutting out the waveform of the unvoiced part of the actual voice and storing it. Therefore, when the voice unit data is in CV units, when the consonant part C of one voice unit CV is an unvoiced consonant, a plurality of sets of a cutout waveform of an unvoiced portion and an impulse and a unit response waveform form one voice. When the unit data is configured, and when the consonant part C of one voice unit CV is a voiced consonant, one voice unit data is configured by only a plurality of pairs of impulses and unit response waveforms.

【００２０】次に、音声合成規則部８は、発音記号生成
部６から得られるデータと、音声単位記憶部７の情報
と、音韻・韻律規則等とから音声の合成波形パターンと
ピッチパターンとを得る。すなわち、音声合成規則部８
は、音声単位記憶部７からロードされた音声単位データ
を、テキスト入力に応じた順序で合成し、抑揚のない状
態の合成音声波形を得る。また、音声合成規則部８は、
所定の韻律規則に基づいて、テキスト入力を適当な長さ
で分割して、切り目（すなわち、ポーズからなる）を検
出する。このようにして、図３に示すように、例えば、
テキスト入力として文章、「きれいな花を山田さんから
もらいました」が入力された場合は（図３（Ａ））、当
該テキスト入力は、「きれいな」、「はな」、「やまだ
さんから」、「もらいました」に分解された後、「は
な」および「やまださんから」間にポーズが検出される
（図３（Ｂ））。Next, the voice synthesis rule unit 8 generates a voice synthesized waveform pattern and a pitch pattern from the data obtained from the phonetic symbol generation unit 6, the information of the voice unit storage unit 7, the phoneme / prosodic rule and the like. obtain. That is, the voice synthesis rule unit 8
The voice unit data loaded from the voice unit storage unit 7 is synthesized in the order according to the text input to obtain a synthesized voice waveform in which there is no intonation. Also, the voice synthesis rule unit 8
The text input is divided into appropriate lengths based on a predetermined prosody rule to detect cuts (ie, consisting of pauses). Thus, for example, as shown in FIG.
When the text "Give a beautiful flower from Mr. Yamada" is input as the text input (Fig. 3 (A)), the text input is "pretty", "hana", "from Yamada san", After being decomposed into "I got it", a pose is detected between "Hana" and "From Yamada-san" (Fig. 3 (B)).

【００２１】さらに、音声合成規則部８は、韻律規則お
よび各単語の基本アクセントに基づいて、各文節のアク
セントを検出する。すなわち、日本語の文節単体のアク
セントは、感覚的に仮名文字を単位として（以下、モー
ラと呼ぶ）高低の２レベルで表現することができる。こ
のとき、文節の内容等に応じて、文節のアクセント位置
を区別することができる。例えば、端、箸、橋は２モー
ラの単語で、それぞれのアクセントのない０型、アクセ
ントの位置が先頭のモーラにある１型、アクセントの位
置が２モーラ目にある２型に分類することができる。か
くして、この実施例において、音声合成規則部７は、テ
キスト入力の各文節を、１型、２型、０型、４型と分類
し（図３（Ｃ））、これにより文節単位でアクセントお
よびポーズを検出する。Further, the speech synthesis rule unit 8 detects the accent of each phrase based on the prosody rule and the basic accent of each word. That is, the accent of a Japanese phrase alone can be expressed sensuously in two levels, high and low, in units of kana characters (hereinafter referred to as mora). At this time, the accent position of the phrase can be distinguished according to the content of the phrase. For example, edges, chopsticks, and bridges are 2-mora words, and can be categorized into 0 type without accent, 1 type with accent position in the first mora, and 2 type with accent position in 2nd mora. it can. Thus, in this embodiment, the speech synthesis rule unit 7 classifies each phrase of the text input into 1 type, 2 type, 0 type, and 4 type (FIG. 3 (C)). Detect a pose.

【００２２】さらに、音声合成規則部８は、アクセント
およびポーズの検出結果に基づいて、テキスト入力全体
の抑揚を表す基本ピッチパターンを生成する。すなわ
ち、日本語においては、文節のアクセントは、感覚的に
２レベルで表し得るのに対し、実際の抑揚は、アクセン
トの位置から徐々に低下する特徴がある（図３
（Ｄ））。Further, the voice synthesis rule unit 8 generates a basic pitch pattern representing the intonation of the entire text input, based on the accent and pause detection results. That is, in Japanese, the accent of a bunsetsu can be sensuously expressed in two levels, while the actual intonation is characterized by gradually decreasing from the position of the accent (FIG. 3).
(D)).

【００２３】さらに、日本語においては、文節が連続し
て１つの文章になると、ポーズから続くポーズに向っ
て、抑揚が徐々に低下する特徴がある（図３（Ｅ））。
したがって、音声合成規則部８は、かかる日本語の特徴
に基づいて、テキスト入力全体の抑揚を表すパラメータ
を各モーラ毎に生成した後、人間が発声した場合と同様
に抑揚が滑らかに変化するように、モーラ間に補間によ
りパラメータを設定する。かくして、音声合成規則部８
は、テキスト入力に応じた順序で、各モーラのパラメー
タおよび補間したパラメータを合成し（以下、ピッチパ
ターンと呼ぶ）、かくしてテキスト入力を読み上げた音
声の抑揚を表すピッチパターン（図３（Ｆ））を得るこ
とができる。Furthermore, in Japanese, when the bunsetsu becomes one sentence in a row, the intonation gradually decreases from one pose to another (FIG. 3 (E)).
Therefore, the speech synthesis rule unit 8 generates a parameter representing the intonation of the entire text input for each mora based on the characteristics of the Japanese language, and then the intonation changes smoothly as if a human uttered. Then, parameters are set by interpolation between mora. Thus, the voice synthesis rule unit 8
Synthesizes the parameters of each mora and the interpolated parameters in the order according to the text input (hereinafter referred to as the pitch pattern), and thus the pitch pattern representing the intonation of the voice read out from the text input (FIG. 3 (F)). Can be obtained.

【００２４】次に、音声合成部９は、音声合成規則部８
から得られたデータ（合成波形データおよびピッチパタ
ーン）に基づいて音声波形の合成を行なう。この波形合
成処理は、次のようなことを行なっている。合成音声の
有声部分においては、合成波形データ内のインパルスを
ピッチパターンに基づいて並べ、その並べられたインパ
ルスそれぞれに対応する単位応答波形を各インパルスに
重畳する。Next, the voice synthesizing unit 9 has a voice synthesizing rule unit 8
A voice waveform is synthesized based on the data (synthesis waveform data and pitch pattern) obtained from the above. The waveform synthesizing process is as follows. In the voiced part of the synthetic speech, impulses in the synthetic waveform data are arranged based on a pitch pattern, and a unit response waveform corresponding to each of the arranged impulses is superimposed on each impulse.

【００２５】また、合成音声の無声部分においては、合
成波形データ内の切り出し波形をそのまま所望の合成音
声の波形とする。これにより、ピッチパターンの変化に
追従して抑揚の変化する合成音を得ることができる。し
たがって、合成音において、音源情報にインパルスを用
いているため、合成音のピッチ周期が伸縮しても、それ
による音源情報への影響はほとんどなく、ピッチパター
ンが大きく変化するような場合でも、スペクトル包絡に
歪みが生じることなく、人間の音声に近い高品質な任意
合成音が得られる。次に、出力部１０は、音声合成部９
で合成された音声波形を例えば、スピーカ、磁気ディス
ク等へ出力する。In the unvoiced part of the synthetic voice, the cut-out waveform in the synthetic waveform data is used as it is as the waveform of the desired synthetic voice. As a result, it is possible to obtain a synthetic sound in which the intonation changes according to the change in the pitch pattern. Therefore, since the impulse is used for the sound source information in the synthesized sound, even if the pitch period of the synthesized sound expands or contracts, it has almost no effect on the sound source information, and even if the pitch pattern changes greatly, the spectrum It is possible to obtain a high-quality arbitrary synthesized voice that is similar to human voice without causing distortion in the envelope. Next, the output unit 10 outputs the voice synthesis unit 9
The voice waveform synthesized in step (3) is output to, for example, a speaker, a magnetic disk, or the like.

【００２６】次に、本実施例における自然言語処理装置
の構成について図４を参照して説明する。図４は、上記
自然言語処理装置の構成を示す模式図である。図におい
て、自然言語処理装置１１は、内部に、標準自立語辞書
２ａおよび標準付属語辞書２ｂと、自立語辞書管理表４
ｃおよび付属語辞書管理表４ｄとを備えており、拡張自
立語辞書２ｃ１，２ｃ２，…や、拡張付属語辞書２ｄ
１，２ｄ２，…を拡張辞書読み取り装置１２によって読
み取る。Next, the structure of the natural language processing apparatus in this embodiment will be described with reference to FIG. FIG. 4 is a schematic diagram showing the configuration of the natural language processing device. In the figure, the natural language processing device 11 internally includes a standard independent word dictionary 2a, a standard auxiliary word dictionary 2b, and an independent word dictionary management table 4
c and the auxiliary word dictionary management table 4d, and the extended independent word dictionaries 2c1, 2c2, ... And the extended auxiliary word dictionary 2d.
1, 2d2, ... Are read by the extended dictionary reading device 12.

【００２７】ここで、拡張自立語辞書２ｃ１を追加し、
自立語辞書管理表４ｃを更新する手順の一例を以下に説
明する。１）自然言語処理装置１１は、内蔵する標準自立語辞書
２ａと標準付属語辞書２ｂを、各々、自立語辞書管理表
４ｃおよび付属語辞書管理表４ｄに登録する。この時点
では、標準自立語辞書２ａと標準付属語辞書２ｂとは、
共に優先順位が「１」である。２）拡張自立語辞書２ｃ１を拡張辞書読み取り装置１２
に挿入する。３）拡張辞書読み取り装置１２への辞書挿入を検出し、
拡張自立語辞書２ｃ１を読む。４）拡張自立語辞書２ｃ１であることを判別し、自立語
辞書管理表４ｃを更新する。Here, the extended independent word dictionary 2c1 is added,
An example of the procedure for updating the independent word dictionary management table 4c will be described below. 1) The natural language processing device 11 registers the built-in standard independent word dictionary 2a and standard auxiliary word dictionary 2b in the independent word dictionary management table 4c and the auxiliary word dictionary management table 4d, respectively. At this point, the standard independent word dictionary 2a and the standard auxiliary word dictionary 2b are
Both have a priority of "1". 2) The extended independent word dictionary 2c1 is used as the extended dictionary reading device 12
To insert. 3) Detecting dictionary insertion into the extended dictionary reading device 12,
Read the extended independent word dictionary 2c1. 4) It is determined that it is the extended independent word dictionary 2c1, and the independent word dictionary management table 4c is updated.

【００２８】また、拡張辞書が標準辞書よりも優先され
る機構になっている場合には、標準自立語辞書２ａの優
先順位は「２」に変更され、拡張自立語辞書２ｃ１の優
先順位は「１」として自立語辞書管理表４ｃに追加登録
される。また、複数の辞書読み取り装置が内蔵された自
然言語処理装置の場合、各辞書読み取り装置に優先順位
を与えておき、拡張辞書の優先順位は辞書読み取り装置
に与えられた優先順位に従うようにしてもよい。この場
合は、各辞書読み取り装置の優先順位の切替えを行なう
ための機構を設ければよい。When the extended dictionary has a mechanism giving priority over the standard dictionary, the priority of the standard independent word dictionary 2a is changed to "2", and the priority of the extended independent word dictionary 2c1 is "2". 1 ”is additionally registered in the independent word dictionary management table 4c. Also, in the case of a natural language processing device having a plurality of dictionary reading devices built-in, each dictionary reading device is given a priority order, and the priority order of the extended dictionary is made to follow the priority order given to the dictionary reading device. Good. In this case, a mechanism for switching the priority of each dictionary reading device may be provided.

【００２９】例えば、図３に示すように、自然言語処理
装置において、上段の辞書読み取り装置１２ａの優先順
位に「１」、下段の辞書読み取り装置１２ｂに「２」、
内蔵辞書の読み取り装置（図示略）に「３」を与えたと
する。拡張辞書読み取り装置１２ａ，１２ｂに拡張辞書
が挿入されていない場合には、標準辞書の他には辞書が
ないため、標準辞書の優先順位を「１」とする。一方、
下段の辞書読み取り装置１２ｂに拡張辞書が挿入された
場合には、拡張辞書の優先順位を「１」に、標準辞書の
優先順位を「２」とする。For example, as shown in FIG. 3, in the natural language processing device, the priority of the upper dictionary reading device 12a is "1", and the lower dictionary reading device 12b is "2".
It is assumed that "3" is given to the built-in dictionary reading device (not shown). When the extended dictionary is not inserted in the extended dictionary reading devices 12a and 12b, there is no dictionary other than the standard dictionary, so the priority of the standard dictionary is set to "1". on the other hand,
When the extended dictionary is inserted in the lower dictionary reading device 12b, the priority of the extended dictionary is set to "1" and the priority of the standard dictionary is set to "2".

【００３０】次に、本実施例の自然言語処理装置の動作
について図５を参照して説明する。図５は、本実施例に
おける自然言語処理装置１１の形態素解析処理の一例を
示すフローチャートである。上述した構成において、ま
ず、入力するテキスト文の分野等に応じて、拡張自立語
辞書２ｃ１，２ｃ２，…や、拡張付属語辞書２ｄ１，２
ｄ２，…を拡張辞書読み取り装置１２によって読み取ら
せる。次に、辞書検索・辞書管理部４によって、内蔵の
標準自立語辞書２ａ、標準付属語辞書２ｂ、拡張自立語
辞書２ｃ１，２ｃ２，…、および拡張付属語辞書２ｄ
１，２ｄ２，…の優先順位を、上述した手順に従って、
自立語辞書管理表４ｃおよび付属語辞書管理表４ｄに設
定する。Next, the operation of the natural language processing apparatus of this embodiment will be described with reference to FIG. FIG. 5 is a flowchart showing an example of the morpheme analysis processing of the natural language processing device 11 in this embodiment. In the above-mentioned configuration, first, the extended independent word dictionaries 2c1, 2c2, ..., The extended auxiliary word dictionaries 2d1, 2d1, 2 are depending on the field of the input text sentence.
The extended dictionary reading device 12 reads d2, .... Next, by the dictionary search / dictionary management unit 4, the built-in standard independent word dictionary 2a, standard auxiliary word dictionary 2b, extended independent word dictionary 2c1, 2c2, ..., And extended auxiliary word dictionary 2d.
1, 2d2, ... Priority is set according to the procedure described above.
It is set in the independent word dictionary management table 4c and the auxiliary word dictionary management table 4d.

【００３１】次に、入力部１から入力されたテキスト文
は、文章解析部３の辞書検索部・辞書管理部４によっ
て、自立語辞書管理表４ｃ、もしくは付属語辞書管理表
４ｄに設定された優先順位に従って、辞書２（標準自立
語辞書２ａ、標準付属語辞書２ｂ、拡張自立語辞書２ｃ
１，２ｃ２，…、および拡張付属語辞書２ｄ１，２ｄ
２，…）に記憶された単語の綴りや、その付属情報に従
って検索され、その結果、語群が抽出される。Next, the text sentence input from the input unit 1 is set in the independent word dictionary management table 4c or the auxiliary word dictionary management table 4d by the dictionary search unit / dictionary management unit 4 of the sentence analysis unit 3. According to the priority order, the dictionary 2 (standard independent word dictionary 2a, standard auxiliary word dictionary 2b, extended independent word dictionary 2c
1, 2c2, ..., and extended auxiliary word dictionaries 2d1, 2d
2, ...) are searched in accordance with the spelling of the words stored in (2, ...) And the attached information, and as a result, the word group is extracted.

【００３２】次に、形態素解析部５により、入力部１か
ら入力された漢字かな混じり文と、辞書検索部４ａで検
索された語群に対して形態素解析が行なわれ、仮名文字
列に変換された後、単語、文節毎に分解される。ここ
で、辞書検索結果から形態素解析結果を得るための形態
素解析について図７を参照して説明する。図７は辞書検
索結果から形態素解析結果を得るための本実施例の形態
素解析手順の一例を示すフローチャートである。Next, the morpheme analysis unit 5 performs morpheme analysis on the kanji / kana mixed sentence input from the input unit 1 and the word group searched by the dictionary search unit 4a, and converts it into a kana character string. After that, it is decomposed into words and phrases. Here, the morphological analysis for obtaining the morphological analysis result from the dictionary search result will be described with reference to FIG. 7. FIG. 7 is a flowchart showing an example of a morpheme analysis procedure of this embodiment for obtaining a morpheme analysis result from a dictionary search result.

【００３３】まず、ステップＳ１において、形態素の接
続判定のための注視点を設定する。初めて、本形態素解
析処理が実行される場合には、注視点は文頭に置かれ
る。次に、ステップＳ２において、文頭の形態素候補群
を取得し、仮説群を生成する。そして、ステップＳ３で
は、注視点を１つ先にずらし、注視点が文末かどうかの
判定を行なう。ステップＳ３において、注視点が文末で
ないならばステップＳ４へ進む。First, in step S1, a gazing point for morpheme connection determination is set. When this morphological analysis process is executed for the first time, the gazing point is placed at the beginning of the sentence. Next, in step S2, a morpheme candidate group at the beginning of a sentence is acquired and a hypothesis group is generated. Then, in step S3, the gazing point is shifted by one, and it is determined whether or not the gazing point is the end of a sentence. If the gazing point is not at the end of the sentence in step S3, the process proceeds to step S4.

【００３４】ステップＳ４では、注視点から始る形態素
と接続を行なう必要のある仮説が存在するか否かを判断
する。そして、ステップＳ４における判断結果が「Ｎ
Ｏ」の場合、すなわち注視点から始る形態素と接続を行
なう必要のある仮説が存在しない場合は、ステップＳ３
へ戻り、注視点を１つ先にずらしながら、注視点から始
る形態素と接続を行なう必要のある仮説を検索し、ステ
ップＳ４における判断結果が「ＹＥＳ」となると、すな
わち注視点から始る形態素と接続を行なう必要のある仮
説が存在すると（この時の注視点を接続点と呼ぶことに
する）、ステップＳ５へ進み、注視点から始る形態素群
を取得する。In step S4, it is determined whether or not there is a hypothesis that needs to be connected to the morpheme starting from the gazing point. Then, the determination result in step S4 is "N
If “O”, that is, if there is no hypothesis that needs to be connected to the morpheme starting from the gazing point, step S3
Returning to the above, while searching for a hypothesis that requires connection with a morpheme starting from the gazing point while shifting the gazing point forward by one, and the judgment result in step S4 is "YES", that is, the morpheme starting from the gazing point. If there is a hypothesis that needs to be connected (the gaze point at this time is called a connection point), the process proceeds to step S5, and a morpheme group starting from the gaze point is acquired.

【００３５】次に、ステップＳ６において、注視点が接
続点である各仮説について、ステップＳ５で取得した形
態素群との接続可否判定を行なう。接続可能な形態素が
存在する場合には、ステップＳ７へ進み、その形態素に
より仮説を更新する。なお、接続可能な形態素が複数存
在する場合には、新しい仮説を生成する。一方、ステッ
プＳ６において、接続可能な形態素が存在しない場合に
は、ステップＳ８へ進み、その仮説を削除する。Next, in step S6, it is determined whether or not each hypothesis whose gazing point is a connection point can be connected to the morpheme group acquired in step S5. If there is a connectable morpheme, the process proceeds to step S7, and the hypothesis is updated by the morpheme. If there are multiple connectable morphemes, a new hypothesis is generated. On the other hand, if there is no connectable morpheme in step S6, the process proceeds to step S8 and the hypothesis is deleted.

【００３６】以下、該当する全ての仮説について上述し
た処理を行なう。そして、ステップＳ３において、注視
点が文末にくると、解析を終了し、ステップＳ９へ進
み、解析結果を出力する。なお、仮説が複数存在するた
めに仮説を絞り込む必要が生じた場合には、ステップＳ
３〜Ｓ８の任意の処理中において、最長一致原理等の種
々の規範を適用して絞り込みを行なう。Hereinafter, the above-mentioned processing is performed for all applicable hypotheses. Then, in step S3, when the gazing point comes to the end of the sentence, the analysis ends, and the process proceeds to step S9 to output the analysis result. If it is necessary to narrow down the hypotheses because there are multiple hypotheses, step S
During the arbitrary processing of 3 to S8, narrowing down is performed by applying various norms such as the longest matching principle.

【００３７】次に、図１に示す発音記号生成部６におい
て、上述した解析結果である各形態素に発音が付与され
る。すなわち、各形態素が持つ読み、アクセント型を発
音記号に変換する。このようにして、文章解析部３は、
テキスト入力を所定の辞書２を基準にして解析し、仮名
文字列に変換し、単語、文節毎に分解した後、各単語毎
に基本アクセントを検出し、これらを音声合成規則部８
に出力する。Next, in the phonetic symbol generator 6 shown in FIG. 1, a pronunciation is given to each morpheme which is the above-mentioned analysis result. That is, the reading and accent types of each morpheme are converted into phonetic symbols. In this way, the sentence analysis unit 3
The text input is analyzed based on a predetermined dictionary 2, converted into a kana character string, decomposed into words and phrases, and then a basic accent is detected for each word, and these are synthesized into a speech synthesis rule unit 8
Output to.

【００３８】この単語、文節の境界および基本アクセン
トの検出結果は、音声合成規則部８で、所定の音韻規則
に従って処理され、抑揚のない状態でテキスト入力を読
み上げた音声を表す合成波形データが生成される。さら
に、単語、文節の境界および基本アクセントの検出結果
は、音声合成規則部８で、所定の韻律規則に従って処理
され、テキスト入力全体の抑揚を表すピッチパターンが
生成される。ピッチパターンは、合成波形データととも
に、音声合成部９に出力され、ここで、ピッチパターン
および合成波形データに基づいて合成音が生成された
後、出力部１０によって、スピーカで発音される。もし
くは、上記合成波形データを磁気ディスクへ記憶しても
よい。The detection results of the words and the boundaries of the clauses and the basic accents are processed by the speech synthesis rule section 8 in accordance with a predetermined phonological rule, and synthetic waveform data representing a speech in which the text input is read aloud without inflection is generated. To be done. Further, the detection result of the word and phrase boundaries and the basic accent is processed by the speech synthesis rule unit 8 in accordance with a predetermined prosody rule, and a pitch pattern representing the intonation of the entire text input is generated. The pitch pattern is output to the voice synthesizing unit 9 together with the synthetic waveform data, and after the synthetic sound is generated based on the pitch pattern and the synthetic waveform data, the output unit 10 produces the sound by the speaker. Alternatively, the composite waveform data may be stored in the magnetic disk.

【００３９】上述した構成によれば、一般的なテキスト
文の場合には、標準自立語辞書と標準付属語辞書だけ、
特定分野のテキスト文の場合には、それに加えて、特定
分野に合った拡張自立語辞書と拡張付属語辞書とを追加
する。なお、拡張付属語辞書は不用な場合もある。ま
た、異なる分野の場合には、拡張自立語辞書と拡張付属
語辞書（拡張付属語辞書は不要な場合もある）を取り替
える。このように、辞書を必要最小限の構成にすること
ができる。According to the above-mentioned configuration, in the case of a general text sentence, only the standard independent word dictionary and the standard auxiliary word dictionary are
In the case of a text sentence of a specific field, in addition to that, an extended independent word dictionary and an extended auxiliary word dictionary suitable for the specific field are added. The extended auxiliary word dictionary may not be needed. In the case of different fields, the extended independent word dictionary and the extended auxiliary word dictionary (the extended auxiliary word dictionary may not be necessary) are replaced. In this way, the dictionary can be made to have the minimum necessary configuration.

【００４０】また、検索する辞書が複数ある場合には、
各辞書中の各語彙は、当然のことながら、さらに、各辞
書毎に優先順位を付与することにより、形態素解析処理
における語の選択の際に、より正確な判定基準を与える
ことができる。さらに、標準付属語辞書の他に、拡張付
属語辞書を設けることにより、特定分野に特有の言い回
しに対しても、正確な形態素解析を行なうことができ
る。If there are a plurality of dictionaries to be searched,
As a matter of course, each vocabulary in each dictionary is given a priority for each dictionary, so that a more accurate determination criterion can be given when selecting a word in the morphological analysis process. Further, by providing an extended auxiliary word dictionary in addition to the standard auxiliary word dictionary, it is possible to perform accurate morphological analysis even for a phrase peculiar to a specific field.

【００４１】[0041]

【発明の効果】本発明によれば、一般的な語を記憶した
標準辞書と、特定分野の語を記憶した拡張辞書を複数用
意し、標準辞書と拡張辞書との組合わせ方を解析対象文
に合せて変更することで、比較的小規模のシステムで多
様な文の解析が可能となる。また、複数の辞書を用いる
場合に、各辞書に優先順位を与え、その優先順位と各語
彙に与えられた優先順位によって、辞書検索により得ら
れた各語彙の優先順位を更新し、その優先順位を形態素
解析に用いることで、利用者の意図する解析結果が容易
に得ることができるという利点が得られる。According to the present invention, a standard dictionary in which general words are stored and a plurality of extended dictionaries in which words in a specific field are stored are prepared, and a combination method of the standard dictionary and the extended dictionary is analyzed. It is possible to analyze various sentences with a relatively small-scale system by changing it according to. Also, when using multiple dictionaries, give priority to each dictionary, and update the priority of each vocabulary obtained by dictionary search based on that priority and the priority given to each vocabulary, and then The use of for the morphological analysis has an advantage that the analysis result intended by the user can be easily obtained.

[Brief description of drawings]

【図１】本発明における一実施例である自然語処理装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a natural language processing apparatus that is an embodiment of the present invention.

【図２】同実施例における辞書検索・辞書管理部と辞書
との構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a dictionary search / dictionary management unit and a dictionary in the embodiment.

【図３】同実施例における基本ピッチパターンの生成手
順を示す略線図である。FIG. 3 is a schematic diagram showing a procedure of generating a basic pitch pattern in the embodiment.

【図４】同実施例における自然語処理装置の外観構成を
示す斜視図である。FIG. 4 is a perspective view showing an external configuration of a natural language processing device in the embodiment.

【図５】同実施例における日本語処理装置の形態素解析
処理の一例を示すフローチャートである。FIG. 5 is a flowchart showing an example of a morphological analysis process of the Japanese language processing apparatus in the embodiment.

[Explanation of symbols]

１入力部２辞書２ａ標準自立語辞書（標準辞書）２ｂ標準付属語辞書（標準辞書）２ｃ拡張自立語辞書（拡張辞書）２ｄ拡張付属語辞書（拡張辞書）３文章解析部４辞書検索・辞書管理部（検索抽出手段、優先順位設
定手段）５形態素解析部（形態素解析手段）６発音記号生成部７音声単位記憶部８音声合成規則部９音声合成部１０出力部１２拡張辞書読み取り装置（辞書読み取り手段）1 input unit 2 dictionary 2a standard independent word dictionary (standard dictionary) 2b standard adjunct word dictionary (standard dictionary) 2c extended independent word dictionary (extended dictionary) 2d extended adjunct word dictionary (extended dictionary) 3 sentence analysis unit 4 dictionary search / dictionary Management unit (search extraction unit, priority setting unit) 5 Morphological analysis unit (Morphological analysis unit) 6 Phonetic symbol generation unit 7 Speech unit storage unit 8 Speech synthesis rule unit 9 Speech synthesis unit 10 Output unit 12 Extended dictionary reading device (dictionary (Reading means)

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ１０Ｌ 5/02 Ｊ ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification code Office reference number FI technical display location G10L 5/02 J

Claims

[Claims]

1. A natural language processing device for performing morphological analysis of kanji and kana mixed sentences, a standard dictionary storing general vocabulary, and a plurality of detachable extension dictionaries storing only vocabulary peculiar to a specific field. , A dictionary reading means for reading the specific vocabulary, and a combination of the standard dictionary and the plurality of extended dictionaries in accordance with the search target character string of the kanji / kana mixed sentence, and searching the combined dictionary And a morphological analysis unit that performs morphological analysis on the candidate word group of the search target character string, the Kanji / Kana mixed sentence, and the candidate word group that is the result of the search and extraction unit. A natural language processing device comprising:

2. The standard dictionary includes a standard independent word dictionary in which general independent words are stored and a standard auxiliary word dictionary in which general auxiliary words are stored, and the plurality of extended dictionaries are 2. The natural language processing device according to claim 1, wherein the natural language processing device comprises at least a plurality of extended independent word dictionaries in which only independent words peculiar to a specific field are stored.

3. A priority order setting means for setting a priority order to the standard dictionary and the extended dictionary, wherein the search and extraction means is set by the priority order setting means for each of the searched candidate word groups. A morphological analysis is performed by assigning a priority, and the morpheme analysis unit gives a predetermined priority to a word having a high priority order among the candidate word group extracted by the search and extraction unit. Item 1. The natural language processing device according to item 1.

4. The natural language processing apparatus according to claim 1, wherein the plurality of extended dictionaries are storage media such as IC cards, magnetic disks, magneto-optical disks, and optical disks.