JP2009103921A

JP2009103921A - Abbreviated word determining apparatus, computer program, text analysis apparatus, and speech synthesis apparatus

Info

Publication number: JP2009103921A
Application number: JP2007275651A
Authority: JP
Inventors: Hideki Kojima; 英樹小島
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2007-10-23
Filing date: 2007-10-23
Publication date: 2009-05-14
Anticipated expiration: 2027-10-23
Also published as: JP5125404B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an abbreviated-word determining apparatus for determining whether text data is an abbreviated word, a computer program for realizing the abbreviated word determination apparatus by a computer, a text analysis apparatus having the abbreviated-word determining apparatus, and a speech synthesis apparatus having the text analysis apparatus. <P>SOLUTION: A control part 1 divides text data into morphemes, based on the content of the registration in a language dictionary 4a, and assigns accent type to each of the morphemes. The control part 1 determines whether a morpheme that is not registered in the language dictionary 4a (an unknown word) is an abbreviated word, based on the content of the registration in a personal name dictionary 4b or a compound word dictionary 4c. The control part 1 assigns different accent types to the unknown word which is determined as being an abbreviated word, and the unknown word determined as not being an abbreviated word. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、テキストデータが省略語であるか否かを判定する省略語判定装置、該省略語判定装置をコンピュータによって実現するためのコンピュータプログラム、前記省略語判定装置を備えたテキスト解析装置、及び該テキスト解析装置を備えた音声合成装置に関する。 The present invention relates to an abbreviation determination device that determines whether text data is an abbreviation, a computer program for realizing the abbreviation determination device by a computer, a text analysis device including the abbreviation determination device, and The present invention relates to a speech synthesizer provided with the text analysis device.

テキストデータから音声を合成するテキスト音声合成技術は、例えば、ＩＶＲ（自動音声応答：Interactive Voice Response）システム、車載情報端末及び携帯電話等における操作方法の音声ガイダンス、電子メールの読み上げ、視覚障害者・発話障害者の支援システム等に適用されている。 Text-to-speech synthesis technology that synthesizes speech from text data includes, for example, IVR (Automatic Voice Response) system, in-vehicle information terminal, voice guidance of operation method in mobile phone, etc., reading out e-mail, It is applied to support systems for the speech disabled.

従来のテキスト音声合成装置は、形態素及び各形態素のアクセント型を対応付けて記憶する言語辞書が予め用意されており、入力されたテキストデータを言語辞書の登録内容に基づいて形態素に分割し、分割したそれぞれの形態素に対してアクセント型を付与する。また、従来のテキスト音声合成装置は、分割した形態素及び各形態素に付与されたアクセント型に基づいて、各形態素に対応する韻律を所定の韻律生成ルールに従って生成し、生成した韻律を音声波形に変換して合成音声を取得する。 In the conventional text-to-speech synthesizer, a language dictionary for storing morphemes and accent types of each morpheme is prepared in advance, and the input text data is divided into morphemes based on the registered contents of the language dictionary. Accent type is given to each morpheme. In addition, the conventional text-to-speech synthesizer generates prosody corresponding to each morpheme according to predetermined prosody generation rules based on the divided morphemes and the accent type assigned to each morpheme, and converts the generated prosody into a speech waveform To obtain synthesized speech.

このような従来のテキスト音声合成装置において、言語辞書に登録されていない形態素がテキストデータに含まれる場合、形態素に分割する際に誤った位置で分割されてしまう虞があり、形態素にアクセント型を付与する際に誤ったアクセント型が付与されてしまう虞がある。このように、誤った位置で形態素に分割された場合、又は誤ったアクセント型が付与された場合、正しい合成音声を生成することが困難であった。 In such a conventional text-to-speech synthesizer, if text data contains morphemes that are not registered in the language dictionary, there is a possibility that the morphemes may be divided at wrong positions when they are divided. There is a possibility that an incorrect accent type may be given when it is given. Thus, when divided into morphemes at an incorrect position, or when an incorrect accent type is given, it is difficult to generate a correct synthesized speech.

また、従来のテキスト音声合成装置では、テキストデータを形態素に分割する際に、言語辞書に登録されていない形態素を抽出した場合、この形態素を未知語として分割し、各形態素（未知語）に、例えば後ろから３モーラ目にアクセント核を有するアクセント型を付与するように構成されている場合が多い。これは、「オーストラリア」、「チェルノブイリ」のような外来語は、後ろから３モーラ目にアクセント核を有するものが多いからである。 Further, in the conventional text-to-speech synthesizer, when dividing text data into morphemes, if morphemes that are not registered in the language dictionary are extracted, the morphemes are divided as unknown words, and each morpheme (unknown word) For example, in many cases, an accent type having an accent nucleus is assigned to the third mora from the back. This is because many foreign words such as “Australia” and “Chernobyl” have an accent nucleus in the third mora from the back.

従来より、マツケン（登録商標）（松平健）、キムタク（木村拓也）、コスプレ（コスチュームプレイ）、地デジ（地上デジタル放送）、連ドラ（連続ドラマ）等の省略語が多用されている。このような省略語は言語辞書に登録されていないため、従来のテキスト音声合成装置では、未知語として扱う場合が多く、上述したように例えば後ろから３モーラ目にアクセント核を有するアクセント型を付与していた。しかし、このような省略語は、平板型（ゼロ型）のアクセント型である場合が多く、後ろから３モーラ目にアクセント核を有するアクセント型を付与した場合、正しい合成音声を生成することはできず、訛ったような読み方の合成音声を生成する可能性があった。 Conventionally, abbreviations such as Matsuken (registered trademark) (Ken Matsuhira), Kim Taku (Takuya Kimura), Cosplay (costume play), Terrestrial digital (terrestrial digital broadcasting), serial drama (continuous drama), etc. have been frequently used. Since such abbreviations are not registered in the language dictionary, the conventional text-to-speech synthesizer often handles them as unknown words. As described above, for example, an accent type having an accent nucleus is assigned to the third mora from the back. Was. However, such abbreviations are often flat-type (zero-type) accent types, and if an accent type with an accent kernel is given to the third mora from the back, correct synthesized speech cannot be generated. First, there was a possibility of generating synthesized speech that was read like a whisper.

特許文献１には、省略語に対する正式名称を登録した正式名称辞書と、正式名称から予測される省略語を登録した省略語辞書とを予め用意しておき、テキスト文書中に省略語辞書に登録された省略語を検出した場合に、この省略語を、対応する正式名称に変換する装置が開示されている。このような装置を用いた場合、正式名称辞書に登録されていない省略語がテキスト文書中に含まれる場合であっても、未知語として扱わずに、省略語として適切に扱うことができる。
特開２００４−３２６３６７号公報 In Patent Document 1, an official name dictionary in which official names for abbreviations are registered and an abbreviation dictionary in which abbreviations predicted from official names are prepared are registered in the abbreviation dictionary in a text document. An apparatus is disclosed that converts an abbreviation into a corresponding formal name when an abbreviation is detected. When such an apparatus is used, even if an abbreviation not registered in the official name dictionary is included in the text document, it can be appropriately handled as an abbreviation without being treated as an unknown word.
JP 2004-326367 A

上述した特許文献１のように省略語辞書を用いることにより、省略語辞書に登録されている省略語については、テキスト文書を形態素に分割する際に正しい形態素に分割することができると共に、正しいアクセント型を付与することができるので、正しい合成音声を生成することができる。しかし、省略語は日々新しい言葉が出てくるので、新しい省略語を省略語辞書に逐次登録することは不可能であるという問題を有する。 By using an abbreviation dictionary as in Patent Document 1 described above, abbreviations registered in the abbreviation dictionary can be divided into correct morphemes when a text document is divided into morphemes, and correct accents Since a type can be given, a correct synthesized speech can be generated. However, since abbreviated words come out every day, it is impossible to register new abbreviated words in the abbreviation dictionary one after another.

本発明は斯かる事情に鑑みてなされたものであり、その目的とするところは、テキストデータが人名を省略して生成された省略語であるか否かを容易に判定することが可能な省略語判定装置、該省略語判定装置をコンピュータによって実現するためのコンピュータプログラム、前記省略語判定装置を備えたテキスト解析装置及び該テキスト解析装置を備えた音声合成装置を提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to omit whether text data is an abbreviation generated by omitting a person's name or not. An object of the present invention is to provide a word determination device, a computer program for realizing the abbreviation determination device by a computer, a text analysis device including the abbreviation determination device, and a speech synthesizer including the text analysis device.

本発明の他の目的は、テキストデータが複合語を省略して生成された省略語であるか否かを容易に判定することが可能な省略語判定装置、該省略語判定装置をコンピュータによって実現するためのコンピュータプログラム、前記省略語判定装置を備えたテキスト解析装置及び該テキスト解析装置を備えた音声合成装置を提供することにある。 Another object of the present invention is to realize an abbreviation determination device that can easily determine whether text data is an abbreviation generated by omitting a compound word, and the abbreviation determination device realized by a computer. The present invention provides a computer program, a text analysis device including the abbreviation determination device, and a speech synthesis device including the text analysis device.

本発明に係る省略語判定装置は、テキストデータが省略語であるか否かを判定する省略語判定装置において、人名に用いられる姓及び名をそれぞれ人名記憶手段に記憶しており、テキストデータから先頭の所定数の文字データを抽出し、抽出した文字データを先頭に有する姓が人名記憶手段に記憶してあるか否かを判断する。また、省略語判定装置は、前記姓が人名記憶手段に記憶してあると判断した場合、前記抽出した文字データを除いた前記テキストデータから、先頭の所定数の文字データを抽出し、抽出した文字データを先頭に有する名が人名記憶手段に記憶してあるか否かを判断する。前記名が人名記憶手段に記憶してあると判断した場合、省略語判定装置は、前記テキストデータが省略語であると判定する。よって、テキストデータが、人名の姓及び名のそれぞれの先頭から所定数の文字を抽出して生成された省略語であるか否かを容易に判定することが可能となる。 The abbreviation determination apparatus according to the present invention is an abbreviation determination apparatus for determining whether text data is an abbreviation or not. In the abbreviation determination apparatus, a surname and a first name used for a person's name are respectively stored in a person name storage means. A predetermined number of character data at the beginning is extracted, and it is determined whether or not the surname having the extracted character data at the beginning is stored in the personal name storage means. Further, when the abbreviation determination device determines that the last name is stored in the personal name storage unit, the abbreviation determination device extracts and extracts a predetermined number of character data at the beginning from the text data excluding the extracted character data. It is determined whether a name having character data at the head is stored in the personal name storage means. If it is determined that the name is stored in the personal name storage means, the abbreviation determination device determines that the text data is an abbreviation. Therefore, it is possible to easily determine whether or not the text data is an abbreviation generated by extracting a predetermined number of characters from the head of each surname and first name of the person name.

本発明に係る省略語判定装置は、テキストデータが省略語であるか否かを判定する省略語判定装置において、複数の複合語及び各複合語を構成する構成語を対応付けて複合語記憶手段に記憶しており、テキストデータから先頭の所定数の文字データを抽出し、抽出した文字データを先頭に有する構成語を含む複合語が複合語記憶手段に記憶してあるか否かを判断する。また、省略語判定装置は、前記複合語が複合語記憶手段に記憶してあると判断した場合、前記抽出した文字データを除いた前記テキストデータから、先頭の所定数の文字データを抽出し、抽出した文字データを先頭に有する構成語が、複合語記憶手段に記憶してあると判断した複合語の構成語に含まれているか否かを判断する。前記構成語が含まれていると判断した場合、省略語判定装置は、前記テキストデータが省略語であると判定する。よって、テキストデータが、複合語を構成する２つの構成語のそれぞれの先頭から所定数の文字を抽出して生成された省略語であるか否かを容易に判定することが可能となる。 The abbreviation determination device according to the present invention is a compound word storage means in which a plurality of compound words and constituent words constituting each compound word are associated with each other in the abbreviation determination device for determining whether or not text data is an abbreviation. A predetermined number of character data at the beginning is extracted from the text data, and it is determined whether or not a compound word including a constituent word having the extracted character data at the beginning is stored in the compound word storage means. . Further, when the abbreviation determination device determines that the compound word is stored in the compound word storage unit, the abbreviation determination device extracts a predetermined number of character data at the beginning from the text data excluding the extracted character data, It is determined whether or not the constituent word having the extracted character data at the head is included in the constituent words of the composite word determined to be stored in the composite word storage means. If it is determined that the constituent word is included, the abbreviation determination device determines that the text data is an abbreviation. Therefore, it is possible to easily determine whether or not the text data is an abbreviation generated by extracting a predetermined number of characters from the head of each of the two constituent words constituting the compound word.

本発明に係る省略語判定装置は、テキストデータの先頭から２音節に相当する数の文字データを抽出し、抽出した文字データを先頭に有する姓が人名記憶手段に記憶してあるか否か、又は抽出した文字データを先頭に有する構成語を含む複合語が複合語記憶手段に記憶してあるか否かを判断する。また、省略語判定装置は、前記抽出した文字データを除いた前記テキストデータの先頭から、２音節に相当する数の文字データを抽出し、抽出した文字データを先頭に有する名が人名記憶手段に記憶してあるか否か、又は抽出した文字データを先頭に有する構成語が、複合語記憶手段に記憶してあると判断した複合語の構成語に含まれているか否かを判断する。よって、テキストデータが、人名の姓及び名のそれぞれの先頭から２音節に相当する数の文字を抽出して生成された省略語であるか否か、又は複合語を構成する２つの構成語のそれぞれの先頭から２音節に相当する数の文字を抽出して生成された省略語であるか否かを容易に判定することが可能となる。 The abbreviation determination device according to the present invention extracts a number of character data corresponding to two syllables from the beginning of the text data, and whether the surname having the extracted character data at the beginning is stored in the personal name storage means, Alternatively, it is determined whether or not a compound word including a constituent word having extracted character data at the head is stored in the compound word storage unit. The abbreviation determination device extracts a number of character data corresponding to two syllables from the head of the text data excluding the extracted character data, and a name having the extracted character data at the head is stored in the personal name storage means. It is determined whether or not it is stored, or whether or not the constituent word having the extracted character data at the head is included in the constituent words of the composite word determined to be stored in the composite word storage means. Therefore, whether the text data is an abbreviation generated by extracting a number of characters corresponding to two syllables from the head of each surname and first name of a person name, or two constituent words constituting a compound word It is possible to easily determine whether or not the abbreviation is generated by extracting a number of characters corresponding to two syllables from the beginning.

本発明に係る省略語判定装置は、テキストデータの先頭から１音節に相当する数の文字データを抽出し、抽出した文字データを先頭に有する姓が人名記憶手段に記憶してあるか否か、又は抽出した文字データを先頭に有する構成語を含む複合語が複合語記憶手段に記憶してあるか否かを判断する。また、省略語判定装置は、前記抽出した文字データを除いた前記テキストデータの先頭から、２音節に相当する数の文字データを抽出し、抽出した文字データを先頭に有する名が人名記憶手段に記憶してあるか否か、又は抽出した文字データを先頭に有する構成語が、複合語記憶手段に記憶してあると判断した複合語の構成語に含まれているか否かを判断する。よって、テキストデータが、人名の姓の先頭から１音節に相当する数の文字を抽出し、名の先頭から２音節に相当する数の文字を抽出して生成された省略語であるか否か、又は複合語を構成する構成語の先頭から１音節に相当する数の文字を抽出し、他の構成語の先頭から２音節に相当する数の文字を抽出して生成された省略語であるか否かを容易に判定することが可能となる。 The abbreviation determination device according to the present invention extracts a number of character data corresponding to one syllable from the beginning of the text data, and whether the surname having the extracted character data at the beginning is stored in the personal name storage means, Alternatively, it is determined whether or not a compound word including a constituent word having extracted character data at the head is stored in the compound word storage unit. The abbreviation determination device extracts a number of character data corresponding to two syllables from the head of the text data excluding the extracted character data, and a name having the extracted character data at the head is stored in the personal name storage means. It is determined whether or not it is stored, or whether or not the constituent word having the extracted character data at the head is included in the constituent words of the composite word determined to be stored in the composite word storage means. Therefore, whether or not the text data is an abbreviation generated by extracting the number of characters corresponding to one syllable from the beginning of the surname of the person name and extracting the number of characters corresponding to two syllables from the beginning of the name. Or an abbreviation generated by extracting a number of characters corresponding to one syllable from the beginning of a constituent word constituting a compound word and extracting a number of characters corresponding to two syllables from the beginning of another constituent word. It is possible to easily determine whether or not.

本発明に係る省略語判定装置は、複数のテキストデータを含む文書データをテキストデータに分割し、分割されたテキストデータから先頭の所定数の文字データを抽出し、抽出した文字データを先頭に有する姓が人名記憶手段に記憶してあるか否か、又は抽出した文字データを先頭に有する構成語を含む複合語が複合語記憶手段に記憶してあるか否かを判断する。また、省略語判定装置は、前記抽出した文字データを除いた前記テキストデータから先頭の所定数の文字データを抽出し、抽出した文字データを先頭に有する名が人名記憶手段に記憶してあるか否か、又は抽出した文字データを先頭に有する構成語が、複合語記憶手段に記憶してあると判断した複合語の構成語に含まれているか否かを判断する。前記名が人名記憶手段に記憶してあると判断した場合、又は前記構成語が含まれていると判断した場合、省略語判定装置は、前記テキストデータが省略語の候補であると判定する。また、省略語判定装置は、複数のテキストデータ及び各テキストデータと共起される共起データを対応付けて共起データ記憶手段に記憶しており、省略語の候補であると判定したテキストデータに対応する共起データを共起データ記憶手段から取得し、取得された共起データが前記文書データ中のテキストデータに含まれているか否かを判断し、含まれていると判断した場合、省略語の候補であると判定したテキストデータを省略語であると確定する。よって、テキストデータが、人名の姓及び名のそれぞれの先頭から所定数の文字を抽出して生成された省略語、又は複合語を構成する２つの構成語のそれぞれの先頭から所定数の文字を抽出して生成された省略語であるか否かを、当該テキストデータと共起されるデータが共起データであるか否かに基づいて確実に判定することが可能となる。 The abbreviation determination device according to the present invention divides document data including a plurality of text data into text data, extracts a predetermined number of character data at the head from the divided text data, and has the extracted character data at the head. It is determined whether the last name is stored in the personal name storage means, or whether a compound word including a constituent word having the extracted character data at the head is stored in the compound word storage means. Further, the abbreviation determination device extracts a predetermined number of character data at the beginning from the text data excluding the extracted character data, and whether a name having the extracted character data at the beginning is stored in the personal name storage means It is determined whether or not the constituent word having the extracted character data at the head is included in the constituent words of the compound word determined to be stored in the compound word storage means. If it is determined that the name is stored in the personal name storage means, or if it is determined that the constituent word is included, the abbreviation determination device determines that the text data is an abbreviation candidate. The abbreviation determination device stores a plurality of text data and co-occurrence data co-occurring with each text data in association with each other in the co-occurrence data storage means, and the text data determined as a candidate for the abbreviation Is obtained from the co-occurrence data storage means, and it is determined whether or not the acquired co-occurrence data is included in the text data in the document data, The text data determined to be an abbreviation candidate is determined to be an abbreviation. Therefore, the text data includes an abbreviation generated by extracting a predetermined number of characters from the beginning of each surname and first name of a person name, or a predetermined number of characters from the beginning of each of two constituent words constituting a compound word. It is possible to reliably determine whether the abbreviation is generated by extraction based on whether the data co-occurring with the text data is co-occurring data.

本発明に係る省略語判定方法は、テキストデータが省略語であるか否かを判定する省略語判定方法において、テキストデータから先頭の所定数の文字データを抽出し、抽出した文字データを先頭に有する姓が、人名に用いられる姓及び名をそれぞれ記憶する人名記憶手段に記憶してあるか否かを判断し、前記姓が人名記憶手段に記憶してあると判断した場合、前記抽出した文字データを除いた前記テキストデータから、先頭の所定数の文字データを抽出し、抽出した文字データを先頭に有する名が人名記憶手段に記憶してあるか否かを判断し、前記名が人名記憶手段に記憶してあると判断した場合、前記テキストデータが省略語であると判定する。 The abbreviation determination method according to the present invention is an abbreviation determination method for determining whether or not text data is an abbreviation. In the abbreviation determination method, a predetermined number of character data at the beginning is extracted from the text data, and the extracted character data at the beginning. It is determined whether or not the last name is stored in the personal name storage means for storing the last name and the first name used for the personal name, and if it is determined that the last name is stored in the personal name storage means, the extracted characters A predetermined number of character data at the beginning is extracted from the text data excluding data, and it is determined whether or not a name having the extracted character data at the beginning is stored in the personal name storage means. If it is determined that the data is stored in the means, it is determined that the text data is an abbreviation.

本発明に係る省略語判定方法は、テキストデータが省略語であるか否かを判定する省略語判定方法において、テキストデータから先頭の所定数の文字データを抽出し、抽出した文字データを先頭に有する構成語を含む複合語が、複数の複合語及び各複合語を構成する構成語を対応付けて記憶する複合語記憶手段に記憶してあるか否かを判断し、前記複合語が前記複合語記憶手段に記憶してあると判断した場合、前記抽出した文字データを除いた前記テキストデータから、先頭の所定数の文字データを抽出し、抽出した文字データを先頭に有する構成語が、複合語記憶手段に記憶してあると判断された複合語の構成語に含まれているか否かを判断し、前記構成語が含まれていると判断した場合、前記テキストデータが省略語であると判定する。 The abbreviation determination method according to the present invention is an abbreviation determination method for determining whether or not text data is an abbreviation. In the abbreviation determination method, a predetermined number of character data at the beginning is extracted from the text data, and the extracted character data at the beginning. It is determined whether or not a compound word including a constituent word is stored in a compound word storage unit that stores a plurality of compound words and constituent words constituting each compound word in association with each other, and the compound word is the compound word If it is determined that the word storage means stores, a predetermined number of character data at the beginning is extracted from the text data excluding the extracted character data, and the constituent word having the extracted character data at the beginning is combined It is determined whether or not it is included in a constituent word of a compound word determined to be stored in the word storage means, and when it is determined that the constituent word is included, the text data is an abbreviation judge.

本発明に係るコンピュータプログラムは、コンピュータに読み取らせて実行させることにより、上述したような省略語判定装置をコンピュータによって実現することが可能となる。 When the computer program according to the present invention is read by a computer and executed, the abbreviation determination device as described above can be realized by the computer.

本発明に係るテキスト解析装置は、テキストデータを解析するテキスト解析装置において、上述したいずれかの省略語判定装置を備え、形態素及びアクセント型を対応付けて形態素記憶手段に記憶しており、形態素記憶手段の記憶内容に基づいて、テキストデータを形態素に分割し、分割した形態素のそれぞれにアクセント型を付与する。前記省略語判定装置は、形態素記憶手段に記憶されていない形態素が省略語であるか否かを判定しており、テキスト解析装置は、省略語判定装置によって省略語であると判定された形態素に所定のアクセント型を付与する。よって、形態素記憶手段に記憶されていない形態素において、省略語であると判定された形態素と、他の形態素とにおいて異なるアクセント型を付与するので、省略語に適したアクセント型を付与することが可能となる。 A text analysis apparatus according to the present invention is a text analysis apparatus that analyzes text data, includes any one of the abbreviation determination apparatuses described above, and stores morphemes and accent types in association with each other in a morpheme storage unit. Based on the stored contents of the means, the text data is divided into morphemes, and an accent type is assigned to each of the divided morphemes. The abbreviation determination device determines whether or not a morpheme that is not stored in the morpheme storage means is an abbreviation, and the text analysis device applies the morpheme determined to be an abbreviation by the abbreviation determination device. A predetermined accent type is given. Therefore, in the morphemes that are not stored in the morpheme storage means, different accent types are assigned to the morphemes determined to be abbreviations and other morphemes, so it is possible to assign accent types suitable for abbreviations. It becomes.

本発明に係るテキスト解析装置は、省略語及びアクセント型を対応付けて省略語記憶手段に記憶しており、省略語記憶手段の記憶内容に基づいて、省略語判定装置によって省略語であると判定された形態素のそれぞれにアクセント型を付与すると共に、省略語記憶手段に記憶されていない形態素に所定のアクセント型を付与する。よって、予め省略語記憶手段に記憶してある省略語に対しては、それぞれ対応するアクセント型を付与し、省略語記憶手段に記憶されていない省略語に対しては、所定のアクセント型を付与するので、省略語に適したアクセント型を付与することが可能となる。 The text analysis apparatus according to the present invention stores abbreviations and accent types in association with each other in the abbreviation storage means, and determines that the abbreviation is determined by the abbreviation determination apparatus based on the stored contents of the abbreviation storage means. Each of the morphemes is given an accent type, and a predetermined accent type is given to a morpheme that is not stored in the abbreviation storage means. Therefore, the corresponding accent type is assigned to the abbreviations stored in advance in the abbreviation storage means, and the predetermined accent type is assigned to the abbreviations not stored in the abbreviation storage means. Therefore, an accent type suitable for an abbreviation can be given.

本発明に係るテキスト解析装置は、テキストデータを解析するテキスト解析装置において、上述した共起データ記憶手段を有する省略語判定装置を備え、テキストデータ及びアクセント型を対応付けてテキスト記憶手段に記憶している。前記省略語判定装置は、テキスト記憶手段の記憶内容に基づいて、文書データをテキストデータに分割し、分割したテキストデータのそれぞれにアクセント型を付与する。また、省略語判定装置は、テキスト記憶手段に記憶されていないテキストデータが省略語であるか否かを判定しており、テキスト解析装置は、省略語判定装置によって省略語であると判定されたテキストデータに所定のアクセント型を付与する。よって、テキスト記憶手段に記憶されていないテキストデータにおいて、省略語であると判定されたテキストデータと、他のテキストデータとにおいて異なるアクセント型を付与するので、省略語に適したアクセント型を付与することが可能となる。 The text analysis apparatus according to the present invention is a text analysis apparatus for analyzing text data, and includes the abbreviation determination apparatus having the above-described co-occurrence data storage means, and stores the text data and the accent type in association with each other in the text storage means. ing. The abbreviation determination device divides the document data into text data based on the stored contents of the text storage means, and gives an accent type to each of the divided text data. The abbreviation determination device determines whether text data not stored in the text storage means is an abbreviation, and the text analysis device is determined to be an abbreviation by the abbreviation determination device. A predetermined accent type is given to the text data. Therefore, in the text data not stored in the text storage means, different accent types are assigned to the text data determined to be an abbreviation and the other text data, so an accent type suitable for the abbreviation is assigned. It becomes possible.

本発明に係るテキスト解析装置は、省略語及びアクセント型を対応付けて省略語記憶手段に記憶しており、省略語記憶手段の記憶内容に基づいて、省略語判定装置によって省略語であると判定されたテキストデータのそれぞれにアクセント型を付与すると共に、省略語記憶手段に記憶されていないテキストデータに所定のアクセント型を付与する。よって、予め省略語記憶手段に記憶してある省略語に対しては、それぞれ対応するアクセント型を付与し、省略語記憶手段に記憶されていない省略語に対しては、所定のアクセント型を付与するので、省略語に適したアクセント型を付与することが可能となる。 The text analysis apparatus according to the present invention stores abbreviations and accent types in association with each other in the abbreviation storage means, and determines that the abbreviation is determined by the abbreviation determination apparatus based on the stored contents of the abbreviation storage means. An accent type is assigned to each piece of text data, and a predetermined accent type is assigned to text data not stored in the abbreviation storage means. Therefore, the corresponding accent type is assigned to the abbreviations stored in advance in the abbreviation storage means, and the predetermined accent type is assigned to the abbreviations not stored in the abbreviation storage means. Therefore, an accent type suitable for an abbreviation can be given.

本発明に係る音声合成装置は、テキストデータから合成音声を生成する音声合成装置において、上述したいずれかのテキスト解析装置を備え、テキスト解析装置の形態素分割手段が分割した形態素及び各形態素に付与されたアクセント型に基づいて、各形態素に対応する韻律を生成し、生成した韻律に基づいて合成音声を生成する。 A speech synthesizer according to the present invention is a speech synthesizer that generates synthesized speech from text data. The speech synthesizer includes any of the text analysis devices described above, and is assigned to each morpheme divided by the morpheme dividing unit of the text analysis device. Prosody corresponding to each morpheme is generated based on the accent type, and synthesized speech is generated based on the generated prosody.

本発明に係る音声合成装置は、テキストデータから合成音声を生成する音声合成装置において、上述した共起データ記憶手段を有する省略語判定装置を備えたテキスト解析装置を備え、省略語判定装置の分割手段が分割したテキストデータ及びテキスト解析装置が各テキストデータに付与したアクセント型に基づいて、各テキストデータに対応する韻律を生成し、生成した韻律に基づいて合成音声を生成する。 A speech synthesizer according to the present invention is a speech synthesizer that generates synthesized speech from text data. The speech synthesizer includes a text analysis device that includes the abbreviation determination device having the above-described co-occurrence data storage unit. Prosody corresponding to each text data is generated based on the text data divided by the means and the accent type given to each text data by the text analysis device, and synthesized speech is generated based on the generated prosody.

本発明では、テキストデータが、人名の姓に用いられる文字の先頭の所定数の文字と、人名の名に用いられる文字の先頭の所定数の文字とを組み合わせて構成される場合に、前記テキストデータを、人名を省略して生成された省略語であると判定する。なお、近年の省略語は、人名における姓の先頭の２音節と名の先頭の２音節とを組み合わせたものが多い。従って、例えば、人名の姓に用いられる文字の先頭の２文字と人名の名に用いられる文字の先頭の２文字とを組み合わせて構成されるテキストデータを省略語であると判定する場合、テキストデータが、例えば、マツケン（松平健）、キムタク（木村拓也）、ナカショー（中川翔子）等の省略語であるか否かを容易に判定することができる。 In the present invention, when the text data is configured by combining a predetermined number of characters at the beginning of the characters used for the surname of the person name and a predetermined number of characters at the beginning of the characters used for the name of the person name, the text The data is determined to be an abbreviation generated by omitting a person's name. In recent years, many abbreviations are a combination of the first two syllables of a family name and the first two syllables of a first name. Therefore, for example, when it is determined that text data formed by combining the first two characters used for the surname of the person name and the first two characters used for the name of the person name is an abbreviation, the text data However, it is possible to easily determine whether the abbreviation is, for example, Matsuken (Ken Matsuhira), Kim Taku (Takuya Kimura), Nakasho (Shoko Nakagawa), or the like.

本発明では、テキストデータが、複合語を構成する２つの構成語のそれぞれの先頭の所定数の文字を組み合わせて構成される場合に、前記テキストデータを、複合語を省略して生成された省略語であると判定する。なお、近年の省略語は、複合語における正式名称を構成する最初の単語（構成語）の先頭の２音節と２番目の単語（構成語）の先頭の２音節とを組み合わせたものが多い。従って、例えば、最初の構成語の先頭の２文字と２番目の構成語の先頭の２文字とを組み合わせて構成されるテキストデータを省略語であると判定する場合、テキストデータが、例えば、コスプレ（コスチュームプレイ）、連ドラ（連続ドラマ）、筋トレ（筋肉トレーニング）等の省略語であるか否かを容易に判定することができる。また、最初の構成語の先頭の１文字と２番目の構成語の先頭の２文字とを組み合わせて構成されるテキストデータを省略語であると判定する場合、テキストデータが、例えば、地デジ（地上デジタル放送）等の省略語であるか否かを容易に判定することができる。 In the present invention, when the text data is configured by combining a predetermined number of characters at the head of each of the two constituent words constituting the compound word, the text data is generated by omitting the compound word. Judged to be a word. In recent years, many abbreviations are a combination of the first two syllables of the first word (component word) and the first two syllables of the second word (component word) constituting the formal name in the compound word. Therefore, for example, when it is determined that text data formed by combining the first two characters of the first constituent word and the first two characters of the second constituent word is an abbreviation, the text data is, for example, a cosplay Whether it is an abbreviation such as (costume play), continuous drama (continuous drama), muscle training (muscle training) or the like can be easily determined. Further, when it is determined that text data configured by combining the first character of the first constituent word and the first two characters of the second constituent word is an abbreviation, the text data is, for example, terrestrial digital ( It is possible to easily determine whether the abbreviation is “terrestrial digital broadcasting” or the like.

本発明では、文書データをテキストデータに分割し、分割されたテキストデータが省略語の候補であると判定された場合、このテキストデータと共起される共起データが、このテキストデータを含む文書データに含まれているか否かを判断し、含まれている場合、このテキストデータを省略語であると確定する。よって、省略語の候補と判定されたテキストデータが、本来の省略語であれば共起される可能性の高い共起データと共起されている場合にのみ、このテキストデータを省略語に確定するので、省略語の誤判定を防止することができる。具体的には、例えば、「マツケン」に対する共起データに「サンバ」が登録されており、「マツケンがサンバを踊った」という文書データ中のテキストデータ「マツケン」が省略語の候補であると判定された場合、「マツケン」の共起データである「サンバ」が文書データ中に含まれているので、この「マツケン」を省略語に特定することができる。 In the present invention, when document data is divided into text data, and it is determined that the divided text data is an abbreviation candidate, the co-occurrence data co-occurred with the text data is a document including the text data. It is determined whether or not it is included in the data, and if it is included, this text data is determined to be an abbreviation. Therefore, this text data is determined to be an abbreviation only if the text data determined to be an abbreviation candidate co-occurs with co-occurrence data that is likely to co-occur if it is an original abbreviation. Thus, erroneous determination of abbreviations can be prevented. Specifically, for example, “samba” is registered in the co-occurrence data for “Matsuken”, and text data “Matsuken” in the document data “Matsuken danced samba” is a candidate for abbreviation. If it is determined, since “samba”, which is co-occurrence data of “Matsuken”, is included in the document data, this “Matsuken” can be specified as an abbreviation.

本発明では、形態素及びアクセント型を対応付けて記憶してある形態素記憶手段の記憶内容に基づいて、テキストデータを形態素に分割し、分割した形態素のそれぞれにアクセント型を付与する。また、形態素記憶手段に記憶されていない形態素が省略語判定装置によって省略語であると判定された場合、この形態素に所定のアクセント型を付与する。このように、形態素記憶手段に記憶されていない形態素において、省略語であると判定された形態素と、他の形態素とにおいて異なるアクセント型を付与するので、省略語に適したアクセント型を付与することができる。従って、このように各形態素に適切に付与されたアクセント型に基づいて、各形態素に対応する韻律を生成し、生成した韻律に基づいて合成音声を生成した場合、適切な韻律を生成することができ、また、適切な韻律に基づいて適切な合成音声を生成することができるので、形態素記憶手段に記憶されていない形態素（未知語）に対しても正しいアクセントで合成音声を出力することができる。 In the present invention, the text data is divided into morphemes based on the storage contents of the morpheme storage means in which the morpheme and the accent type are stored in association with each other, and an accent type is assigned to each of the divided morphemes. When a morpheme that is not stored in the morpheme storage unit is determined to be an abbreviation by the abbreviation determination device, a predetermined accent type is assigned to the morpheme. In this way, in a morpheme that is not stored in the morpheme storage means, a different accent type is assigned to a morpheme determined to be an abbreviation and another morpheme, and therefore an accent type suitable for the abbreviation is given. Can do. Therefore, when a prosody corresponding to each morpheme is generated based on the accent type appropriately given to each morpheme in this way, and a synthesized speech is generated based on the generated prosody, an appropriate prosody can be generated. In addition, since an appropriate synthesized speech can be generated based on an appropriate prosody, a synthesized speech can be output with a correct accent even for morphemes (unknown words) that are not stored in the morpheme storage means. .

本発明では、省略語及びアクセント型を対応付けて記憶してある省略語記憶手段の記憶内容に基づいて、省略語判定装置によって省略語であると判定された形態素のそれぞれにアクセント型を付与すると共に、省略語記憶手段に記憶されていない形態素に所定のアクセント型を付与する。このように、省略語記憶手段に記憶してある省略語に対してはそれぞれ対応するアクセント型を付与するので、より適切なアクセント型を付与することができる。従って、このように各形態素に適切に付与されたアクセント型に基づいて、各形態素に対応する韻律を生成し、生成した韻律に基づいて合成音声を生成した場合、適切な韻律を生成することができ、また、適切な韻律に基づいて適切な合成音声を生成することができる。 In the present invention, an accent type is assigned to each morpheme determined to be an abbreviation by the abbreviation determination device based on the stored contents of the abbreviation storage means that stores the abbreviation and the accent type in association with each other. At the same time, a predetermined accent type is given to the morphemes that are not stored in the abbreviation storage means. Thus, since the corresponding accent type is assigned to each abbreviation stored in the abbreviation storage means, a more appropriate accent type can be assigned. Therefore, when a prosody corresponding to each morpheme is generated based on the accent type appropriately given to each morpheme in this way, and a synthesized speech is generated based on the generated prosody, an appropriate prosody can be generated. In addition, an appropriate synthesized speech can be generated based on an appropriate prosody.

本発明では、テキストデータ及びアクセント型を対応付けて記憶してあるテキスト記憶手段の記憶内容に基づいて、文書データをテキストデータに分割し、分割したテキストデータのそれぞれにアクセント型を付与する。また、テキスト記憶手段に記憶されていないテキストデータが省略語判定装置によって省略語であると判定された場合、このテキストデータに所定のアクセント型を付与する。このように、テキスト記憶手段に記憶されていないテキストデータにおいて、省略語であると判定されたテキストデータと、他のテキストデータとにおいて異なるアクセント型を付与するので、省略語に適したアクセント型を付与することができる。従って、このように各テキストデータに適切に付与されたアクセント型に基づいて、各テキストデータに対応する韻律を生成し、生成した韻律に基づいて合成音声を生成した場合、適切な韻律を生成することができ、また、適切な韻律に基づいて適切な合成音声を生成することができるので、テキスト記憶手段に記憶されていない形態素（未知語）に対しても正しいアクセントで合成音声を出力することができる。 In the present invention, the document data is divided into text data based on the stored contents of the text storage means in which the text data and the accent type are stored in association with each other, and an accent type is assigned to each of the divided text data. When text data not stored in the text storage means is determined to be an abbreviation by the abbreviation determination device, a predetermined accent type is assigned to this text data. In this way, in the text data not stored in the text storage means, different accent types are given to the text data determined to be an abbreviation and the other text data, so an accent type suitable for the abbreviation is selected. Can be granted. Therefore, when a prosody corresponding to each text data is generated based on the accent type appropriately given to each text data in this way, and a synthesized speech is generated based on the generated prosody, an appropriate prosody is generated. Since it is possible to generate an appropriate synthesized speech based on an appropriate prosody, it is possible to output a synthesized speech with a correct accent even for morphemes (unknown words) that are not stored in the text storage means. Can do.

本発明では、省略語及びアクセント型を対応付けて記憶してある省略語記憶手段の記憶内容に基づいて、省略語判定装置によって省略語であると判定されたテキストデータのそれぞれにアクセント型を付与すると共に、省略語記憶手段に記憶されていないテキストデータに所定のアクセント型を付与する。このように、省略語記憶手段に記憶してある省略語に対してはそれぞれ対応するアクセント型を付与するので、より適切なアクセント型を付与することができる。従って、このように各テキストデータに適切に付与されたアクセント型に基づいて、各テキストデータに対応する韻律を生成し、生成した韻律に基づいて合成音声を生成した場合、適切な韻律を生成することができ、また、適切な韻律に基づいて適切な合成音声を生成することができる。 In the present invention, an accent type is assigned to each text data determined to be an abbreviation by the abbreviation determination device based on the stored contents of the abbreviation storage means that stores the abbreviation and the accent type in association with each other. At the same time, a predetermined accent type is given to the text data not stored in the abbreviation storage means. Thus, since the corresponding accent type is assigned to each abbreviation stored in the abbreviation storage means, a more appropriate accent type can be assigned. Therefore, when a prosody corresponding to each text data is generated based on the accent type appropriately given to each text data in this way, and a synthesized speech is generated based on the generated prosody, an appropriate prosody is generated. In addition, an appropriate synthesized speech can be generated based on an appropriate prosody.

以下に、本発明に係る省略語判定装置、テキスト解析装置及び音声合成装置を、各実施形態を示す図面に基づいて詳述する。なお、以下の各実施形態では、本発明に係るコンピュータプログラムを公知のパーソナルコンピュータ等に読み取らせ、パーソナルコンピュータのＣＰＵ等によって実行させることによって本発明に係る省略語判定装置、テキスト解析装置及び音声合成装置を実現する構成について説明する。しかし、等価な働きをするハードウェアによって本発明に係る省略語判定装置、テキスト解析装置及び音声合成装置を実現してもよい。 Hereinafter, an abbreviation determination device, a text analysis device, and a speech synthesis device according to the present invention will be described in detail with reference to the drawings illustrating each embodiment. In the following embodiments, the computer program according to the present invention is read by a known personal computer or the like and executed by the CPU or the like of the personal computer, whereby the abbreviation determination device, text analysis device, and speech synthesis according to the present invention are performed. A configuration for realizing the apparatus will be described. However, the abbreviation determination device, the text analysis device, and the speech synthesis device according to the present invention may be realized by hardware having equivalent functions.

（実施形態１）
以下に、本発明に係る省略語判定装置を備えた本発明に係るテキスト解析装置を、実施形態１を示す図面に基づいて詳述する。図１は実施形態１に係るテキスト解析装置の構成例を示すブロック図である。本実施形態１に係るテキスト解析装置１０は、制御部１、ＲＯＭ２、ＲＡＭ３、ＨＤＤ４、操作部５、表示部６等を備え、これらのハードウェア各部はそれぞれバス１ａを介して相互に接続されている。 (Embodiment 1)
Hereinafter, a text analysis apparatus according to the present invention provided with an abbreviation determination apparatus according to the present invention will be described in detail with reference to the drawings showing Embodiment 1. FIG. 1 is a block diagram illustrating a configuration example of a text analysis apparatus according to the first embodiment. The text analysis apparatus 10 according to the first embodiment includes a control unit 1, a ROM 2, a RAM 3, an HDD 4, an operation unit 5, a display unit 6, and the like. These hardware units are connected to each other via a bus 1a. Yes.

制御部１は、ＣＰＵ（Central Processing Unit ）又はＭＰＵ（Micro Processor Unit）等で構成され、ＲＯＭ２又はＨＤＤ４に予め記憶してある制御プログラムを適宜ＲＡＭ３に読み出して実行すると共に、上述したハードウェア各部の動作を制御する。ＲＯＭ２には、テキスト解析装置１０を本発明の省略語判定装置及びテキスト解析装置として動作させるために必要な種々の制御プログラムが予め格納されている。ＲＡＭ３はＳＲＡＭ又はフラッシュメモリ等で構成されており、制御部１による制御プログラムの実行時に発生する種々のデータを一時的に記憶する。 The control unit 1 is composed of a CPU (Central Processing Unit) or MPU (Micro Processor Unit) or the like, and reads and executes a control program stored in advance in the ROM 2 or HDD 4 to the RAM 3 as appropriate. Control the behavior. The ROM 2 stores various control programs necessary for operating the text analysis device 10 as an abbreviation determination device and a text analysis device of the present invention. The RAM 3 is configured by SRAM, flash memory, or the like, and temporarily stores various data generated when the control unit 1 executes the control program.

操作部５は、キーボード及びマウス等であり、ユーザがテキスト解析装置１０を操作するために必要な各種の操作キーを備えている。ユーザにより各操作キーが操作された場合、操作部５は操作された操作キーに対応した制御信号を制御部１へ送出し、制御部１は操作部５から取得した制御信号に対応した処理を実行する。
表示部６は、液晶ディスプレイ又はＣＲＴディスプレイ等であり、制御部１からの指示に従って、テキスト解析装置１０の動作状態、操作部５を介して入力された情報、ユーザに対して通知すべき情報等を表示する。 The operation unit 5 is a keyboard, a mouse, and the like, and includes various operation keys necessary for the user to operate the text analysis device 10. When each operation key is operated by the user, the operation unit 5 sends a control signal corresponding to the operated operation key to the control unit 1, and the control unit 1 performs processing corresponding to the control signal acquired from the operation unit 5. Execute.
The display unit 6 is a liquid crystal display, a CRT display, or the like. According to an instruction from the control unit 1, the operating state of the text analysis apparatus 10, information input via the operation unit 5, information to be notified to the user, and the like Is displayed.

ＨＤＤ４は大容量の記憶装置であり、ＨＤＤ４には、テキスト解析装置１０を本発明の省略語判定装置及びテキスト解析装置として動作させるために必要な種々の制御プログラム、テキストデータ、図２に示すような言語辞書４ａ、図３に示すような人名辞書４ｂ、図４に示すような複合語辞書４ｃ、ユーザに対して各種の情報を通知するための画面情報等が予め記憶されている。 The HDD 4 is a large-capacity storage device. The HDD 4 includes various control programs and text data necessary for operating the text analysis device 10 as an abbreviation determination device and text analysis device of the present invention, as shown in FIG. A language dictionary 4a, a personal name dictionary 4b as shown in FIG. 3, a compound word dictionary 4c as shown in FIG. 4, screen information for notifying the user of various information, and the like are stored in advance.

なお、言語辞書４ａ、人名辞書４ｂ及び複合語辞書４ｃは、ＨＤＤ４に予め格納されているだけでなく、テキスト解析装置１０が外部メモリ（図示せず）に記憶してあるデータの読み出しが可能なドライバ（図示せず）を備える場合には、外部メモリに記憶された各種辞書をドライバによって読み出してＨＤＤ４に格納させてもよい。また、テキスト解析装置１０がインターネットのようなネットワークとの接続が可能な通信部（図示せず）を備える場合には、ネットワークを介して外部の装置から各種辞書をダウンロードしてＨＤＤ４に格納させてもよい。また、ＨＤＤ４に記憶してあるテキストデータは、テキスト解析装置１０で作成されたテキストデータであってもよく、外部の装置で作成されて外部メモリ（図示せず）又はネットワーク（図示せず）を介してテキスト解析装置１０に読み取らせたテキストデータであってもよい。 The language dictionary 4a, the personal name dictionary 4b, and the compound word dictionary 4c are not only stored in the HDD 4 in advance, but the text analysis device 10 can read data stored in an external memory (not shown). When a driver (not shown) is provided, various dictionaries stored in the external memory may be read out by the driver and stored in the HDD 4. When the text analysis device 10 includes a communication unit (not shown) that can be connected to a network such as the Internet, various dictionaries are downloaded from an external device via the network and stored in the HDD 4. Also good. The text data stored in the HDD 4 may be text data created by the text analysis device 10, and is created by an external device and stored in an external memory (not shown) or a network (not shown). The text data may be read by the text analysis device 10 via the network.

図２は言語辞書４ａの登録内容を示す模式図である。図２に示すように、言語辞書（形態素記憶手段）４ａには、単語（形態素）の表記、読み及びアクセント型がそれぞれ対応付けて登録されている。なお、言語辞書４ａに各形態素の品詞が登録されていてもよい。 FIG. 2 is a schematic diagram showing registered contents of the language dictionary 4a. As shown in FIG. 2, in the language dictionary (morpheme storage means) 4a, word (morpheme) notation, reading, and accent type are registered in association with each other. The part of speech of each morpheme may be registered in the language dictionary 4a.

図３は人名辞書４ｂの登録内容を示す模式図である。人名辞書（人名記憶手段）４ｂは、人名に用いられる姓及び名をそれぞれ記憶しており、図３（ａ）に示すような姓のリストと、図３（ｂ）に示すような名のリストとを有する。 FIG. 3 is a schematic diagram showing the registered contents of the personal name dictionary 4b. The personal name dictionary (person name storage means) 4b stores the last name and the first name used for the personal name, and a list of last names as shown in FIG. 3 (a) and a list of names as shown in FIG. 3 (b). And have.

図４は複合語辞書４ｃの登録内容を示す模式図である。図４に示すように、複合語辞書（複合語記憶手段）４ｃには、複合語、複合語を構成する構成語及び各構成語の読みがそれぞれ対応付けて登録されている。なお、本実施形態１では、複合語辞書４ｃの構成語の欄に登録される各構成語が、複合語を構成する順に登録されている例で説明するが、登録順序はこれに限られない。ただし、構成語の欄に順次登録される各構成語と、読みの欄に順次登録される各構成語の読みとは対応付けて登録されることが望ましい。 FIG. 4 is a schematic diagram showing the registered contents of the compound word dictionary 4c. As shown in FIG. 4, the compound word dictionary (compound word storage means) 4c registers compound words, constituent words constituting the compound words, and readings of the constituent words in association with each other. In the first embodiment, each constituent word registered in the constituent word column of the compound word dictionary 4c will be described as an example in which the constituent words are registered in the order of composing the compound words. However, the registration order is not limited to this. . However, it is desirable that the constituent words sequentially registered in the constituent word column and the readings of the constituent words sequentially registered in the reading column are registered in association with each other.

以下に、上述した構成のテキスト解析装置１０において、制御部１がＲＯＭ２及びＨＤＤ４に記憶してある制御プログラムを実行することによって実現される各種の機能について説明する。図５はテキスト解析装置１０の機能構成例を示す機能ブロック図である。本実施形態１のテキスト解析装置１０において、制御部１は、ＲＯＭ２及びＨＤＤ４に記憶してある制御プログラムを実行することによって、形態素解析部１１、省略語判定部（本発明に係る省略語判定装置）１２、省略語アクセント付与部１３等の各機能を実現する。なお、以下では、テキスト解析装置１０が「マツケンが、サンバを踊った。」のテキストデータを解析する処理を例に説明する。 Hereinafter, in the text analysis apparatus 10 having the above-described configuration, various functions that are realized when the control unit 1 executes a control program stored in the ROM 2 and the HDD 4 will be described. FIG. 5 is a functional block diagram showing a functional configuration example of the text analysis apparatus 10. In the text analysis device 10 according to the first embodiment, the control unit 1 executes a control program stored in the ROM 2 and the HDD 4 to thereby execute a morphological analysis unit 11, an abbreviation determination unit (the abbreviation determination device according to the present invention). ) 12, Each function of the abbreviation accent giving unit 13 and the like is realized. In the following description, an example of processing in which the text analysis device 10 analyzes text data “Matsuken danced samba” will be described.

形態素解析部（形態素分割手段）１１は、ＨＤＤ４に記憶してあるテキストデータをＨＤＤ４からＲＡＭ３に読み出し、言語辞書４ａの登録内容に基づいて、ＲＡＭ３に読み出したテキストデータを形態素に分割すると共に、分割した形態素のそれぞれにアクセント型を付与する。形態素解析部１１は、分割した各形態素と、それぞれに付与したアクセント型とを対応付けて省略語判定部１２へ送出する。なお、形態素解析部１１は、言語辞書４ａの登録内容に基づいてアクセント型を付与できなかった形態素（未知語）については、各形態素と、アクセント型が不明であること（未知語であること）を示す情報とを対応付けて省略語判定部１２へ送出する。 The morpheme analysis unit (morpheme dividing means) 11 reads the text data stored in the HDD 4 from the HDD 4 to the RAM 3 and divides the text data read into the RAM 3 into morphemes based on the registered contents of the language dictionary 4a. An accent type is given to each of the morphemes. The morpheme analyzing unit 11 associates each divided morpheme with the accent type assigned to each morpheme and sends it to the abbreviation determination unit 12. Note that the morpheme analysis unit 11 does not know each morpheme and the accent type (unknown word) for the morpheme (unknown word) that could not be given the accent type based on the registered contents of the language dictionary 4a. Is transmitted to the abbreviation determination unit 12 in association with the information indicating.

本実施形態１の言語辞書４ａには、「マツケン」は登録されていないため、「マツケン」は未知語として扱われる。従って、形態素解析部１１は、テキストデータ「マツケンが、サンバを踊った。」を、「マツケン（未知語）・ガ（１モーラ０型）・サンバ（３モーラ１型）・オ（１モーラ０型）・オドッタ（４モーラ０型）」のように形態素に分割し、この表音文字列を省略語判定部１２へ送出する。 Since “Matsuken” is not registered in the language dictionary 4a of the first embodiment, “Matsuken” is treated as an unknown word. Therefore, the morphological analysis unit 11 converts the text data “Matsuken danced samba.” Into “Matsuken (unknown word), Ga (1 mora 0 type), Samba (3 mora 1 type), O (1 mora 0). The phonogram character string is sent to the abbreviation determination unit 12.

なお、表音文字列の（）内には、各形態素のアクセント型、又は各形態素が未知語である場合には未知語であることを示す情報が付与される。また、「踊った」は、厳密に言うと形態素ではなく、文節又はアクセント句と呼ばれるものだが、本発明の本質とは関係ないのでここでは１形態素として扱っている。 In addition, information indicating that the accent type of each morpheme or an unknown word when each morpheme is an unknown word is given in parentheses in the phonetic character string. Strictly speaking, “dancing” is not a morpheme but is called a phrase or an accent phrase, but it is treated as one morpheme because it is not related to the essence of the present invention.

省略語判定部１２は、人名辞書４ｂ及び複合語辞書４ｃの登録内容に基づいて、形態素解析部１１によって未知語であるとされた形態素が省略語であるか否かを判定する。ここでは、省略語判定部１２は、「マツケン」が省略語であるか否かを判定する。なお、省略語判定部１２による省略語の判定処理の詳細については後述する。 The abbreviation determination unit 12 determines whether or not the morpheme determined to be an unknown word by the morpheme analysis unit 11 is an abbreviation based on the registered contents of the personal name dictionary 4b and the compound word dictionary 4c. Here, the abbreviation determination unit 12 determines whether or not “Matsuken” is an abbreviation. The details of the abbreviation determination process by the abbreviation determination unit 12 will be described later.

省略語判定部１２は、形態素解析部１１によって未知語であるとされた形態素が省略語であると判定した場合、省略語であると判定された形態素に、省略語であることを示す情報を対応付けて省略語アクセント付与部１３へ送出する。具体的には、「マツケン」が省略語であると判定した場合、省略語判定部１２は、「マツケン（省略語）・ガ（１モーラ０型）・サンバ（３モーラ１型）・オ（１モーラ０型）・オドッタ（４モーラ０型）」の表音文字列を省略語アクセント付与部１３へ送出する。 When the abbreviation determination unit 12 determines that the morpheme determined to be an unknown word by the morpheme analysis unit 11 is an abbreviation, information indicating that the abbreviation is an abbreviation is given to the morpheme determined to be an abbreviation. Correspondingly, it is sent to the abbreviation accent giving unit 13. Specifically, when it is determined that “Matsuken” is an abbreviation, the abbreviation determination unit 12 determines whether “Matsuken (abbreviation), moth (1 mora 0 type), samba (3 mora 1 type), o ( A phonogram character string of “1 mora 0 type” and ottotta (4 mora 0 type) is sent to the abbreviation accent assigning unit 13.

一方、省略語判定部１２は、形態素解析部１１によって未知語であるとされた形態素が省略語でないと判定した場合、省略語でないと判定された形態素に、形態素解析部１１から送出されてきた未知語であることを示す情報をそのまま対応付けて省略語アクセント付与部１３へ送出する。具体的には、「マツケン」が省略語でないと判定した場合、省略語判定部１２は、「マツケン（未知語）・ガ（１モーラ０型）・サンバ（３モーラ１型）・オ（１モーラ０型）・オドッタ（４モーラ０型）」の表音文字列を省略語アクセント付与部１３へ送出する。 On the other hand, if the abbreviation determination unit 12 determines that the morpheme determined to be an unknown word by the morpheme analysis unit 11 is not an abbreviation, the abbreviation determination unit 12 has transmitted the morpheme determined to be not an abbreviation from the morpheme analysis unit 11. The information indicating that it is an unknown word is associated with it as it is and sent to the abbreviation accent assigning unit 13. Specifically, when it is determined that “Matsuken” is not an abbreviation, the abbreviation determination unit 12 reads “Matsuken (unknown word), Ga (1 mora 0 type), Samba (3 mora 1 type), o (1 The phonetic character string “Mora 0 type” and Odotta (4 mora 0 type) ”is sent to the abbreviation accent assigning unit 13.

省略語アクセント付与部（アクセント付与手段）１３は、省略語判定部１２によって省略語であると判定された形態素に対して平板型（０型）のアクセント型（所定のアクセント型）を付与する。具体的には、省略語判定部１２によって「マツケン」が省略語であると判定されていた場合、省略語アクセント付与部１３は、省略語「マツケン」に対して４モーラ０型のアクセント型を付与し、省略語判定部１２から送出されてきた表音文字列中の「マツケン」のアクセント型に「４モーラ０型」を割り当てる。 The abbreviation accent imparting unit (accent imparting means) 13 imparts a flat type (0 type) accent type (predetermined accent type) to the morpheme determined by the abbreviation determination unit 12 as an abbreviation. Specifically, when “Matsuken” is determined to be an abbreviation by the abbreviation determination unit 12, the abbreviation accent giving unit 13 sets a 4-mora 0 type accent type for the abbreviation “Matsuken”. And “4 mora 0 type” is assigned to the accent type of “Matsuken” in the phonetic character string sent from the abbreviation determination unit 12.

これにより、省略語アクセント付与部１３は、「マツケン（４モーラ０型）・ガ（１モーラ０型）・サンバ（３モーラ１型）・オ（１モーラ０型）・オドッタ（４モーラ０型）」の表音文字列を出力する。なお、省略語判定部１２によって「マツケン」が省略語でないと判定されていた場合、省略語アクセント付与部１３は、省略語判定部１２から送出されてきた表音文字列をそのまま出力する。 As a result, the abbreviation accent assigning unit 13 can read “Matsuken (4 Mora 0 type), Ga (1 Mora 0 type), Samba (3 Mora 1 type), O (1 Mora 0 type), Odotta (4 Mora 0 type). ) "Phonetic character string is output. If it is determined by the abbreviation determination unit 12 that “Matsuken” is not an abbreviation, the abbreviation accent assignment unit 13 outputs the phonogram string sent from the abbreviation determination unit 12 as it is.

以下に、省略語判定部１２による省略語の判定処理について説明する。なお、本実施形態１では、省略語判定部１２は、形態素解析部１１によって未知語であるとされた形態素が、人名を省略した省略語であるか否か、又は複合語を省略した省略語であるか否かを判定する。なお、近年の省略語は、人名における姓の先頭の２音節（２文字）と名の先頭の２音節（２文字）とを組み合わせたもの、複合語における正式名称を構成する最初の単語（構成語）の先頭の２音節（２文字）と２番目の単語（構成語）の先頭の２音節（２文字）とを組み合わせたもの、複合語における正式名称を構成する最初の単語（構成語）の先頭の１音節（１文字）と２番目の単語（構成語）の先頭の２音節（２文字）とを組み合わせたものが多い。従って、本実施形態１では、形態素解析部１１によって未知語であるとされた形態素が、これらの省略語であるか否かを判定する。 The abbreviation determination process by the abbreviation determination unit 12 will be described below. In the first embodiment, the abbreviation determination unit 12 determines whether the morpheme determined to be an unknown word by the morpheme analysis unit 11 is an abbreviation obtained by omitting a person name, or an abbreviation obtained by omitting a compound word. It is determined whether or not. Note that recent abbreviations are a combination of the first two syllables (two letters) of the surname in a person's name and the first two syllables (two letters) of a first name, or the first word (composition) constituting a formal name in a compound word. The first two syllables (two letters) of the word) and the first two syllables (two letters) of the second word (constituent word), the first word (constituent word) constituting the official name in the compound word There are many combinations of the first syllable (one character) of and the first two syllables (two characters) of the second word (component word). Therefore, in the first embodiment, it is determined whether or not the morphemes determined as unknown words by the morpheme analyzer 11 are these abbreviations.

本実施形態１の省略語判定部１２は、まず、形態素解析部１１から送出されてきた表音文字列から、形態素解析部１１によって未知語とされた形態素を抽出し、抽出した未知語（形態素）が３音節であるか４音節であるかを判断する。なお、未知語が３音節である場合、例えば、人名における姓の先頭の１文字と名の先頭の２文字とを組み合わせた省略語、又は複合語を構成する最初の構成語の先頭の１文字と２番目の構成語の先頭の２文字とを組み合わせた省略語である可能性が高い。また、未知語が４音節である場合、例えば、人名における姓の先頭の２文字と名の先頭の２文字とを組み合わせた省略語、又は複合語を構成する最初の構成語の先頭の２文字と２番目の構成語の先頭の２文字とを組み合わせた省略語である可能性が高い。 The abbreviation determination unit 12 of the first exemplary embodiment first extracts a morpheme that is an unknown word by the morpheme analysis unit 11 from the phonetic character string transmitted from the morpheme analysis unit 11, and extracts the extracted unknown word (morpheme ) Is 3 syllables or 4 syllables. If the unknown word is three syllables, for example, an abbreviation combining the first letter of the surname in the name of the person and the first two letters of the name, or the first letter of the first constituent word constituting the compound word And the first two characters of the second component word are likely to be abbreviations. If the unknown word is four syllables, for example, the abbreviation combining the first two letters of the surname in the person's name and the first two letters of the first name, or the first two letters of the first constituent word constituting the compound word And the first two characters of the second component word are likely to be abbreviations.

従って、抽出した未知語が３音節である場合、省略語判定部（第１抽出手段）１２は、３音節の未知語（テキストデータ）から先頭の１音節を抽出する。そして、省略語判定部１２は、抽出した１音節が、人名辞書４ｂの姓のリストに登録されているいずれかの姓の先頭の１文字に一致するか否かを判断する。即ち、３音節の未知語の先頭の１音節を先頭に有する姓が姓のリストに登録されているか否かを判断する。抽出した１音節が姓のリストに登録されているいずれかの姓の先頭の１文字に一致すると判断した場合、省略語判定部（第２抽出手段）１２は、抽出した１音節を除いた前記３音節の未知語から先頭の２音節を抽出する。即ち、３音節の未知語の後方の２音節を抽出する。 Therefore, when the extracted unknown word is three syllables, the abbreviation determination unit (first extraction means) 12 extracts the first one syllable from the unknown word (text data) of the three syllables. Then, the abbreviation determination unit 12 determines whether or not the extracted one syllable matches the first character of any surname registered in the surname list of the personal name dictionary 4b. That is, it is determined whether or not a surname having the first one syllable of the unknown word of three syllables is registered in the surname list. When it is determined that the extracted one syllable matches the first character of any surname registered in the surname list, the abbreviation determination unit (second extraction means) 12 excludes the extracted one syllable. The first two syllables are extracted from three syllable unknown words. That is, the two syllables behind the unknown three-syllable word are extracted.

省略語判定部１２は、抽出した２音節が、人名辞書４ｂの名のリストに登録されているいずれかの名の先頭の２文字に一致するか否かを判断する。即ち、３音節の未知語の後方の２音節を先頭に有する名が名のリストに登録されているか否かを判断する。抽出した２音節が名のリストに登録されているいずれかの名の先頭の２文字に一致すると判断した場合、省略語判定部（判定手段）１２は、この未知語が人名を省略した省略語であると判定する。 The abbreviation determination unit 12 determines whether or not the extracted two syllables match the first two characters of any name registered in the name list of the personal name dictionary 4b. That is, it is determined whether or not a name having the first two syllables behind the unknown word of the three syllables is registered in the name list. When it is determined that the extracted two syllables coincide with the first two characters of any name registered in the name list, the abbreviation determination unit (determination means) 12 abbreviates the unknown word as the abbreviation. It is determined that

なお、３音節の未知語の先頭の１音節が姓のリストに登録されているいずれの姓の先頭の１文字とも一致しない場合、即ち、３音節の未知語の先頭の１音節を先頭に有する姓が人名辞書４ｂの姓のリストに登録されていない場合、又は、３音節の未知語の後方の２音節が名のリストに登録されているいずれの名の先頭の２文字とも一致しない場合、即ち、３音節の未知語の後方の２音節を先頭に有する名が人名辞書４ｂの名のリストに登録されていない場合、省略語判定部１２は、同様の処理を複合語辞書４ｃに基づいて行なう。 If the first syllable of the unknown word of the three syllables does not match the first character of any last name registered in the surname list, that is, it has the first syllable of the unknown word of the three syllable at the beginning. If the last name is not registered in the list of surnames in the personal name dictionary 4b, or if the last two syllables of the unknown word of the three syllables do not match the first two characters of any name registered in the list of first names, That is, when the name having the first two syllables after the unknown word of the three syllables is not registered in the name list of the personal name dictionary 4b, the abbreviation determination unit 12 performs the same processing based on the compound word dictionary 4c. Do.

具体的には、省略語判定部（判断手段）１２は、３音節の未知語（テキストデータ）から先頭の１音節を抽出し、抽出した１音節が、複合語辞書４ｃの構成語の欄に最初の構成語として登録されているいずれかの構成語の先頭の１文字に一致するか否かを判断する。即ち、３音節の未知語の先頭の１音節を先頭に有する構成語が、複合語辞書４ｃの構成語の欄に最初の構成語として登録されているか否かを判断する。抽出した１音節が最初の構成語として登録されているいずれかの構成語の先頭の１文字に一致すると判断した場合、省略語判定部１２は、この構成語を含む複合語の２番目の構成語を複合語辞書４ｃから読み出すと共に、抽出した１音節を除いた前記３音節の未知語から先頭の２音節、即ち、３音節の未知語の後方の２音節を抽出する。 Specifically, the abbreviation determination unit (determination means) 12 extracts the first one syllable from the unknown word (text data) of the three syllables, and the extracted one syllable is in the component word column of the compound word dictionary 4c. It is determined whether or not the first one of the constituent words registered as the first constituent word matches the first character. That is, it is determined whether or not the constituent word having the first one syllable of the unknown words of the three syllables is registered as the first constituent word in the constituent word column of the compound word dictionary 4c. When it is determined that the extracted one syllable matches the first character of one of the constituent words registered as the first constituent word, the abbreviation determination unit 12 determines the second constituent of the compound word including this constituent word. A word is read from the compound word dictionary 4c, and the first two syllables, that is, the two syllables behind the unknown word of the three syllables are extracted from the unknown word of the three syllables excluding the extracted one syllable.

省略語判定部１２は、抽出した２音節が、複合語辞書４ｃから読み出した２番目の構成語の先頭の２文字に一致するか否かを判断する。即ち、３音節の未知語の後方の２音節を先頭に有する構成語が、当該３音節の未知語の先頭の１音節を先頭に有する構成語が構成する複合語の２番目の構成語であるか否かを判断する。抽出した２音節が２番目の構成語の先頭の２文字に一致すると判断した場合、省略語判定部（判定手段）１２は、この未知語が複合語を省略した省略語であると判定する。 The abbreviation determination unit 12 determines whether or not the extracted two syllables match the first two characters of the second constituent word read from the compound word dictionary 4c. That is, the constituent word having the first two syllables behind the unknown word of the three syllables is the second constituent word of the compound word formed by the constituent word having the first one syllable of the unknown word of the three syllables. Determine whether or not. When it is determined that the extracted two syllables match the first two characters of the second constituent word, the abbreviation determination unit (determination means) 12 determines that the unknown word is an abbreviation obtained by omitting the compound word.

なお、３音節の未知語の先頭の１音節が最初の構成語として登録されているいずれの構成語の先頭の１文字とも一致しない場合、即ち、３音節の未知語の先頭の１音節を先頭に有する構成語が複合語辞書４ｃに登録されていない場合、又は、３音節の未知語の後方の２音節が２番目の構成語の先頭の２文字に一致しない場合、省略語判定部１２は、この未知語を省略語でないと判定する。 When the first syllable of the unknown word of 3 syllables does not match the first character of any of the registered words registered as the first component word, that is, the first syllable of the unknown word of 3 syllables starts Is not registered in the compound word dictionary 4c, or when the two syllables after the unknown word of the three syllables do not match the first two characters of the second constituent word, the abbreviation determination unit 12 This unknown word is determined not to be an abbreviation.

同様に、形態素解析部１１によって未知語とされた形態素が４音節である場合、省略語判定部１２は、４音節の未知語から先頭の２音節を抽出する。そして、省略語判定部１２は、抽出した２音節が、人名辞書４ｂの姓のリストに登録されているいずれかの姓の先頭の２文字に一致するか否かを判断する。即ち、４音節の未知語の先頭の２音節を先頭に有する姓が姓のリストに登録されているか否かを判断する。抽出した２音節が姓のリストに登録されているいずれかの姓の先頭の２文字に一致すると判断した場合、省略語判定部１２は、抽出した２音節を除いた前記４音節の未知語から先頭の２音節を抽出する。即ち、４音節の未知語の後方の２音節を抽出する。 Similarly, when the morpheme made into an unknown word by the morpheme analysis part 11 is 4 syllables, the abbreviation determination part 12 extracts the head 2 syllable from the unknown word of 4 syllables. Then, the abbreviation determination unit 12 determines whether or not the extracted two syllables match the first two characters of any surname registered in the surname list of the personal name dictionary 4b. That is, it is determined whether or not a surname having the first two syllables of the four syllable unknown words is registered in the surname list. When it is determined that the extracted two syllables match the first two letters of any surname registered in the surname list, the abbreviation determination unit 12 determines whether the extracted four syllable unknown words are excluded. Extract the first two syllables. That is, two syllables after the unknown word of four syllables are extracted.

省略語判定部１２は、抽出した２音節が、人名辞書４ｂの名のリストに登録されているいずれかの名の先頭の２文字に一致するか否かを判断する。即ち、４音節の未知語の後方の２音節を先頭に有する名が名のリストに登録されているか否かを判断する。抽出した２音節が名のリストに登録されているいずれかの名の先頭の２文字に一致すると判断した場合、省略語判定部１２は、この未知語が人名を省略した省略語であると判定する。 The abbreviation determination unit 12 determines whether or not the extracted two syllables match the first two characters of any name registered in the name list of the personal name dictionary 4b. That is, it is determined whether or not a name having the first two syllables behind the unknown word of four syllables is registered in the name list. When it is determined that the extracted two syllables match the first two characters of any name registered in the name list, the abbreviation determination unit 12 determines that the unknown word is an abbreviation obtained by omitting a person's name. To do.

なお、４音節の未知語の先頭の２音節が姓のリストに登録されているいずれの姓の先頭の２文字とも一致しない場合、又は、４音節の未知語の後方の２音節が名のリストに登録されているいずれの名の先頭の２文字とも一致しない場合、省略語判定部１２は、同様の処理を複合語辞書４ｃに基づいて行なう。 If the first two syllables of an unknown word in four syllables do not match the first two letters of any last name registered in the last name list, or the last two syllables of the unknown word in four syllables are a list of first names If neither of the first two characters registered in the name matches, the abbreviation determination unit 12 performs the same processing based on the compound word dictionary 4c.

具体的には、省略語判定部１２は、４音節の未知語から先頭の２音節を抽出し、抽出した２音節が、複合語辞書４ｃの構成語の欄に最初の構成語として登録されているいずれかの構成語の先頭の２文字に一致するか否かを判断する。即ち、４音節の未知語の先頭の２音節を先頭に有する構成語が、複合語辞書４ｃの構成語の欄に最初の構成語として登録されているか否かを判断する。抽出した２音節が最初の構成語として登録されているいずれかの構成語の先頭の２文字に一致すると判断した場合、省略語判定部１２は、この構成語を含む複合語の２番目の構成語を複合語辞書４ｃから読み出すと共に、抽出した２音節を除いた前記３音節の未知語から先頭の２音節、即ち、４音節の未知語の後方の２音節を抽出する。 Specifically, the abbreviation determination unit 12 extracts the first two syllables from the four syllable unknown words, and the extracted two syllables are registered as the first component word in the component word column of the compound word dictionary 4c. It is determined whether or not the first two characters of any constituent word match. That is, it is determined whether or not the constituent word having the first two syllables at the head of the four-syllable unknown word is registered as the first constituent word in the constituent word column of the compound word dictionary 4c. When it is determined that the extracted two syllables coincide with the first two characters of any constituent word registered as the first constituent word, the abbreviation determination unit 12 determines the second constituent of the compound word including this constituent word. The word is read from the compound word dictionary 4c, and the first two syllables, that is, the two syllables behind the unknown word of the four syllables are extracted from the three syllable unknown words excluding the extracted two syllables.

省略語判定部１２は、抽出した２音節が、複合語辞書４ｃから読み出した２番目の構成語の先頭の２文字に一致するか否かを判断する。即ち、４音節の未知語の後方の２音節を先頭に有する構成語が、当該４音節の未知語の先頭の２音節を先頭に有する構成語が構成する複合語の２番目の構成語であるか否かを判断する。抽出した２音節が２番目の構成語の先頭の２文字に一致すると判断した場合、省略語判定部（判定手段）１２は、この未知語が複合語を省略した省略語であると判定する。 The abbreviation determination unit 12 determines whether or not the extracted two syllables match the first two characters of the second constituent word read from the compound word dictionary 4c. In other words, the constituent word having the first two syllables behind the unknown word of the four syllables is the second constituent word of the compound word formed by the constituent words having the first two syllables of the unknown four syllable word. Determine whether or not. When it is determined that the extracted two syllables match the first two characters of the second constituent word, the abbreviation determination unit (determination means) 12 determines that the unknown word is an abbreviation obtained by omitting the compound word.

なお、４音節の未知語の先頭の２音節が最初の構成語として登録されているいずれの構成語の先頭の２文字とも一致しない場合、又は４音節の未知語の後方の２音節が２番目の構成語の先頭の２文字に一致しない場合、省略語判定部１２は、この未知語を省略語でないと判定する。 If the first two syllables of an unknown word in four syllables do not match the first two characters of any of the constituent words registered as the first constituent word, or the second syllable after the unknown word in four syllables is the second If the two words do not match the first two characters, the abbreviation determination unit 12 determines that the unknown word is not an abbreviation.

省略語判定部１２は、上述した処理を、形態素解析部１１から送出されてきた表音文字列中の全ての未知語に対して実行し、省略語であると判定された形態素（未知語）には、省略語であることを示す情報を対応付け、省略語でないと判定された形態素には、未知語であることを示す情報をそのまま対応付けて省略語アクセント付与部１３へ送出する。 The abbreviation determination unit 12 performs the above-described processing on all unknown words in the phonogram string sent from the morpheme analysis unit 11, and determines the morpheme (unknown word) determined to be an abbreviation. Is associated with information indicating that it is an abbreviation, and information indicating that it is an unknown word is directly associated with a morpheme determined not to be an abbreviation and sent to the abbreviation accent assigning unit 13.

以下に、本実施形態１のテキスト解析装置１０によるテキスト解析処理についてフローチャートに基づいて詳述する。図６はテキスト解析処理の手順を示すフローチャートである。なお、以下の処理は、テキスト解析装置１０のＲＯＭ２又はＨＤＤ４に記憶してある制御プログラムに従って制御部１によって実行される。 Below, the text analysis process by the text analysis apparatus 10 of this Embodiment 1 is explained in full detail based on a flowchart. FIG. 6 is a flowchart showing the procedure of the text analysis process. The following processing is executed by the control unit 1 in accordance with a control program stored in the ROM 2 or the HDD 4 of the text analysis device 10.

テキスト解析装置１０のユーザが操作部５を操作することによって１つのテキストデータに基づくテキスト解析の実行を指示した場合、制御部１は、ＨＤＤ４に記憶してあるテキストデータをＲＡＭ３に読み込む（Ｓ１）。制御部１（形態素解析部１１）は、ＲＡＭ３に読み込んだテキストデータを、言語辞書４ａの登録内容に基づいて形態素に分割し、分割した形態素のそれぞれにアクセント型を付与し（Ｓ２）、各形態素にアクセント型を対応付けた表音文字列を生成する。 When the user of the text analysis apparatus 10 instructs the execution of text analysis based on one text data by operating the operation unit 5, the control unit 1 reads the text data stored in the HDD 4 into the RAM 3 (S1). . The control unit 1 (morpheme analysis unit 11) divides the text data read into the RAM 3 into morphemes based on the registered contents of the language dictionary 4a, and assigns an accent type to each of the divided morphemes (S2). A phonetic character string in which an accent type is associated with is generated.

制御部１は、人名辞書４ｂ及び複合語辞書４ｃの登録内容に基づいて省略語判定処理を実行し（Ｓ３）、ステップＳ２でアクセント型を付与できなかった形態素（未知語）が省略語であるか否かを判定する。なお、省略語判定処理の詳細については図７乃至図１１に基づいて後述する。制御部１（省略語アクセント付与部１３）は、ステップＳ３で省略語であると判定された形態素に対して平板型（０型）のアクセント型を付与し（Ｓ４）、テキスト解析処理を終了する。 The control unit 1 executes an abbreviation determination process based on the registered contents of the personal name dictionary 4b and the compound word dictionary 4c (S3), and the morpheme (unknown word) that could not be given the accent type in step S2 is an abbreviation. It is determined whether or not. The details of the abbreviation determination process will be described later with reference to FIGS. The control unit 1 (abbreviated word accent assigning unit 13) assigns a flat plate type (0 type) accent type to the morpheme determined to be an abbreviated word in step S3 (S4), and ends the text analysis process. .

以下に、上述したテキスト解析処理における省略語判定処理（図６中のステップＳ３）について説明する。図７乃至図１１は省略語判定処理の手順を示すフローチャートである。なお、以下の処理は、テキスト解析装置１０のＲＯＭ２又はＨＤＤ４に記憶してある制御プログラムに従って制御部１（省略語判定部１２）によって実行される。 The abbreviation determination process (step S3 in FIG. 6) in the text analysis process described above will be described below. 7 to 11 are flowcharts showing the procedure of the abbreviation determination process. The following processing is executed by the control unit 1 (abbreviated word determination unit 12) according to a control program stored in the ROM 2 or the HDD 4 of the text analysis device 10.

制御部１は、図６中のステップＳ２で各形態素にアクセント型を対応付けた表音文字列を生成した場合、生成した表音文字列から未知語（形態素）を抽出し（Ｓ１１）、抽出した未知語が３音節であるか否かを判断する（Ｓ１２）。３音節であると判断した場合（Ｓ１２：ＹＥＳ）、制御部１は、この未知語から先頭の１音節を抽出し（Ｓ１３）、人名辞書４ｂの姓のリストから１単語を読み出す（Ｓ１４）。制御部１は、ステップＳ１３で抽出した１音節が、姓のリストから読み出した単語（姓）の先頭の１文字に一致するか否かを判断し（Ｓ１５）、一致しないと判断した場合（Ｓ１５：ＮＯ）、姓のリストから全ての単語の読み出しが終了したか否かを判断する（Ｓ１６）。 When the phonetic character string in which the accent type is associated with each morpheme is generated in step S2 in FIG. 6, the control unit 1 extracts an unknown word (morpheme) from the generated phonetic character string (S11). It is determined whether the unknown word is a three syllable (S12). If it is determined that there are three syllables (S12: YES), the control unit 1 extracts the first syllable from this unknown word (S13), and reads one word from the surname list in the personal name dictionary 4b (S14). The control unit 1 determines whether or not the one syllable extracted in step S13 matches the first character of the word (last name) read from the last name list (S15), and if it does not match (S15) : NO), it is determined whether or not all words have been read from the surname list (S16).

姓のリストから全ての単語の読み出しが終了していないと判断した場合（Ｓ１６：ＮＯ）、制御部１は、ステップＳ１４へ処理を戻し、人名辞書４ｂの姓のリストから１単語を読み出し（Ｓ１４）、ステップＳ１３で抽出した１音節が姓のリストから読み出した単語の先頭の１文字に一致するか否かの判断を繰り返す。なお、姓のリストから全ての単語の読み出しが終了したと判断した場合（Ｓ１６：ＹＥＳ）、即ち、ステップＳ１３で抽出した１音節を先頭に有する姓が人名辞書４ｂの姓のリストに登録されていない場合、制御部１は、ステップＳ２３へ処理を移行する。ステップＳ１３で抽出した１音節が姓のリストから読み出した単語の先頭の１文字に一致すると判断した場合（Ｓ１５：ＹＥＳ）、即ち、ステップＳ１３で抽出した１音節を先頭に有する姓が人名辞書４ｂの姓のリストに登録されている場合、制御部１は、ステップＳ１１で抽出した未知語の後方の２音節を抽出する（Ｓ１７）。 When it is determined that reading of all words from the last name list has not been completed (S16: NO), the control unit 1 returns the process to step S14, and reads one word from the last name list in the personal name dictionary 4b (S14). ), And repeats the determination of whether or not the one syllable extracted in step S13 matches the first character of the word read from the surname list. If it is determined that all words have been read from the surname list (S16: YES), that is, the surname having the first syllable extracted in step S13 is registered in the surname list of the personal name dictionary 4b. If not, the control unit 1 moves the process to step S23. When it is determined that one syllable extracted in step S13 matches the first character of the word read from the surname list (S15: YES), that is, the surname having the first syllable extracted in step S13 at the head is the personal name dictionary 4b. Is registered in the list of surnames, the control unit 1 extracts the two syllables behind the unknown word extracted in step S11 (S17).

制御部１は、人名辞書４ｂの名のリストから１単語を読み出し（Ｓ１８）、ステップＳ１７で抽出した２音節が、名のリストから読み出した単語（名）の先頭の２文字に一致するか否かを判断する（Ｓ１９）。抽出した２音節が名のリストから読み出した単語の先頭の２文字に一致すると判断した場合（Ｓ１９：ＹＥＳ）、即ち、ステップＳ１７で抽出した２音節を先頭に有する名が人名辞書４ｂの名のリストに登録されている場合、制御部１は、ステップＳ１１で抽出した未知語は省略語であると判定し（Ｓ２０）、図６中のステップＳ２で各形態素にアクセント型を対応付けた表音文字列中の全ての未知語に対して上述した処理を終了したか否かを判断する（Ｓ２１）。制御部１は、全ての未知語に対する処理を終了したと判断した場合（Ｓ２１：ＹＥＳ）、省略語判定処理を終了し、終了していないと判断した場合（Ｓ２１：ＮＯ）、ステップＳ１１へ処理を戻し、表音文字列中の全ての未知語に対して上述した処理を繰り返す。 The control unit 1 reads one word from the name list in the personal name dictionary 4b (S18), and whether the two syllables extracted in step S17 match the first two characters of the word (name) read from the name list. Is determined (S19). If it is determined that the extracted two syllables match the first two characters of the word read from the list of names (S19: YES), that is, the name having the two syllables extracted at step S17 at the head is the name of the name dictionary 4b. If registered in the list, the control unit 1 determines that the unknown word extracted in step S11 is an abbreviation (S20), and in step S2 in FIG. It is determined whether or not the above-described processing has been completed for all unknown words in the character string (S21). When it is determined that the process for all unknown words has been completed (S21: YES), the control unit 1 ends the abbreviation determination process, and when it is determined that the process has not been completed (S21: NO), the process proceeds to step S11. And the process described above is repeated for all unknown words in the phonetic character string.

ステップＳ１７で抽出した２音節が名のリストから読み出した単語の先頭の２文字に一致しないと判断した場合（Ｓ１９：ＮＯ）、制御部１は、名のリストから全ての単語の読み出しが終了したか否かを判断し（Ｓ２２）、全ての単語の読み出しが終了していないと判断した場合（Ｓ２２：ＮＯ）、ステップＳ１８へ処理を戻し、人名辞書４ｂの名のリストから１単語を読み出し（Ｓ１８）、ステップＳ１７で抽出した２音節が名のリストから読み出した単語の先頭の２文字に一致するか否かの判断を繰り返す。なお、名のリストから全ての単語の読み出しが終了したと判断した場合（Ｓ２２：ＹＥＳ）、即ち、ステップＳ１７で抽出した２音節を先頭に有する名が人名辞書４ｂの名のリストに登録されていない場合、制御部１は、ステップＳ２３へ処理を移行する。 If it is determined that the two syllables extracted in step S17 do not match the first two characters of the word read from the name list (S19: NO), the control unit 1 has finished reading all the words from the name list. If it is determined whether or not all words have been read (S22: NO), the process returns to step S18, and one word is read from the list of names in the personal name dictionary 4b ( S18), the determination as to whether or not the two syllables extracted in step S17 match the first two characters of the word read from the list of names is repeated. If it is determined that all words have been read from the name list (S22: YES), that is, the name having the first two syllables extracted in step S17 is registered in the name list of the personal name dictionary 4b. If not, the control unit 1 moves the process to step S23.

制御部１は、ステップＳ１１で抽出した未知語から先頭の１音節を抽出し（Ｓ２３）、複合語辞書４ｃの構成語の欄に最初の構成語として登録されている構成語を１つ読み出す（Ｓ２４）。制御部１は、ステップＳ２３で抽出した１音節が、複合語辞書４ｃから読み出した構成語の先頭の１文字に一致するか否かを判断し（Ｓ２５）、一致しないと判断した場合（Ｓ２５：ＮＯ）、複合語辞書４ｃから最初の構成語の全ての読み出しが終了したか否かを判断する（Ｓ２６）。 The control unit 1 extracts the first syllable from the unknown word extracted in step S11 (S23), and reads one constituent word registered as the first constituent word in the constituent word column of the compound word dictionary 4c ( S24). The control unit 1 determines whether or not the one syllable extracted in step S23 matches the first character of the constituent word read from the compound word dictionary 4c (S25), and determines that it does not match (S25: NO), it is determined whether or not all reading of the first constituent word from the compound word dictionary 4c has been completed (S26).

複合語辞書４ｃから最初の構成語の全ての読み出しが終了していないと判断した場合（Ｓ２６：ＮＯ）、制御部１は、ステップＳ２４へ処理を戻し、複合語辞書４ｃの構成語の欄に最初の構成語として登録されている構成語を１つ読み出し（Ｓ２４）、ステップＳ２３で抽出した１音節が複合語辞書４ｃから読み出した構成語の先頭の１文字に一致するか否かの判断を繰り返す。なお、複合語辞書４ｃから最初の構成語の全ての読み出しが終了したと判断した場合（Ｓ２６：ＹＥＳ）、即ち、ステップＳ２３で抽出した１音節を先頭に有する最初の構成語が複合語辞書４ｃに登録されていない場合、制御部１は、ステップＳ１１で抽出した未知語は省略語でないと判定し（Ｓ３１）、ステップＳ２１へ処理を移行する。 If it is determined that the reading of all the first constituent words from the compound word dictionary 4c has not been completed (S26: NO), the control unit 1 returns the process to step S24, and enters the constituent word column of the compound word dictionary 4c. One constituent word registered as the first constituent word is read (S24), and it is determined whether or not one syllable extracted in step S23 matches the first character of the constituent word read from the compound word dictionary 4c. repeat. When it is determined that all the first constituent words have been read from the compound word dictionary 4c (S26: YES), that is, the first component word having one syllable extracted at step S23 is the compound word dictionary 4c. If not registered, the controller 1 determines that the unknown word extracted in step S11 is not an abbreviation (S31), and proceeds to step S21.

ステップＳ２３で抽出した１音節が複合語辞書４ｃから読み出した構成語の先頭の１文字に一致すると判断した場合（Ｓ２５：ＹＥＳ）、即ち、ステップＳ２３で抽出した１音節を先頭に有する最初の構成語が複合語辞書４ｃに登録されている場合、制御部１は、ステップＳ１１で抽出した未知語の後方の２音節を抽出する（Ｓ２７）。制御部１は、ステップＳ２４で読み出した最初の構成語を含む複合語の２番目の構成語を複合語辞書４ｃから読み出し（Ｓ２８）、ステップＳ２７で抽出した２音節が、複合語辞書４ｃから読み出した２番目の構成語の先頭の２文字に一致するか否かを判断する（Ｓ２９）。 When it is determined that one syllable extracted in step S23 matches the first character of the constituent word read from the compound word dictionary 4c (S25: YES), that is, the first component having one syllable extracted in step S23 at the head. When the word is registered in the compound word dictionary 4c, the control unit 1 extracts the two syllables behind the unknown word extracted in step S11 (S27). The control unit 1 reads out the second constituent word of the compound word including the first constituent word read out in step S24 from the compound word dictionary 4c (S28), and reads out the two syllables extracted in step S27 from the compound word dictionary 4c. It is determined whether or not the first two characters of the second constituent word match (S29).

制御部１は、抽出した２音節が複合語辞書４ｃから読み出した２番目の構成語の先頭の２文字に一致すると判断した場合（Ｓ２９：ＹＥＳ）、ステップＳ１１で抽出した未知語は省略語であると判定し（Ｓ３０）、ステップＳ２１へ処理を移行する。また、制御部１は、抽出した２音節が複合語辞書４ｃから読み出した２番目の構成語の先頭の２文字に一致しないと判断した場合（Ｓ２９：ＮＯ）、ステップＳ１１で抽出した未知語は省略語でないと判定し（Ｓ３１）、ステップＳ２１へ処理を移行する。 When the control unit 1 determines that the extracted two syllables match the first two characters of the second constituent word read from the compound word dictionary 4c (S29: YES), the unknown word extracted in step S11 is an abbreviation. It determines with there (S30), and transfers a process to step S21. If the control unit 1 determines that the extracted two syllables do not match the first two characters of the second constituent word read from the compound word dictionary 4c (S29: NO), the unknown word extracted in step S11 is It determines with it not being an abbreviation (S31), and transfers a process to step S21.

一方、ステップＳ１１で抽出した未知語が３音節でないと判断した場合（Ｓ１２：ＮＯ）、制御部１は、この未知語が４音節であるか否かを判断し（Ｓ３２）、４音節でないと判断した場合（Ｓ３２：ＮＯ）、ステップＳ２１へ処理を移行する。制御部１は、この未知語が４音節であると判断した場合（Ｓ３２：ＹＥＳ）、この未知語から先頭の２音節を抽出し（Ｓ３３）、人名辞書４ｂの姓のリストから１単語を読み出す（Ｓ３４）。 On the other hand, when it is determined that the unknown word extracted in step S11 is not 3 syllables (S12: NO), the control unit 1 determines whether or not this unknown word is 4 syllables (S32) and is not 4 syllables. If it is determined (S32: NO), the process proceeds to step S21. When it is determined that the unknown word is four syllables (S32: YES), the control unit 1 extracts the first two syllables from the unknown word (S33), and reads one word from the surname list in the personal name dictionary 4b. (S34).

制御部１は、ステップＳ３３で抽出した２音節が、姓のリストから読み出した単語（姓）の先頭の２文字に一致するか否かを判断し（Ｓ３５）、一致しないと判断した場合（Ｓ３５：ＮＯ）、姓のリストから全ての単語の読み出しが終了したか否かを判断する（Ｓ３６）。姓のリストから全ての単語の読み出しが終了していないと判断した場合（Ｓ３６：ＮＯ）、制御部１は、ステップＳ３４へ処理を戻し、人名辞書４ｂの姓のリストから１単語を読み出し（Ｓ３４）、ステップＳ３３で抽出した２音節が姓のリストから読み出した単語の先頭の２文字に一致するか否かの判断を繰り返す。なお、姓のリストから全ての単語の読み出しが終了したと判断した場合（Ｓ３６：ＹＥＳ）、即ち、ステップＳ３３で抽出した２音節を先頭に有する姓が人名辞書４ｂの姓のリストに登録されていない場合、制御部１は、ステップＳ４２へ処理を移行する。 The control unit 1 determines whether or not the two syllables extracted in step S33 match the first two characters of the word (last name) read from the last name list (S35), and determines that they do not match (S35) : NO), it is determined whether or not all words have been read from the surname list (S36). When it is determined that reading of all words from the last name list has not been completed (S36: NO), the control unit 1 returns the process to step S34, and reads one word from the last name list in the personal name dictionary 4b (S34). ), The determination as to whether or not the two syllables extracted in step S33 match the first two letters of the word read from the surname list is repeated. If it is determined that all words have been read from the surname list (S36: YES), that is, surnames having the first two syllables extracted in step S33 are registered in the surname list of the personal name dictionary 4b. If not, the control unit 1 moves the process to step S42.

ステップＳ３３で抽出した２音節が姓のリストから読み出した単語の先頭の２文字に一致すると判断した場合（Ｓ３５：ＹＥＳ）、即ち、ステップＳ３３で抽出した２音節を先頭に有する姓が人名辞書４ｂの姓のリストに登録されている場合、制御部１は、ステップＳ１１で抽出した未知語の後方の２音節を抽出する（Ｓ３７）。 When it is determined that the two syllables extracted in step S33 match the first two letters of the word read from the surname list (S35: YES), that is, the surname having the two syllables extracted in step S33 at the head is the personal name dictionary 4b. Is registered in the list of surnames, the control unit 1 extracts the two syllables behind the unknown word extracted in step S11 (S37).

制御部１は、人名辞書４ｂの名のリストから１単語を読み出し（Ｓ３８）、ステップＳ３７で抽出した２音節が、名のリストから読み出した単語（名）の先頭の２文字に一致するか否かを判断する（Ｓ３９）。抽出した２音節が名のリストから読み出した単語の先頭の２文字に一致すると判断した場合（Ｓ３９：ＹＥＳ）、即ち、ステップＳ３７で抽出した２音節を先頭に有する名が人名辞書４ｂの名のリストに登録されている場合、制御部１は、ステップＳ１１で抽出した未知語は省略語であると判定し（Ｓ４０）、ステップＳ２１へ処理を移行する。 The control unit 1 reads one word from the name list of the personal name dictionary 4b (S38), and whether or not the two syllables extracted in step S37 match the first two characters of the word (name) read from the name list. Is determined (S39). When it is determined that the extracted two syllables coincide with the first two characters of the word read from the list of names (S39: YES), that is, the name having the two syllables extracted at step S37 at the head is the name of the name dictionary 4b. When registered in the list, the control unit 1 determines that the unknown word extracted in step S11 is an abbreviation (S40), and proceeds to step S21.

ステップＳ３７で抽出した２音節が名のリストから読み出した単語の先頭の２文字に一致しないと判断した場合（Ｓ３９：ＮＯ）、制御部１は、名のリストから全ての単語の読み出しが終了したか否かを判断し（Ｓ４１）、全ての単語の読み出しが終了していないと判断した場合（Ｓ４１：ＮＯ）、ステップＳ３８へ処理を戻し、人名辞書４ｂの名のリストから１単語を読み出し（Ｓ３８）、ステップＳ３７で抽出した２音節が名のリストから読み出した単語の先頭の２文字に一致するか否かの判断を繰り返す。なお、名のリストから全ての単語の読み出しが終了したと判断した場合（Ｓ４１：ＹＥＳ）、即ち、ステップＳ３７で抽出した２音節を先頭に有する名が人名辞書４ｂの名のリストに登録されていない場合、制御部１は、ステップＳ４２へ処理を移行する。 When it is determined that the two syllables extracted in step S37 do not match the first two letters of the word read from the name list (S39: NO), the control unit 1 has finished reading all the words from the name list. If it is determined whether or not all words have been read (S41: NO), the process returns to step S38, and one word is read from the list of names in the personal name dictionary 4b ( S38) The determination as to whether or not the two syllables extracted in step S37 match the first two characters of the word read from the list of names is repeated. If it is determined that all words have been read from the name list (S41: YES), that is, the name having the first two syllables extracted in step S37 is registered in the name list of the personal name dictionary 4b. If not, the control unit 1 moves the process to step S42.

制御部１は、ステップＳ１１で抽出した未知語から先頭の２音節を抽出し（Ｓ４２）、複合語辞書４ｃの構成語の欄に最初の構成語として登録されている構成語を１つ読み出す（Ｓ４３）。制御部１は、ステップＳ４２で抽出した２音節が、複合語辞書４ｃから読み出した構成語の先頭の２文字に一致するか否かを判断し（Ｓ４４）、一致しないと判断した場合（Ｓ４４：ＮＯ）、複合語辞書４ｃから最初の構成語の全ての読み出しが終了したか否かを判断する（Ｓ４５）。 The control unit 1 extracts the first two syllables from the unknown word extracted in step S11 (S42), and reads one constituent word registered as the first constituent word in the constituent word column of the compound word dictionary 4c ( S43). The control unit 1 determines whether or not the two syllables extracted in step S42 match the first two characters of the constituent words read from the compound word dictionary 4c (S44), and determines that they do not match (S44: NO), it is determined whether or not all of the first constituent words have been read from the compound word dictionary 4c (S45).

複合語辞書４ｃから最初の構成語の全ての読み出しが終了していないと判断した場合（Ｓ４５：ＮＯ）、制御部１は、ステップＳ４３へ処理を戻し、複合語辞書４ｃの構成語の欄に最初の構成語として登録されている構成語を１つ読み出し（Ｓ４３）、ステップＳ４２で抽出した２音節が複合語辞書４ｃから読み出した構成語の先頭の２文字に一致するか否かの判断を繰り返す。なお、複合語辞書４ｃから最初の構成語の全ての読み出しが終了したと判断した場合（Ｓ４５：ＹＥＳ）、即ち、ステップＳ４２で抽出した２音節を先頭に有する最初の構成語が複合語辞書４ｃに登録されていない場合、制御部１は、ステップＳ１１で抽出した未知語は省略語でないと判定し（Ｓ５０）、ステップＳ２１へ処理を移行する。 When it is determined that reading of all the first component words from the compound word dictionary 4c has not been completed (S45: NO), the control unit 1 returns the process to step S43, and enters the component word column of the compound word dictionary 4c. One constituent word registered as the first constituent word is read (S43), and it is determined whether or not the two syllables extracted in step S42 match the first two characters of the constituent word read from the compound word dictionary 4c. repeat. When it is determined that all the first constituent words have been read from the compound word dictionary 4c (S45: YES), that is, the first component word having two syllables extracted in step S42 is the compound word dictionary 4c. If not registered, the controller 1 determines that the unknown word extracted in step S11 is not an abbreviation (S50), and proceeds to step S21.

ステップＳ４２で抽出した２音節が複合語辞書４ｃから読み出した構成語の先頭の２文字に一致すると判断した場合（Ｓ４４：ＹＥＳ）、即ち、ステップＳ４２で抽出した２音節を先頭に有する最初の構成語が複合語辞書４ｃに登録されている場合、制御部１は、ステップＳ１１で抽出した未知語の後方の２音節を抽出する（Ｓ４６）。制御部１は、ステップＳ４３で読み出した最初の構成語を含む複合語の２番目の構成語を複合語辞書４ｃから読み出し（Ｓ４７）、ステップＳ４６で抽出した２音節が、複合語辞書４ｃから読み出した２番目の構成語の先頭の２文字に一致するか否かを判断する（Ｓ４８）。 When it is determined that the two syllables extracted in step S42 match the first two characters of the constituent word read from the compound word dictionary 4c (S44: YES), that is, the first configuration having the two syllables extracted in step S42 at the head. When the word is registered in the compound word dictionary 4c, the control unit 1 extracts the two syllables behind the unknown word extracted in step S11 (S46). The control unit 1 reads the second constituent word of the compound word including the first constituent word read out in step S43 from the compound word dictionary 4c (S47), and the two syllables extracted in step S46 are read out from the compound word dictionary 4c. It is then determined whether or not the first two characters of the second constituent word match (S48).

制御部１は、抽出した２音節が複合語辞書４ｃから読み出した２番目の構成語の先頭の２文字に一致すると判断した場合（Ｓ４８：ＹＥＳ）、ステップＳ１１で抽出した未知語は省略語であると判定し（Ｓ４９）、ステップＳ２１へ処理を移行する。また、制御部１は、抽出した２音節が複合語辞書４ｃから読み出した２番目の構成語の先頭の２文字に一致しないと判断した場合（Ｓ４８：ＮＯ）、ステップＳ１１で抽出した未知語は省略語でないと判定し（Ｓ５０）、ステップＳ２１へ処理を移行する。 When the control unit 1 determines that the extracted two syllables match the first two characters of the second constituent word read from the compound word dictionary 4c (S48: YES), the unknown word extracted in step S11 is an abbreviation. It determines with there (S49), and transfers a process to step S21. If the controller 1 determines that the extracted two syllables do not match the first two characters of the second constituent word read from the compound word dictionary 4c (S48: NO), the unknown word extracted in step S11 is It determines with it not being an abbreviation (S50), and transfers a process to step S21.

上述したように、本実施形態１のテキスト解析装置１０では、言語辞書４ａに基づいてアクセント型を付与できなかった形態素（未知語）に対して省略語判定処理を行なうことにより、人名を省略した省略語であるか複合語を省略した省略語であるかを判定することができる。また、省略語であると判定された未知語には所定のアクセント型（平板型のアクセント型）を付与し、省略語であると判定されなかった未知語には、例えば後ろから３モーラ目にアクセント核を有するアクセント型を付与することにより、省略語であると判定された未知語と、省略語でないと判定された未知語とにそれぞれ異なるアクセント型を付与することができるので、それぞれに適したアクセント型を付与することができる。 As described above, in the text analysis apparatus 10 of the first embodiment, the abbreviation determination process is performed on the morpheme (unknown word) for which the accent type could not be assigned based on the language dictionary 4a, thereby omitting the person name. It is possible to determine whether the abbreviation is an abbreviation or an abbreviation with the compound word omitted. In addition, a predetermined accent type (flat plate type) is given to unknown words determined to be abbreviations, and unknown words that are not determined to be abbreviations, for example, from the back to the third mora By assigning an accent type with an accent nucleus, different accent types can be assigned to unknown words that are determined to be abbreviations and unknown words that are determined to be not abbreviations. Accent type can be given.

上述した実施形態１では、各未知語が、人名における姓の先頭の１音節と名の先頭の２音節とを組み合わせた省略語、人名における姓の先頭の２音節と名の先頭の２音節とを組み合わせた省略語、複合語を構成する最初の構成語の先頭の１音節と２番目の構成語の先頭の２音節とを組み合わせた省略語、複合語を構成する最初の構成語の先頭の２音節と２番目の構成語の先頭の２音節とを組み合わせた省略語のいずれかであるか否かを判定する構成であった。しかし、省略語を構成する文字数はこれらに限られず、また、操作部５を介してユーザからの設定によって、任意の文字数に変更可能とすることもできる。 In Embodiment 1 described above, each unknown word is an abbreviation that combines the first syllable of the last name in the personal name and the first two syllables of the first name, the first two syllables of the last name in the personal name, and the first two syllables of the first name Abbreviations combining the first one syllable of the first component word constituting the compound word and the first two syllables of the second component word, the first of the first component word constituting the compound word In this configuration, it is determined whether the abbreviation is a combination of two syllables and the first two syllables of the second constituent word. However, the number of characters constituting the abbreviation is not limited to these, and can be changed to an arbitrary number of characters by setting from the user via the operation unit 5.

上述した構成により、本実施形態１のテキスト解析装置１０では、マツケン（松平健）、キムタク（木村拓也）、ナカショー（中川翔子）、コスプレ（コスチュームプレイ）、連ドラ（連続ドラマ）、筋トレ（筋肉トレーニング）、地デジ（地上デジタル放送）等の省略語がテキストデータに含まれている場合に、各単語を省略語であると適切に判定することができる。 With the configuration described above, in the text analysis apparatus 10 of the first embodiment, Matsuken (Ken Matsuhira), Kim Tak (Takuya Kimura), Naka Show (Shoko Nakagawa), Cosplay (costume play), series drama (continuous drama), muscle training ( When abbreviations such as muscle training) and terrestrial digital (terrestrial digital broadcasting) are included in the text data, each word can be appropriately determined as an abbreviation.

（実施形態２）
以下に、本発明に係る省略語判定装置を備えた本発明に係るテキスト解析装置を、実施形態２を示す図面に基づいて詳述する。なお、本実施形態２のテキスト解析装置は、上述した実施形態１のテキスト解析装置１０と同様の構成によって実現することができるので、同様の構成については同一の符号を付して説明を省略する。 (Embodiment 2)
Hereinafter, a text analysis device according to the present invention provided with an abbreviation determination device according to the present invention will be described in detail with reference to the drawings showing Embodiment 2. Since the text analysis apparatus according to the second embodiment can be realized by the same configuration as the text analysis apparatus 10 according to the first embodiment, the same reference numerals are given to the same configurations and the description thereof is omitted. .

図１２は実施形態２のテキスト解析装置１０の機能構成例を示す機能ブロック図である。本実施形態２のテキスト解析装置１０において、制御部１は、ＲＯＭ２又はＨＤＤ４に記憶してある制御プログラムを実行することにより、上述した実施形態１のテキスト解析装置１０と同様に、形態素解析部１１、省略語判定部１２及び省略語アクセント付与部１３等の各機能を実現する。 FIG. 12 is a functional block diagram illustrating a functional configuration example of the text analysis apparatus 10 according to the second embodiment. In the text analysis apparatus 10 according to the second embodiment, the control unit 1 executes a control program stored in the ROM 2 or the HDD 4, and similarly to the text analysis apparatus 10 according to the first embodiment described above, the morpheme analysis unit 11. Each function of the abbreviation determination unit 12 and the abbreviation accent assignment unit 13 is realized.

なお、本実施形態２のテキスト解析装置１０のＨＤＤ４には、図１３に示すような共起辞書４ｄが格納されている。図１３は共起辞書４ｄの登録内容を示す模式図である。図１３に示すように、共起辞書（共起データ記憶手段）４ｄには、単語（テキストデータ）の読み及び各単語と共起される共起単語（共起データ）がそれぞれ対応付けて登録されている。 A co-occurrence dictionary 4d as shown in FIG. 13 is stored in the HDD 4 of the text analysis apparatus 10 of the second embodiment. FIG. 13 is a schematic diagram showing the registered contents of the co-occurrence dictionary 4d. As shown in FIG. 13, in the co-occurrence dictionary (co-occurrence data storage means) 4d, words (text data) are read and co-occurrence words (co-occurrence data) co-occurring with each word are registered in association with each other. Has been.

以下に、上述した構成の本実施形態２のテキスト解析装置１０によるテキスト解析処理について説明する。以下では、テキスト解析装置１０が「マツケンが、サンバを踊った。」のテキストデータを解析する処理を例に説明する。
本実施形態２の形態素解析部１１は、上述した実施形態１の形態素解析部１１と同様に、ＨＤＤ４に記憶してあるテキストデータ（文書データ）をＨＤＤ４からＲＡＭ３に読み出し、言語辞書（テキスト記憶手段）４ａの登録内容に基づいて、ＲＡＭ３に読み出したテキストデータ（文書データ）を形態素（テキストデータ）に分割すると共に、分割した形態素のそれぞれにアクセント型を付与する。具体的には、形態素解析部１１は、実施形態１で説明したように、「マツケン（未知語）・ガ（１モーラ０型）・サンバ（３モーラ１型）・オ（１モーラ０型）・オドッタ（４モーラ０型）」の表音文字列を生成して省略語判定部１２へ送出する。 The text analysis process performed by the text analysis apparatus 10 according to the second embodiment having the above-described configuration will be described below. In the following, description will be given by taking as an example a process in which the text analysis apparatus 10 analyzes text data of “Matsuken danced a samba”.
The morpheme analysis unit 11 of the second embodiment reads text data (document data) stored in the HDD 4 from the HDD 4 to the RAM 3 in the same manner as the morpheme analysis unit 11 of the first embodiment described above. ) Based on the registered content of 4a, the text data (document data) read to the RAM 3 is divided into morphemes (text data), and an accent type is given to each of the divided morphemes. Specifically, as described in the first embodiment, the morphological analysis unit 11 “Matsuken (unknown word) • Ga (1 mora 0 type) • Samba (3 mora 1 type) • O (1 mora 0 type) A phonogram string of “Odotta (4 mora 0 type)” is generated and sent to the abbreviation determination unit 12.

本実施形態２の省略語判定部１２は、まず、上述した実施形態１の省略語判定部１２と同様の処理を行ない、人名辞書４ｂ又は複合語辞書４ｃの登録内容に基づいて、形態素解析部１１から送出されてきた表音文字列中の未知語（形態素）が、人名又は複合語を省略した省略語の候補であるか否かを判定する。ここでは、実施形態１で説明した通り、未知語「マツケン」が省略語の候補であると判定される。なお、本実施形態２の省略語判定部１２は、各未知語が省略語の候補であると判定した場合、この省略語に対する正式名称を、人名辞書４ｂ又は複合語辞書４ｃの登録内容に基づいて取得しておく。 The abbreviation determination unit 12 of the second embodiment first performs the same processing as the abbreviation determination unit 12 of the first embodiment described above, and based on the registered contents of the personal name dictionary 4b or the compound word dictionary 4c, the morpheme analysis unit 11 is determined whether or not the unknown word (morpheme) in the phonogram string sent from 11 is a candidate for an abbreviation obtained by omitting a person name or a compound word. Here, as described in the first embodiment, the unknown word “Matsuken” is determined to be an abbreviation candidate. When the abbreviation determination unit 12 of the second embodiment determines that each unknown word is a candidate for an abbreviation, the formal name for the abbreviation is based on the registered contents of the personal name dictionary 4b or the compound word dictionary 4c. Get it.

具体的には、省略語判定部１２は、未知語の先頭の１音節（又は２音節）が、人名辞書４ｂの姓のリストに登録されているいずれかの姓の先頭の１文字（又は２文字）に一致する場合、若しくは、複合語辞書４ｃに登録されている最初の構成語のいずれかの先頭の１文字（又は２文字）に一致する場合、それぞれ一致する姓又は構成語を読み出してＲＡＭ３に記憶しておく。また、省略語判定部１２は、未知語の後方の２音節が、人名辞書４ｂの名のリストに登録されているいずれかの名の先頭の２文字に一致する場合、若しくは、複合語辞書４ｃに登録されている２番目の構成語の先頭の２文字に一致する場合、それぞれ一致する名又は構成語を読み出してＲＡＭ３に記憶しておく。これにより、省略語判定部１２は、省略語の候補であると判定した未知語に対する正式名称を取得することができる。なお、ここでは、省略語の候補「マツケン」の正式名称「マツダイラケン」が取得される。 Specifically, the abbreviation determination unit 12 determines that the first one syllable (or two syllables) of the unknown word is the first character (or 2) of any surname registered in the surname list of the personal name dictionary 4b. Character), or when matching the first character (or two characters) of any of the first constituent words registered in the compound word dictionary 4c, the corresponding surname or constituent word is read out. Stored in the RAM 3. The abbreviation determination unit 12 also determines that the two syllables behind the unknown word match the first two characters of any name registered in the name list of the personal name dictionary 4b, or the compound word dictionary 4c. When the first two characters of the second component word registered in the column match, the corresponding name or component word is read out and stored in the RAM 3. Thereby, the abbreviation determination part 12 can acquire the formal name with respect to the unknown word determined to be an abbreviation candidate. Here, the official name “Matsudai Raken” of the abbreviation candidate “Matsuken” is acquired.

次に、省略語判定部１２は、省略語の候補であると判定した未知語の正式名称に基づいて、この省略語に対する共起単語を共起辞書４ｄから取得する。ここでは、正式名称「マツダイラケン」に対する共起単語「サンバ」及び「暴れん坊将軍」が取得される。省略語判定部１２は、形態素解析部１１から送出されてきた表音文字列中に、共起辞書４ｄから取得した共起単語が含まれているか否かを判断し、含まれると判断した場合、省略語の候補であると判定していた未知語を省略語であると確定する。そして、省略語判定部１２は、「マツケン（省略語）・ガ（１モーラ０型）・サンバ（３モーラ１型）・オ（１モーラ０型）・オドッタ（４モーラ０型）」の表音文字列を省略語アクセント付与部１３へ送出する。 Next, the abbreviation determination unit 12 acquires a co-occurrence word for this abbreviation from the co-occurrence dictionary 4d based on the formal name of the unknown word determined to be an abbreviation candidate. Here, the co-occurrence words “samba” and “general shogun” for the official name “Matsudai Laken” are acquired. When the abbreviation determination unit 12 determines whether or not the co-occurrence word acquired from the co-occurrence dictionary 4d is included in the phonogram string transmitted from the morpheme analysis unit 11, and determines that it is included The unknown word determined to be an abbreviation candidate is determined to be an abbreviation. Then, the abbreviation determination unit 12 displays a table of “Matsuken (abbreviation), Ga (1 mora 0 type), Samba (3 mora 1 type), E (1 mora 0 type), Odotta (4 mora 0 type)”. The phonetic character string is sent to the abbreviation accent assigning unit 13.

一方、形態素解析部１１から送出されてきた表音文字列中に、共起辞書４ｄから取得した共起単語が含まれていないと判断した場合、省略語判定部１２は、省略語の候補であると判定した未知語を省略語でないと確定し、例えば、「マツケン（未知語）・ガ（１モーラ０型）・サンバ（３モーラ１型）・オ（１モーラ０型）・オドッタ（４モーラ０型）」の表音文字列を省略語アクセント付与部１３へ送出する。 On the other hand, when it is determined that the syllable character string transmitted from the morphological analysis unit 11 does not include the co-occurrence word acquired from the co-occurrence dictionary 4d, the abbreviation determination unit 12 is an abbreviation candidate. It is determined that an unknown word determined to be an abbreviation is not an abbreviation. For example, “Matsuken (unknown word), Ga (1 mora 0 type), Samba (3 mora 1 type), O (1 mora 0 type), Odotta (4 "Mora 0 type" "is sent to the abbreviation accent giving unit 13.

省略語アクセント付与部１３は、上述した実施形態１の省略語アクセント付与部１３と同様に、省略語判定部１２によって省略語であると判定された形態素に対して平板型（０型）のアクセント型を付与する。 The abbreviation accent assigning unit 13 is a flat type (0 type) accent for the morphemes determined to be abbreviations by the abbreviation determining unit 12, similarly to the abbreviation accent providing unit 13 of the first embodiment described above. Give type.

上述したように、人名辞書４ｂ及び複合語辞書４ｃの登録内容に基づいて、省略語の候補であると判定された未知語について、当該未知語を含むテキストデータ（文書データ）中に、当該未知語と共起される共起単語が含まれない場合は、この未知語が、当該省略語ではない可能性が高いため、当該省略語ではないと確定する。これにより、省略語の誤判定を防止することができ、真に省略語に対してのみ所定のアクセント型を付与することができる。 As described above, for unknown words determined to be abbreviation candidates based on the registered contents of the personal name dictionary 4b and the compound word dictionary 4c, the unknown word is included in the text data (document data) including the unknown word. If a co-occurrence word that co-occurs with a word is not included, it is highly likely that this unknown word is not the abbreviation, so it is determined that it is not the abbreviation. Thereby, erroneous determination of abbreviations can be prevented, and a predetermined accent type can be given only to abbreviations.

以下に、本実施形態２のテキスト解析装置１０によるテキスト解析処理についてフローチャートに基づいて詳述する。図１４はテキスト解析処理の手順を示すフローチャートである。なお、以下の処理は、テキスト解析装置１０のＲＯＭ２又はＨＤＤ４に記憶してある制御プログラムに従って制御部１によって実行される。 Below, the text analysis process by the text analysis apparatus 10 of this Embodiment 2 is explained in full detail based on a flowchart. FIG. 14 is a flowchart showing the procedure of text analysis processing. The following processing is executed by the control unit 1 in accordance with a control program stored in the ROM 2 or the HDD 4 of the text analysis device 10.

テキスト解析装置１０のユーザが操作部５を操作することによって１つのテキストデータに基づくテキスト解析の実行を指示した場合、制御部１は、ＨＤＤ４に記憶してあるテキストデータをＲＡＭ３に読み込む（Ｓ６１）。制御部１（形態素解析部１１）は、ＲＡＭ３に読み込んだテキストデータを、言語辞書４ａの登録内容に基づいて形態素に分割し、分割した形態素のそれぞれにアクセント型を付与し（Ｓ６２）、各形態素にアクセント型を対応付けた表音文字列を生成する。 When the user of the text analysis apparatus 10 instructs the execution of text analysis based on one text data by operating the operation unit 5, the control unit 1 reads the text data stored in the HDD 4 into the RAM 3 (S61). . The control unit 1 (morpheme analysis unit 11) divides the text data read into the RAM 3 into morphemes based on the registered contents of the language dictionary 4a, and assigns an accent type to each of the divided morphemes (S62). A phonetic character string in which an accent type is associated with is generated.

制御部１（省略語判定部１２）は、人名辞書４ｂ及び複合語辞書４ｃの登録内容に基づいて省略語判定処理を実行し（Ｓ６３）、ステップＳ６２でアクセント型を付与できなかった形態素（未知語）が省略語の候補であるか否かを判定する。なお、本実施形態２の省略語判定処理は、実施形態１において図７乃至図１１に基づいて説明した処理と同様であるが、本実施形態２の制御部１（省略語判定部１２）は、各未知語が省略語の候補であると判定した場合、この省略語に対する正式名称を、人名辞書４ｂ又は複合語辞書４ｃの登録内容に基づいて取得する（Ｓ６４）。 The control unit 1 (abbreviated word determining unit 12) executes the abbreviated word determining process based on the registered contents of the personal name dictionary 4b and the compound word dictionary 4c (S63), and the morpheme that could not give the accent type in step S62 (unknown Word) is a candidate for abbreviation. The abbreviation determination processing of the second embodiment is the same as the processing described in the first embodiment based on FIGS. 7 to 11, but the control unit 1 (abbreviation determination unit 12) of the second embodiment is When it is determined that each unknown word is a candidate for an abbreviation, the official name for the abbreviation is acquired based on the registered contents of the personal name dictionary 4b or the compound word dictionary 4c (S64).

制御部１は、ステップＳ６４で取得した正式名称に基づいて、この省略語に対する共起単語を共起辞書４ｄから取得し（Ｓ６５）、ステップＳ６２で生成した表音文字列中に、共起辞書４ｄから取得した共起単語が含まれるか否かを判断する（Ｓ６６）。共起単語が含まれると判断した場合（Ｓ６６：ＹＥＳ）、制御部１は、ステップＳ６３で省略語の候補であると判定した未知語を省略語であると確定する（Ｓ６７）。また、制御部１（省略語アクセント付与部１３）は、ステップＳ６７で省略語であると確定された形態素に対して平板型（０型）のアクセント型を付与し（Ｓ６８）、テキスト解析処理を終了する。 The control unit 1 acquires a co-occurrence word for the abbreviation from the co-occurrence dictionary 4d based on the official name acquired in step S64 (S65), and includes the co-occurrence dictionary in the phonogram string generated in step S62. It is determined whether or not the co-occurrence word acquired from 4d is included (S66). When it is determined that the co-occurrence word is included (S66: YES), the control unit 1 determines that the unknown word determined to be an abbreviation candidate in step S63 is an abbreviation (S67). Further, the control unit 1 (abbreviated word accent assigning unit 13) assigns a flat type (0 type) accent type to the morpheme determined to be an abbreviated word in step S67 (S68), and performs text analysis processing. finish.

一方、ステップＳ６２で生成した表音文字列中に共起単語が含まれないと判断した場合（Ｓ６６：ＮＯ）、制御部１は、ステップＳ６３で省略語の候補であると判定した未知語を省略語でないと確定し（Ｓ６９）、テキスト解析処理を終了する。 On the other hand, when it is determined that the co-occurrence word is not included in the phonogram string generated in step S62 (S66: NO), the control unit 1 determines the unknown word determined to be an abbreviation candidate in step S63. If it is not an abbreviation, it is determined (S69), and the text analysis process is terminated.

上述したように、本実施形態２のテキスト解析装置１０では、人名辞書４ｂ及び複合語辞書４ｃの登録内容に基づいて省略語の候補であると判定された未知語を含むテキストデータ（文書データ）中に、当該省略語と共起される共起単語が含まれない場合は、この未知語が当該省略語ではない可能性が高いため、当該省略語ではないと確定する。従って、省略語の候補であると判定された未知語が、当該省略語であれば共起される可能性の高い共起単語と共に用いられていれば、この未知語を省略語であると確定し、共起単語と共に用いられていなければ、この未知語を省略語でないと確定する。 As described above, in the text analysis apparatus 10 according to the second embodiment, text data (document data) including an unknown word determined to be an abbreviation candidate based on registered contents of the personal name dictionary 4b and the compound word dictionary 4c. If a co-occurrence word that co-occurs with the abbreviation is not included, it is highly likely that the unknown word is not the abbreviation, so it is determined that it is not the abbreviation. Therefore, if an unknown word determined to be a candidate for an abbreviation is used together with a co-occurrence word that is likely to co-occur if it is the abbreviation, the unknown word is determined to be an abbreviation. If it is not used with a co-occurrence word, it is determined that this unknown word is not an abbreviation.

これにより、省略語の誤判定を防止することができ、真の省略語に対してのみ省略語であると判定することができる。よって、省略語であると確定された未知語には所定のアクセント型を付与し、省略語でないと確定され未知語には、例えば後ろから３モーラ目にアクセント核を有するアクセント型を付与することにより、真の省略語と、省略語でない未知語とにそれぞれ異なるアクセント型を付与することができるので、それぞれに適したアクセント型を付与することができる。 Thereby, erroneous determination of an abbreviation can be prevented, and it can be determined that only an abbreviation is a true abbreviation. Therefore, a predetermined accent type is assigned to an unknown word determined to be an abbreviation, and an accent type having an accent nucleus is assigned to the unknown word determined to be not an abbreviation, for example, at the third mora from the back. Thus, different accent types can be assigned to true abbreviations and unknown words that are not abbreviations, so that an appropriate accent type can be assigned to each.

（実施形態３）
以下に、本発明に係る省略語判定装置を備えた本発明に係るテキスト解析装置を、実施形態３を示す図面に基づいて詳述する。なお、本実施形態３のテキスト解析装置は、上述した実施形態１のテキスト解析装置１０と同様の構成によって実現することができるので、同様の構成については同一の符号を付して説明を省略する。 (Embodiment 3)
Hereinafter, a text analysis apparatus according to the present invention provided with an abbreviation determination apparatus according to the present invention will be described in detail with reference to the drawings showing a third embodiment. Note that the text analysis apparatus according to the third embodiment can be realized by the same configuration as the text analysis apparatus 10 according to the first embodiment described above. .

上述した実施形態１のテキスト解析装置１０において、制御部１（省略語アクセント付与部１３）は、人名辞書４ｂ及び複合語辞書４ｃの登録内容に基づいて省略語であると判定した未知語に対して平板型のアクセント型を付与するように構成されていた。 In the text analysis device 10 according to the first embodiment described above, the control unit 1 (abbreviated word accent assigning unit 13) performs an operation on unknown words determined to be abbreviated words based on the registered contents of the personal name dictionary 4b and the compound word dictionary 4c. It was configured to give a flat accent type.

本実施形態３のテキスト解析装置１０では、各省略語に対するアクセント型を登録してある例外省略語辞書４ｅがＨＤＤ４に予め格納されており、制御部１（省略語アクセント付与部１３）は、人名辞書４ｂ及び複合語辞書４ｃの登録内容に基づいて省略語であると判定した未知語（省略語）において、例外省略語辞書４ｅに登録されている省略語には例外省略語辞書４ｅに登録されているアクセント型を付与し、例外省略語辞書４ｅに登録されていない省略語には平板型のアクセント型を付与するように構成されている。 In the text analysis apparatus 10 according to the third embodiment, an exception abbreviation dictionary 4e in which an accent type for each abbreviation is registered is stored in the HDD 4 in advance, and the control unit 1 (abbreviation accent assignment unit 13) is a personal name dictionary. Among unknown words (abbreviated words) determined to be abbreviations based on the registered contents of 4b and compound word dictionary 4c, abbreviations registered in exception abbreviation dictionary 4e are registered in exception abbreviation dictionary 4e. The accent type is given, and a flat type accent type is given to abbreviations not registered in the exception abbreviation dictionary 4e.

図１５は実施形態３のテキスト解析装置１０の機能構成例を示す機能ブロック図である。本実施形態３のテキスト解析装置１０において、制御部１は、ＲＯＭ２又はＨＤＤ４に記憶してある制御プログラムを実行することにより、上述した実施形態１のテキスト解析装置１０と同様に、形態素解析部１１、省略語判定部１２及び省略語アクセント付与部１３等の各機能を実現する。 FIG. 15 is a functional block diagram illustrating a functional configuration example of the text analysis apparatus 10 according to the third embodiment. In the text analysis device 10 according to the third embodiment, the control unit 1 executes a control program stored in the ROM 2 or the HDD 4, and similarly to the text analysis device 10 according to the first embodiment described above, the morpheme analysis unit 11. Each function of the abbreviation determination unit 12 and the abbreviation accent assignment unit 13 is realized.

なお、本実施形態３のテキスト解析装置１０のＨＤＤ４には、図１６に示すような例外省略語辞書４ｅが格納されている。図１６は例外省略語辞書４ｅの登録内容を示す模式図である。図１６に示すように、例外省略語辞書（省略語記憶手段）４ｅには、省略語及び各省略語のアクセント型がそれぞれ対応付けて登録されている。 Note that an exception abbreviation dictionary 4e as shown in FIG. 16 is stored in the HDD 4 of the text analysis apparatus 10 of the third embodiment. FIG. 16 is a schematic diagram showing the registered contents of the exception abbreviation dictionary 4e. As shown in FIG. 16, in the exception abbreviation dictionary (abbreviation storage means) 4e, abbreviations and accent types of abbreviations are registered in association with each other.

以下に、上述した構成の本実施形態３のテキスト解析装置１０によるテキスト解析処理について説明する。以下では、テキスト解析装置１０が「ナカショーが、コスプレをした。」のテキストデータを解析する処理を例に説明する。
本実施形態３の形態素解析部１１は、上述した実施形態１の形態素解析部１１と同様に、ＨＤＤ４に記憶してあるテキストデータをＨＤＤ４からＲＡＭ３に読み出し、言語辞書４ａの登録内容に基づいて、ＲＡＭ３に読み出したテキストデータを形態素に分割すると共に、分割した形態素のそれぞれにアクセント型を付与する。ここでは、形態素解析部１１は、「ナカショー（未知語）・ガ（１モーラ０型）・コスプレ（未知語）・オ（１モーラ０型）・シタ（２モーラ０型）」の表音文字列を生成する。 The text analysis process performed by the text analysis apparatus 10 according to the third embodiment having the above-described configuration will be described below. In the following, description will be given by taking as an example a process in which the text analysis device 10 analyzes text data of “Nakasho has cosplayed”.
Similar to the morpheme analyzer 11 of the first embodiment described above, the morpheme analyzer 11 of the third embodiment reads text data stored in the HDD 4 from the HDD 4 to the RAM 3, and based on the registered contents of the language dictionary 4a. The text data read into the RAM 3 is divided into morphemes, and an accent type is given to each of the divided morphemes. Here, the morphological analysis unit 11 is a phonetic character of “Nakasho (unknown word), Ga (1 mora 0 type), Cosplay (unknown word), Oh (1 mora 0 type), Sita (2 mora 0 type)”. Generate a column.

本実施形態３の省略語判定部１２は、上述した実施形態１の省略語判定部１２と同様に、人名辞書４ｂ及び複合語辞書４ｃの登録内容に基づいて、形態素解析部１１によって生成された表音文字列中の未知語が省略語であるか否かを判定する。省略語判定部１２は、省略語であると判定した未知語に対しては省略語であることを示す情報を対応付け、省略語でないと判定した未知語に対しては未知語であることを示す情報を対応付けた表音文字列を生成する。ここでは、省略語判定部１２は、「ナカショー（省略語）・ガ（１モーラ０型）・コスプレ（省略語）・オ（１モーラ０型）・シタ（２モーラ０型）」の表音文字列を生成する。 The abbreviation determination unit 12 of the third embodiment is generated by the morpheme analysis unit 11 based on the registered contents of the personal name dictionary 4b and the compound word dictionary 4c, similarly to the abbreviation determination unit 12 of the first embodiment. It is determined whether or not the unknown word in the phonetic character string is an abbreviation. The abbreviation determination unit 12 associates information indicating an abbreviation with an unknown word determined to be an abbreviation, and identifies an unknown word for an unknown word determined to be not an abbreviation. A phonetic character string in which the indicated information is associated is generated. Here, the abbreviation determination unit 12 represents the phonetics of “Nakasho (abbreviation), Ga (1 mora 0 type), Cosplay (abbreviation), Oh (1 mora 0 type), Sita (2 mora 0 type)”. Generate a string.

本実施形態３の省略語アクセント付与部１３は、省略語判定部１２によって生成された表音文字列中の省略語を抽出し、抽出した省略語が例外省略語辞書４ｅに登録されているか否かを判断する。抽出した省略語が例外省略語辞書４ｅに登録されている場合、省略語アクセント付与部１３は、抽出した省略語に対応するアクセント型を例外省略語辞書４ｅから読み出し、抽出した省略語に対して付与する。一方、抽出した省略語が例外省略語辞書４ｅに登録されていない場合、省略語アクセント付与部１３は、抽出した省略語に対して平板型（０型）のアクセント型を付与する。 The abbreviation accent assigning unit 13 according to the third embodiment extracts abbreviations from the phonetic character string generated by the abbreviation determination unit 12, and whether or not the extracted abbreviations are registered in the exception abbreviation dictionary 4e. Determine whether. When the extracted abbreviation is registered in the exception abbreviation dictionary 4e, the abbreviation accent assigning unit 13 reads out the accent type corresponding to the extracted abbreviation from the exception abbreviation dictionary 4e, and for the extracted abbreviation Give. On the other hand, when the extracted abbreviation is not registered in the exception abbreviation dictionary 4e, the abbreviation accent assigning unit 13 assigns a flat (0 type) accent type to the extracted abbreviation.

ここでは、省略語「ナカショー」は例外省略語辞書４ｅに登録されているので、省略語「ナカショー」には、そのアクセント型「４モーラ２型」を付与し、省略語「コスプレ」は例外省略語辞書４ｅに登録されていないので、省略語「コスプレ」には平板型のアクセント型を付与する。即ち、ここでは、省略語アクセント付与部１３は、「ナカショー（４モーラ２型）・ガ（１モーラ０型）・コスプレ（４モーラ０型）・オ（１モーラ０型）・シタ（２モーラ０型）」の表音文字列を生成する。 Here, since the abbreviation “Nakasho” is registered in the exception abbreviation dictionary 4e, the abbreviation “Nakasho” is given the accent type “4 mora type 2”, and the abbreviation “cosplay” is omitted. Since it is not registered in the word dictionary 4e, a flat accent type is given to the abbreviation “cosplay”. That is, here, the abbreviation accent giving unit 13 is “Nakasho (4 Mora 2 Type), Ga (1 Mora 0 Type), Cosplay (4 Mora 0 Type), Oh (1 Mora 0 Type), Sita (2 Mora "0 type)" is generated.

なお、省略語アクセント付与部１３は、上述した処理を、省略語判定部１２によって生成された表音文字列中の全ての省略語に対して実行することにより、例外省略語辞書４ｅに登録されている省略語に対しては予め登録しておいたアクセント型を付与することができる。これにより、省略語「ナカショー」のように平板型のアクセント型でない省略語に対してはより適切なアクセント型を付与することができる。 The abbreviation accent assigning unit 13 is registered in the exception abbreviation dictionary 4e by executing the above-described processing for all abbreviations in the phonogram string generated by the abbreviation determination unit 12. A pre-registered accent type can be assigned to the abbreviation. Thereby, a more appropriate accent type can be given to an abbreviation that is not a flat accent type, such as the abbreviation “Nakasho”.

以下に、本実施形態３のテキスト解析装置１０によるテキスト解析処理についてフローチャートに基づいて詳述する。図１７はテキスト解析処理の手順を示すフローチャートである。なお、以下の処理は、テキスト解析装置１０のＲＯＭ２又はＨＤＤ４に記憶してある制御プログラムに従って制御部１によって実行される。 Below, the text analysis process by the text analysis apparatus 10 of this Embodiment 3 is explained in full detail based on a flowchart. FIG. 17 is a flowchart showing the procedure of the text analysis process. The following processing is executed by the control unit 1 in accordance with a control program stored in the ROM 2 or the HDD 4 of the text analysis device 10.

テキスト解析装置１０のユーザが操作部５を操作することによって１つのテキストデータに基づくテキスト解析の実行を指示した場合、制御部１は、ＨＤＤ４に記憶してあるテキストデータをＲＡＭ３に読み込む（Ｓ７１）。制御部１（形態素解析部１１）は、ＲＡＭ３に読み込んだテキストデータを、言語辞書４ａの登録内容に基づいて形態素に分割し、分割した形態素のそれぞれにアクセント型を付与し（Ｓ７２）、各形態素にアクセント型を対応付けた表音文字列を生成する。 When the user of the text analysis device 10 instructs the execution of text analysis based on one text data by operating the operation unit 5, the control unit 1 reads the text data stored in the HDD 4 into the RAM 3 (S71). . The control unit 1 (morpheme analysis unit 11) divides the text data read into the RAM 3 into morphemes based on the registered contents of the language dictionary 4a, and assigns an accent type to each of the divided morphemes (S72). A phonetic character string in which an accent type is associated with is generated.

制御部１（省略語判定部１２）は、人名辞書４ｂ及び複合語辞書４ｃの登録内容に基づいて省略語判定処理を実行し（Ｓ７３）、ステップＳ７２でアクセント型を付与できなかった形態素（未知語）が省略語の候補であるか否かを判定する。なお、本実施形態３の省略語判定処理は、実施形態１において図７乃至図１１に基づいて説明した処理と同様である。 The control unit 1 (abbreviated word determining unit 12) executes an abbreviated word determining process based on the registered contents of the personal name dictionary 4b and the compound word dictionary 4c (S73), and the morpheme that could not give the accent type in step S72 (unknown Word) is a candidate for abbreviation. The abbreviation determination process of the third embodiment is the same as the process described in the first embodiment based on FIGS.

制御部１（省略語アクセント付与部１３）は、ステップＳ７３で生成した表音文字列から省略語を抽出し（Ｓ７４）、抽出した省略語が例外省略語辞書４ｅに登録されているか否かを判断する（Ｓ７５）。抽出した省略語が例外省略語辞書４ｅに登録されていると判断した場合（Ｓ７５：ＹＥＳ）、制御部１は、抽出した省略語に対応するアクセント型を例外省略語辞書４ｅから読み出し、抽出した省略語に対して付与する（Ｓ７６）。 The control unit 1 (abbreviated word accent assigning unit 13) extracts abbreviations from the phonetic character string generated in step S73 (S74), and determines whether or not the extracted abbreviations are registered in the exception abbreviation dictionary 4e. Judgment is made (S75). When it is determined that the extracted abbreviation is registered in the exception abbreviation dictionary 4e (S75: YES), the control unit 1 reads the accent type corresponding to the extracted abbreviation word from the exception abbreviation dictionary 4e and extracts it. It is given to the abbreviation (S76).

抽出した省略語が例外省略語辞書４ｅに登録されていないと判断した場合（Ｓ７５：ＮＯ）、制御部１は、抽出した省略語に対して平板型（０型）のアクセント型を付与する（Ｓ７７）。制御部１は、ステップＳ７３で生成した表音文字列中の全ての省略語に対して上述した処理が終了したか否かを判断しており（Ｓ７８）、終了していないと判断した場合（Ｓ７８：ＮＯ）、ステップＳ７４へ処理を戻し、表音文字列から省略語を抽出し（Ｓ７４）、抽出した省略語が例外省略語辞書４ｅに登録されているか否かの判断を繰り返す。表音文字列中の全ての省略語に対する処理が終了したと判断した場合（Ｓ７８：ＹＥＳ）、制御部１は、上述したテキスト解析処理を終了する。 When it is determined that the extracted abbreviation is not registered in the exception abbreviation dictionary 4e (S75: NO), the control unit 1 gives a flat (0 type) accent type to the extracted abbreviation ( S77). The control unit 1 determines whether or not the above-described processing has been completed for all abbreviations in the phonogram string generated in step S73 (S78), and determines that it has not ended (S78). (S78: NO), the process returns to step S74, abbreviations are extracted from the phonetic character string (S74), and the determination of whether or not the extracted abbreviations are registered in the exception abbreviation dictionary 4e is repeated. When it is determined that the processing for all abbreviations in the phonetic character string has been completed (S78: YES), the control unit 1 ends the above-described text analysis processing.

上述したように、本実施形態３のテキスト解析装置１０では、人名辞書４ｂ及び複合語辞書４ｃの登録内容に基づいて省略語であると判定された未知語に対するアクセント型が例外省略語辞書４ｅに登録されている場合、このアクセント型を省略語に付与することにより、平板型でないアクセント型の省略語に対して適切なアクセント型を付与することができる。また、人名辞書４ｂ及び複合語辞書４ｃの登録内容に基づいて省略語であると判定された未知語に対するアクセント型が例外省略語辞書４ｅに登録されていない場合であっても、この省略語に平板型のアクセント型を付与することにより、省略語ではない未知語とは異なるアクセント型を付与することができる。従って、各省略語により適切に付与されたアクセント型に基づいて、各省略語に対応する韻律を生成し、生成した韻律に基づいて合成音声を生成した場合、適切な韻律を生成することができ、また、適切な韻律に基づいて適切な合成音声を生成することができる。 As described above, in the text analysis apparatus 10 of the third embodiment, the accent type for an unknown word determined to be an abbreviation based on the registered contents of the personal name dictionary 4b and the compound word dictionary 4c is stored in the exception abbreviation dictionary 4e. When registered, by assigning this accent type to an abbreviation, an appropriate accent type can be assigned to an abbreviation of an accent type that is not a flat plate type. Even if the accent type for an unknown word determined to be an abbreviation based on the registered contents of the personal name dictionary 4b and the compound word dictionary 4c is not registered in the exception abbreviation dictionary 4e, By giving a flat accent type, an accent type different from an unknown word that is not an abbreviation can be given. Therefore, when a prosody corresponding to each abbreviation is generated based on the accent type appropriately given by each abbreviation, and a synthesized speech is generated based on the generated prosody, an appropriate prosody can be generated, and An appropriate synthesized speech can be generated based on an appropriate prosody.

上述した本実施形態３は、実施形態１のテキスト解析装置１０において、制御部１（省略語アクセント付与部１３）が、例外省略語辞書４ｅに登録されている省略語に対しては、対応するアクセント型を付与し、例外省略語辞書４ｅに登録されていない省略語に対しては、平板型のアクセント型を付与する構成であり、実施形態１の変形例として説明した。しかし、実施形態２のテキスト解析装置１０においても同様の変形例を適用することができる。即ち、本実施形態３の構成を実施形態２のテキスト解析装置１０に適用した場合、人名辞書４ｂ及び複合語辞書４ｃだけでなく、共起辞書４ｄに基づいて省略語であると特定された各単語に対して適切なアクセント型を付与することができる。 The third embodiment described above corresponds to the abbreviations registered in the exception abbreviation dictionary 4e by the control unit 1 (abbreviated word accent assigning unit 13) in the text analysis apparatus 10 of the first embodiment. A configuration in which a flat accent type is assigned to an abbreviation that is assigned an accent type and is not registered in the exception abbreviation dictionary 4e has been described as a modification of the first embodiment. However, a similar modification can be applied to the text analysis apparatus 10 of the second embodiment. That is, when the configuration of the third embodiment is applied to the text analysis device 10 of the second embodiment, each of the abbreviations specified based on the co-occurrence dictionary 4d as well as the personal name dictionary 4b and the compound word dictionary 4c. Appropriate accent types can be given to words.

（実施形態４）
以下に、本発明に係る音声合成装置を、実施形態４を示す図面に基づいて詳述する。なお、本実施形態４の音声合成装置は、上述した実施形態１のテキスト解析装置１０の構成を備えており、同様の構成については同一の符号を付して説明を省略する。図１８は実施形態４に係る音声合成装置の構成例を示すブロック図である。本実施形態４に係る音声合成装置１００は、図１に示した制御部１、ＲＯＭ２、ＲＡＭ３、ＨＤＤ４、操作部５、表示部６のほかに、音声出力部７を備えており、これらのハードウェア各部はそれぞれバス１ａを介して相互に接続されている。 (Embodiment 4)
Hereinafter, a speech synthesizer according to the present invention will be described in detail with reference to the drawings showing a fourth embodiment. Note that the speech synthesizer of the fourth embodiment includes the configuration of the text analysis device 10 of the first embodiment described above, and the same components are denoted by the same reference numerals and description thereof is omitted. FIG. 18 is a block diagram illustrating a configuration example of a speech synthesizer according to the fourth embodiment. The speech synthesis apparatus 100 according to the fourth embodiment includes a speech output unit 7 in addition to the control unit 1, the ROM 2, the RAM 3, the HDD 4, the operation unit 5, and the display unit 6 illustrated in FIG. Each part of the wear is connected to each other via the bus 1a.

音声出力部７は、音声増幅回路及びスピーカ等を備えており、例えばＨＤＤ４に記憶された音声情報（音声波形）に基づく音声を出力する。
ＨＤＤ４には、テキストデータ、図２に示すような言語辞書４ａ、図３に示すような人名辞書４ｂ、図４に示すような複合語辞書４ｃ、ユーザに対して各種の情報を通知するための画面情報のほかに、音声合成装置１００を本発明の音声合成装置として動作させるために必要な種々の制御プログラム、韻律生成ルール辞書４ｆ、波形辞書４ｇ等が予め記憶されている。 The audio output unit 7 includes an audio amplification circuit, a speaker, and the like, and outputs audio based on audio information (audio waveform) stored in the HDD 4, for example.
In the HDD 4, text data, a language dictionary 4a as shown in FIG. 2, a personal name dictionary 4b as shown in FIG. 3, a compound word dictionary 4c as shown in FIG. 4, and various kinds of information are notified to the user. In addition to the screen information, various control programs necessary for operating the speech synthesizer 100 as the speech synthesizer of the present invention, the prosody generation rule dictionary 4f, the waveform dictionary 4g, and the like are stored in advance.

なお、韻律生成ルール辞書４ｆ及び波形辞書４ｇの詳細については図示しないが、韻律生成ルール辞書４ｆには、各形態素の読み及びアクセント型に基づいて韻律データを生成する際のルールが登録されており、波形辞書４ｇには、複数の音素（音素列）からなる文章に対応して各音声の波形群が登録されている。 Although details of the prosody generation rule dictionary 4f and the waveform dictionary 4g are not shown, rules for generating prosody data based on the reading and accent type of each morpheme are registered in the prosody generation rule dictionary 4f. In the waveform dictionary 4g, a waveform group of each voice is registered corresponding to a sentence composed of a plurality of phonemes (phoneme strings).

以下に、上述した構成の音声合成装置１００において、制御部１がＲＯＭ２及びＨＤＤ４に記憶してある制御プログラムを実行することによって実現される各種の機能について説明する。図１９は実施形態４の音声合成装置１００の機能構成例を示す機能ブロック図である。本実施形態４の音声合成装置１００において、制御部１は、ＲＯＭ２及びＨＤＤ４に記憶してある制御プログラムを実行することによって、上述した実施形態１のテキスト解析装置１０、韻律生成部２０、波形生成部３０等の各機能を実現する。 Hereinafter, in the speech synthesizer 100 having the above-described configuration, various functions realized by the control unit 1 executing a control program stored in the ROM 2 and the HDD 4 will be described. FIG. 19 is a functional block diagram illustrating a functional configuration example of the speech synthesizer 100 according to the fourth embodiment. In the speech synthesizer 100 according to the fourth embodiment, the control unit 1 executes the control program stored in the ROM 2 and the HDD 4 to thereby execute the text analysis device 10, the prosody generation unit 20, and the waveform generation described above according to the first embodiment. Each function of the unit 30 and the like is realized.

韻律生成部２０は、テキスト解析装置１０によって生成された表音文字列に対応する韻律データを、韻律生成ルール辞書４ｆの登録内容に従って生成する。具体的には、韻律生成部２０は、テキスト解析装置１０から送出されてきた表音文字列において、各形態素の読み及びアクセント型に対応する韻律データを生成する。 The prosody generation unit 20 generates prosody data corresponding to the phonetic character string generated by the text analysis device 10 according to the registered contents of the prosody generation rule dictionary 4f. Specifically, the prosody generation unit 20 generates prosody data corresponding to the reading and accent type of each morpheme in the phonogram string sent from the text analysis device 10.

波形生成部３０は、韻律生成部２０によって生成された韻律データを、波形辞書４ｇの登録内容に基づいて音声波形に変換して合成音声を生成する。具体的には、波形生成部３０は、韻律生成部２０から送出されてきた韻律データにおける各形態素に対応する音声波形を波形辞書４ｇから抽出し、抽出した音声波形と韻律データとに基づいて合成音声を生成する。なお、波形生成部３０が生成した合成音声は、一旦ＲＡＭ３又はＨＤＤ４に記憶された後、制御部１による制御に従った所定のタイミングで音声出力部７へ送出され、音声出力部７から音声出力される。 The waveform generation unit 30 converts the prosody data generated by the prosody generation unit 20 into a speech waveform based on the registered contents of the waveform dictionary 4g to generate a synthesized speech. Specifically, the waveform generation unit 30 extracts a speech waveform corresponding to each morpheme in the prosody data sent from the prosody generation unit 20 from the waveform dictionary 4g, and synthesizes based on the extracted speech waveform and prosody data. Generate audio. The synthesized voice generated by the waveform generation unit 30 is temporarily stored in the RAM 3 or the HDD 4 and then sent to the voice output unit 7 at a predetermined timing according to control by the control unit 1. Is done.

上述した構成により、本実施形態４の音声合成装置１００は、テキスト解析装置１０によってテキストデータを解析して表音文字列を生成し、生成した表音文字列に応じた合成音声を生成することができる。従って、上述した実施形態１のテキスト解析装置１０のように、言語辞書４ａに基づいて各形態素に適切なアクセント型を付与し、また、言語辞書４ａに登録されていない未知語については、省略語である場合には省略語に適したアクセント型（平板型のアクセント型）を付与することにより、各形態素のそれぞれに適切なアクセント型を付与した場合、このようなアクセント型に基づいて適切な合成音声を生成することができる。 With the above-described configuration, the speech synthesizer 100 according to the fourth embodiment generates text phonetic character strings by analyzing text data using the text analysis device 10 and generates synthesized speech corresponding to the generated phonetic character strings. Can do. Therefore, as in the text analysis apparatus 10 of the first embodiment described above, an appropriate accent type is assigned to each morpheme based on the language dictionary 4a, and an abbreviation is used for unknown words that are not registered in the language dictionary 4a. If an appropriate accent type is given to each morpheme by giving an accent type (flat plate type) suitable for abbreviations, an appropriate composition based on such an accent type is given. Voice can be generated.

以下に、本実施形態４の音声合成装置１００による合成音声の生成処理についてフローチャートに基づいて詳述する。図２０は合成音声の生成処理の手順を示すフローチャートである。なお、以下の処理は、音声合成装置１００のＲＯＭ２又はＨＤＤ４に記憶してある制御プログラムに従って制御部１によって実行される。 Below, the synthetic | combination speech production | generation process by the speech synthesizer 100 of this Embodiment 4 is explained in full detail based on a flowchart. FIG. 20 is a flowchart showing a procedure of synthetic speech generation processing. Note that the following processing is executed by the control unit 1 in accordance with a control program stored in the ROM 2 or the HDD 4 of the speech synthesizer 100.

音声合成装置１００のユーザが操作部５を操作することによって１つのテキストデータに基づく合成音声の生成処理の実行を指示した場合、制御部１は、ＨＤＤ４に記憶してあるテキストデータをＲＡＭ３に読み込む（Ｓ８１）。制御部１（形態素解析部１１）は、ＲＡＭ３に読み込んだテキストデータを、言語辞書４ａの登録内容に基づいて形態素に分割し、分割した形態素のそれぞれにアクセント型を付与し（Ｓ８２）、各形態素にアクセント型を対応付けた表音文字列を生成する。 When the user of the speech synthesizer 100 instructs the execution of the synthetic speech generation process based on one text data by operating the operation unit 5, the control unit 1 reads the text data stored in the HDD 4 into the RAM 3. (S81). The control unit 1 (morpheme analysis unit 11) divides the text data read into the RAM 3 into morphemes based on the registered contents of the language dictionary 4a, and assigns an accent type to each of the divided morphemes (S82). A phonetic character string in which an accent type is associated with is generated.

制御部１（省略語判定部１２）は、人名辞書４ｂ及び複合語辞書４ｃの登録内容に基づいて省略語判定処理を実行し（Ｓ８３）、ステップＳ８２でアクセント型を付与できなかった形態素（未知語）が省略語であるか否かを判定する。なお、本実施形態４の省略語判定処理は、実施形態１において図７乃至図１１に基づいて説明した処理と同様である。 The control unit 1 (abbreviated word determining unit 12) executes an abbreviated word determining process based on the registered contents of the personal name dictionary 4b and the compound word dictionary 4c (S83), and the morpheme that could not be given the accent type in step S82 (unknown Word) is an abbreviation. Note that the abbreviation determination process of the fourth embodiment is the same as the process described in the first embodiment with reference to FIGS.

制御部１（省略語アクセント付与部１３）は、ステップＳ８３で省略語であると判定された形態素に対して平板型（０型）のアクセント型を付与する（Ｓ８４）。制御部１（韻律生成部２０）は、得られた表音文字列に対応する韻律データを、韻律生成ルール辞書４ｆの登録内容に基づいて生成する（Ｓ８５）。制御部１(波形生成部３０)は、生成した韻律データを、波形辞書４ｇの登録内容に基づいて音声波形を生成し（Ｓ８６）、合成音声（音声波形）の生成処理を終了する。 The control unit 1 (abbreviated word accent assigning unit 13) assigns a flat plate type (0 type) accent type to the morpheme determined to be an abbreviated word in step S83 (S84). The control unit 1 (prosody generation unit 20) generates prosody data corresponding to the obtained phonetic character string based on the registered contents of the prosody generation rule dictionary 4f (S85). The control unit 1 (waveform generation unit 30) generates a speech waveform from the generated prosodic data based on the registered contents of the waveform dictionary 4g (S86), and ends the synthetic speech (speech waveform) generation process.

上述したように、本実施形態４の音声合成装置１００では、実施形態１で説明したように、言語辞書４ａに基づいてアクセント型を付与できなかった形態素（未知語）が省略語であるか否かを判定し、省略語であるか否かに応じて適切なアクセント型を各未知語に付与することにより、適切に付与されたアクセント型に基づいて適切な韻律を生成することができ、また、適切な韻律に基づいて適切な合成音声を生成することができる。従って、言語辞書４ａに登録されていない未知語に対しても正しいアクセント、イントネーションでの合成音声を生成することができる。 As described above, in the speech synthesizer 100 according to the fourth embodiment, as described in the first embodiment, whether or not a morpheme (unknown word) for which an accent type could not be assigned based on the language dictionary 4a is an abbreviation. By assigning an appropriate accent type to each unknown word depending on whether it is an abbreviation, it is possible to generate an appropriate prosody based on the appropriately assigned accent type, An appropriate synthesized speech can be generated based on an appropriate prosody. Therefore, it is possible to generate synthesized speech with correct accent and intonation even for unknown words that are not registered in the language dictionary 4a.

上述した本実施形態４では、実施形態１のテキスト解析装置１０を備えた音声合成装置１００を例に本発明を説明したが、本発明の音声合成装置は、上述した実施形態２，３のテキスト解析装置１０を備えた構成とすることもできる。なお、上述した実施形態３のテキスト解析装置１０を備えた構成とした場合、音声合成装置１００は、省略語判定処理によって省略語であると判定された未知語で、例外省略語辞書４ｅに登録されている省略語には予め登録しておいたアクセント型を付与することができるので、平板型のアクセント型でない省略語に対してより適切なアクセント型を付与することができ、このようなアクセント型に基づいてより適切な合成音声を生成することができる。 In the fourth embodiment described above, the present invention has been described by taking the speech synthesizer 100 including the text analysis device 10 of the first embodiment as an example, but the speech synthesizer of the present invention is the text of the second and third embodiments described above. It can also be set as the structure provided with the analysis apparatus 10. FIG. In the case of the configuration including the text analysis device 10 according to the third embodiment described above, the speech synthesizer 100 is an unknown word determined to be an abbreviation by the abbreviation determination process and is registered in the exception abbreviation dictionary 4e. Since an abbreviation that has been registered can be given a pre-registered accent type, a more appropriate accent type can be assigned to an abbreviation that is not a flat accent type. A more appropriate synthesized speech can be generated based on the type.

上述したように、本発明に係る省略語判定装置は、言語辞書４ａに登録されていない形態素（未知語）が、人名を省略した省略語である場合、又は複合語を省略した省略語である場合、このような省略語には平板型のアクセント型を付与し、それ以外の未知語には従来から行なっているように、例えば後ろから３モーラ目にアクセント核を有するアクセント型を付与することにより、省略語と、省略語ではない未知語とにおいてそれぞれ異なるアクセント型を付与することができる。 As described above, the abbreviation determination device according to the present invention is an abbreviation in which a morpheme (unknown word) that is not registered in the language dictionary 4a is an abbreviation in which a person name is omitted or a compound word is omitted. In such a case, a flat accent type is given to such abbreviations, and an accent type having an accent nucleus is assigned to the third unknown mora, for example, as is conventionally done for other unknown words. Thus, different accent types can be assigned to abbreviations and unknown words that are not abbreviations.

なお、人名を省略した省略語及び複合語を省略した省略語は、平板型のアクセント型を有する場合が多いので、このような省略語には平板型のアクセント型を付与することにより、このような省略語を含む文書であっても、適切なアクセントでの合成音声を生成することができ、このような合成音声に基づいて、より自然な音声の出力が可能となる。また、本発明は、日々出現する新しい省略語を辞書に登録しておく構成ではなく、人名辞書４ｂ及び複合語辞書４ｃの登録内容に基づいて省略語であるか否かを判断する構成であるので、省略語を辞書に登録する作業を行なうことなく、文書中の省略語を適切に判定し、省略語に対して適切なアクセント型を付与することができる。 Note that abbreviations omitting names of persons and abbreviations omitting compound words often have a flat accent type. Therefore, by adding a flat accent type to such abbreviations, Even a document including abbreviated abbreviations can generate synthesized speech with appropriate accents, and more natural speech can be output based on such synthesized speech. The present invention is not configured to register new abbreviations that appear every day in the dictionary, but is configured to determine whether or not the abbreviation is based on the registered contents of the personal name dictionary 4b and the compound word dictionary 4c. Therefore, it is possible to appropriately determine the abbreviations in the document without assigning the abbreviations to the dictionary and to assign an appropriate accent type to the abbreviations.

（付記１）
テキストデータが省略語であるか否かを判定する省略語判定装置において、
人名に用いられる姓及び名をそれぞれ記憶する人名記憶手段と、
前記テキストデータから先頭の所定数の文字データを抽出する第１抽出手段と、
該第１抽出手段が抽出した文字データを先頭に有する姓が前記人名記憶手段に記憶してあるか否かを判断する手段と、
前記姓が前記人名記憶手段に記憶してあると判断した場合、前記第１抽出手段が抽出した文字データを除いた前記テキストデータから、先頭の所定数の文字データを抽出する第２抽出手段と、
該第２抽出手段が抽出した文字データを先頭に有する名が前記人名記憶手段に記憶してあるか否かを判断する手段と、
前記名が前記人名記憶手段に記憶してあると判断した場合、前記テキストデータが省略語であると判定する判定手段と
を備えることを特徴とする省略語判定装置。 (Appendix 1)
In an abbreviation determination device that determines whether text data is an abbreviation,
Personal name storage means for storing the last name and first name used for the personal name;
First extraction means for extracting a predetermined number of character data at the beginning from the text data;
Means for determining whether or not a last name having the character data extracted by the first extraction means at the head is stored in the personal name storage means;
A second extraction means for extracting a predetermined number of character data at the beginning from the text data excluding the character data extracted by the first extraction means when it is determined that the last name is stored in the personal name storage means; ,
Means for determining whether or not a name having character data extracted by the second extraction means at the head is stored in the personal name storage means;
An abbreviation determination device comprising: determination means for determining that the text data is an abbreviation when it is determined that the name is stored in the personal name storage means.

（付記２）
テキストデータが省略語であるか否かを判定する省略語判定装置において、
複数の複合語及び各複合語を構成する構成語を対応付けて記憶する複合語記憶手段と、
前記テキストデータから先頭の所定数の文字データを抽出する第１抽出手段と、
該第１抽出手段が抽出した文字データを先頭に有する構成語を含む複合語が前記複合語記憶手段に記憶してあるか否かを判断する判断手段と、
前記複合語が前記複合語記憶手段に記憶してあると判断した場合、前記第１抽出手段が抽出した文字データを除いた前記テキストデータから、先頭の所定数の文字データを抽出する第２抽出手段と、
該第２抽出手段が抽出した文字データを先頭に有する構成語が、前記判断手段が前記複合語記憶手段に記憶してあると判断した複合語の構成語に含まれているか否かを判断する手段と、
前記構成語が含まれていると判断した場合、前記テキストデータが省略語であると判定する判定手段と
を備えることを特徴とする省略語判定装置。 (Appendix 2)
In an abbreviation determination device that determines whether text data is an abbreviation,
A compound word storage means for storing a plurality of compound words and constituent words constituting each compound word in association with each other;
First extraction means for extracting a predetermined number of character data at the beginning from the text data;
Determining means for determining whether or not a compound word including a constituent word having the character data extracted by the first extracting means is stored in the compound word storage means;
A second extraction for extracting a predetermined number of character data from the text data excluding the character data extracted by the first extraction means when it is determined that the compound word is stored in the compound word storage means; Means,
It is determined whether or not the constituent word having the character data extracted by the second extraction means at the head is included in the constituent words of the composite word determined by the determination means to be stored in the compound word storage means. Means,
An abbreviation determination device comprising: a determination unit that determines that the text data is an abbreviation when it is determined that the constituent word is included.

（付記３）
前記第１抽出手段は、前記テキストデータの先頭から、２音節に相当する数の文字データを抽出するように構成されており、
前記第２抽出手段は、前記第１抽出手段が抽出した文字データを除いた前記テキストデータの先頭から、２音節に相当する数の文字データを抽出するように構成されていることを特徴とする付記１又は２に記載の省略語判定装置。 (Appendix 3)
The first extraction means is configured to extract a number of character data corresponding to two syllables from the beginning of the text data,
The second extraction means is configured to extract a number of character data corresponding to two syllables from the beginning of the text data excluding the character data extracted by the first extraction means. The abbreviation determination device according to appendix 1 or 2.

（付記４）
前記第１抽出手段は、前記テキストデータの先頭から、１音節に相当する数の文字データを抽出するように構成されており、
前記第２抽出手段は、前記第１抽出手段が抽出した文字データを除いた前記テキストデータの先頭から、２音節に相当する数の文字データを抽出するように構成されていることを特徴とする付記１又は２に記載の省略語判定装置。 (Appendix 4)
The first extraction means is configured to extract a number of character data corresponding to one syllable from the beginning of the text data,
The second extraction means is configured to extract a number of character data corresponding to two syllables from the beginning of the text data excluding the character data extracted by the first extraction means. The abbreviation determination device according to appendix 1 or 2.

（付記５）
複数のテキストデータを含む文書データをテキストデータに分割する分割手段を備え、
前記第１抽出手段は、分割されたテキストデータのそれぞれから先頭の所定数の文字データを抽出するように構成されており、
前記第２抽出手段は、前記第１抽出手段が抽出した文字データを除いた前記分割されたテキストデータのそれぞれから、先頭の所定数の文字データを抽出するように構成されており、
前記判定手段は、前記分割されたテキストデータのそれぞれが省略語の候補であるか否かを判定するように構成されており、
複数のテキストデータ及び各テキストデータと共起される共起データを対応付けて記憶する共起データ記憶手段と、
前記判定手段が省略語の候補であると判定したテキストデータに対応する共起データを前記共起データ記憶手段から取得する手段と、
前記文書データ中のテキストデータに、前記共起データ記憶手段から取得された共起データが含まれているか否かを判断する手段と、
前記共起データが含まれていると判断した場合、前記判定手段が省略語の候補であると判定したテキストデータを省略語であると確定する手段と
を備えることを特徴とする付記１乃至４のいずれかひとつに記載の省略語判定装置。 (Appendix 5)
A dividing unit for dividing document data including a plurality of text data into text data;
The first extracting means is configured to extract a predetermined number of character data at the beginning from each of the divided text data,
The second extraction means is configured to extract a predetermined number of character data at the beginning from each of the divided text data excluding the character data extracted by the first extraction means,
The determination means is configured to determine whether each of the divided text data is an abbreviation candidate,
Co-occurrence data storage means for storing a plurality of text data and co-occurrence data co-occurring with each text data in association with each other;
Means for acquiring from the co-occurrence data storage means co-occurrence data corresponding to the text data determined by the determination means to be abbreviation candidates;
Means for determining whether the text data in the document data includes co-occurrence data acquired from the co-occurrence data storage means;
Appendices 1 to 4, further comprising: means for determining, when it is determined that the co-occurrence data is included, the text data determined by the determining means as an abbreviation candidate as an abbreviation. An abbreviation determination device according to any one of the above.

（付記６）
テキストデータが省略語であるか否かを判定する省略語判定方法において、
前記テキストデータから先頭の所定数の文字データを抽出する第１抽出ステップと、
該第１抽出ステップで抽出した文字データを先頭に有する姓が、人名に用いられる姓及び名をそれぞれ記憶する人名記憶手段に記憶してあるか否かを判断するステップと、
前記姓が前記人名記憶手段に記憶してあると判断した場合、前記第１抽出ステップで抽出した文字データを除いた前記テキストデータから、先頭の所定数の文字データを抽出する第２抽出ステップと、
該第２抽出ステップで抽出した文字データを先頭に有する名が前記人名記憶手段に記憶してあるか否かを判断するステップと、
前記名が前記人名記憶手段に記憶してあると判断した場合、前記テキストデータが省略語であると判定するステップと
を含むことを特徴とする省略語判定方法。 (Appendix 6)
In the abbreviation determination method for determining whether text data is an abbreviation,
A first extraction step of extracting a predetermined number of character data at the beginning from the text data;
A step of determining whether or not the surname having the character data extracted in the first extraction step at the beginning is stored in a personal name storage means for storing the surname and the first name used for the personal name;
A second extraction step of extracting a predetermined number of character data from the text data excluding the character data extracted in the first extraction step when it is determined that the last name is stored in the personal name storage means; ,
Determining whether the name having the character data extracted in the second extraction step at the head is stored in the personal name storage means;
And a step of determining that the text data is an abbreviation when it is determined that the name is stored in the personal name storage means.

（付記７）
テキストデータが省略語であるか否かを判定する省略語判定方法において、
前記テキストデータから先頭の所定数の文字データを抽出する第１抽出ステップと、
該第１抽出ステップで抽出した文字データを先頭に有する構成語を含む複合語が、複数の複合語及び各複合語を構成する構成語を対応付けて記憶する複合語記憶手段に記憶してあるか否かを判断する判断ステップと、
前記複合語が前記複合語記憶手段に記憶してあると判断した場合、前記第１抽出ステップで抽出した文字データを除いた前記テキストデータから、先頭の所定数の文字データを抽出する第２抽出ステップと、
該第２抽出ステップで抽出した文字データを先頭に有する構成語が、前記判断ステップで前記複合語記憶手段に記憶してあると判断された複合語の構成語に含まれているか否かを判断するステップと、
前記構成語が含まれていると判断した場合、前記テキストデータが省略語であると判定するステップと
を含むことを特徴とする省略語判定方法。 (Appendix 7)
In the abbreviation determination method for determining whether text data is an abbreviation,
A first extraction step of extracting a predetermined number of character data at the beginning from the text data;
A compound word including a constituent word having the character data extracted in the first extraction step at the head is stored in a compound word storage unit that stores a plurality of compound words and the constituent words constituting each compound word in association with each other. A determination step for determining whether or not,
A second extraction for extracting a predetermined number of character data from the text data excluding the character data extracted in the first extraction step when it is determined that the compound word is stored in the compound word storage means; Steps,
It is determined whether or not the constituent word having the character data extracted in the second extraction step at the head is included in the constituent words of the compound word determined to be stored in the compound word storage means in the determination step. And steps to
And a step of determining that the text data is an abbreviation when it is determined that the constituent word is included.

（付記８）
コンピュータに、テキストデータが省略語であるか否かを判定させるためのコンピュータプログラムにおいて、
人名に用いられる姓及び名をそれぞれ記憶する人名記憶手段を備えたコンピュータに、
前記テキストデータから先頭の所定数の文字データを抽出する第１抽出ステップと、
該第１抽出ステップで抽出した文字データを先頭に有する姓が前記人名記憶手段に記憶してあるか否かを判断するステップと、
前記姓が前記人名記憶手段に記憶してあると判断した場合、前記第１抽出ステップで抽出した文字データを除いた前記テキストデータから、先頭の所定数の文字データを抽出する第２抽出ステップと、
該第２抽出ステップで抽出した文字データを先頭に有する名が前記人名記憶手段に記憶してあるか否かを判断するステップと、
前記名が前記人名記憶手段に記憶してあると判断した場合、前記テキストデータが省略語であると判定するステップと
を実行させるためのコンピュータプログラム。 (Appendix 8)
In a computer program for causing a computer to determine whether text data is an abbreviation,
In a computer equipped with personal name storage means for storing the last name and first name used for the personal name,
A first extraction step of extracting a predetermined number of character data at the beginning from the text data;
Determining whether a surname having the character data extracted in the first extraction step at the head is stored in the personal name storage means;
A second extraction step of extracting a predetermined number of character data from the text data excluding the character data extracted in the first extraction step when it is determined that the last name is stored in the personal name storage means; ,
Determining whether the name having the character data extracted in the second extraction step at the head is stored in the personal name storage means;
And a step of determining that the text data is an abbreviation when it is determined that the name is stored in the personal name storage means.

（付記９）
コンピュータに、テキストデータが省略語であるか否かを判定させるためのコンピュータプログラムにおいて、
複数の複合語及び各複合語を構成する構成語を対応付けて記憶する複合語記憶手段を備えたコンピュータに、
前記テキストデータから先頭の所定数の文字データを抽出する第１抽出ステップと、
該第１抽出ステップで抽出した文字データを先頭に有する構成語を含む複合語が前記複合語記憶手段に記憶してあるか否かを判断する判断ステップと、
前記複合語が前記複合語記憶手段に記憶してあると判断した場合、前記第１抽出ステップで抽出した文字データを除いた前記テキストデータから、先頭の所定数の文字データを抽出する第２抽出ステップと、
該第２抽出ステップで抽出した文字データを先頭に有する構成語が、前記判断ステップで前記複合語記憶手段に記憶してあると判断された複合語の構成語に含まれているか否かを判断するステップと、
前記構成語が含まれていると判断した場合、前記テキストデータが省略語であると判定するステップと
を実行させるためのコンピュータプログラム。 (Appendix 9)
In a computer program for causing a computer to determine whether text data is an abbreviation,
In a computer provided with a compound word storage means for storing a plurality of compound words and constituent words constituting each compound word in association with each other,
A first extraction step of extracting a predetermined number of character data at the beginning from the text data;
A determination step of determining whether or not a compound word including a constituent word having the character data extracted in the first extraction step at the head is stored in the compound word storage unit;
A second extraction for extracting a predetermined number of character data from the text data excluding the character data extracted in the first extraction step when it is determined that the compound word is stored in the compound word storage means; Steps,
It is determined whether or not the constituent word having the character data extracted in the second extraction step at the head is included in the constituent words of the compound word determined to be stored in the compound word storage means in the determination step. And steps to
A computer program for executing the step of determining that the text data is an abbreviation when it is determined that the constituent word is included.

（付記１０）
テキストデータを解析するテキスト解析装置において、
付記１乃至４のいずれかひとつに記載の省略語判定装置と、
形態素及びアクセント型を対応付けて記憶する形態素記憶手段と、
該形態素記憶手段の記憶内容に基づいて、テキストデータを形態素に分割する形態素分割手段と、
前記形態素記憶手段の記憶内容に基づいて、前記形態素分割手段が分割した形態素のそれぞれにアクセント型を付与する手段とを備え、
前記省略語判定装置は、前記形態素記憶手段に記憶されていない形態素が省略語であるか否かを判定するように構成されており、
前記省略語判定装置によって省略語であると判定された形態素に所定のアクセント型を付与するアクセント付与手段を備えることを特徴とするテキスト解析装置。 (Appendix 10)
In a text analysis device that analyzes text data,
The abbreviation determination device according to any one of appendices 1 to 4,
Morpheme storage means for storing morphemes and accent types in association with each other;
Morpheme dividing means for dividing text data into morphemes based on the stored contents of the morpheme storage means;
Means for giving an accent type to each of the morphemes divided by the morpheme dividing means based on the storage contents of the morpheme storage means,
The abbreviation determination device is configured to determine whether or not a morpheme that is not stored in the morpheme storage unit is an abbreviation,
A text analysis apparatus comprising: an accent imparting unit that imparts a predetermined accent type to a morpheme determined to be an abbreviation by the abbreviation determination apparatus.

（付記１１）
省略語及びアクセント型を対応付けて記憶する省略語記憶手段を備え、
前記アクセント付与手段は、
前記省略語記憶手段の記憶内容に基づいて、前記省略語判定装置によって省略語であると判定された形態素のそれぞれにアクセント型を付与する手段と、
前記省略語記憶手段に記憶されていない形態素に所定のアクセント型を付与する手段とを備えることを特徴とする付記１０に記載のテキスト解析装置。 (Appendix 11)
Abbreviation storage means for storing abbreviations and accent types in association with each other;
The accent giving means is
Means for assigning an accent type to each of the morphemes determined to be abbreviations by the abbreviation determination device based on the stored contents of the abbreviation storage means;
The text analysis apparatus according to appendix 10, further comprising means for giving a predetermined accent type to a morpheme not stored in the abbreviation storage means.

（付記１２）
テキストデータを解析するテキスト解析装置において、
付記５に記載の省略語判定装置と、
テキストデータ及びアクセント型を対応付けて記憶するテキスト記憶手段とを備え、
前記省略語判定装置の分割手段は、前記テキスト記憶手段の記憶内容に基づいて、文書データをテキストデータに分割するように構成されており、
前記テキスト記憶手段の記憶内容に基づいて、前記分割手段が分割したテキストデータのそれぞれにアクセント型を付与する手段を備え、
前記省略語判定装置は、前記テキスト記憶手段に記憶されていないテキストデータが省略語であるか否かを判定するように構成されており、
前記省略語判定装置によって省略語であると判定されたテキストデータに所定のアクセント型を付与する手段を備えることを特徴とするテキスト解析装置。 (Appendix 12)
In a text analysis device that analyzes text data,
An abbreviation determination device according to appendix 5,
Text storage means for storing text data and an accent type in association with each other,
The dividing means of the abbreviation determination device is configured to divide document data into text data based on the stored contents of the text storage means,
Means for giving an accent type to each of the text data divided by the dividing means based on the stored contents of the text storing means;
The abbreviation determination device is configured to determine whether text data not stored in the text storage means is an abbreviation,
A text analysis apparatus comprising means for giving a predetermined accent type to text data determined to be an abbreviation by the abbreviation determination apparatus.

（付記１３）
省略語及びアクセント型を対応付けて記憶する省略語記憶手段を備え、
前記アクセント付与手段は、
前記省略語記憶手段の記憶内容に基づいて、前記省略語判定装置によって省略語であると判定されたテキストデータのそれぞれにアクセント型を付与する手段と、
前記省略語記憶手段に記憶されていないテキストデータに所定のアクセント型を付与する手段とを備えることを特徴とする付記１２に記載のテキスト解析装置。 (Appendix 13)
Abbreviation storage means for storing abbreviations and accent types in association with each other;
The accent giving means is
Means for assigning an accent type to each of the text data determined to be abbreviations by the abbreviation determination device based on the stored contents of the abbreviation storage means;
The text analysis apparatus according to claim 12, further comprising means for giving a predetermined accent type to text data not stored in the abbreviation storage means.

（付記１４）
テキストデータから合成音声を生成する音声合成装置において、
付記１０又は１１に記載のテキスト解析装置と、
該テキスト解析装置の形態素分割手段が分割した形態素及び各形態素に付与されたアクセント型に基づいて、各形態素に対応する韻律を生成する韻律生成手段と、
該韻律生成手段が生成した韻律に基づいて合成音声を生成する波形生成手段と
を備えることを特徴とする音声合成装置。 (Appendix 14)
In a speech synthesizer that generates synthesized speech from text data,
The text analysis device according to appendix 10 or 11,
Prosody generation means for generating prosody corresponding to each morpheme based on the morpheme divided by the morpheme dividing means of the text analysis device and the accent type assigned to each morpheme;
A speech synthesizer comprising: waveform generation means for generating synthesized speech based on the prosody generated by the prosody generation means.

（付記１５）
テキストデータから合成音声を生成する音声合成装置において、
付記１２又は１３に記載のテキスト解析装置と、
前記省略語判定装置の分割手段が分割したテキストデータ及び前記テキスト解析装置が各テキストデータに付与したアクセント型に基づいて、各テキストデータに対応する韻律を生成する韻律生成手段と、
該韻律生成手段が生成した韻律に基づいて合成音声を生成する波形生成手段と
を備えることを特徴とする音声合成装置。 (Appendix 15)
In a speech synthesizer that generates synthesized speech from text data,
The text analysis device according to appendix 12 or 13,
Prosody generation means for generating prosody corresponding to each text data based on the text data divided by the dividing means of the abbreviation determination device and the accent type given to each text data by the text analysis device;
A speech synthesizer comprising: waveform generation means for generating synthesized speech based on the prosody generated by the prosody generation means.

実施形態１に係るテキスト解析装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the text analysis apparatus which concerns on Embodiment 1. FIG. 言語辞書の登録内容を示す模式図である。It is a schematic diagram which shows the registration content of a language dictionary. 人名辞書の登録内容を示す模式図である。It is a schematic diagram which shows the registration content of a personal name dictionary. 複合語辞書の登録内容を示す模式図である。It is a schematic diagram which shows the registration content of a compound word dictionary. テキスト解析装置の機能構成例を示す機能ブロック図である。It is a functional block diagram which shows the function structural example of a text analysis apparatus. テキスト解析処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a text analysis process. 省略語判定処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of an abbreviation determination process. 省略語判定処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of an abbreviation determination process. 省略語判定処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of an abbreviation determination process. 省略語判定処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of an abbreviation determination process. 省略語判定処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of an abbreviation determination process. 実施形態２のテキスト解析装置の機能構成例を示す機能ブロック図である。It is a functional block diagram which shows the function structural example of the text analysis apparatus of Embodiment 2. FIG. 共起辞書の登録内容を示す模式図である。It is a schematic diagram which shows the registration content of a co-occurrence dictionary. テキスト解析処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a text analysis process. 実施形態３のテキスト解析装置の機能構成例を示す機能ブロック図である。It is a functional block diagram which shows the function structural example of the text analysis apparatus of Embodiment 3. FIG. 例外省略語辞書の登録内容を示す模式図である。It is a schematic diagram which shows the registration content of an exception abbreviation dictionary. テキスト解析処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a text analysis process. 実施形態４に係る音声合成装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the speech synthesizer which concerns on Embodiment 4. 実施形態４の音声合成装置の機能構成例を示す機能ブロック図である。FIG. 10 is a functional block diagram illustrating a functional configuration example of a speech synthesis device according to a fourth embodiment. 合成音声の生成処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the production | generation process of synthetic | combination speech.

Explanation of symbols

１０テキスト解析装置
１制御部
１１形態素解析部（形態素分割手段）
１２省略語判定部（第１抽出手段、第２抽出手段、判定手段、判断手段）
１３省略語アクセント付与部（アクセント付与手段）
４ａ言語辞書（形態素記憶手段）
４ｂ人名辞書（人名記憶手段）
４ｃ複合語辞書（複合語記憶手段）
４ｄ共起辞書（共起データ記憶手段）
４ｅ例外省略語辞書（省略語記憶手段）
２０韻律生成部（韻律生成手段）
３０波形生成部（波形生成手段） DESCRIPTION OF SYMBOLS 10 Text analyzer 1 Control part 11 Morphological analysis part (morpheme division means)
12 abbreviation determination unit (first extraction means, second extraction means, determination means, determination means)
13 Abbreviated accent giving part (accent giving means)
4a Language dictionary (morpheme storage means)
4b Personal name dictionary (person name storage means)
4c Compound word dictionary (compound word storage means)
4d Co-occurrence dictionary (co-occurrence data storage means)
4e Exception abbreviation dictionary (abbreviation storage means)
20 Prosody generation part (prosody generation means)
30 Waveform generator (waveform generator)

Claims

In an abbreviation determination device that determines whether text data is an abbreviation,
Personal name storage means for storing the last name and first name used for the personal name;
First extraction means for extracting a predetermined number of character data at the beginning from the text data;
Means for determining whether or not a last name having the character data extracted by the first extraction means at the head is stored in the personal name storage means;
A second extraction means for extracting a predetermined number of character data at the beginning from the text data excluding the character data extracted by the first extraction means when it is determined that the last name is stored in the personal name storage means; ,
Means for determining whether or not a name having character data extracted by the second extraction means at the head is stored in the personal name storage means;
An abbreviation determination device comprising: determination means for determining that the text data is an abbreviation when it is determined that the name is stored in the personal name storage means.

In an abbreviation determination device that determines whether text data is an abbreviation,
A compound word storage means for storing a plurality of compound words and constituent words constituting each compound word in association with each other;
First extraction means for extracting a predetermined number of character data at the beginning from the text data;
Determining means for determining whether or not a compound word including a constituent word having the character data extracted by the first extracting means is stored in the compound word storage means;
A second extraction for extracting a predetermined number of character data from the text data excluding the character data extracted by the first extraction means when it is determined that the compound word is stored in the compound word storage means; Means,
It is determined whether or not the constituent word having the character data extracted by the second extraction means at the head is included in the constituent words of the composite word determined by the determination means to be stored in the compound word storage means. Means,
An abbreviation determination device comprising: a determination unit that determines that the text data is an abbreviation when it is determined that the constituent word is included.

A dividing unit for dividing document data including a plurality of text data into text data;
The first extracting means is configured to extract a predetermined number of character data at the beginning from each of the divided text data,
The second extraction means is configured to extract a predetermined number of character data at the beginning from each of the divided text data excluding the character data extracted by the first extraction means,
The determination means is configured to determine whether each of the divided text data is an abbreviation candidate,
Co-occurrence data storage means for storing a plurality of text data and co-occurrence data co-occurring with each text data in association with each other;
Means for acquiring from the co-occurrence data storage means co-occurrence data corresponding to the text data determined by the determination means to be abbreviation candidates;
Means for determining whether the text data in the document data includes co-occurrence data acquired from the co-occurrence data storage means;
2. The method according to claim 1, further comprising: means for determining that the text data determined by the determining means as an abbreviation candidate is an abbreviation when it is determined that the co-occurrence data is included. 2. The abbreviation determination device according to 2.

In a computer program for causing a computer to determine whether text data is an abbreviation,
In a computer equipped with personal name storage means for storing the last name and first name used for the personal name,
A first extraction step of extracting a predetermined number of character data at the beginning from the text data;
Determining whether a surname having the character data extracted in the first extraction step at the head is stored in the personal name storage means;
A second extraction step of extracting a predetermined number of character data from the text data excluding the character data extracted in the first extraction step when it is determined that the last name is stored in the personal name storage means; ,
Determining whether the name having the character data extracted in the second extraction step at the head is stored in the personal name storage means;
And a step of determining that the text data is an abbreviation when it is determined that the name is stored in the personal name storage means.

In a computer program for causing a computer to determine whether text data is an abbreviation,
In a computer provided with a compound word storage means for storing a plurality of compound words and constituent words constituting each compound word in association with each other,
A first extraction step of extracting a predetermined number of character data at the beginning from the text data;
A determination step of determining whether or not a compound word including a constituent word having the character data extracted in the first extraction step at the head is stored in the compound word storage unit;
A second extraction for extracting a predetermined number of character data from the text data excluding the character data extracted in the first extraction step when it is determined that the compound word is stored in the compound word storage means; Steps,
It is determined whether or not the constituent word having the character data extracted in the second extraction step at the head is included in the constituent words of the compound word determined to be stored in the compound word storage means in the determination step. And steps to
A computer program for executing the step of determining that the text data is an abbreviation when it is determined that the constituent word is included.

In a text analysis device that analyzes text data,
An abbreviation determination device according to claim 1 or 2,
Morpheme storage means for storing morphemes and accent types in association with each other;
Morpheme dividing means for dividing text data into morphemes based on the stored contents of the morpheme storage means;
Means for giving an accent type to each of the morphemes divided by the morpheme dividing means based on the storage contents of the morpheme storage means,
The abbreviation determination device is configured to determine whether or not a morpheme that is not stored in the morpheme storage unit is an abbreviation,
A text analysis apparatus comprising: an accent imparting unit that imparts a predetermined accent type to a morpheme determined to be an abbreviation by the abbreviation determination apparatus.

In a text analysis device that analyzes text data,
An abbreviation determination device according to claim 3,
Text storage means for storing text data and an accent type in association with each other,
The dividing means of the abbreviation determination device is configured to divide document data into text data based on the stored contents of the text storage means,
Means for giving an accent type to each of the text data divided by the dividing means based on the stored contents of the text storing means;
The abbreviation determination device is configured to determine whether text data not stored in the text storage means is an abbreviation,
A text analysis apparatus comprising means for giving a predetermined accent type to text data determined to be an abbreviation by the abbreviation determination apparatus.

In a speech synthesizer that generates synthesized speech from text data,
A text analysis device according to claim 6;
Prosody generation means for generating prosody corresponding to each morpheme based on the morpheme divided by the morpheme dividing means of the text analysis device and the accent type assigned to each morpheme;
A speech synthesizer comprising: waveform generation means for generating synthesized speech based on the prosody generated by the prosody generation means.