JP2009169513A

JP2009169513A - Device, method and program for estimating nickname

Info

Publication number: JP2009169513A
Application number: JP2008004364A
Authority: JP
Inventors: Yumi Wakagi; 裕美若木; Kazuo Sumita; 一男住田; Miyoshi Fukui; 美佳福井; Hiroko Fujii; 寛子藤井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2008-01-11
Filing date: 2008-01-11
Publication date: 2009-07-30
Anticipated expiration: 2028-01-11
Also published as: JP5248121B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a nickname estimation device for more properly acquiring a nickname from such designation as a person's name. <P>SOLUTION: The nickname estimation device includes: a rule storage part 121 for storing the generation rule of the candidates of a nickname including position information indicating the positions of characters included in the candidates of the nickname among characters included in designation and a predetermined additional character string; a designation input part 101 for inputting designation; a candidate generation part 102 for acquiring the characters of the positions shown by the position information of the generation rule among the characters included in the input designation, and for generating the candidates of the nickname by connecting the acquired characters and the additional character string of the generation rule; and an output part 103 for outputting the generated candidates of the nickname. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、名称を入力として愛称を推定して出力する装置、方法およびプログラムに関する。 The present invention relates to an apparatus, a method, and a program for estimating and outputting a nickname using a name as an input.

近年、情報検索技術は広く浸透し多様な場面で使われるようになってきた。一般的に、情報検索を実行する際には、検索キーワードを入力として与え、入力された検索キーワードに一致または強く関連するような文書を結果として返す。例えば、ある人物の名称を検索キーワードとして、当該人物に関連する文書をＷｅｂページから検索することなどが行われている。 In recent years, information retrieval technology has been widely used and used in various situations. Generally, when performing an information search, a search keyword is given as an input, and a document that matches or is strongly related to the input search keyword is returned as a result. For example, using a person's name as a search keyword, a document related to the person is searched from a Web page.

人名等の名称には正式名称のほかに、他の呼び名として、愛称、略称、別名、および通称等が存在する。このため、正式名称と愛称等との対応関係を知らなければ、一方のみを検索キーワードとして検索することになり、必要な情報を網羅的に探すことが困難となる。例えば、人名「Ｏ田Ｙ美」で愛称「Ｙりん」である人物がいたとき、「Ｏ田Ｙ美」だけを検索キーワードとすると、「Ｙりん」のみを含む情報を取得することができない。 In addition to the official name, names such as personal names include nicknames, abbreviations, aliases, and common names as other names. For this reason, if the correspondence between the official name and the nickname is not known, only one is searched as a search keyword, and it is difficult to search for necessary information comprehensively. For example, when there is a person with the personal name “Oda Ymi” and the nickname “YRin”, if only “Oda Ymi” is the search keyword, information including only “YRin” cannot be acquired.

一方、音声認識技術を用いたヒューマンインタフェースの実用化も進んでいる。そして、今後は音声対話や音声による情報検索が行われると想定される。音声による情報検索処理では、テキストによる情報検索処理と比較して、愛称等の表現が使われる可能性が高い。このため、ユーザの自由な発話を促進するためにも、愛称等の表現を正式名称に対応付けて認識できることが必要である。 On the other hand, human interface using speech recognition technology is also being put into practical use. In the future, it is assumed that information retrieval by voice dialogue or voice will be performed. In the information search process by voice, there is a high possibility that expressions such as nicknames are used compared to the information search process by text. For this reason, in order to promote a user's free speech, it is necessary to be able to recognize expressions such as nicknames in association with formal names.

一般に音声認識では、発話された音声と照合するための音声認識辞書が用いられる。そして、音声認識辞書に存在しない未登録語は音声認識することができない。このため、愛称等の表現が音声認識辞書に単語登録されていなければ、音声で発話された愛称を認識することさえできない。さらに、検索対象となりうる人物の人名や愛称は日々更新されうるため、更新されうる人名等を正常に認識できるように音声認識辞書も更新する必要があるが、コストが大きい。 Generally, in speech recognition, a speech recognition dictionary for collating with spoken speech is used. An unregistered word that does not exist in the speech recognition dictionary cannot be recognized by speech. For this reason, if a nickname or the like is not registered in the speech recognition dictionary, the nickname spoken by speech cannot be recognized. Furthermore, since the names and nicknames of persons who can be searched can be updated every day, it is necessary to update the speech recognition dictionary so that the names and the like that can be updated can be recognized normally, but the cost is high.

例えば、検索対象となる人物としては、テレビ番組に出演するような芸能人が挙げられる。ところが、テレビ番組等では、新規な芸能人が登場し、その愛称が急速に広まることが多い。このように日々更新されうる人名等に対応するために、音声認識辞書を更新し続ける必要がある。 For example, a person to be searched includes an entertainer who appears in a television program. However, new entertainers appear in television programs and the like, and their nicknames are often spread rapidly. In order to cope with names and the like that can be updated every day in this way, it is necessary to continue to update the speech recognition dictionary.

なお、略称と正式名称との対応関係を獲得するための技術として、入力語を語分割し、分割した単語の頭文字を組み合わせる技術や、公的機関、大企業などの正式名称とその略称とを対応づけてデータベースに保持する技術が提案されている（例えば、特許文献１）。しかし、上述の音声認識辞書と同様に、日々更新されうる人名などを対象とする場合は、最新のデータベースを人手で提供し続けるためのコストが増大する。また、単語の頭文字を組み合わせるというだけの簡単な略称生成方法では、正式名称に含まれる語句以外の文字を含みうる愛称を生成できない。例えば、正式名称「Ｎ弁護士連盟」から略称「Ｎ弁連」などは生成できるが、正式名称「Ｎ弁護士連盟」に含まれる語句以外の文字を含みうる愛称は生成できない。 In addition, as a technique for acquiring the correspondence between abbreviations and official names, the input words are divided into words, the combination of the initials of the divided words, and the official names and abbreviations of public institutions, large corporations, etc. Has been proposed (for example, Patent Document 1). However, as in the case of the above-described speech recognition dictionary, when a person name that can be updated daily is targeted, the cost for continuing to provide the latest database manually increases. In addition, a simple abbreviation generation method that only combines the initial letters of words cannot generate nicknames that can include characters other than words included in official names. For example, although the abbreviation “N Benren” can be generated from the official name “N Lawyer Federation”, a nickname that can include characters other than the words included in the official name “N Lawyer Federation” cannot be generated.

また、特許文献２では、略称生成ルールを用いて、名称から略称を生成する技術が提案されている。この方法では、入力語を分割した各基本語のうち、略称生成に用いる箇所を指定するような略称生成ルールを保持する。そして、この略称生成ルールによって入力語から略称候補が生成される。また、語検索装置の対象文書に付与されたキーワード中に略称候補が存在すれば略称として判定され、略称による語検索も行われる。 Patent Document 2 proposes a technique for generating an abbreviation from a name using an abbreviation generation rule. In this method, an abbreviation generation rule that specifies a location used for abbreviation generation among the basic words obtained by dividing the input word is held. Then, abbreviation candidates are generated from the input words by this abbreviation generation rule. Further, if there is an abbreviation candidate in a keyword assigned to the target document of the word search device, it is determined as an abbreviation, and a word search using the abbreviation is also performed.

なお、略称とは、長い名称を省略したものであるため、正式名称に関連した文字列のみで構成される。例えば、人名「Ｐ田Ａ也」、人名の読み「ピイタエイヤ」、愛称「ピイエイ」である人物の場合、人名の読みに含まれる文字列のみで構成される略称が愛称となっていると言える。一方、人名「Ｓ田Ｕ朗」、愛称「トケイ王子」である人物のように、人名とは全く異なる表現が愛称とされる場合がある。また、人名「Ｉ田Ｍ也」、人名の読み「アイタメツヤ」である人物に対して、人名の読みの一部を利用して元の人名とは関係のない文字を挿入し、新規の単語である愛称「メッチー」を作り出す場合がある。後者の２つの場合には、特許文献２のような略称の生成方法では愛称を生成することはできない。 The abbreviation is an abbreviation for a long name, and is composed of only a character string related to the official name. For example, in the case of a person having a personal name “P field Aya”, a personal name reading “Pita Eiya”, and a nickname “Pii A”, it can be said that an abbreviation composed only of a character string included in the personal name reading is a nickname. On the other hand, an expression completely different from the personal name may be nicknamed, such as a person with the personal name “Suda Uro” and the nickname “Tokei Prince”. In addition, for a person who has a person name “Ida Miya” and a person name reading “Aitametsuya”, a character that is not related to the original person name is inserted using a part of the person name reading, and a new word is used. A nickname “Metchy” may be created. In the latter two cases, the nickname cannot be generated by the abbreviation generation method as in Patent Document 2.

また、特許文献２では、語検索装置の対象文書にキーワードが付与されており、キーワードとして与えられている単語に略称候補が存在すればその略語候補が略語として選定される。例えば、「生命保険」という検索キーワードに対し、「生命保」「生保」「生保険」といった略語候補が存在するとき、文書に付与されたキーワードとして「生保」が存在すれば「生保」だけが略語として認定される。 In Patent Document 2, a keyword is assigned to a target document of a word search device, and if an abbreviation candidate exists in a word given as a keyword, the abbreviation candidate is selected as an abbreviation. For example, if there are abbreviation candidates such as “life insurance”, “life insurance”, and “life insurance” for the search keyword “life insurance”, if “life insurance” exists as a keyword assigned to the document, only “life insurance” Certified as an abbreviation.

すなわち、特許文献２では、略称候補がキーワードとして付与された文書が存在することが前提となっている。しかし、例えば、Ｗｅｂページから検索する場合のように、愛称がキーワードとして語分割されているデータが存在しない場合には、愛称候補が生成された後、生成された愛称候補から適切な愛称候補を選定することは困難となる。 That is, in Patent Document 2, it is assumed that there is a document to which abbreviation candidates are assigned as keywords. However, for example, when there is no data in which the nickname is divided into keywords as a keyword as in the case of searching from a Web page, an appropriate nickname candidate is selected from the generated nickname candidate after the nickname candidate is generated. It becomes difficult to select.

一方、非特許文献１では、呼称と正式名称とが、「（呼称）こと（正式名称）」という日本語独特の典型表現で表されることを利用して呼称を抽出する技術が提案されている。具体的には、非特許文献１の方法では、「こと（正式名称）」を検索キーワードとしてＷｅｂ検索を行い、「こと」の前に出現する文字から正式名称に対応する呼称を抽出している。この方法によれば、上記の「ピイエイ」「トケイ王子」「メッチー」のいずれの愛称も取得できる可能性がある。 On the other hand, Non-Patent Document 1 proposes a technique for extracting a name by using the fact that the name and the formal name are represented by a typical Japanese expression “(name) thing (formal name)”. Yes. Specifically, in the method of Non-Patent Document 1, a web search is performed using “ko (formal name)” as a search keyword, and a name corresponding to the formal name is extracted from characters appearing before “ko”. . According to this method, there is a possibility that any nicknames of the above-mentioned “Peiei”, “Prince Tokei”, and “Mechi” may be acquired.

特開２００３−３３３１６１号公報JP 2003-333161 A 特開平１１−２５１１７号公報Japanese Patent Laid-Open No. 11-25117 外間智子ほか、“Web データを用いた人物の呼称抽出”、DBSJ Letters Vol.5 No.2Tomoko Tooma et al., “Extracting Person Names Using Web Data”, DBSJ Letters Vol.5 No.2

しかしながら、非特許文献１の方法であっても、正式名称に対応する正しい愛称を取得できない場合がある。非特許文献１では、「こと」の前に出現する文字列中から呼称部分を抽出する際に形態素解析器を使用している。このため、形態素解析器が利用する辞書中に愛称が単語として登録されておらず、愛称が名前から作られた造語であるような場合には、愛称を文字列中から切り分けることが困難となる。例えば、非特許文献１の方法では、正しい愛称が「Ｕーちゃん」であっても、「ーちゃん」が愛称として抽出される場合がある。 However, even with the method of Non-Patent Document 1, a correct nickname corresponding to the official name may not be acquired. In Non-Patent Document 1, a morphological analyzer is used to extract a nominal part from a character string that appears before “Koto”. For this reason, when the nickname is not registered as a word in the dictionary used by the morphological analyzer and the nickname is a coined word made from the name, it becomes difficult to separate the nickname from the character string. . For example, in the method of Non-Patent Document 1, even if the correct nickname is “U-chan”, “-chan” may be extracted as the nickname.

また、非特許文献１では、上述のように「（愛称）こと（正式名称）」などの典型表現を利用しているが、検索対象文書中で愛称がそのような典型表現で記載されていない場合には、愛称を取得することができない。 Further, in Non-Patent Document 1, a typical expression such as “(nickname) thing (official name)” is used as described above, but the nickname is not described in such a typical expression in the search target document. In case you can't get a nickname.

本発明は、上記に鑑みてなされたものであって、人名等の名称から愛称をより適切に取得できる装置、方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above, and an object thereof is to provide an apparatus, a method, and a program that can more appropriately acquire a nickname from a name such as a person's name.

上述した課題を解決し、目的を達成するために、本発明は、名称から前記名称の愛称を推定する愛称推定装置であって、前記名称に含まれる文字のうち、前記愛称の候補に含める文字の位置を表す位置情報と、予め定められた付加文字列とを含む前記愛称の候補の生成規則を記憶する規則記憶部と、前記名称を入力する名称入力部と、入力された前記名称に含まれる文字のうち、前記生成規則の前記位置情報で表される位置の文字を取得し、取得した文字と前記生成規則の前記付加文字列とを結合した前記愛称の候補を生成する生成部と、生成された前記愛称の候補を出力する出力部と、を備えたことを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention is a nickname estimation device that estimates a nickname of a name from a name, and includes characters included in the nickname candidate among characters included in the name Included in the input name, a rule storage unit that stores generation information of the nickname candidate including position information representing the position of the character and a predetermined additional character string, a name input unit that inputs the name A generation unit that acquires a character at a position represented by the position information of the generation rule, and generates the nickname candidate that combines the acquired character and the additional character string of the generation rule; And an output unit that outputs the generated nickname candidates.

また、本発明は、上記装置を実行することができる方法およびプログラムである。 Further, the present invention is a method and program capable of executing the above-described apparatus.

本発明によれば、人名等の名称から愛称をより適切に取得できるという効果を奏する。 According to the present invention, there is an effect that a nickname can be acquired more appropriately from a name such as a person's name.

以下に添付図面を参照して、この発明にかかる装置、方法およびプログラムの最良な実施の形態を詳細に説明する。 Exemplary embodiments of an apparatus, a method, and a program according to the present invention will be described below in detail with reference to the accompanying drawings.

（第１の実施の形態）
上述のように、従来は、愛称、略称、および別名等は、予め人手で作成してデータベースに登録していた。また、略称の場合は、所定の略称生成パターンによって、正式名称から略称を推定する方法が用いられていた。しかし、テレビ番組の番組表データ（ＥＰＧ（Electronic Program Guide）データ）のような日々更新される情報では、例えば新しい芸能人が現れることによってデータの更新が頻繁に必要となる。このため、辞書に蓄えておくだけは充分に対応することが困難である。また、辞書の更新を人手で行うのはコストがかかる。 (First embodiment)
As described above, conventionally, nicknames, abbreviations, aliases, and the like have been manually created and registered in a database. In the case of an abbreviation, a method of estimating an abbreviation from a formal name using a predetermined abbreviation generation pattern has been used. However, information that is updated daily, such as program guide data (EPG (Electronic Program Guide) data) of television programs, needs to be updated frequently due to the appearance of new entertainers, for example. For this reason, it is difficult to respond sufficiently only by storing it in the dictionary. In addition, it is expensive to manually update the dictionary.

第１の実施の形態にかかる愛称推定装置は、予め定められた愛称の候補（愛称候補）の生成規則（愛称生成ルール）に従って、入力された名称に対する愛称候補を生成する。 The nickname estimation apparatus according to the first embodiment generates a nickname candidate for an input name in accordance with a predetermined nickname candidate (nickname candidate) generation rule (nickname generation rule).

図１は、第１の実施の形態にかかる愛称推定装置１００の構成を示すブロック図である。図１に示すように、愛称推定装置１００は、ルール記憶部１２１と、名称入力部１０１と、候補生成部１０２と、出力部１０３と、を備えている。 FIG. 1 is a block diagram illustrating a configuration of a nickname estimation apparatus 100 according to the first embodiment. As shown in FIG. 1, the nickname estimation device 100 includes a rule storage unit 121, a name input unit 101, a candidate generation unit 102, and an output unit 103.

ルール記憶部１２１は、入力された名称の愛称候補を生成するための愛称生成ルールを記憶する。図２は、ルール記憶部１２１に記憶された愛称生成ルールの一例を示す図である。図２に示すように、愛称生成ルールは、入力された名称のうち、いずれの文字種のいずれの位置の文字を使って愛称を生成するかを特定する情報（記号）と、愛称の一部を構成する付加文字列（「ちゃん」など）とを含んでいる。 The rule storage unit 121 stores a nickname generation rule for generating a nickname candidate for the input name. FIG. 2 is a diagram illustrating an example of a nickname generation rule stored in the rule storage unit 121. As shown in FIG. 2, the nickname generation rule includes information (symbols) for specifying a nickname to be generated using a character at which position of which character type among the input names, and a part of the nickname. It includes additional character strings (such as “chan”) that make up.

本実施の形態では、３桁の数値による記号で文字種と文字の位置を特定可能としている。１００の位の数値は文字種を表し、「１」が正式名称、「２」がひらがな表記、および「３」がカタカナ表記にそれぞれ対応する。１０の位の数値は、名称を構成する各語句単位の、名称の先頭を基準とした位置を識別するための情報を表す。例えば、姓と名とを語句単位とする人名の場合、１０の位の数値が「１」の場合は姓を表し、「２」の場合は名を表す。なお、例えば、グループ名、団体名等の名称では、３つ以上の語句単位から構成される場合があるため、１０の位の数値は３以上となりうる。以下では、１０の位の数値は、「１（姓）」および「２（名）」のいずれかを取る場合を例に説明する。１の位の数値は、各語句単位内での先頭からの位置を識別するための情報を表す。 In the present embodiment, it is possible to specify the character type and the character position by using a three-digit numerical symbol. The numerical value of the hundreds represents the character type, with “1” corresponding to the official name, “2” corresponding to hiragana notation, and “3” corresponding to katakana notation. The numerical value at the 10's position represents information for identifying the position of each word unit constituting the name with respect to the beginning of the name. For example, in the case of a person name whose first and last names are word units, if the numerical value at the 10th place is “1”, it represents the last name, and “2” represents the first name. For example, in a name such as a group name or an organization name, there are cases where the name is composed of three or more word units, so that the numerical value at the tenth place can be three or more. In the following, a case where the numerical value at the 10th place takes either “1 (last name)” or “2 (first name)” will be described as an example. The numerical value of 1's represents information for identifying the position from the beginning in each word unit.

なお、同図では愛称生成ルールの欄の右に愛称生成ルールを適用した例を記載しているが、実際のルール記憶部１２１には愛称生成ルールのみが記憶されている。 In the figure, an example in which the nickname generation rule is applied is described to the right of the nickname generation rule column, but only the nickname generation rule is stored in the actual rule storage unit 121.

また、ルール記憶部１２１は、ＨＤＤ（Hard Disk Drive）、光ディスク、メモリカード、ＲＡＭ（Random Access Memory）などの一般的に利用されているあらゆる記憶媒体により構成することができる。 Further, the rule storage unit 121 can be configured by any generally used storage medium such as an HDD (Hard Disk Drive), an optical disk, a memory card, and a RAM (Random Access Memory).

名称入力部１０１は、人名等の名称の入力を受け付ける。なお、名称入力部１０１は、語句単位を識別可能な形式で名称の入力を受付ける。例えば、人名の場合、名称入力部１０１は、姓と名とをそれぞれ独立に入力する。 The name input unit 101 receives an input of a name such as a person name. Note that the name input unit 101 receives an input of a name in a format that can identify a word unit. For example, in the case of a person name, the name input unit 101 inputs a family name and a first name independently.

また、名称入力部１０１は、正式名称とともに、正式名称のひらがな表記およびカタカナ表記の入力を受付ける。例えば、人名が「Ｐ田Ａ也」の場合、名称入力部１０１は、姓名を分けた正式名称である姓の「Ｐ田」と名の「Ａ也」とともに、ひらがな表記の姓の「ぴいた」と名の「えいや」や、カタカナ表記の姓「ピイタ」と名の「エイヤ」の入力を受付ける。 In addition, the name input unit 101 accepts input of hiragana notation and katakana notation of the official name together with the official name. For example, when the personal name is “P field Aya”, the name input unit 101, together with the last name “P field” and the first name “A field”, which is an official name in which the first name and the last name are separated, the hiragana notation “Pita”. It accepts the input of “Eiya” as the first name and “Pita” as the last name in katakana and “Eiya” as the first name.

なお、名称の入力方法はこれに限られず、名称を構成する語句単位を識別可能な方法であればあらゆる方法を適用できる。例えば、名称入力部１０１が、スペースなどの所定の文字列で語句単位に区切られた名称の入力を受付けるように構成してもよい。 Note that the method for inputting a name is not limited to this, and any method can be applied as long as it is a method that can identify a word unit constituting a name. For example, the name input unit 101 may be configured to accept an input of a name divided in units of words by a predetermined character string such as a space.

また、名称を語句単位に区切らずに入力するように構成してもよい。この場合、名称入力部１０１は、例えば人名辞典を利用して入力された「Ｐ田Ａ也」を姓の「Ｐ田」と名の「Ａ也」に分けるように構成してもよい。また、ひらがな表記とカタカナ表記の入力を受付ける代わりに、名称入力部１０１が、人名辞典等を利用して入力された正式名称の読みを推定し、ひらがな表記やカタカナ表記を取得するように構成してもよい。 Moreover, you may comprise so that a name may be input, without dividing | segmenting into a phrase unit. In this case, for example, the name input unit 101 may be configured to divide “P field Aya” input using a personal name dictionary into a family name “P field” and a name “A field”. In addition, instead of accepting input of hiragana notation and katakana notation, the name input unit 101 is configured to estimate the reading of the official name input using a personal dictionary and acquire hiragana notation and katakana notation. May be.

候補生成部１０２は、ルール記憶部１２１に記憶されている愛称生成ルールを参照して、入力された名称に対する愛称候補を生成する。具体的には、候補生成部１０２は、入力された名称を記号に変換し、愛称生成ルール中の記号のうち、変換した記号と同一の記号を、変換した記号に対応する文字に置き換えることにより愛称候補を生成する。 The candidate generation unit 102 refers to the nickname generation rule stored in the rule storage unit 121 and generates a nickname candidate for the input name. Specifically, the candidate generation unit 102 converts the input name into a symbol, and replaces the same symbol as the converted symbol in the nickname generation rule with a character corresponding to the converted symbol. Generate nickname candidates.

ここで、名称から記号への変換方法の具体例について、図３〜図６を用いて説明する。図３は、入力された名称データの一例を示す図である。図３に示すように、以下では、漢字を含む正式名称である名称１と、ひらがな表記である名称２と、カタカナ表記である名称３とが入力された場合を例に説明する。なお、同図の記号欄は、各名称を記号に変換した結果を表している。変換方法の詳細は、それぞれ図４〜図６で説明する。 Here, the specific example of the conversion method from a name to a symbol is demonstrated using FIGS. FIG. 3 is a diagram illustrating an example of input name data. As shown in FIG. 3, hereinafter, a case where a name 1 that is an official name including kanji, a name 2 that is hiragana notation, and a name 3 that is katakana notation will be described as an example. In addition, the symbol column of the figure represents the result of converting each name into a symbol. Details of the conversion method will be described with reference to FIGS.

図４は、図３の名称１（正式名称）を記号に変換する例を示す図である。図４は、名称１に含まれる文字表記ごとに、文字種、語句単位の位置（姓か名か）、および語句単位内での文字位置と、変換結果である記号とを表している。名称１は、正式名称であるため、文字種は「１」となる。また、区切り文字であるスペースの前後のいずれに位置するかによって、各文字表記に対応する姓名欄が「１」および「２」のいずれかに設定される。さらに、姓または名内での各文字表記の文字位置が設定される。そして、文字種、姓名、および文字位置に対応する各数値を結合した３桁の数値が変換結果の記号として生成される。 FIG. 4 is a diagram illustrating an example of converting the name 1 (official name) in FIG. 3 into a symbol. FIG. 4 shows, for each character notation included in the name 1, the character type, the position in the phrase unit (last name or first name), the character position in the phrase unit, and the symbol that is the conversion result. Since name 1 is an official name, the character type is “1”. Also, the first and last name field corresponding to each character notation is set to either “1” or “2” depending on whether it is located before or after the space as a delimiter. Furthermore, the character position of each character notation in the last name or the first name is set. Then, a three-digit numerical value obtained by combining the numerical values corresponding to the character type, first name, and character position is generated as a symbol of the conversion result.

図５は、図３の名称２（ひらがな表記）を記号に変換する例を示す図である。名称２はひらがな表記であるため、文字種に「２」が設定される。また、図６は、図３の名称３（カタカナ表記）を記号に変換する例を示す図である。名称３はカタカナ表記であるため、文字種に「３」が設定される。 FIG. 5 is a diagram illustrating an example in which name 2 (hiragana notation) in FIG. 3 is converted into a symbol. Since name 2 is written in hiragana, “2” is set as the character type. FIG. 6 is a diagram illustrating an example of converting the name 3 (katakana notation) in FIG. 3 into a symbol. Since name 3 is in katakana notation, “3” is set as the character type.

候補生成部１０２は、このようにして各名称を記号に変換し、図３に示すような記号を生成する。そして、候補生成部１０２は、変換した記号と同一の記号を含む愛称生成ルールに、変換した記号に対応する文字を当てはめることによって、愛称候補を生成する。 The candidate generation unit 102 thus converts each name into a symbol, and generates a symbol as shown in FIG. Then, the candidate generation unit 102 generates a nickname candidate by applying a character corresponding to the converted symbol to a nickname generation rule including the same symbol as the converted symbol.

例えば、正式名称として図３の人名３０１（「Ｐ田Ａ也」）と、カタカナ表記３０２（「ピイタエイヤ」）とが入力され、図２の最上部の愛称生成ルール「３１１３１２３２１３２２」に対して、入力された名称を適用する場合を考える。この場合、愛称生成ルール内の記号はいずれも１００の位の数字が３であるため、候補生成部１０２は、カタカナ表記の文字を当てはめて愛称候補を生成する。具体的には、候補生成部１０２は、図２の最上部の愛称生成ルール中の「３１１」の部分に「ピ」、「３１２」の部分に「イ」、「３２１」の部分に「エ」、および「３２２」の部分に「イ」を当てはめ、図２の愛称候補２０１（「ピイエイ」）を生成する。 For example, the person name 301 (“P field Aya”) and the katakana notation 302 (“Pita Aiya”) in FIG. 3 are input as formal names, and the nickname generation rule “311 312 321 322” at the top of FIG. On the other hand, consider a case where the input name is applied. In this case, since all the symbols in the nickname generation rule have the number of the 100's place, the candidate generation unit 102 generates a nickname candidate by applying the katakana character. Specifically, the candidate generation unit 102 includes “pi” in the “311” portion, “i” in the “312” portion, and “e” in the “321” portion in the nickname generation rule at the top of FIG. ”And“ 322 ”are applied to“ i ”to generate a nickname candidate 201 (“ PIA ”) in FIG. 2.

出力部１０３は、候補生成部１０２が生成した１つまたは複数の愛称候補を出力する。適合する愛称生成ルールが存在せず、愛称候補が生成されなかった場合は、出力部１０３は、該当候補なしという結果を出力してもよい。 The output unit 103 outputs one or more nickname candidates generated by the candidate generation unit 102. If no matching nickname generation rule exists and no nickname candidate is generated, the output unit 103 may output a result indicating that there is no corresponding candidate.

次に、このように構成された第１の実施の形態にかかる愛称推定装置１００による愛称推定処理について図７を用いて説明する。図７は、第１の実施の形態における愛称推定処理の全体の流れを示すフローチャートである。 Next, the nickname estimation process by the nickname estimation apparatus 100 according to the first embodiment configured as described above will be described with reference to FIG. FIG. 7 is a flowchart showing the overall flow of the nickname estimation process in the first embodiment.

まず、名称入力部１０１が、正式名称、ひらがな表記、およびカタカナ表記を含む名称データを入力する（ステップＳ７０１）。次に、候補生成部１０２が、名称データ内の正式名称、ひらがな表記、およびカタカナ表記をそれぞれ記号化する（ステップＳ７０２）。 First, the name input unit 101 inputs name data including an official name, hiragana notation, and katakana notation (step S701). Next, the candidate generation unit 102 symbolizes the formal name, hiragana notation, and katakana notation in the name data (step S702).

次に、候補生成部１０２は、ルール記憶部１２１から愛称生成ルールを取得する（ステップＳ７０３）。そして、候補生成部１０２は、記号化した名称を愛称生成ルールに適用して愛称候補を生成する（ステップＳ７０４）。具体的には、候補生成部１０２は、取得した愛称生成ルールそれぞれについて、愛称生成ルールに含まれる記号のうち、ステップＳ７０２で名称から変換した記号と同一の記号を、変換した記号の変換元の文字に置き換えた愛称候補を生成する。最後に、出力部１０３が、生成された愛称候補を出力し（ステップＳ７０５）、愛称推定処理を終了する。 Next, the candidate generation unit 102 acquires a nickname generation rule from the rule storage unit 121 (step S703). Then, the candidate generation unit 102 generates a nickname candidate by applying the symbolized name to the nickname generation rule (step S704). Specifically, for each acquired nickname generation rule, candidate generation unit 102 converts the same symbol as the symbol converted from the name in step S702 among the symbols included in the nickname generation rule, as the conversion source of the converted symbol. Generate nickname candidates replaced with letters. Finally, the output unit 103 outputs the generated nickname candidate (step S705) and ends the nickname estimation process.

このように、第１の実施の形態にかかる愛称推定装置では、予め定められた愛称生成ルールに従って、入力された名称に対する愛称候補を生成できるため、人名等の名称から愛称をより適切に取得することができる。 As described above, in the nickname estimation device according to the first embodiment, a nickname candidate for an input name can be generated according to a predetermined nickname generation rule, and thus a nickname is more appropriately acquired from a name such as a person name. be able to.

また、生成した愛称候補によって音声認識辞書や愛称辞書を更新できるため、人手による辞書作成コストを削減することができる。さらに、情報検索の際に、正式名称と、生成された愛称候補との対応を用いることにより、いずれか一方が入力された場合であっても、愛称と正式名称の両方で検索を行うことが可能になる。 Moreover, since the speech recognition dictionary and the nickname dictionary can be updated with the generated nickname candidates, it is possible to reduce manual dictionary creation costs. Furthermore, in the information search, by using the correspondence between the official name and the generated nickname candidate, even if either one is input, it is possible to perform a search using both the nickname and the official name. It becomes possible.

一方、音声対話の際には、愛称推定結果を元に音声認識辞書に愛称を追加しておけば、ユーザが愛称で人名を呼称した場合であっても、愛称を正しく音声認識することが可能になる。また、人名が愛称で呼称された場合であっても、愛称を正式名称に変換することにより、呼称の対象を正しく理解することが可能になる。 On the other hand, if a nickname is added to the speech recognition dictionary based on the nickname estimation result during voice conversation, the nickname can be recognized correctly even if the user names the nickname. become. Further, even when a person's name is nicknamed, it is possible to correctly understand the name of the name by converting the nickname to an official name.

（第２の実施の形態）
第２の実施の形態にかかる愛称推定装置は、Ｗｅｂなどから取得された文書を対象として、生成された愛称候補を検索し、当該文書に含まれる愛称候補を選択することにより、さらに適切な愛称候補を取得する。 (Second Embodiment)
The nickname estimation device according to the second embodiment searches for a generated nickname candidate for a document acquired from the Web or the like, and selects a nickname candidate included in the document, thereby further appropriately nickname. Get candidates.

図８は、第２の実施の形態にかかる愛称推定装置８００の構成を示すブロック図である。図８に示すように、愛称推定装置８００は、ルール記憶部１２１と、文書記憶部８２２と、名称入力部１０１と、候補生成部１０２と、出力部１０３と、選択部８０４と、を備えている。 FIG. 8 is a block diagram showing a configuration of a nickname estimation apparatus 800 according to the second embodiment. As illustrated in FIG. 8, the nickname estimation apparatus 800 includes a rule storage unit 121, a document storage unit 822, a name input unit 101, a candidate generation unit 102, an output unit 103, and a selection unit 804. Yes.

第２の実施の形態では、文書記憶部８２２と選択部８０４とを追加したことが第１の実施の形態と異なっている。その他の構成および機能は、第１の実施の形態にかかる愛称推定装置１００の構成を表すブロック図である図１と同様であるので、同一符号を付し、ここでの説明は省略する。 The second embodiment is different from the first embodiment in that a document storage unit 822 and a selection unit 804 are added. Other configurations and functions are the same as those in FIG. 1 which is a block diagram showing the configuration of the nickname estimation apparatus 100 according to the first embodiment, and thus the same reference numerals are given and description thereof is omitted here.

文書記憶部８２２は、Ｗｅｂページなどの所定の文書群を検索対象とし、正式名称を検索キーワードとして実行された検索処理の処理結果である文書を記憶する。文書記憶部８２２は、例えば検索キーワードの個数等に応じて順位付けされた検索結果の文書のうち、上位の所定数の文書を記憶する。 The document storage unit 822 stores a document that is a processing result of a search process that is executed using a predetermined document group such as a Web page as a search target and an official name as a search keyword. The document storage unit 822 stores a predetermined number of higher-order documents among the search result documents ranked according to the number of search keywords, for example.

選択部８０４は、候補生成部１０２が生成した愛称候補から、より適切な愛称候補を選択する。具体的には、選択部８０４は、まず生成された愛称候補それぞれについて、文書記憶部８２２に記憶した文書に愛称候補が含まれているか否かを判定する。そして、選択部８０４は、文書記憶部８２２の文書中に含まれる愛称候補に対して、「（愛称候補）こと（正式名称）」という連語を検索キーワードとし、Ｗｅｂページを検索対象として検索を実行する。そして、選択部８０４は、検索されたＷｅｂページが多い順に愛称候補を順位付けし、上位の所定数の愛称候補を選択する。 The selection unit 804 selects a more appropriate nickname candidate from the nickname candidates generated by the candidate generation unit 102. Specifically, the selection unit 804 first determines, for each generated nickname candidate, whether or not a nickname candidate is included in the document stored in the document storage unit 822. Then, the selection unit 804 executes a search for a nickname candidate included in the document in the document storage unit 822 using a collocation “(nickname candidate) thing (formal name)” as a search keyword and a Web page as a search target. To do. Then, the selection unit 804 ranks nickname candidates in descending order of the searched Web pages, and selects a predetermined number of nickname candidates at the top.

次に、このように構成された第２の実施の形態にかかる愛称推定装置８００による愛称推定処理について図９を用いて説明する。図９は、第２の実施の形態における愛称推定処理の全体の流れを示すフローチャートである。 Next, a nickname estimation process performed by the nickname estimation apparatus 800 according to the second embodiment configured as described above will be described with reference to FIG. FIG. 9 is a flowchart showing the overall flow of the nickname estimation process in the second embodiment.

ステップＳ９０１からステップＳ９０４までの、名称入力処理、記号化処理、ルール取得処理、および候補生成処理は、第１の実施の形態にかかる愛称推定装置１００におけるステップＳ７０１からステップＳ７０４までと同様の処理なので、その説明を省略する。 The name input process, symbolization process, rule acquisition process, and candidate generation process from step S901 to step S904 are the same processes as from step S701 to step S704 in the nickname estimation apparatus 100 according to the first embodiment. The description is omitted.

愛称候補が生成された後、選択部８０４は、正式名称を検索キーワードとしてＷｅｂページなどの所定の文書群を検索し、上位の所定数の検索結果である文書を文書記憶部８２２に保存する（ステップＳ９０５）。 After the nickname candidate is generated, the selection unit 804 searches a predetermined document group such as a Web page using the official name as a search keyword, and stores a document as a predetermined number of higher-order search results in the document storage unit 822 ( Step S905).

次に、選択部８０４は、保存した文書に各愛称候補が存在するか否かを判定し、保存した文書中に含まれる愛称候補のみを選択する（ステップＳ９０６）。さらに、選択部８０４は、選択した愛称候補それぞれについて、「（愛称候補）こと（正式名称）」という連語を検索キーワードとしてＷｅｂ上で検索を行う（ステップＳ９０７）。そして、選択部８０４は、ヒット件数順に愛称候補を並べ、上位Ｎ個の候補を選択する（ステップＳ９０８）。 Next, the selection unit 804 determines whether or not each nickname candidate exists in the saved document, and selects only the nickname candidate included in the saved document (step S906). Further, the selection unit 804 performs a search on the Web for each of the selected nickname candidates using the collocation “(nickname candidate) thing (formal name)” as a search keyword (step S907). Then, the selection unit 804 arranges nickname candidates in the order of the number of hits, and selects the top N candidates (step S908).

最後に、出力部１０３が、選択された愛称候補を出力し（ステップＳ９０９）、愛称推定処理を終了する。 Finally, the output unit 103 outputs the selected nickname candidate (step S909) and ends the nickname estimation process.

なお、選択部８０４が、出力部１０３によって出力された愛称候補から、ユーザが指定した愛称候補を選択するように構成してもよい。この場合、例えば、出力部１０３が、生成された愛称候補をディスプレイなどの表示装置に表示し、キーボードやマウスなどのインターフェースによって、ユーザが表示された愛称候補から所望の愛称候補を指定可能とする。また、例えば、出力部１０３が、生成された愛称候補のテキストデータを音声信号に合成してスピーカから出力し、マイク等のインターフェースによってユーザが音声によって所望の愛称候補を指定できるように構成してもよい。 Note that the selection unit 804 may be configured to select a nickname candidate designated by the user from the nickname candidates output by the output unit 103. In this case, for example, the output unit 103 displays the generated nickname candidate on a display device such as a display, and allows the user to specify a desired nickname candidate from the displayed nickname candidates using an interface such as a keyboard or a mouse. . Further, for example, the output unit 103 is configured to synthesize the generated nickname candidate text data with a voice signal and output it from a speaker, and the user can specify a desired nickname candidate by voice through an interface such as a microphone. Also good.

ユーザに呼びかけを行うロボットなどの対話装置に本実施の形態を適用する場合、名称入力部１０１でユーザ等の名前を入力し、出力部１０３から出力された愛称を呼称表現として利用してもよい。この場合、対話装置は、例えば選択部８０４によって任意に愛称候補を選択し、選択した愛称で呼称して良いか否かを、ユーザに対して音声で確認するように構成してもよい。 When this embodiment is applied to an interactive device such as a robot that calls a user, a name of the user or the like may be input by the name input unit 101 and the nickname output from the output unit 103 may be used as a name expression. . In this case, the interactive apparatus may be configured to arbitrarily select a nickname candidate by using the selection unit 804 and to confirm with a user whether or not the nickname may be called with the selected nickname.

このように、第２の実施の形態にかかる愛称推定装置では、Ｗｅｂなどの所定の文書群から取得された文書を対象として、生成された愛称候補を検索し、当該文書に含まれる愛称候補を選択することによって、より適切な愛称候補を取得することができる。 As described above, in the nickname estimation apparatus according to the second embodiment, a generated nickname candidate is searched for a document acquired from a predetermined document group such as the Web, and the nickname candidate included in the document is determined. By selecting, a more appropriate nickname candidate can be acquired.

（第３の実施の形態）
第３の実施の形態にかかる愛称推定装置は、名称と、当該名称について事前に分かっている愛称との対を入力して愛称生成ルールを学習する。 (Third embodiment)
The nickname estimation apparatus according to the third embodiment learns a nickname generation rule by inputting a pair of a name and a nickname known in advance for the name.

図１０は、第３の実施の形態にかかる愛称推定装置１０００の構成を示すブロック図である。図１０に示すように、愛称推定装置１０００は、ルール記憶部１２１と、名称入力部１０１と、候補生成部１０２と、出力部１０３と、学習データ入力部１００５と、学習部１００６と、を備えている。 FIG. 10 is a block diagram illustrating a configuration of a nickname estimation apparatus 1000 according to the third embodiment. As illustrated in FIG. 10, the nickname estimation apparatus 1000 includes a rule storage unit 121, a name input unit 101, a candidate generation unit 102, an output unit 103, a learning data input unit 1005, and a learning unit 1006. ing.

第３の実施の形態では、学習データ入力部１００５および学習部１００６を追加したことが第１の実施の形態と異なっている。その他の構成および機能は、第１の実施の形態にかかる愛称推定装置１００の構成を表すブロック図である図１と同様であるので、同一符号を付し、ここでの説明は省略する。 The third embodiment is different from the first embodiment in that a learning data input unit 1005 and a learning unit 1006 are added. Other configurations and functions are the same as those in FIG. 1 which is a block diagram showing the configuration of the nickname estimation apparatus 100 according to the first embodiment, and thus the same reference numerals are given and description thereof is omitted here.

学習データ入力部１００５は、名称と、その名称について既に分かっている愛称とを対応づけた学習データを入力する。学習データ入力部１００５は、姓名に分割された正式名称、正式名称のひらがな表記、および正式名称のカタカナ表記などの異なる表記と、愛称との対を学習データとして入力する。なお、姓名に分割前の名称とその愛称との対を与えるように構成してもよい。 The learning data input unit 1005 inputs learning data in which a name is associated with a nickname already known for the name. The learning data input unit 1005 inputs pairs of different notations such as formal names divided into first and last names, hiragana representations of official names, and katakana representations of official names, and nicknames as learning data. Note that a pair of a name before division and its nickname may be given to the first and last names.

学習部１００６は、入力された学習データから新たな愛称生成ルールを生成し、生成した愛称生成ルールをルール記憶部１２１に保存する。具体的には、学習部１００６は、まず、入力された学習データの名称と愛称との双方に共通して含まれる共通文字を取得する。そして、学習部１００６は、得られた共通文字を候補生成部１０２と同様の方法により記号化する。さらに、学習部１００６は、学習データの愛称のうち共通文字を図３に示すような名称データの記号に置き換えることにより愛称生成ルールを作成する。 The learning unit 1006 generates a new nickname generation rule from the input learning data, and stores the generated nickname generation rule in the rule storage unit 121. Specifically, the learning unit 1006 first acquires a common character that is included in both the name and nickname of the input learning data. Then, the learning unit 1006 symbolizes the obtained common character by the same method as the candidate generation unit 102. Further, the learning unit 1006 creates a nickname generation rule by replacing common characters in nicknames of learning data with symbols of name data as shown in FIG.

次に、このように構成された第３の実施の形態にかかる愛称推定装置１０００による学習処理について図１１および図１２を用いて説明する。図１１は、第３の実施の形態における学習処理の全体の流れを示すフローチャートである。また、図１２は、生成された愛称生成ルールの一例を示す図である。 Next, learning processing by the nickname estimation apparatus 1000 according to the third embodiment configured as described above will be described with reference to FIGS. 11 and 12. FIG. 11 is a flowchart illustrating the entire flow of the learning process according to the third embodiment. FIG. 12 is a diagram illustrating an example of the generated nickname generation rule.

なお、学習された愛称生成ルール、または事前に記憶された愛称生成ルールによる愛称推定処理は、第１の実施の形態の愛称推定処理を表す図７と同様であるためその説明を省略する。 Note that the nickname estimation process based on the learned nickname generation rule or the nickname generation rule stored in advance is the same as that in FIG. 7 representing the nickname estimation process of the first embodiment, and a description thereof will be omitted.

まず、学習データ入力部１００５は、名称と愛称との対である学習データを入力する（ステップＳ１１０１）。次に、学習部１００６は、名称と愛称とに共通する共通文字を取得する（ステップＳ１１０２）。次に、学習部１００６は、取得した共通文字を記号化する（ステップＳ１１０３）。 First, the learning data input unit 1005 inputs learning data that is a pair of a name and a nickname (step S1101). Next, the learning unit 1006 acquires a common character common to the name and the nickname (step S1102). Next, the learning unit 1006 symbolizes the acquired common character (step S1103).

例えば、姓名に分割された人名である図１２の正式名称１２０１（「Ｈ田Ｉ美」）と、愛称１２０２（「Ｉちゃん」）とを対応づけた学習データが入力された場合、学習部１００６は、「Ｉ」を共通文字として取得する。「Ｉ」は、正式名称の名の最初の文字であるため、学習部１００６は、この共通文字を「１２１」に記号化する。 For example, when learning data in which the official name 1201 (“Imi Hada”) in FIG. 12, which is a name divided into first and last names, is associated with the nickname 1202 (“I-chan”) is input, the learning unit 1006 Obtains “I” as a common character. Since “I” is the first character of the name of the official name, the learning unit 1006 symbolizes this common character as “121”.

次に、学習部１００６は、共通文字の記号と、愛称内の共通文字以外の文字とを結合して愛称生成ルールを作成する（ステップＳ１１０４）。上記例では、記号化した「１２１」と、愛称内の共通文字「Ｉ」以外の文字とを結合した愛称生成ルール１２０３（「１２１ちゃん」）が作成される。 Next, the learning unit 1006 creates a nickname generation rule by combining the symbol of the common character and a character other than the common character in the nickname (step S1104). In the above example, the nickname generation rule 1203 (“121-chan”) is created by combining the symbolized “121” and characters other than the common character “I” in the nickname.

図１２では、人名「Ｐ野Ｓ也」のひらがな表記１２１１（「ぴいのえすや」）と、その愛称１２１２（「ぴいちゃん」）との対である学習データが入力された場合に生成される愛称生成ルール１２１３（「２１１２１２ちゃん」）の例も示されている。 In FIG. 12, nickname generation generated when learning data that is a pair of the hiragana notation 1211 (“Pii no Esya”) of the person name “P No Saya” and its nickname 1212 (“Piichan”) is input. An example of rule 1213 (“211 212-chan”) is also shown.

なお、学習部１００６は、さらに、愛称生成ルールを相互に比較することにより、新たな愛称生成ルールを生成する。具体的には、学習部１００６は、まず、ある愛称生成ルール（ルール１とする）に対して記号部分のみが置き換えられた他の愛称生成ルール（ルール２とする）をルール記憶部１２１から検索する。そして、学習部１００６は、このようなルール２が存在する場合、ルール１と記号部分が共通する他の愛称生成ルール（ルール３とする）をさらに検索し、ルール３の記号部分をルール２の記号部分に置換した新たな愛称生成ルールを生成する。 Note that the learning unit 1006 further generates a new nickname generation rule by comparing the nickname generation rules with each other. Specifically, the learning unit 1006 first searches the rule storage unit 121 for another nickname generation rule (referred to as rule 2) in which only a symbol part is replaced with a certain nickname generation rule (referred to as rule 1). To do. When such a rule 2 exists, the learning unit 1006 further searches for another nickname generation rule (referred to as rule 3) having the same symbol part as that of rule 1, and determines the symbol part of rule 3 as rule 2. A new nickname generation rule replaced with the symbol part is generated.

例えば、図１２に示すように、愛称生成ルール１２０３（「１２１ちゃん」）、愛称生成ルール１２１３（「２１１２１２ちゃん」）、および愛称生成ルール１２２３（「１２１やん」）という３つの愛称生成ルールが、学習データから直接学習できたとする。このうち、愛称生成ルール１２０３（「１２１ちゃん」）および愛称生成ルール１２１３（「２１１２１２ちゃん」）は、記号である「１２１」と「２１１２１２」の部分のみが異なる。また、愛称生成ルール１２０３（「１２１ちゃん」）に対しては、同一の記号「１２１」を含む別の愛称生成ルール１２２３（「１２１やん」）が存在する。このため、学習部１００６は、愛称生成ルール１２２３（「１２１やん」）の記号「１２１」の部分を、「２１１２１２」に置き換えた愛称生成ルール（「２１１２１２やん」）を新たに生成することができる。 For example, as shown in FIG. 12, there are three nickname generation rules: a nickname generation rule 1203 (“121-chan”), a nickname generation rule 1213 (“211 212-chan”), and a nickname generation rule 1223 (“121 Yan”). Suppose you can learn directly from the learning data. Among these, the nickname generation rule 1203 (“121-chan”) and the nickname generation rule 1213 (“211 212-chan”) differ only in the portions “121” and “211 212” that are symbols. For the nickname generation rule 1203 (“121-chan”), there is another nickname generation rule 1223 (“121 Yan”) that includes the same symbol “121”. Therefore, the learning unit 1006 newly generates a nickname generation rule (“211 212 yan”) in which the part of the symbol “121” of the nickname generation rule 1223 (“121 yan”) is replaced with “211 212”. Can do.

このように、学習データ中に直接学習することができるデータが存在しなかったとしても、既に学習された愛称生成ルールから、他の愛称生成ルール（「２１１２１２やん」）を類推学習することができる。 As described above, even if there is no data that can be directly learned in the learning data, another nickname generation rule (“211 212 Yan”) can be analogically learned from the nickname generation rule that has already been learned. it can.

図１１に戻り、学習部１００６は、生成した愛称生成ルールをルール記憶部１２１に保存し（ステップＳ１１０５）、学習処理を終了する。 Returning to FIG. 11, the learning unit 1006 stores the generated nickname generation rule in the rule storage unit 121 (step S1105), and ends the learning process.

図１３は、学習された愛称生成ルールの使用例を示す図である。図１３は、正式名称１３０１（「Ｐ田Ｙ子」）と、正式名称の読みであるひらがな表記１３０２（「ぴいたわいこ」）と、カタカナ表記１３０３（「ピイタワイコ」）とが入力された例を示している。この場合、直接学習された３つの愛称生成ルール（「１２１ちゃん」、「２１１２１２ちゃん」、および「１２１やん」）からは、それぞれ愛称候補１３１１、１３１２、１３１３（「Ｙちゃん」、「ぴいちゃん」、「Ｙやん」）が生成される。さらに、直接学習された愛称生成ルールから類推学習された愛称生成ルール（「２１１２１２やん」）からは、愛称候補１３１４（「ぴいやん」）をさらに生成することができる。 FIG. 13 is a diagram illustrating a usage example of the learned nickname generation rule. FIG. 13 shows an example in which an official name 1301 (“P field Y child”), a hiragana notation 1302 (“Pita Waiko”) and a katakana notation 1303 (“Pita Waiko”) are input. Is shown. In this case, nickname candidates 1311, 1312, and 1313 ("Y-chan" and "Pii-chan") are obtained from the three nickname generation rules ("121-chan", "211 212-chan", and "121-yan") that are directly learned. , “Y Yan”) is generated. Furthermore, a nickname candidate 1314 (“Piiyan”) can be further generated from the nickname generation rule (“211 212 Yan”) learned by analogy from the directly learned nickname generation rule.

このように、第３の実施の形態にかかる愛称推定装置では、名称と愛称とを対応づけた学習データを用いて愛称生成ルールを学習することができる。 Thus, in the nickname estimation apparatus according to the third embodiment, a nickname generation rule can be learned using learning data in which a name and a nickname are associated with each other.

上述のように、略称は、元の名称に関連した文字列のみで構成されるため、特許文献２のように略称を生成するための略称生成ルールを書き下すことは比較的容易である。一方、愛称は、名称とは無関係な文字が挿入され、表現の多様性が多い。このため、規則を書き下すことが難しい場合もある。本実施の形態の方法により愛称生成ルールを学習可能とすれば、このような問題を解消できる。 As described above, since the abbreviation is composed only of the character string related to the original name, it is relatively easy to write down the abbreviation generation rule for generating the abbreviation as in Patent Document 2. On the other hand, the nickname has a variety of expressions because characters unrelated to the name are inserted. For this reason, it may be difficult to write down the rules. If the nickname generation rule can be learned by the method of the present embodiment, such a problem can be solved.

（第４の実施の形態）
第４の実施の形態にかかる愛称推定装置は、愛称生成ルールによる愛称候補の生成に加えて、Ｗｅｂなどの外部データから愛称候補を抽出し、生成および抽出した愛称候補から適切な愛称候補を選択する。 (Fourth embodiment)
The nickname estimation apparatus according to the fourth embodiment extracts a nickname candidate from external data such as the Web in addition to generation of a nickname candidate based on a nickname generation rule, and selects an appropriate nickname candidate from the generated and extracted nickname candidates To do.

図１４は、第４の実施の形態にかかる愛称推定装置１４００の構成を示すブロック図である。図１４に示すように、愛称推定装置１４００は、ルール記憶部１２１と、文書記憶部８２２と、名称入力部１０１と、候補生成部１０２と、出力部１０３と、選択部１４０４と、候補抽出部１４０７と、を備えている。 FIG. 14 is a block diagram showing a configuration of a nickname estimation apparatus 1400 according to the fourth embodiment. As illustrated in FIG. 14, the nickname estimation device 1400 includes a rule storage unit 121, a document storage unit 822, a name input unit 101, a candidate generation unit 102, an output unit 103, a selection unit 1404, and a candidate extraction unit. 1407.

第４の実施の形態では、候補抽出部１４０７を追加したこと、および選択部１４０４の機能が第２の実施の形態と異なっている。その他の構成および機能は、第２の実施の形態にかかる愛称推定装置８００の構成を表すブロック図である図８と同様であるので、同一符号を付し、ここでの説明は省略する。 In the fourth embodiment, the candidate extraction unit 1407 is added and the function of the selection unit 1404 is different from that of the second embodiment. Other configurations and functions are the same as those in FIG. 8, which is a block diagram showing the configuration of the nickname estimation apparatus 800 according to the second embodiment, and thus are denoted by the same reference numerals and description thereof is omitted here.

候補抽出部１４０７は、Ｗｅｂ上のデータなどの外部データから愛称候補となる文字列を抽出する。候補抽出部１４０７は、例えば「（愛称）こと（正式名称）」のような典型表現を利用して外部データから愛称候補を含む文字列を検索する。具体的には、候補抽出部１４０７は、ある名称について、「こと（正式名称）」を検索キーワードとして外部データを検索する。そして、候補抽出部１４０７は、得られた文書から「こと（正式名称）」の前の所定数の文字からなる文字列を取得し、取得した文字列から愛称候補を抽出する。文字列の取得方法および取得した文字列から愛称候補を抽出する方法についての詳細は後述する。 The candidate extraction unit 1407 extracts a character string that becomes a nickname candidate from external data such as data on the Web. The candidate extraction unit 1407 uses a typical expression such as “(nickname) thing (official name)” to search for a character string including a nickname candidate from external data. Specifically, the candidate extraction unit 1407 searches external data for a certain name using “ko (formal name)” as a search keyword. Then, the candidate extraction unit 1407 acquires a character string including a predetermined number of characters before “Koto (official name)” from the obtained document, and extracts a nickname candidate from the acquired character string. Details of the method for acquiring the character string and the method for extracting the nickname candidate from the acquired character string will be described later.

選択部１４０４は、候補生成部１０２が生成した愛称候補に加えて、候補抽出部１４０７が抽出した愛称候補を対象として、愛称候補の選択処理を実行する。また、選択部１４０４は、生成された愛称候補それぞれについて、文書記憶部８２２に記憶した文書中の正式名称の前後の所定文字数内に愛称候補が含まれているか否かを判定する。そして、選択部１４０４は、文書中の正式名称の前後の所定文字数内に含まれる愛称候補に対して、「（愛称候補）こと（正式名称）」という連語を検索キーワードとし、Ｗｅｂページを検索対象として検索を実行する。そして、選択部１４０４は、検索されたＷｅｂページが多い順に愛称候補を順位付けし、上位の所定数の愛称候補を選択する。 The selection unit 1404 performs nickname candidate selection processing on the nickname candidates extracted by the candidate extraction unit 1407 in addition to the nickname candidates generated by the candidate generation unit 102. Further, the selection unit 1404 determines whether or not a nickname candidate is included in a predetermined number of characters before and after the official name in the document stored in the document storage unit 822 for each generated nickname candidate. Then, the selection unit 1404 searches the Web page for a nickname candidate included within a predetermined number of characters before and after the official name in the document using the collocation “(nickname candidate) (formal name)” as a search keyword. Perform the search as Then, the selection unit 1404 ranks nickname candidates in descending order of the searched Web pages, and selects a predetermined number of nickname candidates at the top.

次に、このように構成された第４の実施の形態にかかる愛称推定装置１４００による愛称推定処理について図１５を用いて説明する。図１５は、第４の実施の形態における愛称推定処理の全体の流れを示すフローチャートである。 Next, a nickname estimation process performed by the nickname estimation apparatus 1400 according to the fourth embodiment configured as described above will be described with reference to FIG. FIG. 15 is a flowchart showing the overall flow of the nickname estimation process in the fourth embodiment.

ステップＳ１５０１からステップＳ１５０４までの、名称入力処理、記号化処理、ルール取得処理、および候補生成処理は、第２の実施の形態にかかる愛称推定装置８００におけるステップＳ９０１からステップＳ９０４までと同様の処理なので、その説明を省略する。 The name input process, symbolization process, rule acquisition process, and candidate generation process from step S1501 to step S1504 are the same processes as from step S901 to step S904 in the nickname estimation apparatus 800 according to the second embodiment. The description is omitted.

愛称候補が生成された後、選択部１４０４は、正式名称を検索キーワードとしてＷｅｂページなどの所定の文書群を検索し、上位の所定数の検索結果である文書（以下、文書ｐａｇｅという）を文書記憶部８２２に保存する（ステップＳ１５０５）。 After the nickname candidate is generated, the selection unit 1404 searches a predetermined document group such as a Web page using the official name as a search keyword, and a document (hereinafter referred to as a document page) that is a predetermined upper number of search results is a document. It preserve | saves at the memory | storage part 822 (step S1505).

次に、選択部１４０４は、保存した文書ｐａｇｅから、正式名称の前後ｓ文字（ｓは１以上の整数）の文字列を取得する（ステップＳ１５０６）。さらに、選択部１４０４は、取得した文字列中に出現する愛称候補のみを選択する（ステップＳ１５０７）。 Next, the selection unit 1404 obtains a character string of s characters before and after the official name (s is an integer of 1 or more) from the saved document page (step S1506). Furthermore, the selection unit 1404 selects only nickname candidates that appear in the acquired character string (step S1507).

次に、候補抽出部１４０７が、文書ｐａｇｅから愛称候補を抽出する愛称候補抽出処理を実行する（ステップＳ１５０８）。愛称候補抽出処理の詳細については後述する。 Next, the candidate extraction unit 1407 executes nickname candidate extraction processing for extracting nickname candidates from the document page (step S1508). Details of the nickname candidate extraction process will be described later.

次に、選択部１４０４は、ステップＳ１５０７で選択された愛称候補およびステップＳ１５０８で抽出された愛称候補のそれぞれについて、「（愛称候補）こと（正式名称）」という連語を検索キーワードとしてＷｅｂ上で検索を行う（ステップＳ１５０９）。そして、選択部１４０４は、ヒット件数順に愛称候補を並べ、上位Ｎ個の候補を選択する（ステップＳ１５１０）。 Next, the selection unit 1404 searches the Web for the nickname candidate selected in step S1507 and the nickname candidate extracted in step S1508 using the collocation “(nickname candidate) thing (official name)” as a search keyword. Is performed (step S1509). Then, the selection unit 1404 arranges nickname candidates in the order of the number of hits, and selects the top N candidates (step S1510).

最後に、出力部１０３が、選択された愛称候補を出力し（ステップＳ１５１１）、愛称推定処理を終了する。 Finally, the output unit 103 outputs the selected nickname candidate (step S1511) and ends the nickname estimation process.

次に、ステップＳ１５０８の愛称候補抽出処理の詳細について図１６を用いて説明する。図１６は、第４の実施の形態における愛称候補抽出処理の全体の流れを示すフローチャートである。 Next, details of the nickname candidate extraction process in step S1508 will be described with reference to FIG. FIG. 16 is a flowchart showing the overall flow of the nickname candidate extraction process in the fourth embodiment.

まず、候補抽出部１４０７は、「こと（正式名称）」を検索キーワードとしてＷｅｂ検索を実行し、上位Ｎ件の文書を取得する（ステップＳ１６０１）。次に、候補抽出部１４０７は、取得した文書から、検索キーワードである「こと（正式名称）」の前のｔ文字（ｔは１以上の整数））の文字列ｓｔｒを取得する（ステップＳ１６０２）。 First, the candidate extraction unit 1407 performs a Web search using “ko (formal name)” as a search keyword, and acquires the top N documents (step S1601). Next, the candidate extraction unit 1407 acquires, from the acquired document, a character string str of t characters (t is an integer of 1 or more) before the search keyword “ko (formal name)” (step S1602). .

次に、候補抽出部１４０７は、取得した文字列ｓｔｒのそれぞれについて、文字列ｓｔｒの任意の位置の文字から文字列ｓｔｒの最後の文字までの範囲の文字列である接尾辞を生成する（ステップＳ１６０３）。 Next, the candidate extraction unit 1407 generates, for each of the acquired character strings str, a suffix that is a character string in a range from a character at an arbitrary position of the character string str to the last character of the character string str (Step S1). S1603).

図１７は、接尾辞の一例を示す図である。図１７は、文字列１７０１（「今日は良い天気だ」）から作成される８つの接尾辞１７１１〜１７１８（「だ」、「気だ」、「天気だ」、「い天気だ」、「良い天気だ」、「は良い天気だ」、「日は良い天気だ」、「今日は良い天気だ」）を示している。 FIG. 17 is a diagram illustrating an example of a suffix. FIG. 17 shows eight suffixes 1711 to 1718 created from the character string 1701 (“Today is a good weather”) (“Da”, “Dai”, “Weather”, “I have a good weather”, “Good” Weather "," is good weather "," day is good weather "," today is good weather ").

図１６に戻り、候補抽出部１４０７は、各接尾辞について、保存した文書ｐａｇｅおよび文字列ｓｔｒを対象から、接尾辞の前の１文字を取得し、取得した文字の種類数を取得する（ステップＳ１６０４）。 Returning to FIG. 16, the candidate extraction unit 1407 acquires, for each suffix, one character before the suffix from the stored document page and the character string str, and acquires the number of types of the acquired characters (Step). S1604).

図１８は、種類数の取得方法の一例を示す模式図である。図１８は、図１７の接尾辞に対して、「今日は良い天気だ」、「明日は良い天気だ」、および「気持ちの良い天気だ」という３つの文字列のみが、文書ｐａｇｅおよび文字列ｓｔｒ内に存在するときの、各接尾辞の前の文字の種類数を表している。なお、図１８内の数値が、各接尾辞の直前の文字の種類数を表している。また、種類数を数える際には、文書ｐａｇｅ、文字列ｓｔｒ、および接尾辞中のカタカナ表記はひらがな表記に置き換えて数える。 FIG. 18 is a schematic diagram illustrating an example of a method for acquiring the number of types. FIG. 18 shows only the three character strings “Today is good weather”, “Tomorrow is good weather”, and “Pleasant weather” for the suffix of FIG. It represents the number of types of characters before each suffix when present in str. In addition, the numerical value in FIG. 18 represents the number of types of characters immediately before each suffix. Further, when counting the number of types, the katakana notation in the document page, the character string str, and the suffix is replaced with the hiragana notation.

この例では、例えば図１７の接尾辞１７１５（「良い天気だ」）に対しては、直前の文字として文字１８０１（「は」）および文字１８０２（「の」）が取得される。したがって、接尾辞１７１５に対する種類数は２となる。 In this example, for the suffix 1715 in FIG. 17 (“good weather”), for example, a character 1801 (“ha”) and a character 1802 (“no”) are acquired as the immediately preceding characters. Therefore, the number of types for the suffix 1715 is two.

図１６に戻り、候補抽出部１４０７は、各接尾辞のうち未処理の接尾辞を取得する（ステップＳ１６０５）。そして、候補抽出部１４０７は、取得した接尾辞に対してステップＳ１６０４で取得された種類数が１より大きく、かつ、接尾辞の先頭１文字を削除した接尾辞に対してステップＳ１６０４で取得された種類数が１であるか否かを判断する（ステップＳ１６０６）。 Returning to FIG. 16, the candidate extraction unit 1407 acquires an unprocessed suffix from the suffixes (step S1605). Then, the candidate extraction unit 1407 acquires the suffix obtained in step S1604 for the suffix obtained in step S1604 and the suffix obtained by deleting the first character of the suffix in step S1604. It is determined whether the number of types is 1 (step S1606).

接尾辞に対して取得された種類数が１より大きく、かつ、接尾辞の先頭１文字を削除した接尾辞に対して取得された種類数が１である場合（ステップＳ１６０６：ＹＥＳ）、候補抽出部１４０７は、取得した接尾辞を愛称候補として抽出する（ステップＳ１６０７）。 If the number of types acquired for the suffix is greater than 1 and the number of types acquired for the suffix from which the first character of the suffix is deleted is 1 (step S1606: YES), candidate extraction The unit 1407 extracts the acquired suffix as a nickname candidate (step S1607).

接尾辞に対して取得された種類数が１より大きく、かつ、接尾辞の先頭１文字を削除した接尾辞に対して取得された種類数が１でない場合（ステップＳ１６０６：ＮＯ）、候補抽出部１４０７は、さらに、接尾辞に対して取得された種類数が１であり、かつ、接尾辞の先頭１文字を削除した接尾辞に対して取得された種類数が１より大きいか否かを判断する（ステップＳ１６０８）。 If the number of types acquired for the suffix is greater than 1 and the number of types acquired for the suffix from which the first character of the suffix is deleted is not 1 (step S1606: NO), the candidate extraction unit 1407 further determines whether the number of types acquired for the suffix is 1 and the number of types acquired for the suffix from which the first character of the suffix is deleted is greater than 1 (Step S1608).

接尾辞に対して取得された種類数が１であり、かつ、接尾辞の先頭１文字を削除した接尾辞に対して取得された種類数が１より大きい場合（ステップＳ１６０８：ＹＥＳ）、候補抽出部１４０７は、取得した接尾辞から先頭１文字を削除した接尾辞を、愛称候補として抽出する（ステップＳ１６０９）。 If the number of types acquired for the suffix is 1 and the number of types acquired for the suffix from which the first character of the suffix is deleted is greater than 1 (step S1608: YES), candidate extraction The unit 1407 extracts a suffix obtained by deleting the first character from the acquired suffix as a nickname candidate (step S1609).

次に、候補抽出部１４０７は、すべての接尾辞を処理したか否かを判断する（ステップＳ１６１０）。すべての接尾辞を処理していない場合（ステップＳ１６１０：ＮＯ）、候補抽出部１４０７は、次の未処理の接尾辞を取得して処理を繰り返す（ステップＳ１６０５）。 Next, the candidate extraction unit 1407 determines whether all suffixes have been processed (step S1610). If all suffixes have not been processed (step S1610: NO), the candidate extraction unit 1407 acquires the next unprocessed suffix and repeats the processing (step S1605).

すべての接尾辞を処理した場合（ステップＳ１６１０：ＹＥＳ）、候補抽出部１４０７は、文字列ｓｔｒ中での頻度が一致する他の愛称候補の部分文字列である愛称候補を削除する（ステップＳ１６１１）。 When all the suffixes have been processed (step S1610: YES), the candidate extraction unit 1407 deletes nickname candidates that are partial character strings of other nickname candidates with the same frequency in the character string str (step S1611). .

このような処理により、「こと（正式名称）」を検索キーワードとして検索した文書から、適切な愛称候補を抽出することができる。例えば、非特許文献１のように形態素解析器を利用する場合と比較すると、正しい愛称が「Ｕーちゃん」である場合に、誤って「ーちゃん」が愛称として抽出される可能性を低減することができる。 By such processing, an appropriate nickname candidate can be extracted from a document searched for “ko (formal name)” as a search keyword. For example, as compared to the case of using a morphological analyzer as in Non-Patent Document 1, when the correct nickname is “U-chan”, the possibility that “-chan” is erroneously extracted as the nickname is reduced. be able to.

このように、第４の実施の形態にかかる愛称推定装置では、Ｗｅｂなどの外部データから愛称候補を抽出できるため、さらに適切な愛称候補を選択することができる。例えば、愛称生成ルールのみでは、名称内の文字を全く含まない愛称を推定することが困難であるが、外部データを参照することにより、このような愛称も抽出することが可能となる。 As described above, in the nickname estimation apparatus according to the fourth embodiment, nickname candidates can be extracted from external data such as the Web, so that more appropriate nickname candidates can be selected. For example, it is difficult to estimate a nickname that does not include any characters in the name by using only the nickname generation rule, but it is possible to extract such a nickname by referring to external data.

次に、第１〜第４の実施の形態にかかる愛称推定装置のハードウェア構成について図１９を用いて説明する。図１９は、第１〜第４の実施の形態にかかる愛称推定装置のハードウェア構成図である。 Next, the hardware configuration of the nickname estimation apparatus according to the first to fourth embodiments will be described with reference to FIG. FIG. 19 is a hardware configuration diagram of the nickname estimation apparatus according to the first to fourth embodiments.

第１〜第４の実施の形態にかかる愛称推定装置は、ＣＰＵ（Central Processing Unit）５１などの制御装置と、ＲＯＭ（Read Only Memory）５２やＲＡＭ５３などの記憶装置と、ネットワークに接続して通信を行う通信Ｉ／Ｆ５４と、ＨＤＤ、ＣＤ（Compact Disc）ドライブ装置などの外部記憶装置と、ディスプレイ装置などの表示装置と、キーボードやマウスなどの入力装置と、各部を接続するバス６１を備えており、通常のコンピュータを利用したハードウェア構成となっている。 The nickname estimation apparatus according to the first to fourth embodiments communicates with a control device such as a CPU (Central Processing Unit) 51 and a storage device such as a ROM (Read Only Memory) 52 and a RAM 53 connected to a network. A communication I / F 54, an external storage device such as an HDD and a CD (Compact Disc) drive device, a display device such as a display device, an input device such as a keyboard and a mouse, and a bus 61 that connects each unit. The hardware configuration uses a normal computer.

第１〜第４の実施の形態にかかる愛称推定装置で実行される愛称推定プログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ（Compact Disk Read Only Memory）、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ（Compact Disk Recordable）、ＤＶＤ（Digital Versatile Disk）等のコンピュータで読み取り可能な記録媒体に記録されて提供される。 The nickname estimation program executed by the nickname estimation apparatus according to the first to fourth embodiments is a file in an installable format or an executable format, and is a CD-ROM (Compact Disk Read Only Memory), a flexible disk (FD). ), A CD-R (Compact Disk Recordable), a DVD (Digital Versatile Disk), and the like.

また、第１〜第４の実施の形態にかかる愛称推定装置で実行される愛称推定プログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、第１〜第４の実施の形態にかかる愛称推定装置で実行される愛称推定プログラムをインターネット等のネットワーク経由で提供または配布するように構成してもよい。 Further, the nickname estimation program executed by the nickname estimation apparatus according to the first to fourth embodiments is stored on a computer connected to a network such as the Internet, and is provided by being downloaded via the network. It may be configured. Moreover, you may comprise so that the nickname estimation program performed with the nickname estimation apparatus concerning 1st-4th embodiment may be provided or distributed via networks, such as the internet.

また、第１〜第４の実施の形態の愛称推定プログラムを、ＲＯＭ等に予め組み込んで提供するように構成してもよい。 Moreover, you may comprise so that the nickname estimation program of 1st-4th embodiment may be previously incorporated in ROM etc. and provided.

第１〜第４の実施の形態にかかる愛称推定装置で実行される愛称推定プログラムは、上述した各部（名称入力部、候補生成部、出力部、選択部、学習データ入力部、学習部、候補抽出部）を含むモジュール構成となっており、実際のハードウェアとしてはＣＰＵ５１（プロセッサ）が上記記憶媒体から愛称推定プログラムを読み出して実行することにより上記各部が主記憶装置上にロードされ、上述した各部が主記憶装置上に生成されるようになっている。 The nickname estimation program executed by the nickname estimation apparatus according to the first to fourth embodiments includes the above-described units (name input unit, candidate generation unit, output unit, selection unit, learning data input unit, learning unit, candidate As the actual hardware, the CPU 51 (processor) reads out the nickname estimation program from the storage medium and executes it to load the respective units onto the main storage device. Each unit is generated on the main memory.

以上のように、本発明にかかる装置、方法およびプログラムは、名称だけでなく名称の愛称が処理対象となりうる情報検索装置、音声認識装置、音声対話装置などに適している。 As described above, the apparatus, method, and program according to the present invention are suitable for an information search apparatus, a voice recognition apparatus, a voice interactive apparatus, and the like that can be processed not only by names but also by nicknames.

第１の実施の形態にかかる愛称推定装置のブロック図である。It is a block diagram of the nickname estimation apparatus concerning 1st Embodiment. 愛称生成ルールの一例を示す図である。It is a figure which shows an example of a nickname production | generation rule. 入力された名称データの一例を示す図である。It is a figure which shows an example of the input name data. 正式名称を記号に変換する例を示す図である。It is a figure which shows the example which converts a formal name into a symbol. ひらがな表記を記号に変換する例を示す図である。It is a figure which shows the example which converts the hiragana notation into a symbol. カタカナ表記を記号に変換する例を示す図である。It is a figure which shows the example which converts a katakana notation into a symbol. 第１の実施の形態における愛称推定処理の全体の流れを示すフローチャートである。It is a flowchart which shows the whole flow of the nickname estimation process in 1st Embodiment. 第２の実施の形態にかかる愛称推定装置のブロック図である。It is a block diagram of the nickname estimation apparatus concerning 2nd Embodiment. 第２の実施の形態における愛称推定処理の全体の流れを示すフローチャートである。It is a flowchart which shows the whole flow of the nickname estimation process in 2nd Embodiment. 第３の実施の形態にかかる愛称推定装置のブロック図である。It is a block diagram of the nickname estimation apparatus concerning 3rd Embodiment. 第３の実施の形態における学習処理の全体の流れを示すフローチャートである。It is a flowchart which shows the whole flow of the learning process in 3rd Embodiment. 生成された愛称生成ルールの一例を示す図である。It is a figure which shows an example of the produced | generated nickname production | generation rule. 学習された愛称生成ルールの使用例を示す図である。It is a figure which shows the usage example of the learned nickname production | generation rule. 第４の実施の形態にかかる愛称推定装置のブロック図である。It is a block diagram of the nickname estimation apparatus concerning 4th Embodiment. 第４の実施の形態における愛称推定処理の全体の流れを示すフローチャートである。It is a flowchart which shows the whole flow of the nickname estimation process in 4th Embodiment. 第４の実施の形態における愛称候補抽出処理の全体の流れを示すフローチャートである。It is a flowchart which shows the whole flow of the nickname candidate extraction process in 4th Embodiment. 接尾辞の一例を示す図である。It is a figure which shows an example of a suffix. 種類数の取得方法の一例を示す模式図である。It is a schematic diagram which shows an example of the acquisition method of the number of types. 第１〜第４の実施の形態にかかる愛称推定装置のハードウェア構成図である。It is a hardware block diagram of the nickname estimation apparatus concerning the 1st-4th embodiment.

Explanation of symbols

５１ＣＰＵ
５２ＲＯＭ
５３ＲＡＭ
５４通信Ｉ／Ｆ
６１バス
１００愛称推定装置
１０１名称入力部
１０２候補生成部
１０３出力部
１２１ルール記憶部
２０１愛称候補
３０１人名
３０２カタカナ表記
８００愛称推定装置
８０４選択部
８２２文書記憶部
１０００愛称推定装置
１００５学習データ入力部
１００６学習部
１２０１正式名称
１２０２、１２１２愛称
１２０３、１２１３、１２２３愛称生成ルール
１２１１ひらがな表記
１３０１正式名称
１３０２ひらがな表記
１３０３カタカナ表記
１３１１〜１３１４愛称候補
１４００愛称推定装置
１４０４選択部
１４０７候補抽出部
１７０１文字列
１７１１〜１７１８接尾辞
１８０１、１８０２文字 51 CPU
52 ROM
53 RAM
54 Communication I / F
61 bus 100 nickname estimation device 101 name input unit 102 candidate generation unit 103 output unit 121 rule storage unit 201 nickname candidate 301 person name 302 katakana notation 800 nickname estimation device 804 selection unit 822 document storage unit 1000 nickname estimation device 1005 learning data input unit 1006 Learning unit 1201 Official name 1202, 1212 Nickname 1203, 1213, 1223 Nickname generation rule 1211 Hiragana notation 1301 Official name 1302 Hiragana notation 1303 Katakana notation 1311-1314 Nickname candidate 1400 Nickname estimation device 1404 Selection unit 1407 Candidate extraction unit 1701 Character string 1711-1 1718 Suffix 1801, 1802 characters

Claims

A nickname estimation device that estimates a nickname of a name from a name,
Among the characters included in the name, a rule storage unit that stores position information indicating a position of a character to be included in the nickname candidate and a generation rule for the nickname candidate including a predetermined additional character string;
A name input unit for inputting the name;
Among the characters included in the input name, the character at the position represented by the position information of the generation rule is acquired, and the nickname candidate obtained by combining the acquired character and the additional character string of the generation rule A generating unit for generating
An output unit for outputting the generated nickname candidates;
A nickname estimation device characterized by comprising:

A learning data input unit for inputting learning data in which the name and a nickname already known for the name are associated;
A common character included in common in the name of the learning data and the nickname of the learning data is acquired, position information representing a position of the common character is generated from the name of the learning data, and the learning data A learning unit that generates a character string in which the common character is deleted from the nickname, and learns the generated position information and the generated character string as a generation rule for the nickname candidate;
The nickname estimation apparatus according to claim 1.

Determining whether or not the nickname candidate is included in a predetermined first document and further including a selection unit that selects the nickname candidate included in the first document;
The nickname estimation apparatus according to claim 1.

The selection unit further searches the second document including the selected candidate for the nickname among predetermined second documents, and is determined in advance in descending order of the number of the second documents searched. Selecting a number of said nickname candidates;
The nickname estimation apparatus according to claim 3.

The selection unit determines whether the nickname candidate is included in the first document including the input name among the first documents, and the nickname included in the first document. Selecting candidates for
The nickname estimation apparatus according to claim 3.

The selection unit obtains a character string having a predetermined number of characters before and after the name from the first document including the inputted name, and the nickname for the obtained character string Determining whether or not a candidate is included, and selecting the nickname candidate included in the acquired character string;
The nickname estimation apparatus according to claim 5.

The third document including the inputted name is searched for from among the predetermined third documents, and at least one character having a predetermined number of characters before and after the name of the searched third document. Further comprising an extraction unit for acquiring a column and extracting the nickname candidate from the acquired character string;
The nickname estimation apparatus according to claim 1.

A selection unit for selecting the nickname candidate designated by the user from the outputted nickname candidates;
The nickname estimation apparatus according to claim 1.

A nickname estimation method executed by a nickname estimation device that estimates a nickname of a name from a name,
The nickname estimation apparatus stores position information indicating a position of a character to be included in the nickname candidate among characters included in the name, and a nickname candidate generation rule including a predetermined additional character string. A rule storage unit,
A name input step in which the name input unit inputs the name;
The generation unit acquires the character at the position represented by the position information of the generation rule among the characters included in the input name, and combines the acquired character and the additional character string of the generation rule Generating the nickname candidate; and
An output unit for outputting the generated nickname candidate;
A nickname estimation method characterized by comprising:

A nickname estimation program executed by a nickname estimation device that estimates a nickname of the name from a name,
The nickname estimation apparatus stores position information indicating a position of a character to be included in the nickname candidate among characters included in the name, and a nickname candidate generation rule including a predetermined additional character string. A rule storage unit,
A name input procedure for inputting the name;
Among the characters included in the input name, the character at the position represented by the position information of the generation rule is acquired, and the nickname candidate obtained by combining the acquired character and the additional character string of the generation rule A generation procedure for generating
An output procedure for outputting the generated nickname candidates;
A nickname estimation program for causing the nickname estimation apparatus to execute the program.