JP2010102444A

JP2010102444A - Character string conversion device with dictionary search function

Info

Publication number: JP2010102444A
Application number: JP2008272228A
Authority: JP
Inventors: Hisaaki Matsuo; 久顕松尾
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2008-10-22
Filing date: 2008-10-22
Publication date: 2010-05-06
Anticipated expiration: 2028-10-22
Also published as: JP5261133B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a character string conversion device for registering an extended dictionary for use without making it necessary for a user to perform any operation to compare words registered in the extended dictionary with words included in a text prepared in the past or a text browsed in the past for confirmation. <P>SOLUTION: An unknown word information extraction part (11) extracts words which are not registered in a basic dictionary stored in a basic dictionary storage part (16) as unknown words from text information, and a dictionary selection part (13) calculates the number of registration of unknown words in each extended direction, and compares the respective numbers of registration with one another, and selects the prescribed numbers of extended dictionaries in the order of the larger number of registration of unknown words, and a character string conversion part (10) performs character string conversion based on the searched extended dictionary. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、文字列変換装置に関し、特に、辞書検索機能付き文字列変換装置に関する。 The present invention relates to a character string converter, and more particularly to a character string converter with a dictionary search function.

携帯電話等の移動体通信端末の多くは、文字列情報を送受信する電子メール等の機能を備えている。また、このような機能を備える移動体通信端末は、一般に、送信する文字列情報を作成するためのテキストエディタ機能を備えている。 Many mobile communication terminals such as mobile phones have a function such as an electronic mail for transmitting and receiving character string information. In addition, a mobile communication terminal having such a function generally has a text editor function for creating character string information to be transmitted.

前記テキストエディタ機能は、キーボード等からの入力信号を文字コードに変換し、それらの文字コードに対応する文字の画像を移動体通信端末の画面上に表示し、さらに、入力された文字コード列（以下、文字列と呼ぶ）の他の文字列への変換候補を画面上に表示し、それらの変換候補から適切なものをキーボード等の操作によって選択し、選択された候補を変換として確定し、編集中の文字列情報に挿入する文字列変換機能を備えている。 The text editor function converts input signals from a keyboard or the like into character codes, displays character images corresponding to the character codes on the screen of the mobile communication terminal, and further inputs an input character code string ( The conversion candidates to other character strings (hereinafter referred to as character strings) are displayed on the screen, an appropriate one of those conversion candidates is selected by an operation such as a keyboard, and the selected candidate is confirmed as conversion, It has a character string conversion function to be inserted into the character string information being edited.

前記文字列変換機能は、前記入力文字列がひらがなやカタカナの読みであり、前記変換候補が前記読みに対応したかな漢字混じり文であるかな漢字変換機能や、前記読みに前方一致する読みを持つ変換候補を表示する予測変換機能等を備えている場合もある。 The character string conversion function is a kana-kanji conversion function in which the input character string is a reading of hiragana or katakana, and the conversion candidate is a kana-kanji mixed sentence corresponding to the reading, or a conversion candidate having a reading that matches the reading forward In some cases, a predictive conversion function or the like is provided.

また、前記文字列変換機能は、一般に、入力文字列と変換候補文字列との関係をデータ化した、辞書データベースを利用して処理を行う。 The character string conversion function generally performs processing using a dictionary database in which the relationship between an input character string and a conversion candidate character string is converted into data.

通常、文字列変換機能には、ユーザの利便性を考慮し、辞書データベースにあらかじめいくつかの入力文字列−変換候補文字列のデータを登録できるようになっている。 In general, in consideration of user convenience, the character string conversion function allows data of several input character strings-conversion candidate character strings to be registered in advance in the dictionary database.

辞書データベースに新たな関係データを登録する際、ユーザが１つずつ関係データをキーボード等で入力する方法があるが、この方法によると、登録したい関係データが複数ある場合、複数回の操作を繰り返す等の作業を行わなければならない。 When registering new relation data in the dictionary database, there is a method in which the user inputs the relation data one by one using a keyboard or the like. According to this method, when there are a plurality of relation data to be registered, a plurality of operations are repeated. Etc. must be done.

一方で、あらかじめ複数の関係データを登録した拡張辞書がネットワーク上のサーバから配布される場合があり、ユーザは拡張辞書をダウンロードして文字列変換装置に登録し、文字列変換装置は、前記基本辞書データベースと、前記拡張辞書を参照して、文字列変換候補を出力することができる。このような拡張辞書は、通常、特定の分野に関連した語を複数登録している場合が多い。特許文献１には、ユーザが既存メールに対して返信を行う場合、返信処理の対象となる電子メールに対して、文章の内容の解析を行い、文章のタイプを決定し、返信メール作成で使用する日本語変換辞書の種類を決定する装置および方法について記載されている。 On the other hand, an extended dictionary in which a plurality of relational data is registered in advance may be distributed from a server on the network, and the user downloads the extended dictionary and registers it in the character string conversion device. Character string conversion candidates can be output with reference to the dictionary database and the extended dictionary. In many cases, such extended dictionaries usually register a plurality of words related to a specific field. In Patent Document 1, when a user replies to an existing mail, the content of the sentence is analyzed for the e-mail that is the subject of the reply process, the type of sentence is determined, and used in reply mail creation An apparatus and method for determining the type of Japanese conversion dictionary to be performed are described.

特開２００８−１８６３１２号公報JP 2008-188631 A

前記拡張辞書には、複数の語が登録されているのが通例である。文字列変換においては、入力文字列が同じ関係データが複数登録されている場合、それらすべて、もしくは優先度の高い順に所定の個数だけを変換候補として画面に表示するのが通例であり、候補数が多い場合には画面上に一度に表示できない場合がある。したがって、拡張辞書データベースにユーザが使用しない単語が含まれていると、それらの単語が画面上に表示されることによって、画面の表示領域の制約上、その他の変換候補が画面に表示されないといった問題が発生しうる。 In general, a plurality of words are registered in the extended dictionary. In character string conversion, when multiple related data with the same input character string are registered, it is customary to display all of them or only a predetermined number in the order of priority on the screen as conversion candidates. If there are many, it may not be displayed on the screen at once. Therefore, if words that are not used by the user are included in the extended dictionary database, these words are displayed on the screen, and other conversion candidates are not displayed on the screen due to restrictions on the display area of the screen. Can occur.

よって、ユーザが拡張辞書を使用登録する際には、前記の問題を回避するために、各拡張辞書に登録されている語についてどのくらいの頻度で使用するかを吟味し、拡張辞書を選んで登録することになる。拡張辞書については、登録されている語の一覧を端末装置等で参照できるのが一般的である。しかしながら、登録されている語についてどの程度の頻度で文字列変換時に使用するかを、ユーザが客観的に判断するには、ユーザが過去に作成した文章や、過去に閲覧した文章に含まれる語と、拡張辞書に登録されている語とを比較して判断する必要があるが、それは非常に煩雑な作業となる。 Therefore, when a user registers to use an extended dictionary, in order to avoid the above-mentioned problem, the frequency of use of words registered in each extended dictionary is examined, and an extended dictionary is selected and registered. Will do. As for the extended dictionary, a list of registered words can generally be referred to by a terminal device or the like. However, in order for the user to objectively determine how often a registered word is used during character string conversion, words included in sentences created by the user in the past or words read in the past It is necessary to make a judgment by comparing with words registered in the extended dictionary, which is a very complicated operation.

また、特許文献１に記載の方法では、あらかじめ文書のタイプを定義づけ、文章との関連タイプを定義する必要があるため、ユーザが新たに追加した拡張辞書については、ユーザがそれらの拡張辞書と、文章タイプとの関連づけを行う作業が必要となる。 Further, in the method described in Patent Document 1, since it is necessary to define the document type in advance and to define the type related to the sentence, the extended dictionary newly added by the user is the same as the extended dictionary. The work of associating with the sentence type is necessary.

本発明は、このような実情を鑑みてなされたものであり、ユーザが拡張辞書に登録されている単語と、過去に作成した文章や過去に閲覧した文章に含まれる語とを比較確認する作業を必要とせずに拡張辞書を使用登録することができる文字列変換装置を提供する。 The present invention has been made in view of such circumstances, and a work for comparing and confirming a word registered in an extended dictionary by a user and a word included in a sentence created in the past or a sentence browsed in the past Provided is a character string conversion device capable of registering using an extended dictionary without the need for a password.

本発明の文字列変換装置は、拡張辞書に登録されている各語について、ユーザが過去に作成した文章や閲覧した文章においてどの程度の頻度で使用されているかを評価して拡張辞書を使用登録する。 The character string conversion device of the present invention evaluates how often each word registered in the extended dictionary is used in a sentence created or browsed by the user in the past, and registers the use of the extended dictionary To do.

本発明の文字列変換装置は、文章情報を記憶する文章情報記憶部と、文章情報から未知語情報を抽出する未知語情報抽出部と、文字列変換を行う文字列変換部と、前記文字列変換部が参照する基本辞書を記憶する基本辞書格納部と、前記文字列変換部が参照する拡張辞書を記憶する拡張辞書格納部と、前記拡張辞書格納部に記憶されている拡張辞書から文字列変換にて使用する拡張辞書を選択する辞書選択部とを備えた文字列変換装置であって、前記未知語情報抽出部は、前記基本辞書格納部に記憶されている基本辞書に登録されていない語を未知語として前記文章情報から抽出し、前記辞書選択部は、各拡張辞書における前記未知語の登録数を計算して、各登録数を比較して未知語の登録数が多い順に所定の数だけ拡張辞書を選択し、前記文字列変換部は、前記選択された拡張辞書に基づいて文字列変換を行うことを特徴とする。ここで、所定の数とは、文字列変換装置においてあらかじめ決められた、またはユーザが選択可能な、文字列変換装置において登録できる拡張辞書の数である。 A character string conversion apparatus according to the present invention includes a sentence information storage unit that stores sentence information, an unknown word information extraction unit that extracts unknown word information from the sentence information, a character string conversion unit that performs character string conversion, and the character string. A basic dictionary storage unit that stores a basic dictionary referred to by the conversion unit, an extended dictionary storage unit that stores an extended dictionary referred to by the character string conversion unit, and a character string from the extended dictionary stored in the extended dictionary storage unit A character string conversion device including a dictionary selection unit that selects an extended dictionary to be used for conversion, wherein the unknown word information extraction unit is not registered in the basic dictionary stored in the basic dictionary storage unit The word is extracted from the sentence information as an unknown word, and the dictionary selection unit calculates the number of registrations of the unknown word in each extended dictionary, compares the number of registrations, and determines a predetermined number in order of the number of registrations of unknown words. Select as many extended dictionaries as above String conversion unit, and performs string conversion based on the selected extended dictionary. Here, the predetermined number is the number of extended dictionaries that can be registered in the character string conversion device that is predetermined in the character string conversion device or that can be selected by the user.

前記未知語情報抽出部は、文章情報から未知語を抽出する際に、重複を許して未知語を抽出してもよい。 The unknown word information extraction unit may extract an unknown word while allowing duplication when extracting the unknown word from the sentence information.

前記文章情報記憶部は、送信者情報および／または受信者情報を含む文章情報を記憶し、前記未知語情報抽出部は、所定の送信者情報および／または受信者情報を含む文章情報から未知語を抽出してもよい。 The sentence information storage unit stores sentence information including sender information and / or receiver information, and the unknown word information extraction unit is configured to extract an unknown word from sentence information including predetermined sender information and / or receiver information. May be extracted.

前記文章情報記憶部は、送信時刻情報および／または受信時刻情報を含む文章情報を記憶し、前記未知語情報抽出部は、所定の期間内の送信時刻および／または受信時刻を含む文章情報から未知語を抽出してもよい。 The sentence information storage unit stores sentence information including transmission time information and / or reception time information, and the unknown word information extraction unit is unknown from sentence information including transmission time and / or reception time within a predetermined period. Words may be extracted.

表示部をさらに備え、文字列変換において、前記選択された拡張辞書を表す文字列を前記表示部が表示してもよい。 A display unit may be further provided, and in the character string conversion, the display unit may display a character string representing the selected extended dictionary.

前記表示部は、前記文字列変換部が出力する変換候補文字列のうち、前記選択された拡張辞書に登録されており、かつ前記抽出された未知語に含まれる変換候補文字列を、その他の変換候補文字列とは異なる表示方法によって表示してもよい。 The display unit registers the conversion candidate character strings registered in the selected extended dictionary and included in the extracted unknown word among the conversion candidate character strings output by the character string conversion unit, You may display by the display method different from a conversion candidate character string.

本発明の文字列変換装置は、ユーザが拡張辞書に登録されている単語と、過去に作成した文章や過去に閲覧した文章に含まれる語とを比較確認する作業を必要としない。 The character string conversion device according to the present invention does not require the user to compare and confirm the words registered in the extended dictionary with the words created in the past and the words included in the sentences browsed in the past.

以下、図面を参照しながら本発明の実施の形態について詳細に説明する。
図１は、本発明の文字列変換装置の構成の一例を示すブロック図である。文字列変換装置は、ＣＰＵ（Central Processing Unit）１と、記憶装置２と、入力インタフェース３と、出力インタフェース４と、外部通信インタフェース５を備える。これらの各要素は、システムバスによって相互に接続され、データの送受信や処理を行う。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a block diagram showing an example of the configuration of a character string conversion apparatus according to the present invention. The character string converter includes a CPU (Central Processing Unit) 1, a storage device 2, an input interface 3, an output interface 4, and an external communication interface 5. These elements are connected to each other via a system bus, and perform data transmission / reception and processing.

記憶装置２は、ワーク領域６と、プログラム格納領域７と、データ格納領域８とを備える。 The storage device 2 includes a work area 6, a program storage area 7, and a data storage area 8.

ワーク領域６は、プログラム格納領域７に記憶されているプログラムをＣＰＵ１が実行する時に必要な記憶領域である。 The work area 6 is a storage area required when the CPU 1 executes the program stored in the program storage area 7.

プログラム格納領域には、制御プログラム９と、文字列変換プログラム１０と、未知語抽出プログラム１１と、文章データ管理プログラム１２と、拡張辞書評価プログラム１３と、画面表示プログラム１４とが記憶されている。各プログラムをＣＰＵ１が実行することにより、辞書検索処理、文字列変換処理を行う。 In the program storage area, a control program 9, a character string conversion program 10, an unknown word extraction program 11, a text data management program 12, an extended dictionary evaluation program 13, and a screen display program 14 are stored. When the CPU 1 executes each program, dictionary search processing and character string conversion processing are performed.

データ格納領域８は、文章データ格納領域１５と、基本辞書データ格納領域１６と、拡張辞書データ格納領域１７とを備える。 The data storage area 8 includes a text data storage area 15, a basic dictionary data storage area 16, and an extended dictionary data storage area 17.

文章データ格納領域１５は、１つもしくは複数の文章データを記憶する。各文章データは、文字列情報を含む。また、各文章データは、文字列情報とは別に、送信者情報、受信者情報、送信時刻情報、受信時刻情報等を含んでもよい。 The text data storage area 15 stores one or a plurality of text data. Each sentence data includes character string information. Each sentence data may include sender information, receiver information, transmission time information, reception time information, and the like separately from the character string information.

基本辞書データ格納領域１６は、１つもしくは複数の基本辞書データを記憶している。各基本辞書データは、入力文字列と、該入力文字列に対する変換候補文字列とを対応づけたデータから成る。 The basic dictionary data storage area 16 stores one or a plurality of basic dictionary data. Each basic dictionary data includes data in which an input character string is associated with a conversion candidate character string corresponding to the input character string.

拡張辞書データ格納領域１７は、１つもしくは複数の拡張辞書データを記憶している。各拡張辞書データは、入力文字列と、該入力文字列に対する変換候補文字列とを対応づけたデータから成る。 The extended dictionary data storage area 17 stores one or a plurality of extended dictionary data. Each extended dictionary data includes data in which an input character string is associated with a conversion candidate character string corresponding to the input character string.

ＣＰＵ１は、電源投入時に、記憶装置２のプログラム格納領域７に記憶された制御プログラム９を実行し、入力インタフェース３からの入力に応じて、記憶装置２のプログラム格納領域７に格納された各プログラムを実行する。 When the power is turned on, the CPU 1 executes the control program 9 stored in the program storage area 7 of the storage device 2, and each program stored in the program storage area 7 of the storage device 2 according to the input from the input interface 3. Execute.

記憶装置２は、半導体メモリや、磁気ディスクおよび磁気ディスクドライブ、光学ディスクおよび光学ディスクドライブ等からなる。 The storage device 2 includes a semiconductor memory, a magnetic disk and a magnetic disk drive, an optical disk, an optical disk drive, and the like.

入力インタフェース３は、キーボードやタッチパネル、マウス等の端末を操作し、文字列を入力するための外部デバイスと接続される。 The input interface 3 is connected to an external device for operating a terminal such as a keyboard, a touch panel, or a mouse and inputting a character string.

出力インタフェース４は、ＣＲＴ（Cathode Ray Tube）や液晶ディスプレイ等の表示デバイスと接続される。 The output interface 4 is connected to a display device such as a CRT (Cathode Ray Tube) or a liquid crystal display.

外部通信インタフェース５は、外部と無線または有線で通信し、電子メール等の文書データを送受信する。 The external communication interface 5 communicates with the outside wirelessly or by wire, and transmits / receives document data such as electronic mail.

文字列変換プログラム１０は、入力インタフェース３を通じて入力された情報から入力文字列情報を作成し、該入力文字列情報に対する文字列変換を行なうためのプログラムである。文字列変換処理は、基本辞書データベースと、使用登録された拡張辞書データベースとを参照して行なわれる。また、入力文字列に対して１つまたは複数の変換候補を作成する。 The character string conversion program 10 is a program for creating input character string information from information input through the input interface 3 and performing character string conversion on the input character string information. The character string conversion process is performed with reference to the basic dictionary database and the extended dictionary database registered for use. Also, one or more conversion candidates are created for the input character string.

未知語抽出プログラム１１は、文章データ格納領域に格納されている文章データを読み込み、文章データ中の文字列を分割して単語列を生成する。文字列の分割には形態素解析処理を行う。そして単語列に含まれる各単語の文字列を、基本辞書の変換候補文字列と比較し、一致する変換候補文字列がない場合は、その単語は基本辞書に登録されていないと判定する。そして登録されていない単語を未知語リストに追加する。文章データ中の全ての単語について比較が完了すると、未知語リストを出力する。 The unknown word extraction program 11 reads the sentence data stored in the sentence data storage area, divides the character string in the sentence data, and generates a word string. A morpheme analysis process is performed for dividing the character string. Then, the character string of each word included in the word string is compared with the conversion candidate character string in the basic dictionary. If there is no matching conversion candidate character string, it is determined that the word is not registered in the basic dictionary. Then, the unregistered word is added to the unknown word list. When the comparison is completed for all the words in the sentence data, an unknown word list is output.

未知語抽出プログラム１１が、未知語リストに単語を追加する過程において、未知語リスト中の単語の重複を許して追加するようにしてもよい。例えば、文章データ中に「アーカイヴ」という文字列が複数含まれており、基本辞書に「アーカイヴ」という単語が登録されていなかったとする。この場合、未知語抽出プログラム１１は、「アーカイヴ」という単語を文章データに含まれる個数分だけ未知語リストに追加してもよい。これにより各未知語は、文章中での出現回数分だけ未知語リストに登録されるので、出現回数によって重みづけされることになる。よって、文章データ中に含まれる未知語の種類は少ないが、それらの未知語が繰り返し多数出現する場合においても、拡張辞書評価プログラム１３は、それらの単語が文字列変換時の変換候補としてより高い確率で出現するように拡張辞書の選択を行なうことができる。 The unknown word extraction program 11 may add a word in the unknown word list while permitting duplication of words in the process of adding the word to the unknown word list. For example, it is assumed that the text data includes a plurality of character strings “archive” and the word “archive” is not registered in the basic dictionary. In this case, the unknown word extraction program 11 may add the word “archive” to the unknown word list by the number included in the sentence data. As a result, each unknown word is registered in the unknown word list by the number of appearances in the sentence, and is therefore weighted by the number of appearances. Therefore, although there are few types of unknown words included in the text data, even when many of these unknown words appear repeatedly, the expanded dictionary evaluation program 13 is higher as a conversion candidate at the time of character string conversion. The extended dictionary can be selected so that it appears with probability.

また、未知語抽出プログラム１１が文章データを読み込む際に、文章データ中に含まれる送信者情報を参照し、所定の送信者情報、もしくは送信者情報が含まれる文章データのみを読み込むようにしてもよい。これにより特定の送信先の文章を作成する場合に、より使用頻度の高い候補を含む拡張辞書を選択することができるようになる。例えば、特定の相手との電子メールのやりとりにおいては、過去にその相手に送信した電子メール、もしくはその相手から受信した電子メールに含まれる語を使用する場合が多い。よって、その相手に対してメールの文章を作成する場合に、その相手から受信した電子メール、もしくはその相手に送信したメールから抽出した単語から生成された未知語リストに基づいて、拡張辞書評価プログラム１３が拡張辞書を選択して使用登録すると、文章作成時の文字列変換において、より使用する確率の高い語を候補として表示できるという利点がある。 Further, when the unknown word extraction program 11 reads the text data, the sender information included in the text data is referred to and only the text data including the predetermined sender information or the sender information is read. Good. This makes it possible to select an extended dictionary that includes candidates that are used more frequently when creating a text for a specific destination. For example, in the exchange of e-mail with a specific partner, a word included in an e-mail sent to the partner in the past or an e-mail received from the partner is often used. Therefore, when creating mail text for the other party, an extended dictionary evaluation program based on an unknown word list generated from an e-mail received from the other party or a word extracted from the mail sent to the other party If 13 selects and registers an extended dictionary, there is an advantage that words having a higher probability of use can be displayed as candidates in character string conversion at the time of sentence creation.

また、未知語抽出プログラム１１が文章データを読み込む際に、文章データに含まれる送信日時、もしくは受信日時を参照し、それらが所定の期間内である文章データのみを読み込むようにしてもよい。例えば、電子メール等の文章データを作成する場合は、一時的な流行に関わる語を使用する場合があり、そのような場合に、特定の期間内に送受信した電子メールを読みこんで未知語リストを抽出し、その未知語リストに基づいて拡張辞書評価プログラム１３が拡張辞書を選択して登録すると、その期間の流行に関わる語をより高い確率で変換候補として表示できるという利点がある。 In addition, when the unknown word extraction program 11 reads the text data, the transmission date / time or the reception date / time included in the text data may be referred to so that only text data within a predetermined period may be read. For example, when creating text data such as e-mail, words related to temporary trends may be used. In such a case, an unknown word list is read by reading e-mails sent and received within a specific period. When the extended dictionary evaluation program 13 selects and registers an extended dictionary based on the unknown word list, there is an advantage that words related to the fashion in that period can be displayed as conversion candidates with higher probability.

文章データ管理プログラム１２は、外部通信インタフェース５経由で受信した文章データを文章データ格納領域１５に格納し、また文章データ格納領域１５から文章データを検索するためのプログラムである。 The text data management program 12 is a program for storing text data received via the external communication interface 5 in the text data storage area 15 and retrieving text data from the text data storage area 15.

画面表示プログラム１４は、文字列変換において変換候補文字列を画面上に表示するためのプログラムである。 The screen display program 14 is a program for displaying conversion candidate character strings on the screen in character string conversion.

画面表示プログラム１４は、文字列変換時に、使用登録されている拡張辞書を表す文字列を画面上に表示するようにしてもよい。これにより、ユーザは、拡張辞書評価プログラム１３が使用登録した辞書を、その文字列を視認することにより識別できるという利点がある。 The screen display program 14 may display a character string representing an extended dictionary registered for use on the screen during character string conversion. Thereby, the user has an advantage that the dictionary registered for use by the extended dictionary evaluation program 13 can be identified by visually recognizing the character string.

また、文字列変換にて拡張辞書による変換候補を表示する際に、前記文章から抽出された未知語の候補を、その他の候補と識別可能なように表示してもよい。具体的には候補文字列の字体を変える、候補文字列に下線を付す、候補文字列を枠で囲む、候補文字列もしくは候補文字列背景の表示色をその他の候補文字列と変える等の方法がある。これによりユーザは、文章中で使用された未知語のうち、どの未知語が拡張辞書によって登録されていたかを視認によって判別できる。 In addition, when displaying conversion candidates based on an extended dictionary in character string conversion, unknown word candidates extracted from the sentence may be displayed so as to be distinguishable from other candidates. Specifically, changing the font of the candidate character string, underlining the candidate character string, surrounding the candidate character string with a frame, changing the display color of the candidate character string or candidate character string background with other candidate character strings, etc. There is. Thereby, the user can discriminate visually which unknown word was registered by the extended dictionary among the unknown words used in the sentence.

図２は、図１に示す文字列変換装置のプログラム格納領域７に格納されているプログラムの処理の流れを示すフローチャートである。 FIG. 2 is a flowchart showing the flow of processing of the program stored in the program storage area 7 of the character string conversion device shown in FIG.

先ず、ステップＳ１０１において、文章データ管理プログラム１２は、文章データ格納領域１５から文章データを読み込む。 First, in step S <b> 101, the text data management program 12 reads text data from the text data storage area 15.

次に、ステップＳ１０２において、未知語抽出プログラム１１は、読み込んだ文章データから、基本辞書データ格納領域１６に格納された基本辞書に登録されていない語を未知語リストとして抽出する。 Next, in step S102, the unknown word extraction program 11 extracts words that are not registered in the basic dictionary stored in the basic dictionary data storage area 16 from the read text data as an unknown word list.

次に、ステップＳ１０３において、拡張辞書評価プログラム１３は、未チェックの拡張辞書がないかどうかを判定する。 Next, in step S103, the extended dictionary evaluation program 13 determines whether there is an unchecked extended dictionary.

未チェックの拡張辞書がある場合、ステップＳ１０４において、拡張辞書評価プログラム１３がその中から１つの拡張辞書を選択する。 If there is an unchecked extended dictionary, in step S104, the extended dictionary evaluation program 13 selects one extended dictionary therefrom.

次に、ステップＳ１０５において、拡張辞書評価プログラム１３は、前記未知語リストに含まれる語のうち、前記選択された拡張辞書に何件登録されているかを計算する。 Next, in step S105, the extended dictionary evaluation program 13 calculates how many words are registered in the selected extended dictionary among the words included in the unknown word list.

次に、ステップＳ１０６において、現在の登録拡張辞書候補と、現在のチェック対象拡張辞書との未知語の登録件数を比較する。現在の登録拡張辞書候補よりも現在のチェック対象拡張辞書の方が未知語リスト中の未知語を多く登録している場合は、ステップＳ１０７において、登録拡張辞書候補を現在のチェック対象拡張辞書と入れ替える。また現在のチェック対象拡張辞書はチェック済として、再びステップＳ１０３の処理に戻る。 Next, in step S106, the number of unknown word registrations between the current registered extended dictionary candidate and the current check target extended dictionary is compared. If more unknown words in the unknown word list are registered in the current check target extended dictionary than the current registered extended dictionary candidate, the registered extended dictionary candidate is replaced with the current check target extended dictionary in step S107. . Further, it is determined that the current check target extended dictionary has been checked, and the process returns to step S103 again.

以降、ステップＳ１０３からステップＳ１０７の処理を繰り返し、ステップＳ１０３において未チェックの辞書がないと判定されたら、ステップＳ１０８に移行し、現在の登録拡張辞書候補を使用登録する。 Thereafter, the processing from step S103 to step S107 is repeated, and if it is determined in step S103 that there is no unchecked dictionary, the process proceeds to step S108, where the current registered extended dictionary candidate is registered for use.

図３は、図２のステップＳ１０５において拡張辞書評価プログラム１３が未知語リスト中のうちチェック対象辞書に登録されている未知語の数を計算する処理の流れを説明するフローチャートである。 FIG. 3 is a flowchart for explaining the flow of processing in which the extended dictionary evaluation program 13 calculates the number of unknown words registered in the check target dictionary in the unknown word list in step S105 of FIG.

まず、ステップＳ２０１において、未知語の登録数ｃｎｔを０に初期化する。現在のチェック対象の未知語のインデックスを表すｉを０に初期化する。未知語リストは、図４に示すように、先頭のインデックスを０、登録数をｎとして、０から（ｎ−１）番までのインデックスが各未知語に振られているものとする。 First, in step S201, the unknown word registration number cnt is initialized to zero. I representing the index of an unknown word to be checked is initialized to zero. In the unknown word list, as shown in FIG. 4, it is assumed that the index from 0 to (n-1) is assigned to each unknown word, where the top index is 0 and the number of registrations is n.

ステップＳ２０２において、ｉが未知語リストの登録数より小さいかどうかを判定する。 In step S202, it is determined whether i is smaller than the number of unknown word lists registered.

未知語リストの登録数より小さい場合は、ステップＳ２０３に移行し、未知語リストのｉ番目の未知語の文字列が拡張辞書に登録されているかを、拡張辞書に登録されている全ての文字列と、未知語の文字列とを比較して判定する。 If it is smaller than the number registered in the unknown word list, the process proceeds to step S203, and whether all the character strings registered in the extended dictionary indicate whether the character string of the i-th unknown word in the unknown word list is registered in the extended dictionary. And the character string of the unknown word.

登録されている場合は、ステップＳ２０４において、ｃｎｔに１を加算し、ステップＳ２０５にてチェック対象のインデックスに１を加算する。登録されていない場合は、ステップＳ２０５にそのまま移行する。 If registered, 1 is added to cnt in step S204, and 1 is added to the index to be checked in step S205. If not registered, the process proceeds to step S205 as it is.

以降、ステップＳ２０２からステップＳ２０５の処理をｉが未知語リストの登録数に達するまで繰り返し、達したら、Ｓ２０６において、ｃｎｔをチェック対象辞書中の未知語リストの登録数として返す。 Thereafter, the processing from step S202 to step S205 is repeated until i reaches the number of registrations in the unknown word list, and when it is reached, cnt is returned as the number of registrations in the unknown word list in the check target dictionary.

以上説明したように、本発明の文字列変換装置は、拡張辞書に登録されている各語について、ユーザが過去に作成した文章や閲覧した文章においてどの程度の頻度で使用されているかを評価して拡張辞書を使用登録するため、ユーザが拡張辞書に登録されている単語と、過去に作成した文章や過去に閲覧した文章に含まれる語とを比較確認する作業を必要としないという利点がある。 As described above, the character string conversion device according to the present invention evaluates how frequently each word registered in the extended dictionary is used in a sentence created or browsed by the user in the past. Therefore, there is an advantage that the user does not need to compare and confirm the words registered in the extended dictionary with the words created in the past and the words included in the past viewed. .

また、拡張辞書の評価処理においては、拡張辞書と文書タイプとを関連付ける必要がなく、ユーザが拡張辞書をダウンロードした場合に前記関連付けの作業が必要ないという利点がある。 Further, in the extended dictionary evaluation process, there is no need to associate the extended dictionary with the document type, and there is an advantage that the association work is not required when the user downloads the extended dictionary.

本発明は、文字列変換装置に利用可能である。 The present invention can be used for a character string converter.

本発明の文字列変換装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the character string converter of this invention. 文字列変換装置のプログラム格納領域７に格納されているプログラムの処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the program stored in the program storage area 7 of a character string converter. 拡張辞書評価プログラム１３が未知語リスト中のうちチェック対象辞書に登録されている未知語の数を計算する処理の流れを説明するフローチャートである。It is a flowchart explaining the flow of a process in which the extended dictionary evaluation program 13 calculates the number of unknown words registered in the check target dictionary in the unknown word list. 未知語リストの一例を示す図である。It is a figure which shows an example of an unknown word list.

Explanation of symbols

１ＣＰＵ
２記憶装置
３入力インタフェース
４出力インタフェース
５外部通信インタフェース
６ワーク領域
７プログラム格納領域
８データ格納領域
９制御プログラム
１０文字列変換プログラム
１１未知語抽出プログラム
１２文章データ管理プログラム
１３拡張辞書評価プログラム
１４画面表示プログラム
１５文章データ格納領域
１６基本辞書データ格納領域
１７拡張辞書データ格納領域 1 CPU
2 storage device 3 input interface 4 output interface 5 external communication interface 6 work area 7 program storage area 8 data storage area 9 control program 10 character string conversion program 11 unknown word extraction program 12 sentence data management program 13 extended dictionary evaluation program 14 screen display Program 15 Text data storage area 16 Basic dictionary data storage area 17 Extended dictionary data storage area

Claims

A text information storage unit for storing text information;
An unknown word information extraction unit that extracts unknown word information from sentence information;
A character string conversion unit that performs character string conversion;
A basic dictionary storage for storing a basic dictionary referred to by the character string converter;
An extended dictionary storage for storing an extended dictionary referred to by the character string converter;
A character string conversion device including a dictionary selection unit that selects an extended dictionary to be used in character string conversion from the extended dictionary stored in the extended dictionary storage unit,
The unknown word information extraction unit extracts words that are not registered in the basic dictionary stored in the basic dictionary storage unit as unknown words from the sentence information,
The dictionary selection unit calculates the number of unknown words registered in each extended dictionary, compares each registered number, and selects a predetermined number of extended dictionaries in descending order of the number of unknown word registered,
The character string conversion device, wherein the character string conversion unit performs character string conversion based on the selected extended dictionary.

The character string conversion device according to claim 1, wherein the unknown word information extraction unit extracts an unknown word while allowing duplication when extracting the unknown word from the sentence information.

The sentence information storage unit stores sentence information including sender information and / or receiver information,
The character string conversion device according to claim 1, wherein the unknown word information extraction unit extracts an unknown word from sentence information including predetermined sender information and / or receiver information.

The sentence information storage unit stores sentence information including transmission time information and / or reception time information,
4. The character string according to claim 1, wherein the unknown word information extraction unit extracts an unknown word from sentence information including a transmission time and / or a reception time within a predetermined period. 5. Conversion device.

A display unit;
5. The character string conversion device according to claim 1, wherein in the character string conversion, the display unit displays a character string representing the selected extended dictionary. 6.

The display unit registers the conversion candidate character strings registered in the selected extended dictionary and included in the extracted unknown word among the conversion candidate character strings output by the character string conversion unit, 6. The character string conversion device according to claim 5, wherein the character string conversion device displays the character string using a display method different from the conversion candidate character string.