JP4004060B1

JP4004060B1 - Character search method

Info

Publication number: JP4004060B1
Application number: JP2007097568A
Authority: JP
Inventors: 徹也加賀美
Original assignee: 加賀美徹也
Priority date: 2007-03-19
Filing date: 2007-04-03
Publication date: 2007-11-07
Anticipated expiration: 2027-04-03
Also published as: WO2008114618A1; JP2008262248A

Abstract

【課題】漢字やハングルとの文字の文字検索が、文字知識や特別な装置を前提とせず、簡易な方法で正確、迅速に検索できる文字検索方法を提供する。
【解決手段】検索文字の構成要素の間隙を、縦方向及び横方向に分割して、この分割の可否をコードに置き換えることで文字をコード化して分類し、前記分類コードを入力することにより文字の検索を可能とし、その後得られた検索文字の意味を多言語や動画等で表示することで文字理解を可能とした文字検索方法。
【選択図】図３PROBLEM TO BE SOLVED: To provide a character search method capable of searching accurately and quickly by a simple method without searching for character knowledge or a special device for character search with Kanji and Korean characters.
A gap between constituent elements of a search character is divided in a vertical direction and a horizontal direction, the character is encoded and classified by replacing the possibility of division with a code, and the character is input by inputting the classification code. The character search method that enables the user to understand the character by displaying the meaning of the search character obtained thereafter in multiple languages, videos, or the like.
[Selection] Figure 3

Description

本発明は、漢字やハングル等の文字を簡易な方法で正確、迅速に検索する検索方法に関するものである。 The present invention relates to a search method for accurately and quickly searching for characters such as Chinese characters and Korean characters using a simple method.

従来より、文字、特に漢字を検索する方法としていくつかの特許が提案されている。特許文献１に示すものは文字を構成する１つ以上の構成要素によって文字を検索するものである。
この発明によれば、文字論理式入力手段１０から入力された文字論理式に含まれる文字の部品を文字部品特定手段１１において特定し、これを文字論理式に代入して部品論理式を作成する。作成した部品論理式を部品論理式演算手段１２において演算し、演算結果として得られた部品の集合を検索条件として該当文字特定手段１３が文字部品データベース１５を参照し、該当する文字を特定するものである。
特開２００３−３０１８３ Conventionally, several patents have been proposed as a method of searching for characters, particularly kanji. Japanese Patent Application Laid-Open No. H10-228867 searches for a character by one or more constituent elements constituting the character.
According to the present invention, a character part included in the character logical expression input from the character logical expression input means 10 is specified by the character part specifying means 11, and this is substituted into the character logical expression to create a part logical expression. . The part logical expression is calculated by the part logical expression calculation means 12, and the corresponding character specifying means 13 refers to the character part database 15 using the set of parts obtained as the calculation result as a search condition to specify the corresponding character. It is.
JP 2003-30183 A

前記特許文献１に記載の発明は、漢字の文字知識を前提とし、文字をいくつかの部品に分割し、この分割された文字を部品単位で加算、減算、乗算等するように構成しなくてはならず、そのために演算文字検索装置を用いている。 The invention described in Patent Document 1 is based on knowledge of kanji characters, and is configured to divide a character into several parts and add, subtract, multiply, etc. the divided characters in parts. Therefore, an arithmetic character search device is used for this purpose.

前記演算文字検索装置の操作のために、漢字の文字知識を前提とし、文字論理式入力１０、文字部品特定手段１１等の各手段が必要となるため検索装置の構成が複雑となり、検索方法も複雑なものとなっていた。 The operation of the arithmetic character search device is based on knowledge of kanji characters, and each means such as the character logical expression input 10 and the character part specifying means 11 is required, so that the structure of the search device becomes complicated and the search method is also It was complicated.

本発明は上記の点に鑑みなされたものであり、漢字やハングルの文字知識や特別な装置を前提とせず、簡易な方法で行え、かつ正確、迅速に検索できる文字検索方法を提供するものである。 The present invention has been made in view of the above points, and provides a character search method that can be performed in a simple manner and can be accurately and quickly performed without assuming kanji or Korean character knowledge or special devices. is there.

本発明の要旨とするところは、文字４分割コード等の入力を受け付けるための入力手段１０と、検索文字の構成要素の間隙をコード化した文字４分割コードに対応する文字データベースや文字検索プログラムを記憶するための記憶手段３０と、入力情報と文字データベースの照合を行うための演算手段２０と、検索結果を表示するための表示手段４０とを備えた検索装置における文字検索方法であって、
前記記憶手段３０に文字データベースを記憶するに際し、文字の構成要素間に間隙がある場合は分割線が引け、間隙がない場合は分割線が引けないという判断基準に基づき、文字に対し縦方向、横方向の順で略十文字形に分割線が引けるか否かを、文字の上、下、左、右の４つの部分ごとに順に判断し、分割線が引ける場合は数字の１、引けない場合は数字の０で表し、この数字を前記上、下、左、右の順に、４桁の数字の１桁目、２桁目、３桁目、４桁目に対応するそれぞれの桁に割り当てることで文字をコード化して文字４分割コードとし、該文字４分割コードとそれに対応する文字または文字画像、多言語、動画ファイルをデータベースとして分類して記憶しておき、
前記入力手段１０が、前記文字４分割コードの入力を受け付けるステップと、
前記演算手段２０が、前記入力を受け付けた文字４分割コードと前記記憶手段３０に記憶された文字４分割コードとを照合し、これらの文字４分割コードが合致した場合に、該文字４分割コードに対応する文字または文字画像、多言語、動画ファイルを前記表示手段４０に表示するステップと、により構成され、
文字や文字の構成要素に関する知識のない者が入力した文字４分割コードから対応する文字または文字画像を検索表示することを可能とし、得られた文字または文字画像の意味を多言語や動画で理解することも可能とした文字検索方法にある。 The gist of the present invention is that an input means 10 for receiving an input of a character quadrant code and the like, and a character database and a character search program corresponding to a character quadrant code in which a gap between search character components is encoded. A character search method in a search device comprising storage means 30 for storing, calculation means 20 for collating input information with a character database, and display means 40 for displaying search results,
When the character database is stored in the storage means 30, when the character has a gap between the constituent elements, the dividing line is drawn, and when there is no gap, the dividing line cannot be drawn. Whether or not the dividing line can be drawn in an approximately cross shape in the horizontal direction is judged in order for each of the upper, lower, left, and right parts of the character. If the dividing line can be drawn, the number 1 is not drawn. Is represented by the number 0, and these numbers are assigned to the corresponding digits in the first, second, third, and fourth digits of the four-digit number in the order of the top, bottom, left, and right. The character is encoded into a character four-divided code, the character four-divided code and the corresponding character or character image, multilingual, video file are classified and stored as a database,
The input means 10 accepting input of the character quadrant code;
When the arithmetic means 20 collates the character four-division code that has received the input with the character four-division code stored in the storage means 30, and the character four-division code matches, the character four-division code Displaying on the display means 40 a character or character image, multi-language, and moving image file corresponding to
It is possible to search and display the corresponding character or character image from the character quadrant code entered by a person who does not have knowledge about the characters and character components, and understand the meaning of the obtained character or character image in multiple languages and movies It is in the character search method that can be done.

また本発明の要旨とするところは、文字４分割コード及び文字発音情報の入力を受け付けるための入力手段１０と、検索文字の構成要素の間隙をコード化した文字４分割コード及び該文字発音情報の組み合わせに対応する文字データベースや文字検索プログラムを記憶するための記憶手段３０と、入力情報と文字データベースの照合を行うための演算手段２０と、検索結果を表示するための表示手段４０とを備えた検索装置における文字検索方法であって、
前記記憶手段３０に文字データベースを記憶するに際し、文字の構成要素間に間隙がある場合は分割線が引け、間隙がない場合は分割線が引けないという判断基準に基づき、文字に対し縦方向、横方向の順で略十文字形に分割線が引けるか否かを、文字の上、下、左、右の４つの部分ごとに順に判断し、分割線が引ける場合は数字の１、引けない場合は数字の０で表し、この数字を前記上、下、左、右の順に、４桁の数字の１桁目、２桁目、３桁目、４桁目に対応するそれぞれの桁に割り当てることで文字をコード化して文字４分割コードとし、該文字４分割コードの直後に文字発音情報をアルファベットで併記することにより前記文字４分割コード及び該文字発音情報の組み合わせに対応する文字または文字画像、多言語、動画ファイルをデータベースとして分類して記憶しておき、
前記入力手段１０が、前記文字４分割コード及び該文字発音情報の組み合わせの入力を受け付けるステップと、
前記演算手段２０が、前記入力を受け付けた文字４分割コード及び該文字発音情報の組み合わせと前記記憶手段３０に記憶された文字４分割コード及び該文字発音情報の組み合わせとを照合し、これらの文字４分割コード及び該文字発音情報の組み合わせが合致した場合に、該文字４分割コード及び該文字発音情報の組み合わせに対応する文字または文字画像、多言語、動画ファイルを前記表示手段４０に表示するステップと、により構成され、
文字や文字の発音知識を持つ者が入力した文字４分割コード及び該文字発音情報の組み合わせから対応する文字または文字画像を検索表示することを可能とし、得られた文字または文字画像の意味を多言語や動画で理解することも可能とした文字検索方法にある。
また、本発明の要旨とするところは、文字４分割コード及び文字文法意味情報の入力を受け付けるための入力手段１０と、検索文字の構成要素の間隙をコード化した文字４分割コード及び該文字文法意味情報に対応する文字データベースや文字検索プログラムを記憶するための記憶手段３０と、入力情報と文字データベースの照合を行うための演算手段２０と、検索結果を表示するための表示手段４０とを備えた検索装置における文字検索方法であって、
前記記憶手段３０に文字データベースを記憶するに際し、文字の構成要素間に間隙がある場合は分割線が引け、間隙がない場合は分割線が引けないという判断基準に基づき、文字に対し縦方向、横方向の順で略十文字形に分割線が引けるか否かを、文字の上、下、左、右の４つの部分ごとに順に判断し、分割線が引ける場合は数字の１、引けない場合は数字の０で表し、この数字を前記上、下、左、右の順に、８桁の数字の１桁目、２桁目、３桁目、４桁目の順に対応するそれぞれの桁に割り当てることで文字をコード化して文字４分割コードとし、該文字の分割線と異なる箇所にさらに分割可能な構成要素間の間隙があるか否かを５桁目に数字で表し、該文字の構成要素の多寡を６桁目に数字で表すことによりコード化し、このコードに続けて名詞かそれ以外かという文法情報を７桁目に数字で表し、人間かそれ以外かという意味情報を数字で８桁目に表すことでコード化し、前記文字４分割コード及び該文字に関する該文字文法情報と意味情報のコードを組み合わせて８桁のコードとし、それに対応する文字または文字画像、多言語、動画ファイルをデータベースとして分類して記憶しておき、
前記入力手段１０が、前記８桁のコードを受け付けるステップと、
前記演算手段２０が、前記入力を受け付けた前記８桁のコードと前記記憶手段３０に記憶された前記８桁のコードとを照合し、これらの前記８桁のコードが合致した場合に、該８桁のコードに対応する文字または文字画像、多言語、動画ファイルを前記表示手段４０に表示するステップと、により構成され、
文字に関する文法や意味の知識を持つ者が入力した前記８桁のコードから対応する文字または文字画像を検索表示することを可能とし、得られた文字または文字画像の意味を多言語や動画で理解することも可能とした文字検索方法にある。 The gist of the present invention is that the input means 10 for receiving the input of the character quadrant code and the character pronunciation information, the character quadrant code that encodes the gap between the constituent elements of the search character, and the character pronunciation information A storage unit 30 for storing a character database and a character search program corresponding to the combination, a calculation unit 20 for comparing input information with the character database, and a display unit 40 for displaying a search result are provided. A character search method in a search device,
When the character database is stored in the storage means 30, when the character has a gap between the constituent elements, the dividing line is drawn, and when there is no gap, the dividing line cannot be drawn. Judgment is made in order of the four parts of the top, bottom, left, and right of the character to determine whether or not the dividing line can be drawn in the horizontal direction. If the dividing line can be drawn, the number 1 is not drawn. Is represented by the number 0, and these numbers are assigned to the corresponding digits in the first, second, third, and fourth digits of the four-digit number in the order of the top, bottom, left, and right. A character or character image corresponding to a combination of the character four-division code and the character pronunciation information by coding the character pronunciation information in alphabets immediately after the character four-division code; Multilingual video files Classified as a database and is stored,
The input means 10 receiving an input of a combination of the character quadrant code and the character pronunciation information;
The computing means 20 collates the combination of the character quadrant code and the character pronunciation information received with the input with the combination of the character quadrant code and the character pronunciation information stored in the storage means 30, and these characters. A step of displaying, on the display means 40, a character or a character image, a multilingual, and a moving image file corresponding to the combination of the character quad-code and the character pronunciation information when the combination of the quad-code and the character pronunciation information matches. And
It is possible to search and display the corresponding character or character image from the combination of the character quadrant code inputted by a person having knowledge of character and character pronunciation and the character pronunciation information, and to obtain the meaning of the obtained character or character image. There is a character search method that can be understood by language and video.
Further, the gist of the present invention is that the input means 10 for receiving the input of the character quadrant code and the character grammar semantic information, the character quadrant code encoding the gap between the constituent elements of the search character, and the character grammar A storage unit 30 for storing a character database or a character search program corresponding to the semantic information, a calculation unit 20 for collating the input information with the character database, and a display unit 40 for displaying the search result are provided. A character search method in a search device,
When the character database is stored in the storage means 30, when the character has a gap between the constituent elements, the dividing line is drawn, and when there is no gap, the dividing line cannot be drawn. Judgment is made in order of the four parts of the top, bottom, left, and right of the character to determine whether or not the dividing line can be drawn in the horizontal direction. If the dividing line can be drawn, the number 1 is not drawn. Is represented by the number 0, and this number is assigned to the corresponding digits in the order of the first digit, the second digit, the third digit, and the fourth digit of the 8-digit number in the order of the upper, lower, left, and right. The character is coded to form a character four-division code, and whether or not there is a gap between components that can be further divided at a location different from the division line of the character is represented by a number in the fifth digit. Is encoded by expressing the number of characters in the 6th digit, followed by this code. It is coded by expressing the grammatical information whether it is a noun or otherwise in the 7th digit and the semantic information indicating whether it is human or otherwise in the 8th digit. Combining grammatical information and semantic information codes into an 8-digit code, the corresponding character or character image, multilingual, video file is classified and stored as a database,
The input means 10 accepting the 8-digit code;
When the arithmetic means 20 collates the 8-digit code that has received the input with the 8-digit code stored in the storage means 30, and the 8-digit code matches, Displaying a character or character image corresponding to a digit code, a multilingual, video file on the display means 40, and
It is possible to search and display the corresponding character or character image from the 8-digit code entered by a person with knowledge of the grammar and meaning of the character, and understand the meaning of the obtained character or character image in multiple languages and videos It is in the character search method that can be done.

本発明はたとえばインターネットを利用したホームページ上において、漢字あるいはハングル等の検索文字に対する文字知識を持たない欧米人や、文字知識が不十分な学習者、あるいは文字知識を習得済みでフロントエンドプロセッサーなどを利用し文字変換処理に習熟した人など幅広い利用者が想定される場面で有効である。
特に、漢字等の文字知識や文字処理システムを持たない欧米人などが漢字等を検索する場合にASCIIコードなどの１バイト系の文字処理装置でも本発明の文字を分割したコードを数字等で入力し、文字を検索することができる。さらに検索した文字の意味も多言語や動画で容易に理解することができる。 For example, the present invention can be applied to Westerners who do not have character knowledge of search characters such as Kanji or Hangul on a homepage using the Internet, learners who have insufficient character knowledge, or front-end processors that have acquired character knowledge. It is effective in situations where a wide range of users such as those who are familiar with character conversion processing.
In particular, when Westerners who do not have knowledge of kanji characters or character processing systems search kanji, etc., enter codes that divide the characters of the present invention with numbers etc. even with 1-byte character processing devices such as ASCII codes. And can search for characters. In addition, the meaning of the retrieved characters can be easily understood in multiple languages and videos.

また、漢字やハングル等の検索文字の文字知識や文字処理システムを持つ日本人や中国人などがフロントエンドプロセッサーなどで文字検索の一種である文字変換処理をシフトJISコードやGBコードなどの２バイト系の文字処理装置で、従来の発音情報に加えて本発明のコードを一緒に入力することで変換効率を向上させることができる。 Also, Japanese and Chinese who have character knowledge and character processing system of search characters such as kanji and hangul do character conversion processing which is a kind of character search with front end processor etc. 2 bytes such as shift JIS code and GB code The conversion efficiency can be improved by inputting the code of the present invention together with the conventional phonetic information in the system character processing device.

本発明の最良の実施形態は、検索文字の構成要素の間隙を、縦方向及び横方向に分割して、この分割の可否をコードに置き換えることで文字をコード化し、前記コードとそれに対応する文字を分類して記憶手段に記憶せしめ、入力手段から前記コードを入力して、演算手段を用いて前記記憶手段より文字を検索することで、漢字の部首や書き順などの文字知識を持たない人でも文字の検索を可能とし、その後、得られた検索文字の意味を多言語や動画等で表示することで文字理解を可能とする。
また、前記検索文字の発音情報を前記コードと一緒に入力して分類することにより、漢字の発音などの文字知識を持つ人には従来の発音情報のみの検索方法よりも文字検索効率を向上させることができる。 In the best mode of the present invention, the gap between the constituent elements of the search character is divided in the vertical direction and the horizontal direction, and the character is encoded by substituting a code for whether or not the division is possible. By storing the code in the storage means, inputting the code from the input means, and searching for characters from the storage means using the calculation means, it does not have character knowledge such as kanji radicals or writing order. Humans can also search for characters, and then display the meaning of the obtained search characters in multiple languages, videos, etc. to enable character understanding.
Also, by inputting and classifying the pronunciation information of the search characters together with the code, the character search efficiency is improved for those who have knowledge of characters such as kanji pronunciation compared to the conventional search method using only pronunciation information. be able to.

以下、本発明の実施形態を図に基づいて説明する。
本発明は、図１の記憶手段にコードに対応する文字データベースと処理プログラムを記憶しておくだけで、図２に示すプログラムを実行し検索を行うことができ、発音情報と分類コードを仮名漢字変換ソフト（フロントエンドプロセッサー）のユーザー辞書に追加登録するだけで、文字検索効率を向上させることができる。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
In the present invention, the character database and processing program corresponding to the code can be stored in the storage means in FIG. 1, and the program shown in FIG. Character registration efficiency can be improved by simply registering it in the user dictionary of the conversion software (front-end processor).

図１は本発明の一実施形態を示す機能構成ブロック図である。
たとえばインターネット上に公開されている日本語の漢字辞書ホームページをダウンロードしてパソコンや携帯電話などの情報機器上で辞書検索をする場合を想定する。図４に示すような文字４分割コードと漢字を分類し表組み形式で閲覧できる漢字辞書ホームページをZIP方式などで圧縮したファイルとしてダウンロードして解凍し、パソコンの記憶手段３０などに予め記憶しておく。 FIG. 1 is a functional block diagram showing an embodiment of the present invention.
For example, assume that a Japanese kanji dictionary homepage published on the Internet is downloaded and a dictionary search is performed on an information device such as a personal computer or a mobile phone. A kanji dictionary homepage that can be divided into four-character code and kanji as shown in FIG. 4 and can be viewed in a table format is downloaded and decompressed as a file compressed by the ZIP method or the like, and stored in the storage means 30 of the personal computer in advance. deep.

図１の記憶手段３０には、図４に示すような文字４分割コードとそれに対応する漢字を１つのレコードとしてコードごとに分類した配列のデータベース形式で記憶しておく。利用者が閲覧する表示も図４のごとき形式だが、ホームページはシフトJISで作成されていることが多いので、実際は図５に示すような１６進数のシフトJISで表記されるコード形式で図１の記憶手段３０内に記憶されている。 The storage means 30 in FIG. 1 stores a character quadrant code as shown in FIG. 4 and the corresponding kanji in a database format of an array classified as a record for each code. The display viewed by the user is also in the format shown in FIG. 4, but the home page is often created in Shift JIS, so the actual code format shown in FIG. 1 is expressed in hexadecimal JIS as shown in FIG. It is stored in the storage means 30.

図１の文字コード等入力手段１０により、たとえばパソコンや携帯電話の数字キーなどを利用して入力した文字４分割コードを、記憶手段３０に予め記憶された文字データベースの文字４分割コードと順次図１の演算手段２０（CPUなど）において照合する。実際の照合は図５に示す４分割コードのシフトJIS表記形式で行う。 The character quadrant code input by the character code input means 10 of FIG. 1 using, for example, a numeric key of a personal computer or a cellular phone is sequentially displayed with the character quadrant codes of the character database stored in the storage means 30 in advance. Verification is performed in one arithmetic means 20 (CPU or the like). The actual verification is performed in the shift JIS notation format of the four-division code shown in FIG.

照合処理に必要なプログラムは記憶手段３０に事前にインストールされたホームページ閲覧ソフト（ブラウザ）やワープロソフトの検索機能を呼び出して用いる。 A program necessary for the collation processing is used by calling a search function of homepage browsing software (browser) or word processor software installed in the storage means 30 in advance.

入力した４分割コードとデータベースの４分割コードが一致した場合には、その結果として閲覧ページの表組みの中からカーソルキーの直近で一致した４分割コードが図１のたとえば液晶画面などの表示手段４０において該当文字列を背景とは異なる色などで強調表示（ハイライト）される。
たとえば図１の入力手段１０から１１１１という４分割文字コードを入力し、予め記憶手段３０に記憶した図４の文字データベースを図１の演算手段２０で照合した結果、図４の「語」の直前の行にカーソルキーが置かれていた場合には、「語」の左側の「１１１１」が強調表示されるので、引き続き検索を続けたい場合は、ホームページ閲覧ソフトやワープロソフトの検索機能の「次を検索」ボタンを押すと次の行の「競」の左側の「１１１１」が強調表示される。このようにして順次目的とする文字を検索することができる。 When the input 4-division code matches the 4-division code of the database, as a result, the 4-division code that coincides most recently with the cursor key in the table of browsing pages is displayed on the display means such as the liquid crystal screen of FIG. In 40, the corresponding character string is highlighted (highlighted) in a color different from the background.
For example, as a result of inputting a four-part character code 1111 from the input means 10 of FIG. 1 and collating the character database of FIG. 4 previously stored in the storage means 30 with the calculation means 20 of FIG. When the cursor key is placed on the line, “1111” on the left side of “Word” is highlighted. If you want to continue searching, select “Next” in the search function of homepage browsing software or word processor software. When the “Search” button is pressed, “1111” on the left side of “Competition” in the next row is highlighted. In this way, the target character can be searched sequentially.

また、同様のソフトのオプション機能ボタンを使い、入力した１１１１という４分割コードに一致した行だけをまとめて一覧表示することもできる。 In addition, by using the option function button of the same software, it is also possible to display a list of only the lines that match the inputted 1111 four-division code.

図４で使われる数字の１と０はシフトJISでは図５の３１と３０という表記で表されるが、もし、パソコンがシフトJISなどの漢字処理機能を持たない場合には、欧米で一般的なASCIIコードでも同一の３１と３０という表記なので、「語」や「競」などといった文字部分のみをホームページ作成時に予めGIF形式やJPG形式の画像ファイル形式で保存しておけば、検索結果は画像である「語」の左側に表示された「１１１１」や画像である「競」の左側に表示された「１１１１」などで強調表示することが可能であり、利用者は文字化けせずに文字を表示することができる。 The numbers 1 and 0 used in Fig. 4 are represented by 31 and 30 in Fig. 5 in Shift JIS, but if a personal computer does not have Kanji processing functions such as Shift JIS, it is common in Europe and America. Because the ASCII code is the same 31 and 30, the search result will be an image if only the character part such as “word” or “competition” is saved in the GIF or JPG image file format when creating the website. It is possible to highlight with “1111” displayed on the left side of “word” or “1111” displayed on the left side of “competition” that is an image, and the user does not garble Can be displayed.

上で述べた圧縮したホームページをダウンロード後解凍して検索する方法は、インターネットに接続しなくても閲覧できる利点があるが、インターネットに接続したままホームページ閲覧ソフトの検索機能などを利用してオンライン検索することも同様に可能である。
オンライン検索の場合は、図１の記憶手段３０等に一時的に閲覧しているHTML形式のファイルが記憶されている状態にあるので、パソコン等の電源を切りキャッシュメモリが消去されるとダウンロード閲覧のように継続的な利用はできないが、ダウンロードをする手間がかからず、常時記憶手段３０などの容量を確保する必要がないという利点がある。 The method of downloading and decompressing and searching the compressed homepage described above has the advantage that you can browse without connecting to the Internet, but online search using the search function of the homepage browsing software while connected to the Internet It is possible to do as well.
In the case of online search, the HTML format file being temporarily browsed is stored in the storage means 30 of FIG. 1 and so on. However, there is an advantage that it is not necessary to download, and it is not necessary to always secure the capacity of the storage means 30 or the like.

また、膨大な辞書をオンライン検索する場合は、Perl言語などで予め作成したホームページサーバー側のCGI検索プログラムを利用して文字４分割コードを入力欄に入力すれば、該当する文字のみを一覧表示させることもできる。このようなデータベース検索CGIプログラムはフリーソフト等で一般的に入手が容易であり、利用者のパソコン等にインストールされたホームページ閲覧ソフトの検索機能を使わなくても高速にオンライン検索できる利点がある。 Also, if you want to search a huge dictionary online, you can use a CGI search program on the homepage server side created in advance in the Perl language, etc., and enter the character quadrant code in the input field to display only the corresponding characters in a list. You can also. Such a database search CGI program is generally easy to obtain as free software, and has the advantage of being able to search online at high speed without using the search function of homepage browsing software installed on a user's personal computer or the like.

図２は本発明の検索処理フローチャートである。
Ｓ１００はたとえばホームページ形式の漢字辞書などを検索するための作業の開始を表す。Ｓ２００は図１の入力手段１０から文字４分割コードを入力することを表す。Ｓ３００は後述する文字４分割コードの書式を照合用に書式変換するか否かを判断することを表す。もし、変換する必要がある場合にはＳ４００においてたとえばホームページに予め記述されたJAVA（登録商標）Scriptなどのスクリプトを利用するなどして書式変換処理を行った後、Ｓ５００において図１の演算手段２０を用いて入力した文字４分割コードとデータベースの文字４分割コードを照合処理することを表す。Ｓ３００において書式の変換が必要ないと判断する場合には、入力した文字４分割コードの書式のままＳ５００の照合処理を行う。
書式変換とは、たとえば図４の文字４分割コードは４桁の数字が全て１もしくは０で表す書式だが、これを１２３４と全ての桁を異なる数字で表す書式で入力した場合、１以外の数字は全て１に置換するという簡単なスクリプトをホームページ上で処理させることなどをいう。 FIG. 2 is a flowchart of search processing according to the present invention.
S100 represents the start of work for searching, for example, a homepage-style kanji dictionary. S200 represents inputting a character quadrant code from the input means 10 of FIG. S300 represents determining whether or not to convert the format of the character quadrant code described later for collation. If conversion is necessary, after performing format conversion processing using a script such as JAVA (registered trademark) script described in advance on the homepage in S400, the calculation means 20 of FIG. This indicates that the character four-division code input by using and the character four-division code in the database are collated. If it is determined in S300 that format conversion is not necessary, the collation process in S500 is performed with the format of the input character quadrant code as it is.
The format conversion is, for example, the format in which the four-digit character code in FIG. 4 is a format in which all four digits are represented by 1 or 0, but if this is entered in a format in which all digits are different from 1234, a number other than 1 is entered. Means that a simple script that replaces all with 1 is processed on the homepage.

ただし、Ｓ３００とＳ４００は、入力書式とデータベースの書式が異なる場合のみに必要なステップなので、それ以外の利用方法の場合には省略してもよい。 However, S300 and S400 are necessary steps only when the input format and the database format are different, and may be omitted for other usage methods.

Ｓ６００は図１の表示手段４０においてたとえばホームページ上で照合合致した文字４分割コード部分を強調表示することなどを表す。 S600 represents that the display unit 40 of FIG. 1 highlights, for example, a character quadrant code portion collated and matched on the homepage.

Ｓ７００は検索した文字の意味をさらに調べたい場合に、その文字ないしは文字の画像にリンクを予め設定しておき、その文字の上をクリックするなどしてホームページの別の場所にジャンプして文字の意味を説明する画面を表示するか否かを判断する。 In S700, when it is desired to further investigate the meaning of the searched character, a link is set in advance to the character or the character image, and the character is jumped to another place on the homepage by clicking on the character or the like. It is determined whether or not a screen explaining the meaning is displayed.

仮に利用者が文字をクリックして文字の意味を表示させる場合には、Ｓ８００においてたとえば多言語（対訳）の言語情報を表示してもよいし、動画などの非言語情報を表示してもよいことを表す。多言語（対訳）情報とはたとえば日本語の漢字「語」と一緒に中国語の「詞(Ci)」や英語の「Word」などを表示することをいう。もし、利用者がこれらの言語を理解できる場合には、日本語の「語」という文字の意味を言語的に類推理解できる利点がある。 If the user clicks a character to display the meaning of the character, multilingual (parallel translation) language information may be displayed in S800, or non-language information such as a moving image may be displayed. Represents that. Multilingual (parallel translation) information means, for example, displaying “Ci” in Chinese or “Word” in English together with a Japanese kanji “word”. If the user can understand these languages, there is an advantage that the meaning of the word “word” in Japanese can be understood by linguistic analogy.

仮に言語情報では理解できない利用者の場合には、たとえば「競」という文字をクリックすると、動画（アニメ等）により人が競技をしている画面を表示するような処理を非言語情報による意味情報の表示という。 For users who cannot understand with linguistic information, for example, if you click on the word “competition”, the semantic information based on non-linguistic information is displayed, such as displaying a screen in which a person is competing with a video (such as animation). It is called display.

もし、利用者が検索した文字の意味情報の表示が必要ないと判断した場合には、Ｓ９００の再入力のステップに進む。引き続き利用者が異なる文字４分割コードを入力する場合には、再びＳ２００から処理を継続し、検索を終了する場合にはＳ１０００の終了ステップとなる。たとえばパソコンのウインドウを閉じるなどの操作を利用者がした場合に終了となる。 If it is determined that it is not necessary to display the semantic information of the character searched by the user, the process proceeds to the re-input step in S900. If the user continues to input a different character four-division code, the process continues again from S200, and if the search is to end, the end step is S1000. For example, the process ends when the user performs an operation such as closing the window of the personal computer.

図３は、文字の分割方法を示す説明図である。 FIG. 3 is an explanatory diagram showing a character dividing method.

本実施形態の文字検索方法を例えて言うと、文字を乗せたケーキをナイフで上から４分割するものであり、ナイフは文字を構成する線と線の間隙に切り込むことができるが、線に触れてはならないものとする。ここで「線」とは、文字の構成要素である直線、曲線、点などの図形の総称を指すものとする。 For example, the character search method according to the present embodiment is such that the cake on which the character is placed is divided into four from above with a knife, and the knife can be cut into the gap between the lines constituting the character. Shall not be touched. Here, the “line” refers to a general term for graphics such as a straight line, a curve, and a point which are constituent elements of a character.

前記ケーキを時計に例えて、ナイフを切り込む方向を時計の中心から見て１２時方向を「縦方向の上半分（略称「上」）と、６時方向を「縦方向の下半分（略称「下」）と、９時方向を「横方向の左半分（略称「左」）と、３時方向を「横方向の右半分（略称「右」）」と呼ぶ。 The cake is compared to a watch, and the direction of cutting the knife from the center of the watch is 12 o'clock direction is “upper half in the vertical direction (abbreviation“ upper ”) and 6 o'clock direction is“ lower half in the vertical direction (abbreviation “ "Down"), the 9 o'clock direction is referred to as the "horizontal left half (abbreviated as" left "), and the 3 o'clock direction is referred to as" lateral right half (abbreviated as "right") ".

文字を４分割する順序は任意に設定できるが、本実施形態では、まず上、ついで下、３番目に左、最後に右の順序とする。
そして、文字を分割できる場所を１、分割できない場所を０という数字で表し、上→下→左→右の順序に、１または０の組み合わせから成る４桁の数字で検索対象の文字を表し分類し、これを「文字４分割コード」または略称で「コード」と呼ぶ。また愛称として「ケーキカット法」などの名称を用いることにより、コードの適用規則を比喩により理解しやすくできる。 The order in which the characters are divided into four can be arbitrarily set, but in this embodiment, the order is top, then bottom, third left, and finally right.
Then, the place where the character can be divided is represented by 1 and the place where the character cannot be divided is represented by a number 0, and the character to be searched is classified by a 4-digit number consisting of a combination of 1 or 0 in the order of top → bottom → left → right. This is referred to as “character four-division code” or abbreviated as “code”. In addition, by using a name such as “cake cut method” as a nickname, the application rules of the code can be easily understood by metaphor.

４分割線を上下左右という一般的な名称で呼び習わす方法に加え、赤緑青黄などの色彩名称を対照させ着色した分割線で図示してもよい。 In addition to the method of calling the four dividing lines with the general names of up, down, left and right, it may be illustrated with colored dividing lines by contrasting color names such as red, green, blue and yellow.

前記４分割コードは、００００から１１１１までの１６通りが考えられる。この１６通りのコードの内、１つの文字に複数の分割方法がある場合、「できるだけ多く、かつできるだけ平等に文字を分割できるコードを優先する」という条件の適合度に応じた優先度規則を使う。ケーキを平等に分け合うという比喩で理解がしやすくなる。
そして、検索や表示などの処理は必要に応じて優先度の高いコードを優先度の低いコードよりも先に適用できる。 There are 16 possible four-divided codes from 0000 to 1111. When there are a plurality of division methods for one character among these 16 codes, a priority rule corresponding to the degree of conformity of the condition “priority is given to codes that can be divided as much as possible and as evenly as possible” is used. . The metaphor of sharing cakes equally makes it easier to understand.
In addition, a process such as search or display can be applied before a code with a higher priority than a code with a lower priority as necessary.

図３の「語」は、最も優先度の高い第１番目の優先度コードである１１１１を表すものである。このコードは、上→下→左→右の順番に文字を４分割したことを意味し、分割可能な箇所を実線で示している。 The “word” in FIG. 3 represents 1111 which is the first priority code having the highest priority. This code means that the character is divided into four parts in the order of upper → lower → left → right, and the parts that can be divided are indicated by solid lines.

図３の「啓」と「仁」と「六」と「部」は、第２番目の優先度コードを表すものである。この２番目のコードはケーキを３分割するように文字を分類したコードであり、これらのコード間は同一優先度である。
前記コードのうち１０１１は、図３に示すように、上→左→右の順に分割したことを意味し、例えば「啓」という文字が相当する
前記コードのうち１１０１は、図３に示すように、上→下→右の順に分割したことを意味し、例えば「仁」という文字が相当する
前記コードのうち０１１１は、図３に示すように、下→左→右の順に分割したことを意味し、例えば「六」という文字が相当する
前記コードのうち１１１０は、図３に示すように、上→下→左の順に分割したことを意味し、例えば「部」という文字が相当する。 In FIG. 3, “Kei”, “Jin”, “Six”, and “Part” represent the second priority codes. The second code is a code in which characters are classified so as to divide the cake into three parts, and these codes have the same priority.
As shown in FIG. 3, 1011 of the codes means that the codes are divided in the order of upper → left → right. For example, 1101 of the codes corresponding to the letters “Kai” is shown in FIG. , Means that the character is divided in the order of upper → lower → right, for example, the character “Jin” corresponds to 0111 of the code means that it is divided in the order of lower → left → right as shown in FIG. For example, 1110 of the codes corresponding to the characters “six” means that the codes are divided in the order of upper → lower → left, as shown in FIG. 3, for example, the characters “part”.

図３の「北」と「豆」は、第３番目の優先度コードを表すものである。
この３番目の優先度コードはケーキを２分割するように文字を分類したコードである。これらのコード間は同一優先度である。
前記コードのうち１１００は、図３に示すように、上→下の順に分割したことを意味し、例えば「北」という文字が相当する。
前記コードのうち００１１は、左→右の順に分割したことを意味し、例えば「豆」という文字が相当する。 “North” and “bean” in FIG. 3 represent the third priority code.
The third priority code is a code in which characters are classified so as to divide the cake into two. These codes have the same priority.
As shown in FIG. 3, 1100 of the codes means that the codes are divided in the order of upper → lower, and for example, the character “north” corresponds.
Among the codes, 0011 means that the code is divided in order of left → right, and for example, a letter “bean” corresponds to it.

図３の「犬」と「庁」と「寸」と「火」は、第４番目の優先度コードを表すものである。
この４番目の優先度コードはケーキを２分割するように文字を分類したコードである。これらのコード間は同一優先度である。
この場合、２分割のコードという条件は前記３番目の優先度コードと同様であるが、「できるだけ平等に分割する」という条件が適用できないので４番目の規則よりも３番目の規則を優先するのである。
前記コードのうち１００１は、上→右の順に分割したことを意味し、例えば「犬」という文字が相当する。
前記コードのうち０１０１は、下→右の順に分割したことを意味し、例えば「庁」という文字が相当する。
前記コードのうち０１１０は、下→左の順に分割したことを意味し、例えば「寸」という文字が相当する。
前記コードのうち１０１０は、上→左の順に分割したことを意味し、例えば「火」という文字が相当する。
「火」は１００１とも分割できるが、本発明では重複してデータベースを作成することにより、どちらのコードを入力しても目的の文字が検索できるよう冗長性を許してもよい。 “Dog”, “Office”, “Dimension”, and “Fire” in FIG. 3 represent the fourth priority code.
The fourth priority code is a code in which characters are classified so that the cake is divided into two. These codes have the same priority.
In this case, the condition of the two-division code is the same as that of the third priority code, but since the condition “divide as evenly as possible” cannot be applied, the third rule takes precedence over the fourth rule. is there.
1001 in the code means that the code is divided in the order of upper → right, and for example, the character “dog” corresponds.
Of the codes, 0101 means that the codes are divided in the order of lower → right, and for example, the character “Agency” corresponds.
Of the codes, 0110 means that the codes are divided in the order of lower → left, and for example, the character “dimension” corresponds.
Of the codes, 1010 means that the codes are divided in the order of upper → left, and for example, the letter “fire” corresponds to the code.
Although "fire" can be divided into 1001, in the present invention, redundancy may be allowed so that a target character can be searched regardless of which code is entered by creating a duplicate database.

５番目の優先度コードは理論上は１０００、０００１、０１００、００１０の４個のコードが該当するが、ケーキの一部にナイフを切り込めても分割することができないため分割規則から除外する。
従って、００００という分割不可能なコードのみを最も優先度の低いコードとして採用する。このコードに相当する文字は、例えば図３の「口」である。 The fifth priority code theoretically corresponds to four codes 1000, 0001, 0100, and 0010, but is excluded from the division rule because it cannot be divided even if a knife is cut into a part of the cake.
Accordingly, only a code of 0000 that cannot be divided is adopted as a code having the lowest priority. The character corresponding to this code is “mouth” in FIG. 3, for example.

このように前記コードの組み合わせは、理論上は１６通りとなるが５番目のコード処理に従い、最終的には１２通りの組み合わせを採用する。 In this way, there are 16 combinations of codes in theory, but in accordance with the fifth code process, 12 combinations are finally adopted.

なお、前記５番目の優先度コードは「１箇所のみの切り込み可能な文字はほとんど存在せず、１箇所といえども切り込みが不可能な文字は少なからず存在する」という主として漢字の字形に即した対応となっているので、漢字以外の文字、例えばハングルでは４つのコードを除外せず１６通りのコードを使ってもよい。 The fifth priority code mainly corresponds to the character shape of kanji, such as “There are almost no characters that can be cut only at one place, and there are not a few characters that can be cut even at one place”. Since it corresponds, characters other than Kanji, for example, Hangul, 16 codes may be used without excluding 4 codes.

また、１つの文字に複数のコードが存在する場合、規則利用者がいずれのコードを指定しても処理ができるよう冗長性を持ったデータベースを作成してもよい。例えば「火」という文字は、１０１０でもよいし、１００１でもよい。 In addition, when a plurality of codes exist for one character, a database with redundancy may be created so that processing can be performed regardless of which code is specified by the rule user. For example, the letter “fire” may be 1010 or 1001.

文字４分割コードの書式例を説明する。
１文字の文字コード書式には５種類の書式がある。 A format example of the character four-division code will be described.
There are five types of character code formats for one character.

「２進数（ビット）書式」（通称「２進数４桁書式」）は、１文字の上下左右各四分の一ずつの４分割線を０（非分割）か１（分割）の２進数（ビット）でそれぞれ表す４桁（４ビット）の書式で図３がこの書式例である。位置情報は上下左右の順に固定で４桁未満の省略表示はしない。 “Binary (bit) format” (commonly called “binary four-digit format”) is a binary number (0 (non-divided) or 1 (divided)) for a quadrant of each upper, lower, left, and right quarters. FIG. 3 shows an example of this format in a 4-digit (4-bit) format represented by (bit). The position information is fixed in the order of top, bottom, left, and right and is not abbreviated to less than 4 digits.

「１０進数書式」は、「１０進数非省略書式」と「１０進数省略書式」に分かれる。 The “decimal number format” is divided into a “decimal number non-omitted format” and a “decimal number omitted format”.

「１０進数非省略書式」（通称「１０進数４桁書式」）は、０（非分割）、１（縦方向上半分分割）、２（縦方向下半分分割）、３（横方向左半分分割）、４（横方向右半分分割）の５つの数字で表す。位置情報の順番は非固定だが4桁未満の省略表示はしない。
昇順の例は１２３４、１２０４などであり、降順の例は４３２１、４０２１などであり、任意順の例は２３４１、２４０１などであり、非分割の例は００００である。 "Decimal number non-omitted format" (commonly called "decimal 4-digit format") is 0 (non-divided), 1 (vertical upper half divided), 2 (vertical lower half divided), 3 (horizontal left half divided) ) And 4 (horizontal half right division). The order of the location information is not fixed, but the abbreviation of less than 4 digits is not displayed.
Examples of ascending order are 1234, 1204, etc., examples of descending order are 4321, 4021, etc., examples of arbitrary order are 2341, 2401, etc., and examples of non-division are 0000.

「１０進数省略書式」は、「１０進数非省略書式」と同じ規則だが上下左右全ての分割線が分割不可能な場合のみを０で表し、２つ以下の０は省略表示できる。
「１０進数非省略書式」の例を「１０進数省略書式」で表すと、昇順の例は１２３４、１２４などであり、降順の例は４３２１、４２１などであり、任意順の例は２３４１、２４１などであり非分割の例は０である。 The “decimal number omission format” is the same rule as the “decimal number omission format”, but only 0 is shown when all of the upper, lower, left and right dividing lines cannot be divided, and two or less zeros can be omitted.
When an example of “decimal number omission format” is represented by “decimal omission format”, examples of ascending order are 1234, 124, etc., examples of descending order are 4321, 421, etc., examples of arbitrary order are 2341, 241 The example of non-division is 0.

「日常語書式」は「日常語非省略書式」と「日常語省略書式」に分かれる。 “Everyday Word Format” is divided into “Everyday Word Non-Omitted Format” and “Everyday Word Omitted Format”.

「日常語非省略書式」は、「１０進数非省略書式」の「１２３４」の代わりに「上下左右」や「UDLR(Up Down Left Rightの頭文字)」を使う。「赤緑青黄」などの色彩名称を使ってもよい。書式の規則は「１０進数非省略書式」と同じである。 “Daily word non-abbreviated format” uses “up / down / left / right” or “UDLR (acronym for Up Down Left Right)” instead of “1234” in “decimal number non-abbreviated format”. Color names such as “red green blue yellow” may be used. The format rule is the same as “decimal non-abbreviated format”.

昇順の例「１２３４」は「上下左右」か「ＵＤＬＲ」、「１２０４」は「上下０右」か「ＵＤ０Ｒ」などで、降順の例「４３２１」は「右左下上」か「ＲＬＤＵ」、「４０２１」は「右０下上」か「Ｒ０ＤＵ」などで、任意順の例「２３４１」は「下左右上」か「ＤＬＲＵ」、「２４０１」は「下右０上」か「ＤＲ０Ｕ」などで、非分割の例「００００」は「００００」などである。 Ascending example “1234” is “up / down / left / right” or “UDLR”, “1204” is “up / down 0 right” or “UD0R”, and descending example “4321” is “upper right / lower left” or “RLDU”, “ “4021” is “Right 0 Bottom Up” or “R0DU”, etc. Arbitrary example “2341” is “Bottom Left Right Top” or “DLRU”, “2401” is “Bottom Right 0 Top” or “DR0U”, etc. The non-divided example “0000” is “0000” or the like.

「日常語省略書式」は、「１０進数省略書式」の「１２３４」の代わりに「上下左右」や「UDLR(Up Down Left Rightの頭文字)」を使う。書式の規則は「１０進数省略書式」と同じで、コードを続ける場合には１文字につき４桁ずつという規則性がないため、文字単位に相当する箇所にハイフンなどの区切り記号の挿入を必須とする。 The “everyday abbreviation format” uses “up / down / left / right” or “UDLR (acronym for Up Down Left Right)” instead of “decimal abbreviation format” “1234”. The format rules are the same as the “decimal number abbreviation format”. When the code is continued, there is no regularity of 4 digits for each character, so it is mandatory to insert a delimiter such as a hyphen at the position corresponding to the character unit. To do.

昇順の例「１２３４」は「上下左右」か「ＵＤＬＲ」、「１２４」は「上下右」か「ＵＤＲ」などで、降順の例「４３２１」は「右左下上」か「ＲＬＤＵ」、「４２１」は「右下上」か「ＲＤＵ」などで、任意順の例「２３４１」は「下左右上」か「ＤＬＲＵ」、「２４１」は「下右上」か「ＤＲＵ」などで、非分割の例「０」は「０」で表す。 Ascending example “1234” is “up / down / left / right” or “UDLR”, “124” is “up / down / right / right” or “UDR”, and descending order example “4321” is “upper right / lower left” or “RLDU”, “421”. ”Is“ lower right upper ”or“ RDU ”, etc. Arbitrary order example“ 2341 ”is“ lower left upper left ”or“ DLRU ”,“ 241 ”is“ lower right upper ”or“ DRU ”, etc. The example “0” is represented by “0”.

「１６進数圧縮書式」（通称「１６進数１桁書式」）は、「２進数（ビット）書式」を１６進数に変換して１文字で表す。
たとえば、次のような１桁の表示が可能となる。２進数の００００は１６進数で０と表し、２進数の０１０１（１０進数の５）は１６進数では５と表し、２進数の１０１０（１０進数の１０）は１６進数ではＡと表し、２進数の１１１１（１０進数の１５）は１６進数ではＦと表すので、習熟すると入力が大幅に効率化できる。 The “hexadecimal compression format” (commonly referred to as “hexadecimal one-digit format”) is expressed as one character by converting the “binary number (bit) format” into a hexadecimal number.
For example, the following one-digit display is possible. The binary number 0000 is represented as 0 in the hexadecimal number, the binary number 0101 (decimal number 5) is represented as 5 in the hexadecimal number, and the binary number 1010 (decimal number 10) is represented as A in the hexadecimal number. 1111 (decimal number 15) is represented by F in hexadecimal number, so input can greatly improve the efficiency of learning.

「連想文字圧縮書式」（通称「連想１桁書式」）は図６に示すように「２進数（ビット）書式」を連想しやすいアルファベット等に置き換えて１文字で表す。図６は１０進数省略書式と連想文字圧縮書式を対照してある。同じ１桁でも、１６進数圧縮書式は論理的だが記憶しにくいため、初心者には連想文字圧縮書式のほうが記憶しやすく効率がよいという利点がある。 “Associative character compression format” (commonly referred to as “associative one-digit format”) is represented by one character by replacing “binary number (bit) format” with an easily associated alphabet or the like as shown in FIG. FIG. 6 contrasts the decimal abbreviation format and the associative character compression format. Even with the same single digit, the hexadecimal compression format is logical but difficult to memorize, so there is an advantage for beginners that the associative character compression format is easier to memorize and more efficient.

次に２文字以上の文字列の４分割コード書式を説明する。
たとえば図４の「北」と「山」という２文字からなる「北山」という苗字を４分割文字コードで表す場合、２進数書式では、図４の４分割コード「１１００」と「００００」をつなげて「１１００００００」と８桁の数字で表すことができるので、もし名簿などを作成する場合は、図４の４分割コードに「１１００００００」を加え、その右側に「北山」という文字を併記すればよい。
しかし、数字の羅列が見分けにくいとか、数字の０をたくさん入力するのに手間がかかるなどというさまざまな理由から、文字列にも書式の規定が必要となる。 Next, a 4-part code format for a character string of two or more characters will be described.
For example, when the last name “Kitayama” consisting of two characters “North” and “Mountain” in FIG. 4 is represented by a four-part character code, the four-part code “1100” and “0000” in FIG. "11000000" can be represented by an 8-digit number, so if you want to create a name list, add "11000000" to the 4-part code in Fig. 4 and write "Kitayama" on the right side. Good.
However, for various reasons such as it is difficult to distinguish the enumeration of numbers, and it takes time to input a lot of numbers 0, it is necessary to specify the format of the character string.

「２進数（ビット）書式」の文字列の書式は隣り合う文字と文字の区切り記号のハイフン（−）等を挿入してもよいし、しなくてもよい。理由は４桁ずつコード列が一定に連続しているので識別しやすいからである。 The character string format of “binary number (bit) format” may or may not insert a hyphen (−) or the like as a separator between adjacent characters. The reason is that it is easy to identify because the code string is continuously fixed every 4 digits.

区切り記号を挿入しない書式は、内部データはハイフン（−）を挿入せず４桁（４ビット）ずつ文字列に対応するコードを列記する形式で記憶してあるので、区切り記号なしの２進数書式は入力書式と記憶データ書式が同一で誤処理が少ないという長所がある。 The format that does not insert a delimiter is a binary format without delimiters because the internal data is stored in a format that lists codes corresponding to character strings in 4 digits (4 bits) without inserting hyphens (-). Has the advantage that the input format and the stored data format are the same and there are few erroneous processes.

区切り記号を挿入する書式は、利用者の入力文字数が０と１以外にハイフン（−）記号の分だけ増加するが、入力する利用者が目視して文字コードの区切りを識別しやすいという長所がある。 The format for inserting delimiters increases the number of characters input by the user by the hyphen (-) symbol in addition to 0 and 1, but the advantage is that it is easy for the input user to visually identify the character code delimiters. is there.

「１０進数非省略書式」も、「２進数（ビット）書式」の文字列の書式と同じ理由から区切り記号の挿入は任意とする。 In the “decimal number non-abbreviated format”, insertion of a delimiter is optional for the same reason as the character string format of “binary number (bit) format”.

「１０進数省略書式」は、隣り合う文字と文字の区切り記号としてハイフン（−）等を挿入する。理由はコード列が１桁から４桁まで一定の長さを持たず変化するため、１文字分のコードを識別できないからである。
例えば１２３４は、区切り記号を挿入すれば１２３４（１文字のコード）か１２−３４（２文字列のコード列）かが識別できる。 In the “decimal number abbreviation format”, a hyphen (−) or the like is inserted as a delimiter between adjacent characters. The reason is that the code string does not have a certain length from 1 digit to 4 digits and changes, so that the code for one character cannot be identified.
For example, 1234 can be identified as 1234 (one-character code) or 12-34 (two-character code string) by inserting a delimiter.

「日常語非省略書式」は隣り合う文字と文字の区切り記号のハイフン（−）等を挿入してもよいし、しなくてもよい。理由は４桁ずつコード列が一定に連続しているので識別しやすいからである。 The “daily life non-abbreviated format” may or may not insert a hyphen (−) or the like between adjacent characters and character separators. The reason is that it is easy to identify because the code string is continuously fixed every 4 digits.

「日常語省略書式」は隣り合う文字と文字の区切り記号としてハイフン（−）等を挿入する。理由はコード列が１桁から４桁まで一定の長さを持たず変化するため、１文字分のコードを識別できないからである。 The “daily life abbreviation format” inserts a hyphen (−) or the like as a delimiter between adjacent characters. The reason is that the code string does not have a certain length from 1 digit to 4 digits and changes, so that the code for one character cannot be identified.

例えば、上下左右（ＵＤＬＲ）は、区切り記号を挿入すれば（１文字のコード）上下左右（ＵＤＬＲ）か上下−左右（ＵＤ−ＬＲ）（２文字列のコード列）かが識別できる。 For example, up / down / left / right (UDLR) can be identified by inserting a delimiter (one character code), up / down / left / right (UDLR) or up / down / left / right (UD-LR) (two character string code sequence).

「１６進数圧縮書式」は区切り記号は不要だが文字コードの先頭と末尾に＃等の記号を挿入する。その理由は１６進数のコード書式は、数字の０から９までとアルファベットのAからF（数字１５に相当）までを使い、数字と一部のアルファベットが混在するため、その他の文字コード書式や単なる数字とアルファベット文字列の連続と混同しないよう識別するために＃記号等を文字コード先頭と末尾に挿入するのである。明示的に入力や表示をする場合は全角であっても半角であってもよい。 The “hexadecimal compression format” does not require a separator, but inserts symbols such as # at the beginning and end of the character code. The reason is that the hexadecimal code format uses numbers 0 to 9 and alphabets A to F (corresponding to number 15), and numbers and some alphabets are mixed. A # symbol or the like is inserted at the beginning and end of the character code so as not to be confused with a sequence of numbers and alphabetic character strings. When explicitly inputting or displaying, it may be full-width or half-width.

例えば、「大」の１６進数圧縮書式は「０」（ゼロ）だが、１０進数省略書式の「０」と識別する場合は１６進数圧縮書式は「＃０＃」と明示的に表示する。 For example, the hexadecimal compression format of “large” is “0” (zero), but when the decimal compression format is identified as “0”, the hexadecimal compression format is explicitly displayed as “# 0 #”.

本発明の実施形態では「日常語書式」と「１６進数圧縮書式」の「Ｄ」が重複するが、後者は「＃」で明示的に識別可能であり、仮に「＃」記号が脱落しても前者の「Ｄ」は単独で用いられることはないことから識別できる。また、「Ｄ」以外はアルファベットを用いる書式間で重複することはない。これらの特長を利用してソフトウエアの処理系に誤処理防止の照合ルーチンを付加してもよい。 In the embodiment of the present invention, “D” in “daily life format” and “hexadecimal compression format” overlap, but the latter can be explicitly identified by “#”, and the “#” symbol is omitted. The former “D” can be distinguished from the fact that it is not used alone. Other than “D”, there is no overlap between formats using alphabets. By utilizing these features, a checking routine for preventing erroneous processing may be added to the software processing system.

１６進数圧縮書式はできるだけ少ない文字数で迅速かつ効率的に入力を行うことが主な目的なので文字コードの先頭と末尾に＃記号等を挿入することで続くコード列が１６進数１字が１文字に対応することは識別が可能なので区切り記号のハイフン（−）等は不要である。 The main purpose of the hexadecimal compression format is to input quickly and efficiently with as few characters as possible, so by inserting # symbols at the beginning and end of the character code, the code string that follows becomes one character in hexadecimal. Since correspondence can be identified, a hyphen (-) or the like of a delimiter is unnecessary.

「連想文字圧縮書式」は数字を使わず全てアルファベット等で表示するため（０はＺ）、区切り記号は不要である。仮に１６進数圧縮書式と同じアルファベットで表示でも＃記号で識別が可能である。
例えば、「大」の１６進数圧縮書式は「＃０＃」、連想文字圧縮書式は「Ｚ」である。 Since the “associative character compression format” is displayed in alphabet or the like without using numbers (0 is Z), no delimiter is required. Even if it is displayed in the same alphabet as the hexadecimal compression format, it can be identified by the # symbol.
For example, the hexadecimal compression format of “Large” is “# 0 #”, and the associative character compression format is “Z”.

前記書式を用いることで、漢字知識のない人でも文字４分割コードのみで漢字を検索することが可能となるが、漢字知識のある人や漢字学習者にも文字４分割コードは有益である。 By using the format, even a person who does not have knowledge of kanji can search for kanji by using only a four-character divided code. However, the character four-divided code is also useful for people who have kanji knowledge and kanji learners.

たとえばパソコンや携帯電話などで日本語や中国語の漢字変換ソフト（フロントエンドプロセッサー）を利用して漢字を入力する場合に従来よりも扱いやすく効率を高めることができる。 For example, when inputting kanji using Japanese or Chinese kanji conversion software (front-end processor) on a personal computer or mobile phone, it is easier to handle and can improve efficiency.

図７は中国で採用している「五筆字型」と呼ばれる漢字入力法に使う専用キーボードである。図８は一般的なASCII配列のキーボードだが、図７の五筆字型キーボードには、Ｚを除くアルファベットキーごとに、漢字の部首を簡略化した構成要素が割り当てられている。 FIG. 7 shows a dedicated keyboard used for a Chinese character input method called “five-stroke type” adopted in China. FIG. 8 shows a general ASCII keyboard. In the five-stroke keyboard shown in FIG. 7, components having simplified kanji radicals are assigned to each alphabet key except Z.

この漢字入力法は漢字の発音を使わず、構成要素や書き順といった字形の文字知識を組み合わせて使う。図９は、五筆字型で「程」という漢字を入力する方法を示している。 This kanji input method does not use kanji pronunciation, but uses a combination of character knowledge such as components and stroke order. FIG. 9 shows a method of inputting a kanji character called “do” in a five-stroke type.

「程」は、「禾」→「口」→「王」と書き順に従って構成要素を組み合わせてできるという伝統的な漢字知識をキーボードの位置を表す３１→２３→１１という数字で置き換える。
あるいは、前記３つの構成要素が割り当てられたキーを「Ｔ」→「Ｋ」→「Ｇ」とアルファベットで置き換える。 “Procedure” replaces traditional kanji knowledge, which is made by combining components in the order of “禾” → “mouth” → “king”, with the numbers 31 → 23 → 11 representing the position of the keyboard.
Alternatively, the keys to which the three components are assigned are replaced with alphabets “T” → “K” → “G”.

しかし、五筆字型の入力方法を習得するには、１００以上の構成要素や書き順などの漢字知識のほか、どのキーにどの構成要素が割り当てられているかという配置などの専用装置の知識や訓練も必要であったため、パソコンや携帯電話などの操作には不向きである。 However, in order to learn a five-stroke type input method, in addition to knowledge of over 100 components and kanji, such as stroke order, knowledge and training of dedicated devices such as the arrangement of which component is assigned to which key Therefore, it is not suitable for the operation of personal computers and mobile phones.

これに対し、漢字の発音情報（読み方）をローマ字やピンインと呼ばれるアルファベットで入力し漢字変換するフロントエンドプロセッサーがパソコンや携帯電話などの操作には広く普及している。 On the other hand, front-end processors that input kanji pronunciation information (reading) in alphabetical characters called romaji or pinyin and convert kanji are widely used for operations of personal computers and mobile phones.

しかし、中国語や日本語の漢字の発音には「同音語」と呼ばれる同じ発音を持つ漢字が多数存在するため、漢字変換の際に場合によっては列挙表示される同音語漢字変換候補の中から目的の漢字を選択するのにスペースバーなどを何回もたたいてしらみつぶしに探してゆくという煩雑な操作が必要であった。 However, there are many kanji characters with the same pronunciation called “symphony” in the pronunciation of Chinese and Japanese kanji characters. To select the desired kanji, it was necessary to perform a complicated operation of hitting the space bar many times and searching for it.

たとえば図１０に示す中国語の同音漢字は膨大な数になり、次々にスペースバーをたたいて変換候補の中から目的とする文字を探さねばならなかった。
図１０の１は『現代漢語詞典』という単語辞典に掲載された「ＹＩ」という発音の単漢字リストであり、全部で１０９字ある。
図１０の２は『新華字典』という漢字字典に掲載された「ＳＨＩ」という発音の単漢字リストであり、全部で６７字ある。
図１０の３は日本のＪＩＳに相当する中国の国家標準（ＧＢ）コードに含まれる「ＬＩ」という発音の単漢字リストであり、全部で７５字ある。
仮にフロントエンドプロセッサーが１回に表示する同音漢字変換候補数を１０字とすれば、たとえば「ＹＩ」の変換操作にスペースバーを最大で１１回近くたたいて探す必要があり不便であった。 For example, the number of Chinese homophones shown in FIG. 10 has become enormous, and it has been necessary to hit the space bar one after another to find the target character from among the conversion candidates.
1 in FIG. 10 is a single kanji list of pronunciation “YI” published in the word dictionary “modern Chinese dictionary”, which has 109 characters in total.
2 in FIG. 10 is a single kanji list of pronunciation “SHI” published in a kanji dictionary called “Xinhua dictionary”, which has 67 characters in total.
3 in FIG. 10 is a single kanji list of pronunciation “LI” included in the Chinese national standard (GB) code corresponding to Japanese JIS, which has a total of 75 characters.
If the number of candidates for conversion of the same-sound kanji displayed by the front-end processor at one time is 10, for example, it is inconvenient because it is necessary to hit the space bar nearly 11 times for the conversion operation of “YI” at the maximum.

こうした問題を解決するため、本発明は、文字４分割コードという漢字知識を必要としない字形情報と従来の発音情報を組み合わせることで漢字変換効率を向上させることを実現した。 In order to solve such a problem, the present invention has realized that the kanji conversion efficiency is improved by combining character form information which does not require kanji knowledge called character quadrant code and conventional pronunciation information.

図１１は、図１０の「ＹＩ」、「ＳＨＩ」、「ＬＩ」という同音漢字グループ３つに対し、４分割コードを組み合わせて細分類した字数と比率を示す表とグラフである。 FIG. 11 is a table and a graph showing the number of characters and the ratio of the three same-tone Kanji groups “YI”, “SHI”, and “LI” in FIG.

たとえば、「ＹＩ」という発音グループの「伊」は「１１００」と４分割コードで表せ、「ＳＨＩ」という発音グループの「使」も「１１００」、「ＬＩ」という発音グループの「礼」も「１１００」と４分割コードで表せる。 For example, “I” in the pronunciation group “YI” can be represented by “1100” and a four-division code, “usage” in the pronunciation group “SHI” is “1100”, and “rei” in the pronunciation group “LI” is “ 1100 "and 4 division codes.

その結果、「ＹＩ」、「ＳＨＩ」、「ＬＩ」の「１１００」グループは２７字、２１字、２６字がそれぞれ所属し、同音語グループ全体の２５％、３１％、３５％とそれぞれ三分の一から四分の一程度にまで減らすことができた。 As a result, the “1100” group of “YI”, “SHI”, and “LI” has 27 characters, 21 characters, and 26 characters, respectively, and is divided into 25%, 31%, and 35% of the entire homophone group, respectively. I was able to reduce it from one to a quarter.

そこで、４分割コードは１２種類あるので、単純に計算するとそれぞれの４分割コードは平均すると約８パーセントずつ同音語を分散させることが理論的には可能となるので、これを同音語の分散率とよぶことにする。 Therefore, since there are 12 types of 4-division codes, it is theoretically possible to disperse the homophones by an average of about 8 percent on average for each 4-division code. I will call it.

そして、図１１のグラフを見ると、「１１００」以外の４分割コードはほとんど１０％以下の同音語分散率であり、漢字変換の際にスペースバーをたたく回数は１回から３回ですむことがわかる。 And if you look at the graph in Fig. 11, the 4-part code other than "1100" has a homophone variance rate of almost 10% or less, and you need to hit the space bar only once to three times when converting to Kanji. I understand.

このように、４分割コードと発音を組み合わせると漢字変換は大きな至便性が得られる。 In this way, when combining the four-division code and pronunciation, kanji conversion provides great convenience.

しかし、図１１の「１１００」グループはほかのコードグループよりも同音漢字数が相対的に字数が多い。そこで、図１２には、４桁の分割コードを５桁に拡張した場合の「１１００」グループの字数と分散率を抽出した。 However, the “1100” group in FIG. 11 has a relatively larger number of homophones than the other chord groups. Therefore, in FIG. 12, the number of characters and the distribution ratio of the “1100” group when the 4-digit division code is expanded to 5 digits are extracted.

５桁に分割コードを拡張する規則は単純で、たとえば「例」という漢字は縦方向に２箇所分割可能な箇所があるので、こういう漢字は１箇所を分割しても、さらに「再分割」が可能な漢字とみなす。この考えに基づき、再分割可能な漢字は分割コードの５桁目に「１」を、再分割不可能な漢字は分割コードの５桁目に「０」を加えることとする。 The rule for extending the division code to 5 digits is simple. For example, there is a place where the Chinese character “example” can be divided into two places in the vertical direction. Consider it as a possible kanji. Based on this idea, “1” is added to the fifth digit of the division code for re-dividable Chinese characters, and “0” is added to the fifth digit of the division code for non-re-dividable Chinese characters.

その結果、図１２では１１００という４桁の分割コードを５桁に拡張することにより、分散率を１０％台まで改善できた。 As a result, in FIG. 12, the dispersion ratio could be improved to the 10% level by extending the 4-digit division code of 1100 to 5 digits.

この結果を図１１の分散率と比べると、必ずしも全ての漢字を５桁の分割コードで分類する必要はなく、１１００などの一部のコードのみに用いればよいということも物語っている。 Comparing this result with the distribution ratio of FIG. 11, it is also shown that it is not always necessary to classify all Chinese characters by a 5-digit divided code, and only a part of codes such as 1100 may be used.

図１３は、１１００という４桁の分割コードを６桁にまで拡張した場合の分散率を示す。
従来の画数という字形情報はかなり厳密な適用を前提としていたので外国人や初学者には習得が難しかった。そこで、本発明は、前記５桁の分割コードに続く６桁目に、漢字を一見して「複雑そうか？」「シンプルか？」という直感的な印象で分類できる程度の字画情報を導入した。 FIG. 13 shows the distribution ratio when the 4-digit division code 1100 is expanded to 6 digits.
Traditional glyph information, which is the number of strokes, was predicated on strict application, so it was difficult for foreigners and beginners to learn. Therefore, the present invention introduces stroke information that can be classified with an intuitive impression of “Does it look complicated?” Or “Is it simple?” At the 6th digit following the 5-digit division code. .

具体的には、図１３の表の左端の列は、９画以上と７画以下で複雑かシンプルかという定量的基準にし、９画以上を１１００１１、７画以下を１１００１０といった具合に分類した。８画はどちらのグループにも重複して漢字を所属させ、冗長性を持たせてある。 Specifically, the leftmost column in the table of FIG. 13 is based on a quantitative standard of 9 or more strokes and 7 or less strokes, which is complicated or simple, and the 9th stroke or more is classified as 110011, the 7th stroke or less is 110010, and so on. Eight strokes have redundant Kanji characters in both groups and have redundancy.

図１３の表の左から２番目の列は、重複分の８画の字数を振り分け、８画以上と８画以下というふうに重複分を２分して集計したことを表し、表の中央から右の列にそれぞれの結果を示した。たとえば「YI」の１１００１１（８画以上）は、６＋２＝８字と集計した結果である。 The second column from the left of the table in FIG. 13 shows that the number of characters in the 8 strokes is duplicated and the duplicates are divided into 2 such as 8 strokes and 8 strokes. The results are shown in the right column. For example, 110011 (8 strokes or more) of “YI” is the result of totaling 6 + 2 = 8 characters.

この６桁拡張分割コードを利用することで、図１３の下段グラフを見ると、ほとんどのグループが８％以下の分散率を達成したことがわかる。 By using this 6-digit extended division code, it can be seen from the lower graph in FIG. 13 that most groups achieved a dispersion ratio of 8% or less.

このように、４分割コードを漢字の発音情報と組み合わせて利用することで、従来の漢字変換の課題を解決することができた。 In this way, the problem of conventional kanji conversion could be solved by using the four-divided code in combination with kanji pronunciation information.

本実施形態の、文字４分割コード検索法は単漢字の絞込みも効果的に行えるが、特に単語（２つ以上の漢字の組み合わせ）の絞込みに応用した場合にも実用レベルである。 The character quadrant code search method of the present embodiment can effectively narrow down single kanji characters, but it is also practical when applied to narrowing down words (a combination of two or more kanji characters).

さらに前記５桁コードの組み合わせなら、２４分類×２４分類＝５７６通りの組み合わせ、すなわち単語分類が可能となる。
たとえばＨＳＫと呼ばれる外国人向けの中国語認定試験に含まれる常用語彙６８９２単語を５７６通りの分類で割れば、約１２単語であるから、１分類で約１２単語が平均の包含数となる。この程度の数であれば、例えば常用中国語で読み方の分からない単語を検索する際に、５桁コードを２回（２文字分）入力するだけで、検索候補数が１２単語前後となり、ワープロの漢字変換候補数１回分と殆どかわらないという結果が得られ、実用に耐えるのである。 Furthermore, with the combination of the 5-digit code, 24 classifications × 24 classifications = 576 combinations, that is, word classifications are possible.
For example, if 6892 words in the regular vocabulary included in the Chinese certification test for foreigners called HSK are divided by 576 classifications, it is about 12 words, so about 12 words are the average number of inclusions in one classification. If this number is used, for example, when searching for words that cannot be read in regular Chinese, simply entering the five-digit code twice (for two characters) will result in approximately 12 words, and the word processor As a result, the number of candidates for kanji conversion is almost the same as the number of candidates, and it is practical.

次に漢字変換という一種の入力時の漢字検索ソフトを利用する際の、入力用書式について説明する。フロントエンドプロセッサー等で文字４分割コードのみを入力し漢字等に変換する場合、漢字に変換する必要のない単なる数字列と識別する目的で例えば全角の＠（アット）などの記号を文字４分割コードの先頭と末尾に挿入する。
発音と文字４分割コードの組み合わせ書式も同様に、例えば発音と４分割コードを全角イコール（＝）記号などを挿入して組み合わせ情報であることを明示し、かつ、全角の＠（アット）などの記号を組み合わせ情報の先頭と末尾に挿入する。 Next, a description will be given of an input format when using a kind of input kanji search software called kanji conversion. For example, when a front-end processor or the like is used to input only a character quadrant code and convert it to kanji, etc., a symbol such as a full-width @ (at) is used to distinguish it from a simple numeric string that does not need to be converted to kanji. Insert at the beginning and end of.
Similarly, the combination format of pronunciation and character quadrant code, for example, inserts a full-width equal (=) symbol into the pronunciation and quadrant code to clearly indicate that it is combination information, and also includes full-width @ (at) Insert symbols at the beginning and end of the combination information.

たとえば、「昭和」という文字列を変換して検索するために、予め図４の４分割コードに相当するレコードの先頭に、発音と文字コードを組み合わせた＠しょうわ＝１２０４−１２００＠などの書式を昭和という文字列とともに、フロントエンドプロセッサーのユーザー登録辞書などに予め登録しておき、変換の際は、＠しょうわ＝１２０４−１２００＠とキーボードから入力後、スペースバーなどをたたくことで「昭和」という文字列を呼び出して変換することができる。 For example, in order to convert and search for the character string “Showa”, a format such as @ showa = 1204-1200 @, which combines a pronunciation and a character code at the beginning of a record corresponding to the 4-part code in FIG. Is registered in advance in the user registration dictionary of the front-end processor together with the character string Showa, and at the time of conversion, after entering @ Showa = 1204-1200 @ from the keyboard, hit the space bar etc. "Can be called and converted.

通常は電子メールアドレスに使う@記号は半角なので、全角＠などの記号は用いられることが少ない。こうした記号類を文字コードにかかわる範囲指定に明示的に付加することでフロントエンドプロセッサーの誤変換を防止する。 Usually, the @ symbol used for an e-mail address is half-width, so symbols such as full-width @ are rarely used. By explicitly adding these symbols to the range specification related to the character code, erroneous conversion of the front-end processor is prevented.

フロントエンドプロセッサーのユーザー辞書登録の方法は、１単語ごとに言語バーと呼ばれる操作ツールを用いて登録してもよいし、予めテキストファイルにたとえば、以下のような「読み」（タブ挿入）「語句」（タブ挿入）「品詞」（タブ挿入）「ユーザーコメント」の順番で登録用のリストを作成しておき、まとめて登録してもよい。
＠しょうわ＝１２０４−１２００＠昭和名詞リンク The user dictionary registration method of the front-end processor may be registered for each word by using an operation tool called a language bar. For example, the following "reading" (tab insertion) "phrase" It is also possible to create a registration list in the order of “(tab insertion),“ part of speech ”(tab insertion), and“ user comment ”and register them together.
@ Shouwa = 1204-1200 @ Showa Noun Link

フロントエンドプロセッサーによっては、ユーザーコメント欄に解説用の文字列を入力表示できるだけでなく、リンクを設定することで例えば別のホームページで多言語情報や動画等の非言語情報を表示できるものもあるので、表示された語句の意味を理解する助けにもなる。 Depending on the front-end processor, not only can you enter and display commentary strings in the user comment field, but you can also display non-linguistic information such as multilingual information and videos on another homepage by setting a link. , Also helps to understand the meaning of the displayed words.

＠は漢字変換以外にも４分割文字コードに関する表記であることを明示する目的で用いてもよいし、半角で使うことも許容する。 @ May be used for the purpose of clearly indicating that it is a notation for a quadrant character code in addition to Kanji conversion, and may be used in half-width.

１６進数圧縮書式は＠等と異なる＃等の記号を使うことにより、例えばアルファベットの「Ｄ」が日常語書式でなく１６進数圧縮書式であることを区別することができる。 In the hexadecimal compression format, by using a symbol such as # different from @ etc., it is possible to distinguish, for example, that the alphabet “D” is a hexadecimal compression format instead of an everyday language format.

従来の発音入力のみの漢字変換では、たとえば「きしゃ」という入力の同音語候補が多数表示された場合、文脈によって誤変換を修正したりする必要があった。
たとえば、「きしゃのきしゃはいいとおもう。」と入力した場合、いくつかの変換の可能性がある。以下に示す４つの例はいずれも文法的な誤りがない変換候補だが、ほとんどのフロントエンドプロセッサーはどれか１つの変換しかできない。
貴社の記者はいいと思う。
汽車の記者はいいと思う。
記者の喜捨はいいと思う。
貴社の汽車はいいと思う。 In the conventional kanji conversion using only pronunciation input, for example, when a large number of homophone candidates with the input “Kisha” are displayed, it is necessary to correct the erroneous conversion depending on the context.
For example, if you enter "Let's be happy," there are several possible conversions. The following four examples are all conversion candidates with no grammatical errors, but most front-end processors can only do one conversion.
I think your reporter is good.
I think the train reporter is good.
I think the journalists are happy.
I think your train is good.

ところが、予め「きしゃ」という同音語を本発明の書式でユーザー辞書に登録しておけば、以下のような入力をすることで希望する漢字の一発変換が可能となる。
＠００３４−１２００＝きしゃ＠の＠１２３０−００００＝きしゃ＠はいいとおもう。
＠１２３４−００００＝きしゃ＠の＠１２３０−００００＝きしゃ＠はいいとおもう。
＠１２３０−００００＝きしゃ＠の＠００３４−１２０４＝きしゃ＠はいいとおもう。
＠００３４−１２００＝きしゃ＠の＠１２３４−００００＝きしゃ＠はいいとおもう。 However, if the homophone “Kissha” is registered in the user dictionary in the format of the present invention in advance, the desired kanji can be converted once by inputting as follows.
@ 0034-1200 = Kissha @ 's @ 1230-0000 = Kissha @
@ 1234-0000 = @ 1230-0000 of Kisha @
@ 1230-0000 = Kisha @ 's @ 0034-1204 = Kisha @ is good.
@ 0034-1200 = Kissha @ 's @ 1234-0000 = Kissha @ is good.

一発変換できる理由は、発音は同じでも、文字４分割コードがそれぞれ異なるからである。
なお、分割コードと発音の順番は逆でもかまわない。 The reason why the conversion can be made at one time is that even though the pronunciation is the same, the character quadrant codes are different.
Note that the division code and the order of pronunciation may be reversed.

文字コードの詳細書式を説明する。
分野別書式を用いると、さらに精密な漢字検索が可能となる。
たとえば、小規模な専門用語辞書などに限定して検索を行う場合、先頭の＠に続けて例えば「かな」で専門用語辞書名を入力し、続けてコロン（：）等を入力して検索範囲を限定する。 Explain the detailed format of character codes.
Using field-specific formats enables more precise kanji search.
For example, when performing a search limited to a small terminology dictionary, enter the terminology dictionary name with, for example, “Kana” after the leading @, and then enter a colon (:) etc. Limit.

例えば、苗字専門用語辞書（「みょうじ」と略称）内から「堅田」を入力する場合、
＠みょうじ：かただ＝１０３４−００００＠と予め図４の４分割コード欄に登録し、文字欄に堅田と同じ行に登録し、ユーザー辞書登録しておけば、苗字専門用語辞書（「みょうじ」と略称）内からのみ検索変換されるので、専門辞書に登録していない次のような同音語は変換候補として表示されないので精度が向上する。
＠かただ＝１０３４−１２０４＠型だ For example, if you enter “Katata” from the last name technical term dictionary (abbreviated as “myoji”)
@Moji: Katadada = 1034-0000 @ Pre-registered in the 4-part code column of Fig. 4, registered in the same column as Katata in the character column, and registered in the user dictionary, the surname technical term dictionary ("Myo Therefore, the following homophones that are not registered in the specialized dictionary are not displayed as conversion candidates, improving accuracy.
@It is just = 1034-1204 @ type

引用書式が変換効率を向上させる場合がある。たとえば、かなを分割コードで表すと１種類のコードに複数のかなが分類されるので、特定のかな１文字にしぼって検索するのに手間がかかる場合がある。アルファベットや数字も同様である。 The citation format may improve conversion efficiency. For example, when a kana is represented by a divided code, a plurality of kana are classified into one type of code, so it may take time to search for a specific kana character. The same applies to alphabets and numbers.

この場合は、文字コードではなく、かな、アルファベット、数字、記号などの常用的で単純な文字自体を直接文字コードと混在させて入力する場合、文字自体の前後に引用符号であるダブルクォーテーション（””）等を付加することにより、文字コードと区別する。 In this case, if you enter not only the character code but also common and simple characters such as kana, alphabets, numbers, and symbols directly mixed with the character code, double quotation marks (" ")) Etc. are added to distinguish them from character codes.

例えば「昭和３８年」の「３８」を引用書式で表すと以下のようになる。
＠しょうわ”３８”ねん＝１２０４−１２００−”３８”−００００＠
なお、発音部の引用符号は省略してもよい。 For example, “38” of “Showa 38” is expressed in the following form.
@Shouwa "38" Nen = 1204-1200- "38" -0000 @
Note that the quotation marks in the pronunciation part may be omitted.

この場合、引用部分が変数の場合があるので、プログラムで引用部分を除く部分一致ができるようにしておくとよい。 In this case, since the quoted part may be a variable, it is preferable that the program can perform partial matching excluding the quoted part.

部品書式が学習途中の人に有益な場合もある。
検索したい漢字の発音は知らないが、その漢字を構成する部品要素の発音（音や訓）を知っている場合、部品要素の発音をセミコロン（；）等を挿入して列記し、文字コードと組み合わせることができる。ただし、部品の読みの先頭と末尾にもセミコロン（；）等を付加する。 In some cases, the part format is useful for those who are learning.
If you do not know the pronunciation of the kanji you want to search for, but know the pronunciation (sound or practice) of the component elements that make up the kanji, list the pronunciation of the component elements by inserting a semicolon (;), etc. Can be combined. However, a semicolon (;) etc. is also added to the beginning and end of the part reading.

例えば、「魏（発音は「ギ」）」の部品書式を次のように表す。
＠；い；おに；き；＝１２３０＠ For example, the part format of “魏 (pronounced“ Gi ”)” is expressed as follows.
@;

「魏」は「委」と「鬼」という２つの部品から構成されるので、「委」の発音「い」（音読み）と「鬼」の発音「おに」（訓読み）、「き」（音読み）を列挙したのであるが、部品の一部でもよいので予め辞書登録しておけば、学習者などには難しい漢字を検索する助けとなる。 “魏” is composed of two parts, “commitment” and “demon”, so the pronunciation of “commitment” is “I” (spoken reading), and the pronunciation of “demon” is “oni” (note reading), “ki” ( (Speech reading) is listed, but it may be a part of the part, so if it is registered in the dictionary in advance, it helps the learner to search difficult kanji.

文字コードの簡易書式を説明する。
１バイト系の処理機能しかない情報機器等でフォルダ名やファイル名に４分割文字コードを利用する場合、文字コードの簡易書式を用いる。CD-Rにデータファイルを保存する場合のISO-9660規格を基準にするとファイル名は半角大文字アルファベット８字以内、拡張子は３文字以内で記号はアンダースコア( _ )が利用できるため、ファイル名の末尾にアンダースコア( _ )に続き次の略称を付ける。アンダースコアとアルファベット略称を合わせて「識別子」と呼ぶ。 Explain the simple format of character codes.
When a 4-part character code is used for a folder name or file name in an information device or the like having only a 1-byte processing function, a simple character code format is used. Based on the ISO-9660 standard for saving data files on a CD-R, the file name can be up to 8 characters, the extension can be up to 3 characters, and the underscore (_) can be used for the symbol. Append the following abbreviations after the underscore (_) at the end of. The underscore and alphabet abbreviation are collectively referred to as “identifier”.

以下が簡易書式の例である。
２進数書式_B（Binary Numberの略称）
１０進数書式_D（Decimal Numberの略称）
日常語書式_C（a Commonly used Wordの略称）
１６進数書式_H（Hexadecimal Numberの略称）
連想書式_A（Association of Ideaの略称） The following is an example of a simple format.
Binary number format_B (abbreviation of Binary Number)
Decimal format_D (abbreviation for Decimal Number)
Daily Word Format_C (abbreviation for a Commonly used Word)
Hexadecimal format_H (abbreviation for Hexadecimal Number)
Association Form_A (Abbreviation of Association of Idea)

例えば、「語」の発音を訓令式ローマ字表記で「GO」と入力し、続けて「1234」と文字コードを入力し、最後に１０進数書式の識別子を入力する簡易書式例は以下のようになる。
（全て半角）→GO1234_D.HTM For example, a simple format example in which the pronunciation of “word” is entered as “GO” in the Roman alphabet notation, followed by the character code “1234”, and finally the identifier in decimal format is entered as follows: Become.
(All half-width) → GO1234_D.HTM

簡易書式は携帯型音楽再生プレーヤーなどでファイルやフォルダ名称を統一的に検索しやすくする場合などにも有益である。 The simple format is also useful for making it easy to search for file and folder names uniformly with a portable music player.

文字４分割コードの入力表示用に使用する文字等の種類と特長を一般的な装置との関係で説明する。 The types and characteristics of characters used for the input display of the character quadrant code will be described in relation to general devices.

数字のみの書式は最も実用範囲が広い。
「２進数（ビット）書式」は０と１のみを使うため、例えば携帯電話の数字ボタンやパソコン数字キーやマウス左右ボタン、ゲーム機コントローラの左右ボタン、入力機能を備えたテレビリモコンなどほとんどの既存装置類の必要最小限の入力手段で入力や表示が可能である。 Number formats are the most practical.
Since “binary number (bit) format” uses only 0 and 1, for example, most existing mobile phone number buttons, PC numeric keys, mouse left and right buttons, game machine controller left and right buttons, TV remote control with input function, etc. Input and display are possible with the minimum necessary input means of the devices.

「１０進数書式」は０から４までの５種類の数字を使うため、例えば携帯電話の数字ボタンやパソコン数字キー、入力機能を備えたテレビリモコンなど一般的な既存の情報機器類等で入力や表示が可能である。 The “decimal number format” uses five types of numbers from 0 to 4, so it can be input using general existing information devices such as mobile phone numeric buttons, PC numeric keys, and TV remote controls with input functions. Display is possible.

アルファベット、数字、ハイフン（−）、＠、＃、ダブルクォーテーション（”）、コロン（：）、セミコロン（；）アンダースコア（ _ ）などは通常のパソコンや携帯電話の文字入力手段として利用されている。 Alphabets, numbers, hyphens (-), @, #, double quotes ("), colons (:), semicolons (;) underscores (_), etc. are used as character input means for ordinary personal computers and mobile phones. .

前記のごとく全ての文字コード書式に必要な文字が上記数種類に限定されているので、例えば（通話専用電話機を除く）携帯電話の数字ボタンやパソコン数字キー、入力機能を備えたテレビリモコンなど一般的な既存の情報機器類等で入力や表示が可能である。 As described above, the characters required for all character code formats are limited to the above-mentioned several types. For example, it is common to use numeric buttons on a mobile phone (excluding telephones for calls), PC numeric keys, and TV remote controls with input functions. Input and display are possible with existing information devices.

「日常語書式」の場合は、「上下左右」のように入力には変換の手間がかかったり、装置によっては制約があるが、伝達の際に日常語彙なので４分割文字コードの知識がない人でも理解しやすい。アイコンなどでボタンスイッチを日常語書式で表示すれば、老齢者などの機器の操作に不慣れな人にも操作がしやすい長所もある。 In the case of “Daily Word Format”, input is time-consuming, like “Up / Down / Left / Right”, and there are restrictions depending on the device. But easy to understand. If the button switch is displayed in an everyday language format with icons, etc., it is easy to operate even for those who are not familiar with the operation of equipment such as the elderly.

アイコン（図形）で前記表示を代用できれば、幅広い利用が可能である。１２通りから最多で１６通りの４分割文字コードのアイコン等を例えばパソコンや券売機などの入力装置の画面等に表示し、マウスポインターやタッチパネル用のペンや指等で選択するだけで、一般的な既存の装置類等で入力や表示が可能である。この手段の長所は文字コードの書式を学習しなくても、その場で直観的に利用者が入力や表示が可能なことである。 If the display can be substituted with an icon (graphic), it can be used widely. It is common to display icons from 4 to 16 types of 4-part character codes on the screen of input devices such as personal computers and ticket vending machines, and select them with a mouse pointer or touch panel pen or finger. Input and display are possible with existing devices. The advantage of this means is that the user can input and display intuitively on the spot without learning the character code format.

図２のＳ４００で示した書式変換方法を説明する。この処理はホームページのＨＴＭＬ形式でJAVA（登録商標）SCRIPTなどの簡易スクリプトにより処理してもよいし、ワープロソフトの置換機能を用いて、利用者が簡単な書式変換を行ってもよい。 The format conversion method shown in S400 of FIG. 2 will be described. This processing may be performed by a simple script such as JAVA (registered trademark) SCRIPT in the HTML format of the home page, or the user may perform simple format conversion using the replacement function of the word processing software.

「２進数（ビット）書式」ハイフン（−）なしの場合は、入力書式が図１の記憶手段のデータと同一形式なのでそのまま照合する。 When there is no “binary number (bit) format” hyphen (−), the input format is the same format as the data in the storage means of FIG.

「２進数（ビット）書式」ハイフン（−）挿入の場合は、ハイフンを照合前に削除処理してから内部データと照合する。 In the case of “binary (bit) format” hyphen (−) insertion, the hyphen is deleted before collation and then collated with internal data.

「１０進数非省略書式」ハイフン（−）なしの場合は、「１２３４」の順番で基準となる４桁の数字列テンプレートを予めメモリに記憶しておき、これと入力文字コードを４桁ずつ先頭から照合し、一致する数字を昇順に並び替えるが、入力文字コードに０が含まれている場合には照合一致しないので、一致しない数字を０に置き換える。この処理を４桁ずつ繰り返す。
例えば、＠４１２０１０４３＠→１２０４１０３４のように処理する。 When there is no “decimal number abbreviation format” without a hyphen (−), a reference 4-digit number string template in the order of “1234” is stored in the memory in advance, and this and the input character code are prefixed by 4 digits. The matching numbers are rearranged in ascending order, but if the input character code contains 0, the matching does not match, so the mismatched numbers are replaced with 0. This process is repeated every four digits.
For example, it is processed as @ 41201043 @ → 12041034.

「１０進数非省略書式」ハイフン（−）挿入の場合は、ハイフンを照合前に削除処理してから前記の処理を行う。 In the case of “decimal number non-abbreviated format” hyphen (−) insertion, the hyphen is deleted before collation, and then the above processing is performed.

「１０進数省略書式」ハイフン（−）挿入の場合は、「１２３４」の順番で基準となる４桁の数字列テンプレートを予めメモリに記憶しておき、これとハイフンで区切られた入力文字コード列を照合し、一致する数字を昇順に並び替えるが、入力文字コード列中に省略されて４桁の文字列テンプレートに一致しない数字がある場合には０を挿入して補う。その後ハイフンを削除する。この処理をハイフンで区切られた入力文字コード列ごとに繰り返す。
例えば、＠４１２−１４３＠→１２４−１３４→１２０４１０３４のように処理する。 In the case of “decimal abbreviation format” hyphen (−) insertion, a reference 4-digit number string template in the order of “1234” is stored in the memory in advance, and this is an input character code string delimited by a hyphen. The matching numbers are rearranged in ascending order, but if there is a number that is omitted in the input character code string and does not match the 4-digit character string template, 0 is inserted to compensate. Then remove the hyphen. This process is repeated for each input character code string delimited by a hyphen.
For example, processing is performed as follows: @ 412-143 @ → 124-134 → 12041034.

「１６進数圧縮書式」の場合は、入力された文字コード列を１６進数と２進数の対応表に照合して変換するか、処理装置組み込みの関数で変換する。
例えば、（１６進数）ＤＢ→（２進数）１１０１１０１１のように処理する。 In the case of “hexadecimal number compression format”, the input character code string is converted by collating it with a correspondence table of hexadecimal numbers and binary numbers, or by a function built in the processing device.
For example, processing is performed as (hexadecimal number) DB → (binary number) 11011011.

「連想文字圧縮書式」の場合は、入力された文字コード列を連想文字コードと２進数の対応表に照合して変換する。例えば、（連想文字コード）ＫＷ→（２進数）１１０１１０１１のように処理する。 In the case of “associative character compression format”, the input character code string is collated with a correspondence table of associative character codes and binary numbers and converted. For example, processing is performed as (associative character code) KW → (binary number) 11011011.

文字データベースを表す図４のコード欄に品詞などの文法情報と語意分類などの意味情報をコード化して記憶することにより、同一レコードに同一のコードで記憶されている文字を、文法と意味の組み合わせ情報からも検索できる。 By combining grammatical information such as parts of speech and semantic information such as meaning classification in the code column of FIG. 4 representing the character database, the characters stored in the same record with the same code can be combined with the grammar and meaning. You can also search from information.

たとえば前記６桁の分割コードに続けて７桁目に文法情報、８桁目に意味情報のコードを付加することとする。 For example, a grammatical information code is added to the seventh digit and a semantic information code is added to the eighth digit following the 6-digit divided code.

文法情報とは、名詞、動詞、形容詞といった品詞情報などを指す。
次に説明する実施形態では、名詞とそれ以外の品詞という情報を文法情報とするが、これに限定されず、たとえば主語と述語という構文情報を文法情報として使ってもよい。 Grammar information refers to part-of-speech information such as nouns, verbs, and adjectives.
In the embodiment described below, grammatical information is information of nouns and other parts of speech. However, the present invention is not limited to this, and for example, syntactic information of subjects and predicates may be used as grammatical information.

上記のように名詞とそれ以外に分類した場合は、７桁目の文法情報を１０進数書式で表し、名詞を７、動詞や形容詞などの名詞以外の品詞を８、品詞を指定しない場合を０で表すこととする。 When classified as a noun and other than the above, the 7th digit grammatical information is expressed in decimal format, the noun is 7, the part of speech other than the noun such as a verb or adjective is 8 and the part of speech is not specified. It shall be expressed as

文字データベース図４のたとえば「北」の６桁目までの字形情報を２進数非省略書式で表すと１１００００となる。「北」の品詞は名詞なので、７桁目の文法情報を付加したコードは１１００００７と表せる。 Character database The character database information up to the sixth digit of “north” in FIG. Since the part of speech of “north” is a noun, the code with the 7th digit grammar information added can be expressed as 1100007.

文字データベース図４のたとえば「打」の６桁目までの字形情報を２進数非省略書式で表すと１１００００となる。「打」の品詞は動詞であるため、名詞以外の品詞なので、７桁目の文法情報を付加したコードは１１００００８と表せる。 Character database For example, the character shape information up to the sixth digit of “hit” in FIG. Since the part-of-speech of “hit” is a verb, it is a part-of-speech other than a noun, so the code with the seventh digit grammar information added can be expressed as 1100008.

意味情報とは、人間に関する語、自然に関する語といった語意の分類情報などを指す。
次に説明する実施形態では、人間に関するか人間以外かという情報を意味情報とするが、そのほかたとえば動物と植物とそれ以外のもののように分類方法は任意である。 Semantic information refers to classification information of meaning such as words related to humans and words related to nature.
In the embodiment to be described next, the information about whether it is related to humans or non-humans is used as semantic information. In addition, for example, classification methods such as animals, plants, and others are arbitrary.

たとえば、検索文字が人間に関する場合は８桁目に９、人間以外に関する場合は８桁目に０という情報を１０進数書式で付加することにより、文字データベース図４の「北」は人間以外の自然に関する語なので１１００００７０、「打」は人間に関する語なので１１００００８９と表せ、検索時により詳細な識別が可能となる。 For example, if the search character is related to humans, 9 is added to the eighth digit, and if it is related to non-humans, 0 is added to the eighth digit in decimal format. 11000070 and “hit” are words related to humans and can be expressed as 11000089, so that more detailed identification is possible at the time of search.

このように、６桁目までの字形情報のみでコードを入力して検索すると前記「北」と「打」の２つの漢字は同一のコードなので検索時に識別できないが、７桁目の文法情報と８桁目の意味情報を付加するとコードが異なるので検索時に識別できる。 As described above, if a search is performed by inputting a code using only the character information up to the sixth digit, the two kanji characters “north” and “stroke” cannot be identified at the time of the search because they are the same code. If the 8th digit semantic information is added, the code is different and can be identified at the time of retrieval.

本発明の機能構成ブロック図である。It is a functional block diagram of the present invention. 本発明の検索処理フローチャートである。It is a search processing flowchart of the present invention. 本発明の文字の分割方法を示す説明図である。It is explanatory drawing which shows the division | segmentation method of the character of this invention. 本発明の文字データベース表である。It is a character database table | surface of this invention. 本発明の文字データベースのシフトJIS表記である。It is Shift JIS notation of the character database of this invention. 本発明の１０進数省略書式と連想文字圧縮書式の対照表である。It is a comparison table of the decimal number omission format and the associative character compression format of the present invention. 五筆字型のキー配列である。This is a five-stroke key layout. ASCII配列のキー配列である。It is a key layout of ASCII layout. 五筆字型の入力例である。This is an example of five-stroke input. 中国語の同音漢字の例である。This is an example of Chinese homophones. 中国語の同音漢字の分散率である。This is the distribution of Chinese homophones. 中国語の同音漢字の４桁コードと５桁コードの分散率である。This is the variance of the 4-digit code and 5-digit code of Chinese homophones. 中国語の同音漢字の６桁コードの分散率である。This is the variance of the 6-digit code for Chinese homophones.

Explanation of symbols

１０入力手段
２０演算手段
３０記憶手段
４０表示手段 DESCRIPTION OF SYMBOLS 10 Input means 20 Calculation means 30 Storage means 40 Display means

Claims

An input means 10 for receiving an input of a character quadrant code and the like; a storage means 30 for storing a character database and a character search program corresponding to a character quadrant code in which a gap between search character components is encoded; A character search method in a search device comprising a calculation means 20 for collating input information with a character database and a display means 40 for displaying a search result,
When the character database is stored in the storage means 30, when the character has a gap between the constituent elements, the dividing line is drawn, and when there is no gap, the dividing line cannot be drawn. Judgment is made in order of the four parts of the top, bottom, left, and right of the character to determine whether or not the dividing line can be drawn in the horizontal direction. If the dividing line can be drawn, the number 1 is not drawn. Is represented by the number 0, and these numbers are assigned to the corresponding digits in the first, second, third, and fourth digits of the four-digit number in the order of the top, bottom, left, and right. The character is encoded into a character four-divided code, the character four-divided code and the corresponding character or character image, multilingual, video file are classified and stored as a database,
The input means 10 accepting input of the character quadrant code;
When the arithmetic means 20 collates the character four-division code that has received the input with the character four-division code stored in the storage means 30, and the character four-division code matches, the character four-division code Displaying on the display means 40 a character or character image, multi-language, and moving image file corresponding to
It is possible to search and display the corresponding character or character image from the character quadrant code entered by a person who does not have knowledge about the characters and character components, and understand the meaning of the obtained character or character image in multiple languages and movies Character search method that can be used.

Character database and character search program corresponding to the combination of the input means 10 for receiving the input of the character quadrant code and the character pronunciation information, the character quadrant code that encodes the gap between the constituent elements of the search character, and the character pronunciation information A character search method in a search device comprising: storage means 30 for storing the input information; arithmetic means 20 for collating input information with the character database; and display means 40 for displaying the search results.
When the character database is stored in the storage means 30, when the character has a gap between the constituent elements, the dividing line is drawn, and when there is no gap, the dividing line cannot be drawn. Judgment is made in order of the four parts of the top, bottom, left, and right of the character to determine whether or not the dividing line can be drawn in the horizontal direction. If the dividing line can be drawn, the number 1 is not drawn. Is represented by the number 0, and these numbers are assigned to the corresponding digits in the first, second, third, and fourth digits of the four-digit number in the order of the top, bottom, left, and right. A character or character image corresponding to a combination of the character four-division code and the character pronunciation information by coding the character pronunciation information in alphabets immediately after the character four-division code; Multilingual video files Classified as a database and is stored,
The input means 10 receiving an input of a combination of the character quadrant code and the character pronunciation information;
The computing means 20 collates the combination of the character quadrant code and the character pronunciation information received with the input with the combination of the character quadrant code and the character pronunciation information stored in the storage means 30, and these characters. A step of displaying, on the display means 40, a character or a character image, a multilingual, and a moving image file corresponding to the combination of the character quad-code and the character pronunciation information when the combination of the quad-code and the character pronunciation information matches. And
It is possible to search and display the corresponding character or character image from the combination of the character quadrant code inputted by a person having knowledge of character and character pronunciation and the character pronunciation information, and to obtain the meaning of the obtained character or character image. Character search method that can be understood by language and video.

Character input database 10 and character search program corresponding to character quartet code and character grammar semantic information encoded with input means 10 for receiving input of character quartet code and character grammar semantic information; A character search method in a search device comprising: storage means 30 for storing the input information; arithmetic means 20 for collating input information with the character database; and display means 40 for displaying the search results.
When the character database is stored in the storage means 30, when the character has a gap between the constituent elements, the dividing line is drawn, and when there is no gap, the dividing line cannot be drawn. Judgment is made in order of the four parts of the top, bottom, left, and right of the character to determine whether or not the dividing line can be drawn in the horizontal direction. If the dividing line can be drawn, the number 1 is not drawn. Is represented by the number 0, and this number is assigned to the corresponding digits in the order of the first digit, the second digit, the third digit, and the fourth digit of the 8-digit number in the order of the upper, lower, left, and right. The character is coded to form a character four-division code, and whether or not there is a gap between components that can be further divided at a location different from the division line of the character is represented by a number in the fifth digit. Is encoded by expressing the number of characters in the 6th digit, followed by this code. It is coded by expressing the grammatical information whether it is a noun or otherwise in the 7th digit and the semantic information indicating whether it is human or otherwise in the 8th digit. Combining grammatical information and semantic information codes into an 8-digit code, the corresponding character or character image, multilingual, video file is classified and stored as a database,
The input means 10 accepting the 8-digit code;
When the arithmetic means 20 collates the 8-digit code that has received the input with the 8-digit code stored in the storage means 30, and the 8-digit code matches, Displaying a character or character image corresponding to a digit code, a multilingual, video file on the display means 40, and
It is possible to search and display the corresponding character or character image from the 8-digit code entered by a person with knowledge of the grammar and meaning of the character, and understand the meaning of the obtained character or character image in multiple languages and videos Character search method that can be used.