JP6277655B2

JP6277655B2 - Character string search program, character string search method, and character string search device

Info

Publication number: JP6277655B2
Application number: JP2013208505A
Authority: JP
Inventors: 伸次殿川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2013-10-03
Filing date: 2013-10-03
Publication date: 2018-02-14
Anticipated expiration: 2033-10-03
Also published as: JP2015072630A

Description

本発明は、文字列を検索する文字列検索プログラム、文字列検索方法及び文字列検索装置に関する。 The present invention relates to a character string search program, a character string search method, and a character string search device that search for a character string.

従来の検索システムでは、検索により目的の文字列を取得する際に、例えば目的の文字列の一部を入力して検索を行う前方一致検索や後方一致検索等が知られている。さらに従来では、例えば表記が揺らいでいる場合でも、統一した表記の文字列を出力する検索システムが知られている。 In a conventional search system, when acquiring a target character string by search, for example, a forward match search or a backward match search in which a search is performed by inputting a part of the target character string is known. Further, conventionally, for example, a search system that outputs a character string with a unified notation even when the notation fluctuates is known.

特開２００８−５９３８９号公報JP 2008-59389 A 特開平５−１０８００４号公報JP-A-5-108004

従来の検索システムにおいて目的の文字列を検索するためには、少なくとも目的の文字列の一部を正確に入力する必要がある。したがって、例えば目的の文字列の記憶が曖昧なために目的の文字列の一部を正確に入力できない場合等には、検索を行うことが困難であった。 In order to search for a target character string in a conventional search system, it is necessary to input at least a part of the target character string accurately. Therefore, it is difficult to perform a search, for example, when the storage of the target character string is ambiguous and a part of the target character string cannot be accurately input.

１つの側面では、曖昧に記憶された文字列と関連する文字列を検索することが可能な文字列検索プログラム、文字列検索方法及び文字列検索装置を提供することを目的とする。 In one aspect, an object is to provide a character string search program, a character string search method, and a character string search device that can search a character string related to an ambiguously stored character string.

開示の技術は、第一の文字列を受け付けると、前記第一の文字列に含まれる各文字の順に、母音と子音との関係に基づき所定の位置に各文字を配置した配列表における文字間の距離と方向を示すベクトルの列を特定して、所定の類似関係を満たすベクトルの列となる第二の文字列を特定し、前記第二の文字列を出力するか、又は、前記第二の文字列を検索キーとした検索を実行する処理をコンピュータに実行させる。 When the disclosed technique accepts the first character string, the characters in the arrangement table in which each character is arranged at a predetermined position based on the relationship between the vowel and the consonant in the order of each character included in the first character string. A vector string indicating the distance and direction of the second character string is specified, a second character string that is a vector string satisfying a predetermined similarity relationship is specified, and the second character string is output, or the second character string is output The computer is caused to execute a process of executing a search using the character string of as a search key.

上記各処理は、上記各処理を実現する機能部、上記各処理を手順としてンピュータにより実行させる方法、プログラムを記憶したコンピュータ読み取り可能な記憶媒体とすることもできる。 Each of the above processes may be a functional unit that realizes each of the above processes, a method of causing each computer to execute each of the processes as a procedure, and a computer-readable storage medium that stores the program.

あいまいに記憶された文字列と関連する文字列を検索することができる。 It is possible to search for a character string associated with an ambiguously stored character string.

文字列検索装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of a character string search device. 文字列検索装置の機能構成を説明する図である。It is a figure explaining the function structure of a character string search device. 文字座標列表の一例を示す図である。It is a figure which shows an example of a character coordinate sequence table. 文字座標列表における座標の保持の仕方の一例を示す図である。It is a figure which shows an example of the method of holding | maintaining the coordinate in a character coordinate sequence table. 距離方向算出部の処理を説明する図である。It is a figure explaining the process of a distance direction calculation part. 母集団の登録の処理を説明するフローチャートである。It is a flowchart explaining the process of registration of a population. 母集団データベースの一例を示す図である。It is a figure which shows an example of a population database. 文字列検索装置における検索の処理を説明するフローチャートである。It is a flowchart explaining the process of the search in a character string search device. 検索キーとなる文字列に含まれる文字間の方向と距離の一例を示す図である。It is a figure which shows an example of the direction and distance between the characters contained in the character string used as a search key. 検索キーの入力画面の一例を示す図である。It is a figure which shows an example of the input screen of a search key. 検索結果が表示された出力画面の一例を示す図である。It is a figure which shows an example of the output screen on which the search result was displayed.

以下に図面を参照して本実施例について説明する。図１は、文字列検索装置のハードウェア構成の一例の示す図である。 The present embodiment will be described below with reference to the drawings. FIG. 1 is a diagram illustrating an example of a hardware configuration of a character string search device.

文字列検索装置１００は、それぞれバスＢで相互に接続されている入力装置１１，出力装置１２、ドライブ装置１３、補助記憶装置１４、メモリ装置１５、演算処理装置１６及びインターフェース装置１７を含む。 The character string search device 100 includes an input device 11, an output device 12, a drive device 13, an auxiliary storage device 14, a memory device 15, an arithmetic processing device 16, and an interface device 17 that are mutually connected by a bus B.

入力装置１１はキーボードやマウス等を含み、各種信号を入力するために用いられる。出力装置１２はディスプレイ装置等を含み、各種ウインドウやデータ等を表示するために用いられる。インターフェース装置１７は、モデム，ＬＡＮカード等を含み、ネットワークＮに接続する為に用いられる。 The input device 11 includes a keyboard and a mouse, and is used for inputting various signals. The output device 12 includes a display device and the like, and is used to display various windows and data. The interface device 17 includes a modem, a LAN card, and the like, and is used for connecting to the network N.

文字列検索プログラムは、文字列検索装置１００を制御する各種プログラムの少なくとも一部である。文字列検索プログラムは例えば記録媒体１８の配布やネットワークからのダウンロードなどによって提供される。文字列検索プログラムを記録した記録媒体２８は、ＣＤ−ＲＯＭ、フレキシブルディスク、光磁気ディスク等の様に情報を光学的、電気的或いは磁気的に記録する記録媒体、ＲＯＭ、フラッシュメモリ等の様に情報を電気的に記録する半導体メモリ等、様々なタイプの記録媒体を用いることができる。 The character string search program is at least a part of various programs that control the character string search device 100. The character string search program is provided by, for example, distribution of the recording medium 18 or downloading from a network. The recording medium 28 on which the character string search program is recorded is a recording medium on which information is optically, electrically or magnetically recorded, such as a CD-ROM, flexible disk, magneto-optical disk, ROM, flash memory, etc. Various types of recording media such as a semiconductor memory for electrically recording information can be used.

また、文字列検索プログラムは、文字列検索プログラムを記録した記録媒体１８がドライブ装置１３にセットされるとは記録媒体１８からドライブ装置１３を介して補助記憶装置１４にインストールされる。ネットワークからダウンロードされた文字列検索プログラムは、インターフェース装置１７を介して補助記憶装置１４にインストールされる。 The character string search program is installed from the recording medium 18 to the auxiliary storage device 14 via the drive device 13 when the recording medium 18 storing the character string search program is set in the drive device 13. The character string search program downloaded from the network is installed in the auxiliary storage device 14 via the interface device 17.

補助記憶装置１４は、インストールされた文字列検索プログラムを格納すると共に、必要なファイル、データ等を格納する。メモリ装置１５は、コンピュータの起動時に補助記憶装置１４から文字列検索プログラムを読み出して格納する。そして、演算処理装置１６はメモリ装置１５に格納された文字列検索プログラムに従って、後述するような各種処理を実現している。 The auxiliary storage device 14 stores the installed character string search program and also stores necessary files, data, and the like. The memory device 15 reads and stores the character string search program from the auxiliary storage device 14 when the computer is activated. The arithmetic processing unit 16 implements various processes as described later in accordance with a character string search program stored in the memory device 15.

本実施例の文字列検索装置１００は、例えばタブレット型のコンピュータであっても良い。また本実施例の文字列検索装置１００は、例えばスマートフォンを含む多機能の携帯電話あっても良い。 The character string search device 100 of the present embodiment may be a tablet computer, for example. Moreover, the character string search apparatus 100 of a present Example may be a multifunctional mobile phone including a smart phone, for example.

次に図２を参照して、本実施例の文字列検索装置１００の機能について説明する。図２は、文字列検索装置の機能構成を説明する図である。 Next, the function of the character string search device 100 of this embodiment will be described with reference to FIG. FIG. 2 is a diagram illustrating a functional configuration of the character string search device.

本実施例の文字列検索装置１００は、入力受付部１１０、距離方向算出部１２０、母集団登録部１３０、許容範囲設定部１４０、検索部１５０、文字列抽出部１６０、出力部１７０を有する。後述する各部の機能は、演算処理装置１６が文字列検索プログラムを実行することで実現される。 The character string search apparatus 100 according to the present embodiment includes an input reception unit 110, a distance direction calculation unit 120, a population registration unit 130, an allowable range setting unit 140, a search unit 150, a character string extraction unit 160, and an output unit 170. The function of each part to be described later is realized by the arithmetic processing unit 16 executing a character string search program.

また本実施例の文字列検索装置１００は、文字座標列表２１０、母集団データベース２２０を有する。文字座標列表２１０と母集団データベース２２０とは、例えば補助記憶装置１４等の所定の記憶領域に格納されていても良い。 The character string search apparatus 100 according to the present embodiment includes a character coordinate string table 210 and a population database 220. The character coordinate sequence table 210 and the population database 220 may be stored in a predetermined storage area such as the auxiliary storage device 14, for example.

本実施例の文字列検索装置１００は、検索キーとなる文字列（以下、単に検索キーと呼ぶ。）が入力されると、文字列に含まれる各文字について、文字座標列表２１０における文字同士の距離と方向を算出する。そして文字列検索装置１００は、算出した距離と方向とを用いて母集団データベース２２０を検索し、抽出された文字列を検索キーと関連する文字列として出力する。文字座標列表２１０と母集団データベース２２０の詳細は後述する。 When a character string serving as a search key (hereinafter simply referred to as a search key) is input, the character string search device 100 according to the present embodiment is configured such that each character included in the character string includes a character string in the character coordinate sequence table 210. Calculate distance and direction. Then, the character string search device 100 searches the population database 220 using the calculated distance and direction, and outputs the extracted character string as a character string related to the search key. Details of the character coordinate sequence table 210 and the population database 220 will be described later.

本実施例の文字列検索装置１００の入力受付部１１０は、入力装置１１から入力されたデータを受け付ける。本実施例において、入力装置１１から入力されるデータは、例えば検索キーや、後述する許容範囲に関するパラメータ等である。 The input receiving unit 110 of the character string search device 100 according to the present embodiment receives data input from the input device 11. In the present embodiment, data input from the input device 11 is, for example, a search key, a parameter related to an allowable range described later, and the like.

距離方向算出部１２０は、文字列に含まれる文字のうち、隣り合う文字同士の文字座標列表２１０における距離と方向とを算出する。距離方向算出部１２０の詳細は後述する。 The distance direction calculation unit 120 calculates the distance and direction in the character coordinate string table 210 of adjacent characters among the characters included in the character string. Details of the distance direction calculation unit 120 will be described later.

母集団登録部１３０は、母集団データベース２２０の登録を行う。本実施例では、母集団データベース２２０に格納する文字列群が入力されると、各文字列において隣り合う文字同士の距離と方向を算出し、文字列と対応付けた１つのレコードとして母集団データベース２２０へ登録する。尚、隣り合う文字同士の距離と方向は、距離方向算出部１２０により算出される。 The population registration unit 130 registers the population database 220. In this embodiment, when a character string group to be stored in the population database 220 is input, the distance and direction between adjacent characters in each character string are calculated, and the population database is recorded as one record associated with the character string. 220 is registered. The distance and direction between adjacent characters are calculated by the distance direction calculation unit 120.

許容範囲設定部１４０は、距離方向算出部１２０により算出された距離と方向を所定範囲の値とするためのパラメータを設定する。本実施例のパラメータは、具体的には距離を所定の範囲の値とする距離パラメータと、方向を所定の範囲の値とする方向パラメータとを含む。本実施例のパラメータは、例えば予めメモリ装置２５等に複数種類格納されていても良い。許容範囲設定部１４０は、メモリ装置２５から母集団データベース２２０に対応したパラメータを取得して設定しても良い。パラメータの詳細は後述する。 The allowable range setting unit 140 sets parameters for setting the distance and direction calculated by the distance / direction calculation unit 120 to values within a predetermined range. Specifically, the parameters of the present embodiment include a distance parameter whose distance is a predetermined range value and a direction parameter whose direction is a predetermined range value. For example, a plurality of types of parameters in this embodiment may be stored in advance in the memory device 25 or the like. The allowable range setting unit 140 may acquire and set parameters corresponding to the population database 220 from the memory device 25. Details of the parameters will be described later.

検索部１５０は、入力受付部１１０が受け付けた検索キーについて、距離方向算出部１２０が算出した距離と方向及び設定されたパラメータに基づき母集団データベース２２０を検索する。検索部１５０の処理の詳細は後述する。 The search unit 150 searches the population database 220 based on the distance and direction calculated by the distance direction calculation unit 120 and the set parameters for the search key received by the input reception unit 110. Details of the processing of the search unit 150 will be described later.

文字列抽出部１６０は、検索部１５０による検索の結果として該当する文字列を母集団データベース２２０から抽出する。出力部１７０は、抽出された文字列を検索キーと関連する関連文字列として出力装置１２等により出力する。 The character string extraction unit 160 extracts a corresponding character string from the population database 220 as a result of the search by the search unit 150. The output unit 170 outputs the extracted character string as a related character string related to the search key by the output device 12 or the like.

尚、本実施例の文字列検索装置１００は、例えば端末装置とネットワークを介して接続されていても良い。この場合文字列検索装置１００は、例えば端末装置において入力された検索キーを検索要求と共に受け付け、検索キーに基づく検索を行い、その結果を端末装置へ出力しても良い。 In addition, the character string search apparatus 100 of a present Example may be connected with the terminal device via the network, for example. In this case, for example, the character string search device 100 may accept a search key input in the terminal device together with the search request, perform a search based on the search key, and output the result to the terminal device.

以下に図３を参照して本実施例の文字座標列表２１０について説明する。図３は、文字座標列表の一例を示す図である。 The character coordinate sequence table 210 of this embodiment will be described below with reference to FIG. FIG. 3 is a diagram illustrating an example of the character coordinate sequence table.

本実施例の文字座標列表２１０は、五十音表に基づき文字を配列した表である。 The character coordinate sequence table 210 of the present embodiment is a table in which characters are arranged based on a Japanese syllabary table.

五十音表は、日本語の仮名文字（平仮名、片仮名）を母音に基づき縦に五字、子音に基づき横に十字ずつ並べたものであり、言語音に関する物理的特性を扱うとされている音声学に基づき配列されている。言語音とは、いわゆる音のうち言語に用いられるものを示し、子音と母音に分けられる。 The Japanese syllabary table is composed of Japanese kana characters (Hiragana and Katakana) arranged in five letters vertically based on vowels and crosses horizontally based on consonants, and is said to handle physical characteristics related to speech sounds. Arranged based on phonetics. The language sound indicates a so-called sound used for a language, and is divided into a consonant and a vowel.

母音は、口腔内で呼気の流れがあまり妨げられないで発せれられる言語音であり、子音は、口腔内で呼気の流れがある程度妨げられて発せれられる言語音である。 A vowel is a language sound that is uttered without much obstructing the flow of exhalation in the oral cavity, and a consonant is a language sound that is uttered with some obstruction of the flow of exhalation in the oral cavity.

子音は、調音点及び調音法に基づいて分類されている。より具体的には、子音は、調音法より、調音点において呼気がどのように流れるか、あるいは流れないかにより分類される。 Consonants are classified based on articulation points and articulation methods. More specifically, consonants are classified according to how exhalation flows or does not flow at the articulation point, based on the articulation method.

調音点とは、肺から唇までの発声器官の中で、音の区別に大きく係る部分を指す。調音法とは、喉頭以上の調音器官の形や動きによって発声器官内の空気の流れを制御したり、発声器官内で発生する音声の共鳴の仕方を変化させたり、新たな音を発生あるいは追加したりして、さまざまな母音や子音を発生させる方法である。 The articulatory point refers to a portion of the voicing organ from the lungs to the lips that is largely related to the distinction of sound. The articulation method controls the flow of air in the vocal organs by changing the shape and movement of the articulator above the larynx, changes the way the sound resonates within the vocal organs, and generates or adds new sounds. This is a method of generating various vowels and consonants.

この調音点と調音法に基づいて主要な子音を配置した表に発音記号を配置すると、五十音表の「あかさたな・・・」の順序は、調音点の喉の奥から口の前の方という並びになると考えられている。すなわち五十音表における文字の配列は、音声学的な観点で定められた配列であると言える（参考：「言語学の基礎／アカサタナの謎：音声学の基礎」http://culture.cc.hirosaki-u.ac.jp/english/utsumi/linguistics/lingusitics_c2_ja.html）。 If you place a phonetic symbol on a table with the main consonants based on this articulation point and articulation method, the order of `` Akasata ... '' in the 50-sound table is from the back of the throat to the front of the mouth. It is thought that it becomes the line. In other words, it can be said that the arrangement of letters in the Japanese syllabary table is an arrangement determined from a phonetic viewpoint (Reference: “Basics of Linguistics / Mystery of Akasatana: Basics of Phonetics” http://culture.cc .hirosaki-u.ac.jp / english / utsumi / linguistics / lingusitics_c2_en.html).

そこで本願の発明者は、記憶があいまいな文字列は、文字の情報ではなく音声の情報として記憶される可能性がある点に着目し、五十音表に基づく文字座標列表２１０における文字と文字の位置関係を検索に用いることを考えた。本実施例における文字と文字の位置関係とは、距離方向算出部１２０により算出される、文字座標列表２１０における文字と文字の間の距離と方向により示される。 Therefore, the inventor of the present application pays attention to the fact that a character string that is ambiguously stored may be stored as voice information instead of character information, and character and character in the character coordinate string table 210 based on the Japanese syllabary table We considered using the positional relationship of for the search. The positional relationship between characters in the present embodiment is indicated by the distance and direction between characters in the character coordinate sequence table 210 calculated by the distance direction calculation unit 120.

本実施例の文字座標列表２１０では、Ｘ軸方向に子音を並べ、Ｙ軸方向に母音を配列した。すなわち本実施例では、文字座標列表２１０におけるＹ軸方向の座標の変化は、母音の遷移を示し、文字座標列表２１０におけるＸ軸方向の座標の変化は、子音の遷移を示す。また本実施例の文字座標列表２１０では、子音の次に濁音を配置した。 In the character coordinate sequence table 210 of this embodiment, consonants are arranged in the X-axis direction and vowels are arranged in the Y-axis direction. That is, in this embodiment, a change in the coordinate in the Y-axis direction in the character coordinate sequence table 210 indicates a vowel transition, and a change in the coordinate in the X-axis direction in the character coordinate sequence table 210 indicates a consonant transition. Further, in the character coordinate sequence table 210 of this embodiment, the muddy sound is arranged next to the consonant.

本実施例の文字座標列表２１０は、例えば図４に示すように各文字の座標の値を保持していても良い。図４は、文字座標列表における座標の保持の仕方の一例を示す図である。 The character coordinate sequence table 210 of the present embodiment may hold the coordinate values of each character as shown in FIG. 4, for example. FIG. 4 is a diagram illustrating an example of how to maintain coordinates in the character coordinate sequence table.

本実施例では、図４に示すように、文字と、文字のＸ座標とＹ座標とを対応付けて格納したデータベースを文字座標列表２１０として保持しても良い。 In this embodiment, as shown in FIG. 4, a database that stores characters and the X coordinates and Y coordinates of the characters in association with each other may be stored as a character coordinate sequence table 210.

また本実施例の文字座標列表２１０は、母集団データベース２２０と対応して設けられていても良い。 Further, the character coordinate sequence table 210 of the present embodiment may be provided corresponding to the population database 220.

次に図５を参照して本実施例の距離方向算出部１２０の処理について説明する。図５は、距離方向算出部の処理を説明する図である。 Next, processing of the distance direction calculation unit 120 of the present embodiment will be described with reference to FIG. FIG. 5 is a diagram illustrating the processing of the distance direction calculation unit.

図５では、例えば入力受付部１１０が「デイデイコ」という文字列を検索キーとして受け付けた場合について説明する。 In FIG. 5, for example, a case where the input receiving unit 110 receives a character string “day / day” as a search key will be described.

本実施例の距離方向算出部１２０は、文字座標列表２１０における文字列「デイデイコ」に含まれる各文字の座標を取得する。 The distance direction calculation unit 120 according to the present embodiment acquires the coordinates of each character included in the character string “day-deco” in the character coordinate sequence table 210.

文字座標列表２１０において、文字列「デイデイコ」に含まれる最初の文字「デ」の座標（Ｘ１，Ｙ１）は、（１３，４）である。文字「イ」の座標（Ｘ２，Ｙ２）は、（１，２）である。文字「コ」の座標（Ｘ３，Ｙ３）は、（２，５）である。 In the character coordinate sequence table 210, the coordinates (X1, Y1) of the first character “de” included in the character sequence “day deico” are (13, 4). The coordinates (X2, Y2) of the character “I” are (1,2). The coordinates (X3, Y3) of the character “ko” are (2, 5).

本実施例の距離方向算出部１２０は、各文字の座標を用い文字間の距離と方向を算出する。始めに、距離方向算出部１２０による文字間の距離の算出について説明する。 The distance direction calculation unit 120 of the present embodiment calculates the distance and direction between characters using the coordinates of each character. First, calculation of the distance between characters by the distance direction calculation unit 120 will be described.

本実施例において、座標（Ｘ１，Ｙ１）の文字「デ」と、座標（Ｘ２，Ｙ２）の文字「イ」との間の距離Ｌは、以下の式（１）で算出される。 In this embodiment, the distance L between the character “de” at the coordinates (X1, Y1) and the character “I” at the coordinates (X2, Y2) is calculated by the following equation (1).

Ｌ＝√｛（Ｘ１−Ｘ２）^２＋（Ｙ１−Ｙ２）^２｝式（１）
したがって「デ」と文字「イ」との間の距離Ｌ＝√｛（１３−１）^２＋（４−２）^２｝＝１２．１６（小数点第３位以下切り捨て）となる。 L = √ {(X1−X2) ² + (Y1−Y2) ² } Formula (1)
Therefore, the distance L between “de” and the letter “I” is L = √ {(13-1) ² + (4-2) ² } = 12.16 (rounded down to the third decimal place).

本実施例の距離方向算出部１２０は、以上のようにして各文字間の距離を算出する。 The distance direction calculation unit 120 of this embodiment calculates the distance between characters as described above.

次に、距離方向算出部１２０による文字間の方向の算出について説明する。本実施例では、先に入力された文字に対する次に入力された文字の文字座標列表２１０における角度を文字間の方向とする。 Next, calculation of the direction between characters by the distance direction calculation unit 120 will be described. In the present embodiment, the angle in the character coordinate sequence table 210 of the next input character with respect to the previously input character is defined as the direction between characters.

以下に文字「デ」に対する文字「イ」の方向について説明する。文字列「デイデイコ」において、文字「デ」の次に入力された文字が「イ」である。よって本実施例の距離方向算出部１２０は、文字座標列表２１０における文字「デ」に対する文字「イ」の角度を文字「デ」に対する文字「イ」の方向を示す値として算出する。以下の説明では、方向を示す値を単に方向と呼ぶ。 The direction of the character “I” with respect to the character “DE” will be described below. In the character string “Day Deiko”, the character input next to the character “De” is “I”. Therefore, the distance direction calculation unit 120 of the present embodiment calculates the angle of the character “I” relative to the character “DE” in the character coordinate sequence table 210 as a value indicating the direction of the character “I” relative to the character “DE”. In the following description, a value indicating a direction is simply referred to as a direction.

文字座標列表２１０における文字「デ」に対する文字「イ」の角度θは、各文字の座標を用いて以下の式（２）で算出される。 The angle θ of the character “I” relative to the character “DE” in the character coordinate sequence table 210 is calculated by the following equation (2) using the coordinates of each character.

θ＝ｔａｎ^−１｛（Ｙ１−Ｙ２）／（Ｘ１−Ｘ２）｝×１８０／π 式（２）
したがって文字「デ」に対する文字「イ」の方向θ＝ｔａｎ^−１｛（４−２）／（１３−１）｝＝９．４６（小数点第３位以下切り捨て）となる。 θ = tan ⁻¹ {(Y1−Y2) / (X1−X2)} × 180 / π Formula (2)
Therefore, the direction of the character “I” with respect to the character “DE” is θ = tan ⁻¹ {(4-2) / (13-1)} = 9.46 (rounded down to the second decimal place).

本実施例では、以上のようにして算出した文字「デ」と文字「イ」の間の距離と、文字「デ」に対する文字「イ」の向きを、文字座標列表２１０における文字「デ」と文字「イ」の位置関係を示すベクトルとする。 In the present embodiment, the distance between the character “de” and the character “I” calculated as described above and the direction of the character “I” with respect to the character “de” are represented by the character “de” in the character coordinate sequence table 210. It is a vector indicating the positional relationship of the character “I”.

次に、本実施例の母集団データベース２２０について説明する。 Next, the population database 220 of this embodiment will be described.

本実施例の母集団登録部１３０は、例えば管理者により母集団となる文字列群が入力されると、上述した手法により各文字列において隣り合う文字と文字の位置関係を示すベクトルを算出し、文字列とベクトルと対応付けて母集団データベース２２０に登録する。 For example, when a group of character strings serving as a population is input by an administrator, the population registration unit 130 according to the present embodiment calculates a vector indicating the positional relationship between adjacent characters and characters in each character string using the method described above. The character string and the vector are associated with each other and registered in the population database 220.

以下に図６を参照して、本実施例の母集団の登録の処理について説明する。図６は、母集団の登録の処理を説明するフローチャートである。 Hereinafter, with reference to FIG. 6, the registration processing of the population according to the present embodiment will be described. FIG. 6 is a flowchart for explaining the registration process of the population.

本実施例の文字列検索装置１００は、検索対象となる母集団である文字列群の入力を受け付けると（ステップＳ６０１）、母集団登録部１３０は、文字列群のうち最初に入力された文字列を取得する（ステップＳ６０２）。 When the character string search apparatus 100 according to the present embodiment receives an input of a character string group that is a population to be searched (step S601), the population registration unit 130 first inputs a character from the character string group. A column is acquired (step S602).

続いて距離方向算出部１２０は、変数ｎ＝０とする（ステップＳ６０３）。続いて距離方向算出部１２０は、ｎ＝ｎ＋１とし、ステップＳ６０２で取得した文字列から、ｎ番目の文字を取得する（ステップＳ６０４）。続いて距離方向算出部１２０は、ｎ番目の文字が文字列における最後の文字か否かを判断する（ステップＳ６０５）。 Subsequently, the distance direction calculation unit 120 sets the variable n = 0 (step S603). Subsequently, the distance direction calculation unit 120 sets n = n + 1, and acquires the nth character from the character string acquired in step S602 (step S604). Subsequently, the distance direction calculation unit 120 determines whether or not the nth character is the last character in the character string (step S605).

ステップＳ６０５において最後の文字であった場合、後述するステップＳ６０８へ進む。ステップＳ６０５において最後の文字でない場合、距離方向算出部１２０は、文字座標列表２１０におけるｎ番目の文字とｎ＋１番目の文字の座標を取得し、ｎ番目の文字とｎ＋１番目の文字の位置関係を示すベクトルを算出する（ステップＳ６０６）。具体的には距離方向算出部１２０は、ｎ番目の文字とｎ＋１番目の文字の間の距離と、ｎ番目の文字に対するｎ＋１番目の文字の文字座標列表２１０における方向と、を算出する。算出の方法は、上述した通りである。 If it is the last character in step S605, the process proceeds to step S608 described later. If it is not the last character in step S605, the distance direction calculation unit 120 acquires the coordinates of the nth character and the (n + 1) th character in the character coordinate sequence table 210, and indicates the positional relationship between the nth character and the (n + 1) th character. A vector is calculated (step S606). Specifically, the distance direction calculation unit 120 calculates the distance between the nth character and the (n + 1) th character and the direction in the character coordinate sequence table 210 of the (n + 1) th character with respect to the nth character. The calculation method is as described above.

続いて母集団登録部１３０は、ステップＳ６０２で取得した文字列と、ｎ番目の文字とｎ＋１番目の文字のベクトルとを対応付けて母集団データベース２２０に格納し（ステップＳ６０７）、ステップＳ６０４へ戻る。 Subsequently, the population registration unit 130 associates the character string acquired in step S602 with the vector of the nth character and the n + 1th character in the population database 220 (step S607), and returns to step S604. .

本実施例の母集団登録部１３０は、ステップＳ６０４〜ステップＳ６０７の処理を繰り返すことで、文字列に含まれる全ての文字について、文字と文字との位置関係を示すベクトルを取得することができる。 The population registration unit 130 according to the present embodiment can acquire a vector indicating the positional relationship between characters for all the characters included in the character string by repeating the processing of steps S604 to S607.

ステップＳ６０５において、ｎ＋１番目の文字が文字列における最後の文字であった場合、母集団登録部１３０は、ステップＳ６０１で入力された全ての文字列に対して、ステップＳ６０４からステップＳ６０７の処理を実行したか否かを判断する（ステップＳ６０８）。ステップＳ６０８において、全ての文字列について処理を実行していない場合、母集団登録部１３０はステップＳ６０２へ戻る。ステップＳ６０８において全ての文字列について処理を実行した場合、母集団登録部１３０は、母集団の登録の処理を終了する。 In step S605, when the (n + 1) th character is the last character in the character string, the population registration unit 130 performs the processing from step S604 to step S607 on all the character strings input in step S601. It is determined whether or not it has been done (step S608). If it is determined in step S608 that processing has not been performed for all character strings, the population registration unit 130 returns to step S602. When the process is executed for all the character strings in step S608, the population registration unit 130 ends the population registration process.

図７は、母集団データベースの一例を示す図である。 FIG. 7 is a diagram illustrating an example of a population database.

図７に示す母集団データベース２２０は、文字列と、文字列に含まれる文字と文字の位置関係を示すベクトルとが対応付けられて格納されている。図７に示す母集団データベース２２０は、母集団として入力された文字列群が例えば星座の名前であった場合を示している。 The population database 220 shown in FIG. 7 stores a character string and a vector indicating the positional relationship between the character and the character included in the character string in association with each other. The population database 220 shown in FIG. 7 shows a case where the character string group input as the population is, for example, the name of a constellation.

図７に示す母集団データベース２２０は、情報の項目として、星座名を示す文字列、星座の英語名を示す文字列、英語名の発音を示す文字列、英語名の発音を示す文字列に含まれる各文字、各文字の文字座標列表２１０における座標を含む。また母集団データベース２２０は、情報の項目として、英語名の発音を示す文字列の文字間の距離と方向、すなわち文字と文字の位置関係を示すベクトルを含む。 The population database 220 shown in FIG. 7 includes, as information items, a character string indicating the constellation name, a character string indicating the English name of the constellation, a character string indicating the pronunciation of the English name, and a character string indicating the pronunciation of the English name. And the coordinates of each character in the character coordinate sequence table 210. The population database 220 includes, as information items, a distance and direction between characters of a character string indicating pronunciation of an English name, that is, a vector indicating a positional relationship between characters.

具体的には、項目「距離１」は文字列に含まれる１番目の文字と２番目の文字との間の距離を示し、項目「方向１」は文字列に含まれる１番目の文字に対する２番目の文字の方向を示す。したがって、１番目の文字と２番目の文字の位置関係は、ベクトル（距離１，方向１）と表すことができる。 Specifically, the item “distance 1” indicates the distance between the first character and the second character included in the character string, and the item “direction 1” indicates 2 for the first character included in the character string. Indicates the direction of the second character. Therefore, the positional relationship between the first character and the second character can be expressed as a vector (distance 1, direction 1).

同様に項目「距離２」は、文字列に含まれる２番目の文字と３番目の文字との間の距離を示し、項目「方向２」は文字列に含まれる２番目の文字に対する３番目の文字の方向を示す。したがって、２番目の文字と３番目の文字の位置関係は、ベクトル（距離２，方向２）と表すことができる。 Similarly, the item “distance 2” indicates the distance between the second character and the third character included in the character string, and the item “direction 2” indicates the third character with respect to the second character included in the character string. Indicates the direction of characters. Therefore, the positional relationship between the second character and the third character can be expressed as a vector (distance 2, direction 2).

具体的には例えば、文字列「バランス」において、文字「バ」と文字「ラ」の関係は、ベクトル（５．００，０．００）で示すことができる。また文字「ラ」と文字「ン」の関係は、ベクトル（９．００，０．００）で示すことができる。また文字「ン」と文字「ス」の関係は、ベクトル（１５．１３，−７．５９）で示すことができる。よって文字列「バランス」に含まれる各文字間の関係は、ベクトルの列（５．００，０．００），（９．００，０．００），（１５．１３，−７．５９）で示すことができる。すなわち本実施例のベクトルの列は、文字列「バランス」に含まれる各文字の文字座標列表２１０における遷移の方向を順に示している。言い換えれば、本実施例のベクトルの列は、文字列「バランス」に含まれる各文字の文字座標列表２１０における遷移パターンを特定する値の組みの列である。 Specifically, for example, in the character string “balance”, the relationship between the character “B” and the character “La” can be represented by a vector (5.00, 0.00). The relationship between the character “La” and the character “N” can be represented by a vector (9.00, 0.00). The relationship between the character “n” and the character “su” can be represented by a vector (15.13, −7.59). Therefore, the relationship between the characters included in the character string “balance” is a vector sequence (5.00, 0.00), (9.00, 0.00), (15.13, −7.59). Can show. That is, the vector sequence of this embodiment sequentially indicates the transition direction in the character coordinate sequence table 210 of each character included in the character sequence “balance”. In other words, the vector sequence in this embodiment is a set of values specifying a transition pattern in the character coordinate sequence table 210 of each character included in the character sequence “balance”.

本実施例の母集団データベース２２０において、文字列から算出されたベクトルの列は、文字列と対応付けられた１つのレコードとして格納される。 In the population database 220 of this embodiment, a vector string calculated from a character string is stored as one record associated with the character string.

尚図７に示す母集団データベース２２０は、星座の名前が格納されたものとしたが、これに限定されない。母集団データベース２２０は、様々なカテゴリ毎に設けられていても良い。例えば本実施例の文字列検索装置１００は、薬品名が格納された母集団データベースや、キャラクタの名前が格納された母集団データベース等を有していても良い。 Although the population database 220 shown in FIG. 7 stores the names of constellations, the present invention is not limited to this. The population database 220 may be provided for each of various categories. For example, the character string search device 100 according to the present embodiment may include a population database storing medicine names, a population database storing character names, and the like.

次に、図８を参照して本実施例の文字列検索装置１００における検索について説明する。図８は、文字列検索装置における検索の処理を説明するフローチャートである。 Next, the search in the character string search device 100 of the present embodiment will be described with reference to FIG. FIG. 8 is a flowchart for explaining search processing in the character string search device.

本実施例の文字列検索装置１００において、入力受付部１１０が検索キーとなる文字列の入力を受け付けると（ステップＳ８０１）、距離方向算出部１２０は、変数ｎ＝０とする（ステップＳ８０２）。 In the character string search device 100 of the present embodiment, when the input receiving unit 110 receives an input of a character string serving as a search key (step S801), the distance direction calculating unit 120 sets the variable n = 0 (step S802).

図８のステップＳ８０３からステップＳ８０５までの処理は、図６のステップＳ６０４からステップＳ６０６までの処理と同様であるから、説明を省略する。 The processing from step S803 to step S805 in FIG. 8 is the same as the processing from step S604 to step S606 in FIG.

ステップＳ８０３からステップＳ８０５までの処理により、検索キーにおける文字と文字の位置関係を示すベクトルの列が算出される。 Through the processing from step S803 to step S805, a sequence of vectors indicating the positional relationship between characters in the search key is calculated.

本実施例の文字列検索装置１００において許容範囲設定部１４０は、ベクトルの列に含まれる各ベクトルから方向の値を取得し、各方向の値に予め決められた方向パラメータを設定する（ステップＳ８０６）。ステップＳ８０６では、ベクトルの列から方向の値の列が取得され、各方向の値に方向パラメータが設定される。したがってステップＳ８０６では、所定範囲の方向の値の列が取得される。 In the character string search device 100 of the present embodiment, the allowable range setting unit 140 acquires a direction value from each vector included in the vector column, and sets a predetermined direction parameter for each direction value (step S806). ). In step S806, a direction value column is acquired from the vector column, and a direction parameter is set for each direction value. Accordingly, in step S806, a sequence of values in a predetermined range of directions is acquired.

続いて検索部１５０は、母集団データベース２２０を、ステップＳ８０６で取得した所定範囲の方向の値の列で検索する（ステップＳ８０７）。 Subsequently, the search unit 150 searches the population database 220 with the sequence of values in the direction of the predetermined range acquired in step S806 (step S807).

ステップＳ８０７の検索の結果、方向の値の列が、所定範囲の方向の値の列に含まれる文字列が存在しない場合（ステップＳ８０８）、文字列検索装置１００は、後述するステップＳ８１４へ進む。ステップＳ８０７の検索の結果、該当する文字列が存在する場合、文字列抽出部１６０は、該当する文字列を抽出する（ステップＳ８０９）。 As a result of the search in step S807, if there is no character string included in the direction value string in the direction value string in the predetermined range (step S808), the character string search apparatus 100 proceeds to step S814 to be described later. If the corresponding character string exists as a result of the search in step S807, the character string extracting unit 160 extracts the corresponding character string (step S809).

続いて検索部１５０は、ベクトルの列に含まれる各ベクトルから距離の値を取得し、各距離の値に予め決められた距離パラメータを設定する（ステップＳ８１０）。ステップＳ８１０は、各ベクトルから取得され距離の値の列が取得され、各距離の値に距離パラメータが設定される。したがってステップＳ８１０では、所定範囲の距離の値の列が取得される。 Subsequently, the search unit 150 acquires a distance value from each vector included in the vector column, and sets a predetermined distance parameter for each distance value (step S810). In step S810, a sequence of distance values acquired from each vector is acquired, and a distance parameter is set to each distance value. Accordingly, in step S810, a sequence of distance values within a predetermined range is acquired.

続いて検索部１５０は、ステップＳ８０９で抽出した文字列を、ステップＳ８１０で取得した所定範囲の距離の値の列で検索する（ステップＳ８１１）。 Subsequently, the search unit 150 searches the character string extracted in step S809 with the distance value string within the predetermined range acquired in step S810 (step S811).

ステップＳ８１１の検索の結果、ステップＳ８０９で抽出された文字列において、距離の値の列が所定範囲の距離の値の列に含まれる文字列が存在しない場合（ステップＳ８１２）、文字列検索装置１００は、後述するステップＳ８１４へ進む。ステップＳ８１１の検索の結果、該当する文字列が存在する場合、文字列抽出部１６０は、該当する文字列を抽出し、出力部１７０は、抽出された文字列を検索結果として出力装置１２に表示させる（ステップＳ８１３）。ステップＳ８１３で出力される文字列は、検索キーと関連した文字列である。 As a result of the search in step S811, in the character string extracted in step S809, if there is no character string in which the distance value column is included in the distance value column in the predetermined range (step S812), the character string search device 100 Advances to step S814 to be described later. If the corresponding character string exists as a result of the search in step S811, the character string extracting unit 160 extracts the corresponding character string, and the output unit 170 displays the extracted character string on the output device 12 as the search result. (Step S813). The character string output in step S813 is a character string associated with the search key.

続いて文字列検索装置１００は、入力受付部１１０において、パラメータの調整を受け付けたか否かを判断する（ステップＳ８１４）。 Subsequently, the character string search device 100 determines whether or not parameter adjustment is accepted in the input accepting unit 110 (step S814).

ステップＳ８１４においてパラメータが調整された場合、文字列検索装置１００はステップＳ８０６の処理へ戻る。ステップＳ８１４においてパラメータが調整されない場合、文字列検索装置１００は処理を終了する。 When the parameter is adjusted in step S814, the character string search device 100 returns to the process of step S806. If the parameter is not adjusted in step S814, the character string search device 100 ends the process.

本実施例では、ステップＳ８１３で出力された検索キーと関連する文字列に、取得すべき目的の文字列が含まれている場合には、検索の処理を終了しても良い。また検索キーと関連する文字列に、目的の文字列の一部が含まれている場合には、目的の文字列の一部を検索キーとして再度検索を行っても良い。 In this embodiment, if the target character string to be acquired is included in the character string associated with the search key output in step S813, the search process may be terminated. If the character string related to the search key includes a part of the target character string, the search may be performed again using a part of the target character string as the search key.

以下に図８で説明した検索の処理について具体的に説明する。図９は、検索キーとなる文字列に含まれる文字間の方向と距離の一例を示す図である。 The search process described with reference to FIG. 8 will be specifically described below. FIG. 9 is a diagram illustrating an example of the direction and distance between characters included in a character string serving as a search key.

図９では、検索キーとして「ヤンス」という文字列が入力された場合について説明する。本実施例の距離方向算出部１２０は、文字座標列表２１０を参照し、文字「ヤ」と文字「ン」の位置関係を示すベクトルを算出する。 FIG. 9 illustrates a case where a character string “Yance” is input as a search key. The distance direction calculation unit 120 according to the present exemplary embodiment refers to the character coordinate sequence table 210 and calculates a vector indicating the positional relationship between the character “Y” and the character “N”.

図９の例では、文字「ヤ」と文字「ン」の距離は１０．００であり、文字「ヤ」に対する文字「ン」の方向を示す値は０．００である。よって文字座標列表２１０における文字「ヤ」と文字「ン」の位置関係は、ベクトル（１０．００，０．００）で示される。同様に文字「ン」と文字「ス」の距離は１５．１３であり、文字「ン」に対する文字「ス」の方向を示す値は−７．５９である。よって文字座標列表２１０における文字「ン」と文字「ス」の位置関係は、ベクトル（１５．１３，−７．５９）で示される。 In the example of FIG. 9, the distance between the character “Y” and the character “N” is 10.00, and the value indicating the direction of the character “N” with respect to the character “Y” is 0.00. Therefore, the positional relationship between the character “Y” and the character “N” in the character coordinate sequence table 210 is represented by a vector (10.00, 0.00). Similarly, the distance between the character “n” and the character “su” is 15.13, and the value indicating the direction of the character “su” with respect to the character “n” is −7.59. Therefore, the positional relationship between the character “n” and the character “su” in the character coordinate sequence table 210 is represented by a vector (15.13, −7.59).

したがって、検索キー「ヤンス」に含まれる各文字と文字の位置関係は、ベクトルの列（１０．００，０．００）、（１５．１３，−７．５９）により示される。本実施例では、図８のステップＳ８０５までの処理で、検索キーの含まれる各文字と文字の位置関係を示すベクトルの列が得られる。 Therefore, the positional relationship between each character included in the search key “Yance” is indicated by vector columns (10.00, 0.00) and (15.13, −7.59). In the present embodiment, a string of vectors indicating the positional relationship between each character included in the search key is obtained by the processing up to step S805 in FIG.

次にステップＳ８０６の処理を具体的に説明する。 Next, the process of step S806 will be specifically described.

本実施例の許容範囲設定部１４０は、ベクトルの列（１０．００，０．００）、（１５．１３，−７．５９）から、方向の値の列を取得する。各ベクトルにおける方向の値は、０．００，−７．５９であるから、ここで取得される方向の値の列は、０．００，−７．５９である。 The allowable range setting unit 140 according to the present exemplary embodiment acquires a direction value column from the vector columns (10.00, 0.00) and (15.13, −7.59). Since the value of the direction in each vector is 0.00, −7.59, the direction value column acquired here is 0.00, −7.59.

つぎに本実施例の許容範囲設定部１４０は、それぞれの方向の値に方向パラメータを設定する。本実施例では、例えば方向パラメータを±２．００とした。許容範囲設定部１４０は、それぞれの方向に値に、方向パラメータを±２．００を設定することで、方向の値を所定範囲の方向の値とする。 Next, the permissible range setting unit 140 of the present embodiment sets a direction parameter for each direction value. In this embodiment, for example, the direction parameter is set to ± 2.00. The allowable range setting unit 140 sets the direction parameter to ± 2.00 by setting the direction parameter to ± 2.00 as the value in each direction.

方向の値０．００は、方向パラメータ±２．００が設定されると、範囲−２．００〜２．００の方向の値となる。方向の値−７．５９は、方向パラメータ±２．００が設定されると、範囲−９．５９〜−５．５９の方向の値となる。 The direction value 0.00 becomes a direction value in the range of -2.00 to 2.00 when the direction parameter ± 2.00 is set. The direction value −7.59 becomes a direction value in the range −9.59 to −5.59 when the direction parameter ± 2.00 is set.

したがって方向の値の列は、（−２．００〜２．００），（−９．５９〜−５．５９）という所定範囲の方向の値の列となる。ステップＳ８０６では、この所定範囲の方向の値の列が取得される。 Therefore, the direction value column is a column of direction values in a predetermined range of (−2.00 to 2.00) and (−9.59 to −5.59). In step S806, a sequence of values in the direction of the predetermined range is acquired.

次に検索部１５０は、母集団データベース２２０から、所定範囲の方向の値の列（−２．００〜２．００），（−９．５９〜−５．５９）に方向の列の値が含まれる文字列を検索する。 Next, the search unit 150 obtains the direction column values from the population database 220 in the direction value columns (−2.00 to 2.00) and (−9.59 to −5.59) in a predetermined range. Search for the contained string.

本実施例の母集団データベース２２０において、英語名の発音を示す文字列「バランス」から得られるベクトルの列は、（５．００，０．００），（９．００，０．００），（１５．１３，−７．５９）てである。またこのベクトルの列から取得される方向の値の列は、０．００，０．００，−７．５９である。 In the population database 220 of the present embodiment, vector sequences obtained from the character string “balance” indicating pronunciation of English names are (5.00, 0.00), (9.00, 0.00), ( 15.13, -7.59). The direction value column obtained from the vector column is 0.00, 0.00, and −7.59.

この方向の値の列に含まれる列０．００，−７．５９は、所定範囲の方向の値の列（−２．００〜２．００），（−９．５９〜−５．５９）に含まれる。よってステップＳ８０９では、文字列抽出部１６０により文字列「バランス」が抽出される。 Columns 0.00 and -7.59 included in this direction value column are value columns (-2.00 to 2.00) and (-9.59 to -5.59) in a predetermined range. include. Therefore, in step S809, the character string “balance” is extracted by the character string extraction unit 160.

以上のように本実施例では、検索キーに含まれる文字と文字から得たベクトルを所定範囲の値の組みとすることで、文字座標列表２１０において検索キーに含まれる各文字の位置関係と所定の類似関係にある文字列を抽出できる。 As described above, in this embodiment, the character contained in the search key and the vector obtained from the character are set as a set of values within a predetermined range, whereby the positional relationship between each character included in the search key and the predetermined value in the character coordinate sequence table 210 are determined. Character strings with similar relationships can be extracted.

ここで、母集団データベース２２０に該当する文字列が存在しなかった場合について考える。本実施例では、この場合に所定範囲を広げるように、方向パラメータを調整（変更）することができる。 Here, consider a case where there is no corresponding character string in the population database 220. In this embodiment, the direction parameter can be adjusted (changed) so as to widen the predetermined range in this case.

具体的には例えば、方向パラメータを±２．００から±３．００とすれば、文字座標列表２１０における文字間の角度がより広い範囲である文字列を該当文字列として抽出することができる。よって母集団データベース２２０からより多くの候補となる文字列を抽出することができる。 Specifically, for example, when the direction parameter is set to ± 2.00 to ± 3.00, a character string having a wider angle between characters in the character coordinate sequence table 210 can be extracted as the corresponding character string. Therefore, more candidate character strings can be extracted from the population database 220.

ステップＳ８０７で文字列「バランス」が抽出されると、次に検索部１５０は、ベクトルの列（１０．００，０．００）、（１５．１３，−７．５９）から、距離の値の列を取得する。各ベクトルにおける距離の値は、１０．００，１５．１３であるから、ここで取得される方向の値の列は、１０．００，１５．１３である。 When the character string “balance” is extracted in step S807, the search unit 150 next calculates the distance value from the vector columns (10.00, 0.00) and (15.13, −7.59). Get the column. Since the value of the distance in each vector is 10.00, 15.13, the sequence of direction values acquired here is 10.00, 15.13.

つぎに本実施例の許容範囲設定部１４０は、それぞれの距離の値に距離パラメータを設定する。本実施例では、例えば距離パラメータを±１．００とした。許容範囲設定部１４０は、それぞれの距離の値に、距離パラメータを±１．００を設定することで、距離の値を所定範囲の距離の値とする。 Next, the allowable range setting unit 140 of the present embodiment sets a distance parameter for each distance value. In this embodiment, for example, the distance parameter is set to ± 1.00. The allowable range setting unit 140 sets the distance parameter to ± 1.00 for each distance value, thereby setting the distance value as a distance value within a predetermined range.

距離の値１０．００は、距離パラメータ±１．００が設定されると、範囲９．００〜１１．００の距離の値となる。距離の値１５．１３は、距離パラメータ±１．００が設定されると、範囲１４．１３〜１６．１３の距離の値となる。 The distance value 10.00 is a distance value in the range 9.00 to 11.00 when the distance parameter ± 1.00 is set. The distance value 15.13 is a distance value in the range 14.13 to 16.13 when the distance parameter ± 1.00 is set.

したがって距離の値の列は、（９．００〜１１．００），（１４．１３〜１６．１３）という所定範囲の距離の値の列となる。ステップＳ８１０では、この所定範囲の距離の値の列が取得される。 Therefore, the column of distance values is a column of distance values in a predetermined range of (9.00 to 11.00) and (14.13 to 16.13). In step S810, a sequence of distance values within the predetermined range is acquired.

次に検索部１５０は、ステップＳ８０９で抽出された文字列から、所定範囲の距離の値の列（９．００〜１１．００），（１４．１３〜１６．１３）に距離の値の列が含まれる文字列を検索する。 Next, the search unit 150 converts the distance value column from the character string extracted in step S809 into a predetermined range of distance value columns (9.00 to 11.00) and (14.13 to 16.13). Search for strings that contain.

ステップＳ８０９で抽出された文字列は、「バランス」である。「バランス」の距離の値の列は、１０．００，１５．１３であり、所定範囲の距離の値の列（９．００〜１１．００），（１４．１３〜１６．１３）に含まれる。 The character string extracted in step S809 is “balance”. The “balance” distance value columns are 10.00 and 15.13, and are included in the distance value columns (9.00 to 11.00) and (14.13 to 16.13) in the predetermined range. It is.

よって出力部１７０は、検索キー「ヤンス」と関連する文字列として、「バランス」を出力装置１２に表示させる。 Therefore, the output unit 170 displays “balance” on the output device 12 as a character string related to the search key “Yance”.

ここで、ステップＳ８０９で抽出された文字列に該当する文字列が存在しなかった場合について考える。本実施例では、この場合に所定範囲を広げるように、距離パラメータを調整（変更）しても良い。距離パラメータの範囲を広げれば、文字座標列表２１０におけるける文字間の距離がより広い範囲である文字列を該当文字列として抽出できる。 Here, consider a case where there is no character string corresponding to the character string extracted in step S809. In this embodiment, the distance parameter may be adjusted (changed) so as to widen the predetermined range in this case. If the range of the distance parameter is expanded, a character string having a wider distance between characters in the character coordinate string table 210 can be extracted as the corresponding character string.

また本実施例では、方向パラメータの範囲と距離パラメータの範囲とを狭くするように調整することもできる。 In this embodiment, the range of the direction parameter and the range of the distance parameter can be adjusted to be narrow.

本実施例において、例えば検索キーと関連する文字列として、複数の文字列が抽出された場合、ステップＳ８１４において方向パラメータの範囲と距離パラメータの範囲を狭くすれば、文字と文字の位置関係がより検索キーと類似した文字列を抽出することができる。 In this embodiment, for example, when a plurality of character strings are extracted as character strings related to the search key, if the range of the direction parameter and the range of the distance parameter are narrowed in step S814, the positional relationship between the characters is more improved. A character string similar to the search key can be extracted.

また本実施例では、方向パラメータと距離パラメータとを設定してから検索部１５０による検索を行うものとしたが、これに限定されない。 In the present embodiment, the search is performed by the search unit 150 after setting the direction parameter and the distance parameter. However, the present invention is not limited to this.

例えば文字列検索装置１００は、最初に検索キーから得られたベクトルの列を用いて母集団データベース２２０を検索しても良い。この検索により抽出される文字列は、検索キーと同じ文字列を含む文字列となる。また文字列検索装置１００は、検索キーから得られたベクトルの列を用いた検索において、該当する文字列が存在しなかった場合に、方向パラメータと距離パラメータを設定し、再度検索を行っても良い。また設定するパラメータは、方向パラメータ又は距離パラメータの何れか一方であっても良い。 For example, the character string search device 100 may search the population database 220 using a vector string obtained from a search key first. The character string extracted by this search is a character string including the same character string as the search key. Further, the character string search device 100 may set a direction parameter and a distance parameter and perform a search again when a corresponding character string does not exist in a search using a vector string obtained from a search key. good. The parameter to be set may be either a direction parameter or a distance parameter.

また本実施例では、母集団データベース２２０は、検索対象となる文字列群のベクトルの列が母集団登録部１３０により予め登録されているものとして説明したが、これに限定されない。文字列検索装置１００は、例えばる文字列群のみが予め母集団データベース２２０に格納されており、検索の処理と並行して文字列群に含まれる各文字列のベクトルの列を算出しても良い。 In this embodiment, the population database 220 has been described on the assumption that the vector sequence of the character string group to be searched is registered in advance by the population registration unit 130, but the present invention is not limited to this. In the character string search device 100, for example, only the character string group is stored in the population database 220 in advance, and the vector string of each character string included in the character string group is calculated in parallel with the search processing. good.

以下に図１０、図１１を参照し、本実施例における検索キーの入力画面と、検索結果の出力画面について説明する。 The search key input screen and search result output screen in this embodiment will be described below with reference to FIGS.

図１０は、検索キーの入力画面の一例を示す図である。図１０に示す入力画面１０１は、検索キーとなる文字列の入力欄１０２と、パラメータの範囲を設定する設定バー１０３と、検索の実行を指示する指示ボタン１０４、１０５とが表示されている。 FIG. 10 is a diagram illustrating an example of a search key input screen. An input screen 101 shown in FIG. 10 displays a character string input field 102 as a search key, a setting bar 103 for setting a parameter range, and instruction buttons 104 and 105 for instructing execution of a search.

本実施例では、例えば設定バー１０３上のスライダ１０３ａを上下させることで、パラメータの値が調整されても良い。尚図１０の例では、パラメータを設定する設定バー１０３は１つのみ表示されるものとしたが、設定バー１０３は、方向パラメータと距離パラメータのそれぞれと対応して２つ表示されても良い。または図１０に示す設定バー１０３により設定された範囲が、方向パラメータと距離パラメータの両方に設定されても良い。 In this embodiment, the parameter value may be adjusted by moving the slider 103a on the setting bar 103 up and down, for example. In the example of FIG. 10, only one setting bar 103 for setting parameters is displayed. However, two setting bars 103 may be displayed corresponding to each of the direction parameter and the distance parameter. Alternatively, the range set by the setting bar 103 shown in FIG. 10 may be set for both the direction parameter and the distance parameter.

また本実施例の文字列検索装置１００は、入力画面１０１において指示ボタン１０４が操作された際は、パラメータを設定せずに検索を行っても良い。また本実施例の文字列検索装置１００は、入力画面１０１において指示ボタン１０５が操作された際は、設定バー１０３で設定されたパラメータを用いて検索を行っても良い。 The character string search apparatus 100 according to the present embodiment may perform a search without setting parameters when the instruction button 104 is operated on the input screen 101. The character string search apparatus 100 according to the present embodiment may perform a search using the parameters set on the setting bar 103 when the instruction button 105 is operated on the input screen 101.

図１１は、検索結果が表示された出力画面の一例を示す図である。 FIG. 11 is a diagram illustrating an example of an output screen on which search results are displayed.

本実施例の出力画面１１１には、検索キーと関連する文字列が表示される表示欄１１２と、検索の結果抽出された文字列の数を表示するメッセージ１１３とが表示される。また本実施例の出力画面１１１には、再検索の実行を指示する指示ボタン１０６が表示される。本実施例の文字列検索装置１００は、例えば設定バー１０３でパラメータの範囲が調整された後に指示ボタン１０６が操作されたとき、再検索を実行しても良い。また本実施例の文字列検索装置１００は、例えば入力欄１０２に新たな検索キーが入力された後に指示ボタン１０６が操作されたとき、検索を実行しても良い。 On the output screen 111 of this embodiment, a display column 112 for displaying a character string related to the search key and a message 113 for displaying the number of character strings extracted as a result of the search are displayed. In addition, an instruction button 106 for instructing execution of re-searching is displayed on the output screen 111 of this embodiment. The character string search device 100 according to the present embodiment may perform a re-search when the instruction button 106 is operated after the parameter range is adjusted by the setting bar 103, for example. Further, the character string search apparatus 100 according to the present embodiment may execute a search when the instruction button 106 is operated after a new search key is input in the input field 102, for example.

以上のように本実施例の文字列検索装置１００は、検索対象となる文字列群について、文字座標列表２１０に基づき、各文字列における文字と文字の位置関係を示すベクトルを予め算出し、母集団データベース２２０に登録する。 As described above, the character string search device 100 according to the present embodiment calculates in advance a vector indicating the positional relationship between characters in each character string based on the character coordinate string table 210 for the character string group to be searched, and Register in the collective database 220.

また本実施例の文字列検索装置１００は、検索キーが入力されると、検索対象となる文字列群を母集団データベース２２０に登録する際に用いた文字座標列表２１０に基づき、検索キーに含まれる文字と文字の位置関係を示すベクトルを算出する。 In addition, when a search key is input, the character string search apparatus 100 according to the present embodiment is included in the search key based on the character coordinate string table 210 used when registering the character string group to be searched in the population database 220. A vector indicating the positional relationship between the characters to be read is calculated.

そして本実施例の文字列検索装置１００は、検索キーから得られたベクトルを用いて、母集団データベース２２０を検索する。 The character string search apparatus 100 according to the present embodiment searches the population database 220 using the vector obtained from the search key.

すなわち本実施例では、五十音表に基づく文字座標列表２１０における文字と文字の位置関係を検索に用いることで、検索キーが検索により得られる目的の文字列と一致していなくても、検索キーと関連する文字列を検索することができる。したがって本実施例では、目的の文字列の一部を正確に入力できない場合でも、検索を行うことができる。 That is, in this embodiment, by using the positional relationship between characters in the character coordinate sequence table 210 based on the Japanese syllabary table for the search, even if the search key does not match the target character string obtained by the search, the search is performed. You can search for strings associated with keys. Therefore, in this embodiment, a search can be performed even when a part of the target character string cannot be accurately input.

本実施例の文字列検索装置１００は、例えば図書館における蔵書の検索や、薬品名や外国の地名等のカタカナ文字で類似した名称が多い文字列群（データベース）における所望の文字列の検索等に用いることができる。これらの検索に本実施例の文字列検索装置１００を用いることで、蔵書の名称や薬品の名称、地名等を正確に記憶していなくても、正しい名称や地名の候補を検索キーと関連する文字列とてして抽出することができる。 The character string search apparatus 100 according to the present embodiment is used for, for example, searching a library or searching for a desired character string in a character string group (database) having many similar names in katakana characters such as drug names and foreign place names. Can be used. By using the character string search apparatus 100 of the present embodiment for these searches, even if the names of collections, names of medicines, place names, etc. are not accurately stored, correct names and place name candidates are related to the search keys. It can be extracted as a character string.

また本実施例の文字列検索装置１００は、例えば母集団データベースを教材に関する情報が格納されたデータベースとしても良い。この場合、問いを検索キーとして入力させ、検索結果として出力された検索キーと関連する文字列を問いに対するヒントとして提供しても良い。 Moreover, the character string search apparatus 100 of a present Example is good also considering the population database as the database in which the information regarding a teaching material was stored, for example. In this case, a question may be input as a search key, and a character string related to the search key output as a search result may be provided as a hint for the question.

開示の技術では、以下に記載する付記のような形態が考えられる。
（付記１）
第一の文字列に含まれる各文字の母音成分の遷移パターンを特定し、
特定した前記遷移パターンと所定の類似関係を満たす遷移パターンで各文字の母音成分が遷移する第二の文字列を特定し、
前記第二の文字列を前記第一の文字列の関連文字列として出力するか、又は前記第二の文字列を検索キーとした検索を実行する、
処理をコンピュータに実行させる文字列検索プログラム。
（付記２）
第一の文字列に含まれる各文字の五十音表に基づく文字の配列における遷移の方向と順を示すベクトルの列を特定し、
前記ベクトルの列と所定の類似関係を満たすベクトルの列によって前記配列における遷移の方向と順が示される第二の文字列を特定し、
前記第二の文字列を前記第一の文字列の関連文字列として出力するか、又は前記第二の文字列を検索キーとした検索を実行する、
処理をコンピュータに実行させる文字列検索プログラム。
（付記３）
前記ベクトルは、
前記配列における第一の文字と第二の文字との間の距離を示す値と、前記第一の文字に対する前記第二の文字の角度を示す値との組みであり、
前記所定の類似関係を満たすベクトルの列は、
前記第一の文字列に含まれる各文字から得られたベクトルの前記距離を示す値と前記角度の値との差分が所定の範囲内のベクトルの列である付記２記載の文字列検索プログラム。
（付記４）
前記五十音表に基づく文字の配列は、清音と濁音とを含む付記２又は３記載の文字列検索プログラム。
（付記５）
前記第二の文字列を特定する処理は、
文字列群の文字列毎に、前記文字列に含まれる各文字から得られた前記配列におけるベクトルの列が格納された記憶部において、前記第一の文字列から特定されたベクトルの列と前記所定の類似関係を満たすベクトルの列と対応する文字列を第二の文字列に特定する付記２乃至４の何れか一項に記載の文字列検索プログラム。
（付記６）
入力された前記文字列群の文字列毎に、前記文字列に含まれる各文字の前記配列におけるベクトルの列を算出し、
前記文字列と前記ベクトルの列とを対応させて前記記憶部に格納する処理をコンピュータに実行させる付記５記載の文字列検索プログラム。
（付記７）
コンピュータによる文字列検索方法であって、該コンピュータが、
第一の文字列に含まれる各文字の母音成分の遷移パターンを特定し、
特定した前記遷移パターンと所定の類似関係を満たす遷移パターンで各文字の母音成分が遷移する第二の文字列を特定し、
前記第二の文字列を前記第一の文字列の関連文字列として出力するか、又は前記第二の文字列を検索キーとした検索を実行する文字列検索方法。
（付記８）
コンピュータによる文字列検索方法であって、該コンピュータが、
第一の文字列に含まれる各文字の五十音表に基づく文字の配列における遷移の方向と順を示すベクトルの列を特定し、
前記ベクトルの列と所定の類似関係を満たすベクトルの列によって前記配列における遷移の方向と順が示される第二の文字列を特定し、
前記第二の文字列を前記第一の文字列の関連文字列として出力するか、又は前記第二の文字列を検索キーとした検索を実行する文字列検索方法。
（付記９）
第一の文字列に含まれる各文字の母音成分の遷移パターンを特定する第一特定部と、
特定した前記遷移パターンと所定の類似関係を満たす遷移パターンで各文字の母音成分が遷移する第二の文字列を特定する第二特定部と、
前記第二の文字列を前記第一の文字列の関連文字列として出力するか、又は前記第二の文字列を検索キーとした検索を実行する制御部と、を有する文字列検索装置。
（付記１０）
第一の文字列に含まれる各文字の五十音表に基づく文字の配列における遷移の方向と順を示すベクトルの列を特定する第一特定部と、
前記ベクトルの列と所定の類似関係を満たすベクトルの列によって前記配列における遷移の方向と順が示される第二の文字列を特定する第二特定部と、
前記第二の文字列を前記第一の文字列の関連文字列として出力するか、又は前記第二の文字列を検索キーとした検索を実行する制御部と、を有する文字列検索装置。 In the disclosed technology, forms such as the following supplementary notes are conceivable.
(Appendix 1)
Identify the transition pattern of the vowel component of each character in the first string,
Identify a second character string in which the vowel component of each character transitions in a transition pattern that satisfies a predetermined similarity relationship with the identified transition pattern;
Outputting the second character string as a related character string of the first character string, or executing a search using the second character string as a search key;
A string search program that causes a computer to execute processing.
(Appendix 2)
Identify a vector string indicating the direction and order of transition in the character array based on the alphabetic table of each character included in the first character string,
Identifying a second character string indicating the direction and order of transition in the array by a vector sequence satisfying a predetermined similarity relationship with the vector sequence;
Outputting the second character string as a related character string of the first character string, or executing a search using the second character string as a search key;
A string search program that causes a computer to execute processing.
(Appendix 3)
The vector is
A combination of a value indicating the distance between the first character and the second character in the array and a value indicating the angle of the second character with respect to the first character;
A sequence of vectors satisfying the predetermined similarity relationship is
The character string search program according to supplementary note 2, wherein a difference between a value indicating the distance of the vector obtained from each character included in the first character string and the value of the angle is a vector string within a predetermined range.
(Appendix 4)
The character string search program according to supplementary note 2 or 3, wherein the character arrangement based on the Japanese syllabary table includes clear sound and muddy sound.
(Appendix 5)
The process of specifying the second character string is as follows:
For each character string in the character string group, in a storage unit storing a vector string in the array obtained from each character included in the character string, the vector string specified from the first character string and the The character string search program according to any one of supplementary notes 2 to 4, wherein a character string corresponding to a vector string satisfying a predetermined similarity relationship is specified as a second character string.
(Appendix 6)
For each character string of the input character string group, calculate a sequence of vectors in the array of each character included in the character string,
The character string search program according to appendix 5, which causes a computer to execute a process of storing the character string and the vector string in association with each other in the storage unit.
(Appendix 7)
A character string search method by a computer, wherein the computer
Identify the transition pattern of the vowel component of each character in the first string,
Identify a second character string in which the vowel component of each character transitions in a transition pattern that satisfies a predetermined similarity relationship with the identified transition pattern;
A character string search method for outputting the second character string as a related character string of the first character string or executing a search using the second character string as a search key.
(Appendix 8)
A character string search method by a computer, wherein the computer
Identify a vector string indicating the direction and order of transition in the character array based on the alphabetic table of each character included in the first character string,
Identifying a second character string indicating the direction and order of transition in the array by a vector sequence satisfying a predetermined similarity relationship with the vector sequence;
A character string search method for outputting the second character string as a related character string of the first character string or executing a search using the second character string as a search key.
(Appendix 9)
A first specifying unit that specifies a transition pattern of a vowel component of each character included in the first character string;
A second specifying unit that specifies a second character string in which a vowel component of each character transitions in a transition pattern that satisfies a predetermined similarity relationship with the specified transition pattern;
A character string search device comprising: a control unit that outputs the second character string as a related character string of the first character string or executes a search using the second character string as a search key.
(Appendix 10)
A first specifying unit for specifying a vector sequence indicating a direction and order of transition in an arrangement of characters based on the alphabetic table of each character included in the first character string;
A second specifying unit that specifies a second character string indicating a direction and order of transition in the array by a vector sequence satisfying a predetermined similarity relationship with the vector sequence;
A character string search device comprising: a control unit that outputs the second character string as a related character string of the first character string or executes a search using the second character string as a search key.

本発明は、具体的に開示された実施例に限定されるものではなく、特許請求の範囲から
逸脱することなく、種々の変形や変更が可能である。 The present invention is not limited to the specifically disclosed embodiments, and various modifications and changes can be made without departing from the scope of the claims.

１００文字列検索装置
１１０入力受付部
１２０距離方向算出部
１３０母集団登録部
１４０許容範囲設定部
１５０検索部
１６０文字列抽出部
１７０出力部
２１０文字座標列表
２２０母集団データベース DESCRIPTION OF SYMBOLS 100 Character string search device 110 Input reception part 120 Distance direction calculation part 130 Population registration part 140 Permissible range setting part 150 Search part 160 Character string extraction part 170 Output part 210 Character coordinate sequence table 220 Population database

Claims

When the first character string is received, the distance and direction between the characters in the array table in which each character is arranged at a predetermined position based on the relationship between the vowel and the consonant in the order of each character included in the first character string. Identifying a vector string to be identified, identifying a second character string that is a vector string satisfying a predetermined similarity relationship,
Output the second character string, or execute a search using the second character string as a search key.
A string search program that causes a computer to execute processing.

Identify a vector string indicating the direction and order of transition in the character array based on the alphabetic table of each character included in the first character string,
Identifying a second character string indicating the direction and order of transition in the array by a vector sequence satisfying a predetermined similarity relationship with the vector sequence;
Outputting the second character string as a related character string of the first character string, or executing a search using the second character string as a search key;
A string search program that causes a computer to execute processing.

The vector is
A combination of a value indicating the distance between the first character and the second character in the array and a value indicating the angle of the second character with respect to the first character;
A sequence of vectors satisfying the predetermined similarity relationship is
3. The character string search program according to claim 2, wherein a difference between a value indicating the distance of the vector obtained from each character included in the first character string and the value of the angle is a vector string within a predetermined range. .

The character string search program according to claim 2 or 3, wherein the character arrangement based on the Japanese syllabary table includes clear sound and muddy sound.

A character string search method by a computer, wherein the computer
When the first character string is received, the distance and direction between the characters in the array table in which each character is arranged at a predetermined position based on the relationship between the vowel and the consonant in the order of each character included in the first character string. Identifying a vector string to be identified, identifying a second character string that is a vector string satisfying a predetermined similarity relationship,
A character string search method for outputting the second character string or performing a search using the second character string as a search key.

A character string search method by a computer, wherein the computer
Identify a vector string indicating the direction and order of transition in the character array based on the alphabetic table of each character included in the first character string,
Identifying a second character string indicating the direction and order of transition in the array by a vector sequence satisfying a predetermined similarity relationship with the vector sequence;
A character string search method for outputting the second character string as a related character string of the first character string or executing a search using the second character string as a search key.

When the first character string is received, the distance and direction between the characters in the array table in which each character is arranged at a predetermined position based on the relationship between the vowel and the consonant in the order of each character included in the first character string. A specifying unit that specifies a vector string to be shown and specifies a second character string that is a vector string that satisfies a predetermined similarity relationship;
A character string search device comprising: a control unit that outputs the second character string or executes a search using the second character string as a search key.

A first specifying unit for specifying a vector sequence indicating a direction and order of transition in an arrangement of characters based on the alphabetic table of each character included in the first character string;
A second specifying unit that specifies a second character string indicating a direction and order of transition in the array by a vector sequence satisfying a predetermined similarity relationship with the vector sequence;
A character string search device comprising: a control unit that outputs the second character string as a related character string of the first character string or executes a search using the second character string as a search key.