JP2011090352A

JP2011090352A - Retrieval data management device

Info

Publication number: JP2011090352A
Application number: JP2009240992A
Authority: JP
Inventors: Masajiro Iwasaki; 雅二郎岩崎
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2009-10-20
Filing date: 2009-10-20
Publication date: 2011-05-06
Anticipated expiration: 2029-10-20
Also published as: JP5014399B2

Abstract

<P>PROBLEM TO BE SOLVED: To quickly register data in a data group as an object of retrieval of similar data without complicating a retrieval index. <P>SOLUTION: When a plurality of pieces of registration data are input, similar data about each registration data are quickly retrieved by graph retrieval processing of a graph retrieval part 40. Also, a similar data acquisition part 51 in a data registration part 50 inputs registration data to a graph retrieval part 40, and obtains a retrieval result (similar data) about each registration data. Then, a merge part 53 generates a link between the registration data and the similar data. Also, the merge part 53 generates a link between the pieces of registration data, and reads link information of the similar data from a graph index DB70, and merges it on a temporary memory 80. A link optimization part 55 reduces the number of links on the basis of the number of links and the length of the link. Then, database registration is performed by optimizing the number of links generated by additional registration. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、類似するデータをデータベースから検索すると共に、そのデータベースに新たなデータを登録するための装置に関するものである。 The present invention relates to an apparatus for retrieving similar data from a database and registering new data in the database.

画像や音声等から抽出される特徴量は、多次元のベクトルデータで表され、各ベクトルデータ間の距離等を用いて類似データの検索が行われる。多次元ベクトルデータの検索において、検索対象のデータ数が多い場合、検索キーとなるデータ（検索データ）と検索対象のデータとの距離を逐一計算しなければならないため、検索に要する時間が非常に長くなってしまう。 A feature amount extracted from an image, sound, or the like is represented by multidimensional vector data, and similar data is searched using a distance between the vector data. When searching for multi-dimensional vector data, if the number of search target data is large, the distance between the search key data (search data) and the search target data must be calculated one by one. It will be long.

特許文献１には、次のような技術が記載されている。即ち、データベース内の全ての特徴量について、他の特徴量との間の類似度を予め計算し、類似度の高いデータをそれぞれ関連付けて格納しておく。そして、与えられた検索キーの最近傍を求めてその最近傍のデータに対応する類似データを検索結果として返す。 Patent Document 1 describes the following technique. That is, for all feature quantities in the database, the similarity with other feature quantities is calculated in advance, and data with high similarity is stored in association with each other. Then, the nearest neighbor of the given search key is obtained, and similar data corresponding to the nearest neighbor data is returned as a search result.

特開２００１−５２０２４号公報JP 2001-52024 A

ところで、検索対象のデータ群（データベース）に新たなデータを登録しようとするときは、データ間の類似度を考慮したインデックスを予め付与しておくことで、検索の高速性を担保できる。このため、その登録データに対する類似データをデータベース中から検索する必要がある。 By the way, when new data is to be registered in a data group (database) to be searched, an index considering the degree of similarity between the data is assigned in advance, thereby ensuring high speed search. For this reason, it is necessary to search similar data for the registered data from the database.

しかし、特許文献１の技術では、検索の高速化のために、各データに対して全データを類似度順に記憶しておく必要があるが、データベースの全データとの類似度を事前計算する必要があり、検索対象のデータが膨大になると、データ数に応じて検索にかかる時間が長くなってしまった。また、データ登録を行う都度、データ間の類似度を算出して、インデックスの見直しを行う必要があるため、データ数が増えるほど、そのインデックスの構築は煩雑になってしまう。 However, in the technique of Patent Document 1, in order to speed up the search, it is necessary to store all data in order of similarity for each data, but it is necessary to pre-calculate the similarity to all data in the database. When the search target data is enormous, the time required for the search becomes longer depending on the number of data. In addition, every time data is registered, it is necessary to calculate the similarity between the data and review the index. Therefore, as the number of data increases, the construction of the index becomes complicated.

本発明は、上述の課題に鑑みて為されたものであり、その目的とするところは、類似データの検索対象であるデータ群に、検索インデックスを煩雑にすることなく高速にデータを登録することである。 The present invention has been made in view of the above-described problems, and an object of the present invention is to register data at high speed without complicating a search index in a data group to be searched for similar data. It is.

上記目的を達成するため、第１の発明の検索データ管理装置は、
一方のデータから他方のデータに辿るためのリンクがデータ間に設定されて該データが複数記憶されたデータベースと、
入力された検索対象のデータに対して、前記データベースに記憶されたデータのうちの何れかを検索開始点として選択し、該検索開始点から前記リンクを順次辿ることによって、該リンクを辿る経路上にあるデータのうち、検索対象のデータと既定の距離内にあるデータを類似データとして検索し出力するグラフ検索手段と、
前記データベースへの登録対象となるデータを前記検索対象として前記グラフ検索手段に入力することにより、該登録対象のデータに対する類似データを取得し、該登録対象のデータと類似データとの間にリンクを生成する類似データ取得手段と、
複数の登録対象データに対して前記類似データ取得手段がそれぞれ生成したリンクをマージするマージ手段と、
前記マージ手段がマージしたリンクと、前記リンクが生成されたデータに前記データベース上で既に設定されたリンクとに削減処理を行うリンク最適化手段と、
前記削減処理の結果に基づいて前記データベースのリンク情報を更新すると共に、前記登録対象のデータを前記データベースに記憶するデータ更新手段と、
を備えることを特徴としている。 In order to achieve the above object, the search data management device according to the first invention provides:
A database in which a link for tracing from one data to the other data is set between the data and a plurality of the data is stored;
For the input search target data, select any of the data stored in the database as a search start point, and sequentially follow the link from the search start point to A graph search means for searching and outputting the data within a predetermined distance from the data to be searched as similar data among the data in
By inputting the data to be registered in the database as the search target to the graph search means, the similar data for the registration target data is obtained, and a link is established between the registration target data and the similar data. Similar data acquisition means to be generated;
Merge means for merging the links respectively generated by the similar data acquisition means for a plurality of registration target data;
Link optimization means for performing a reduction process on the link merged by the merging means and the link already set on the database in the data generated by the link;
Updating the link information of the database based on the result of the reduction process, and storing the data to be registered in the database;
It is characterized by having.

また、第２の発明において、前記リンク最適化手段は、
前記マージ手段がマージしたリンクと、前記リンクが生成されたデータに前記データベース上で既に設定されたリンクとの総数に基づいて、前記削減処理を行うことを特徴としている。 In the second invention, the link optimization means includes:
The reduction processing is performed based on the total number of links merged by the merging unit and links already set on the database in the data in which the links are generated.

また、第３の発明において、前記リンク最適化手段は、
前記マージ手段がマージしたリンクと、前記リンクが生成されたデータに前記データベース上で既に設定されたリンクとの長さに基づいて、前記削減処理を行うことを特徴としている。 In the third invention, the link optimization means includes:
The reduction processing is performed based on the length of the link merged by the merging means and the link already set on the database in the data in which the link is generated.

また、第４の発明において、前記リンク生成手段は、
前記複数の登録対象のデータのそれぞれの間にリンクを更に生成することを特徴としている。 In the fourth invention, the link generation means includes:
A link is further generated between each of the plurality of registration target data.

また、第５の発明は、一方のデータから他方のデータに辿るためのリンクがデータ間に設定されて該データが複数記憶されたデータベースをコンピュータが検索対象のデータ群としてデータ管理する検索データ管理方法において、
入力された検索対象のデータに対して、前記データベースに記憶されたデータのうちの何れかを検索開始点として選択し、該検索開始点から前記リンクを順次辿ることによって、該リンクを辿る経路上にあるデータのうち、検索対象のデータと既定の距離内にあるデータを類似データとして検索し出力するグラフ検索工程と、
前記データベースへの登録対象となるデータを前記検索対象として前記グラフ検索手段に入力することにより、該登録対象のデータに対する類似データを取得し、該登録対象のデータと類似データとの間にリンクを生成する類似データ取得工程と、
複数の登録対象データに対して前記類似データ取得工程がそれぞれ生成したリンクをマージするマージ工程と、
前記マージ工程においてマージしたリンクと、前記リンクが生成されたデータに前記データベース上で既に設定されたリンクとに削減処理を行うリンク最適化工程と、
前記削減処理の結果に基づいて前記データベースのリンク情報を更新すると共に、前記登録対象のデータを前記データベースに記憶するデータ更新工程と、
を含むことを特徴としている。 According to a fifth aspect of the present invention, there is provided a search data management in which a computer for managing a database in which a plurality of data is stored by setting a link for tracing from one data to the other data as a data group to be searched. In the method
For the input search target data, select any of the data stored in the database as a search start point, and sequentially follow the link from the search start point to A graph search step of searching for and outputting similar data to data within a predetermined distance from the data to be searched among the data in
By inputting the data to be registered in the database as the search target to the graph search means, the similar data for the registration target data is obtained, and a link is established between the registration target data and the similar data. A similar data acquisition process to be generated;
A merge step of merging the links respectively generated by the similar data acquisition step for a plurality of registration target data;
A link optimization step for performing a reduction process on the link merged in the merging step and a link already set on the database in the data in which the link is generated;
A data update step of updating the link information of the database based on the result of the reduction process and storing the registration target data in the database;
It is characterized by including.

また、第６の発明のプログラムは、上記第５の発明に記載の検索データ管理方法の処理をコンピュータに実行させることを特徴としている。 According to a sixth aspect of the invention, there is provided a program for causing a computer to execute the processing of the search data management method according to the fifth aspect of the invention.

本発明によれば、データベースに複数のデータを登録する際には、類似データとの間に生成したリンクを用いてリンクの削減をしながら登録するため、類似データの検索対象であるデータ群に、検索インデックスを煩雑にすることなく高速にデータを登録することができる。また、データ登録によってデータ間のリンク構造が煩雑になる逐次防止できるため、データ登録による検索速度の低下を防止することが可能になる。 According to the present invention, when registering a plurality of data in the database, registration is performed while reducing links using links generated between similar data. The data can be registered at high speed without complicating the search index. In addition, since data registration can sequentially prevent the link structure between data from becoming complicated, it is possible to prevent a decrease in search speed due to data registration.

〔第１実施形態の装置構成〕
本発明を適用した第１実施形態に係る検索装置を、添付の図面に基づいて説明する。尚、本発明の内容は、本実施形態に限定されるものではなく、特許請求の範囲に記載された範囲内において、具体的な構成に対して種々の変更を加えうるものである。 [Apparatus Configuration of First Embodiment]
A search device according to a first embodiment to which the present invention is applied will be described with reference to the accompanying drawings. The contents of the present invention are not limited to the present embodiment, and various modifications can be made to the specific configuration within the scope described in the claims.

また、本実施形態では、ベクトルデータとして、画像の特徴量を表す多次元ベクトルデータを扱う例を説明する。但し、ベクトルデータとしては、音声やその他のマルチメディアデータ情報の特徴量を表すものであってもよいし、他の種類の多次元データであってもよい。 In the present embodiment, an example will be described in which multidimensional vector data representing the feature amount of an image is handled as vector data. However, the vector data may represent features of voice and other multimedia data information, or may be other types of multidimensional data.

図１は、本発明の一例である検索装置１の機能構成を示すブロック図である。検索装置１は、一つのハードウエアあるいはソフトウエアにより構成される必要はない。必要に応じて、複数のハードウエアあるいはソフトウエアの組み合わせにより検索装置１に相当する機能を提供することができる。ネットワーク上に点在する複数のサーバにより、本実施形態の検索装置１を構成することもできる。 FIG. 1 is a block diagram showing a functional configuration of a search device 1 which is an example of the present invention. The search device 1 does not need to be configured by a single piece of hardware or software. If necessary, a function corresponding to the search device 1 can be provided by a combination of a plurality of hardware or software. The search device 1 of this embodiment can also be configured by a plurality of servers scattered on the network.

検索装置１は、クライアント端末から送信される各種要求に応じてデータ処理を行って、その処理結果をクライアント端末に返送する。具体的に、検索装置１は、検索要求である検索クエリを受信すると、検索クエリに応じた検索を行って検索結果をクライアント端末に返送する。また、検索対象のデータを格納したデータベース（以下「ＤＢ」と略す）に新たなデータを登録する要求を受信した場合にはＤＢに登録する。 The search device 1 performs data processing in response to various requests transmitted from the client terminal, and returns the processing result to the client terminal. Specifically, when receiving a search query that is a search request, the search device 1 performs a search according to the search query and returns a search result to the client terminal. When a request for registering new data in a database (hereinafter abbreviated as “DB”) that stores data to be searched is received, it is registered in the DB.

［クライアント端末の構成］
クライアント端末の図示は省略するが、キーボードやマウスやタッチパッドといった入力手段と、ディスプレイやプリンタといった出力手段と、ＣＰＵ等を備えたパーソナルコンピュータにより構成される。クライアント端末は、ネットワークを介して検索装置１と接続されている。また、クライアント端末は、ユーザに指定された検索データや登録データを検索装置１に送る。 [Client terminal configuration]
Although illustration of the client terminal is omitted, the client terminal is configured by a personal computer including an input unit such as a keyboard, a mouse, and a touch pad, an output unit such as a display and a printer, and a CPU. The client terminal is connected to the search device 1 via a network. Further, the client terminal sends search data and registration data designated by the user to the search device 1.

ここで、検索データとは、検索装置１のデータベースに記憶されたデータの中から類似するデータを検索するためのキーとなるデータである。また、登録データとは、検索装置１のデータベースに新たに登録するデータである。尚、検索データ及び登録データから抽出したベクトルデータも以下「検索データ」、「登録データ」という。 Here, the search data is data serving as a key for searching for similar data from data stored in the database of the search device 1. Registration data is data newly registered in the database of the search device 1. The vector data extracted from the search data and registration data is also referred to as “search data” and “registration data”.

［検索装置の構成］
図１によれば、検索装置１は、登録データ入力部１０と、検索データ入力部２０と、検索結果出力部３０と、グラフ検索部４０と、データ登録部５０と、画像ＤＢ６０と、グラフインデックスＤＢ７０と、一時メモリ８０とを備えて構成される。 [Configuration of search device]
According to FIG. 1, the search device 1 includes a registration data input unit 10, a search data input unit 20, a search result output unit 30, a graph search unit 40, a data registration unit 50, an image DB 60, a graph index. A DB 70 and a temporary memory 80 are provided.

登録データ入力部１０は、クライアント端末から送信される登録データを受け付ける機能部であり、データの登録要求と共に登録データを受信すると、該登録データをデータ登録部５０に入力する。 The registration data input unit 10 is a functional unit that receives registration data transmitted from a client terminal. When registration data is received together with a data registration request, the registration data input unit 10 inputs the registration data to the data registration unit 50.

検索データ入力部２０は、クライアント端末から送信される検索データを受け付けるための機能部であり、検索要求と共に検索データを受信すると、その検索データをグラフ検索部４０に入力する。 The search data input unit 20 is a functional unit for receiving the search data transmitted from the client terminal. When the search data is received together with the search request, the search data input unit 20 inputs the search data to the graph search unit 40.

検索結果出力部３０は、検索データ入力部２０の入力に応じてグラフ検索部４０による検索結果をクライアント端末に送信するための機能部である。検索結果出力部３０は、検索結果を表示するための表示データを生成してクライアント端末に送信する。 The search result output unit 30 is a functional unit for transmitting the search result by the graph search unit 40 to the client terminal according to the input of the search data input unit 20. The search result output unit 30 generates display data for displaying the search result and transmits it to the client terminal.

画像ＤＢ６０は、検索対象となる画像データを蓄積記憶するデータベースであり、図１に示すように画像ＩＤと、画像データと、ベクトルデータとを対応付けて記憶する。グラフ検索部４０は、この画像ＤＢ６０に記憶された画像データから複数の特徴量である多次元のベクトルデータを抽出して記憶する。ベクトルデータの次元数は特に制約されないが、検索精度を高めるためには、高い次元数（例えば１０次元以上）とすることが好ましい。 The image DB 60 is a database that stores and stores image data to be searched, and stores an image ID, image data, and vector data in association with each other as shown in FIG. The graph search unit 40 extracts and stores multidimensional vector data as a plurality of feature amounts from the image data stored in the image DB 60. The number of dimensions of the vector data is not particularly limited, but is preferably set to a high number of dimensions (for example, 10 dimensions or more) in order to improve search accuracy.

グラフインデックスＤＢ７０は、グラフ構造のインデックスを記憶するデータベースであり、検索対象となるデータの特徴量を表すベクトルデータを格納している。データ登録部５０は、画像データから抽出したベクトルデータに基づいてグラフ構造のインデックスを生成する。 The graph index DB 70 is a database that stores an index having a graph structure, and stores vector data representing feature amounts of data to be searched. The data registration unit 50 generates a graph structure index based on the vector data extracted from the image data.

また、グラフインデックスＤＢ７０は、一方のベクトルデータから他方のベクトルデータに辿るために設定されたリンクの情報が格納されている。具体的には、図１に示すように、グラフインデックスＤＢ７０は、画像ＤＢ６０に対応したリンク元である画像ＩＤと、このベクトルデータにリンクされた一つ以上の他方のリンク先の画像ＩＤとを対応付けて記憶する。ベクトルデータ間にリンクを張ることにより、図４（ａ）に示すようにベクトル空間上のベクトルデータａ〜ｆの間にグラフ構造が形成される。 Further, the graph index DB 70 stores information on links set to trace from one vector data to the other vector data. Specifically, as shown in FIG. 1, the graph index DB 70 displays an image ID that is a link source corresponding to the image DB 60 and one or more other link destination image IDs linked to the vector data. Store in association with each other. By establishing a link between the vector data, a graph structure is formed between the vector data a to f in the vector space as shown in FIG.

このグラフインデックスＤＢ７０のリンク情報により形成されるグラフ構造、即ち、該リンク情報を適宜「グラフインデックス」という。リンクとは、一つのデータから他のデータを辿ることができる情報である。リンクは一方向であってもよいが、図１に示すように、一方のリンク先を示すデータとその逆方向のリンク先を示すデータとを対に記憶して双方向に辿れる構成とすることで、データの検索速度を向上できる。 The graph structure formed by the link information in the graph index DB 70, that is, the link information is appropriately referred to as “graph index”. A link is information that can trace one data to another data. The link may be unidirectional, but as shown in FIG. 1, the data indicating one link destination and the data indicating the link destination in the opposite direction are stored as a pair and can be traced bidirectionally. Thus, the data search speed can be improved.

図４（ａ）においては、ベクトルデータを点（ａ〜ｆ）により表しており、多次元の特徴量空間にベクトルデータが分布している様子を示している。また、各ベクトルデータの間にリンクが設定され、実線により該リンクが表されている。以下の説明において、ベクトルデータを適宜「点」と称して説明する。 In FIG. 4A, vector data is represented by points (a to f), and shows a state in which vector data is distributed in a multidimensional feature amount space. A link is set between each vector data, and the link is represented by a solid line. In the following description, vector data will be referred to as “points” as appropriate.

グラフ検索部４０は、グラフインデックスＤＢ７０に設定されているリンクを巡回することにより検索データに類似するデータを検索する。グラフ検索部４０は、図１に示すようにベクトルデータ生成部４１と、検索開始点決定部４３と、グラフ巡回部４５と、類似データ特定部４７とを備えて構成される。 The graph search unit 40 searches for data similar to the search data by circulating the links set in the graph index DB 70. As shown in FIG. 1, the graph search unit 40 includes a vector data generation unit 41, a search start point determination unit 43, a graph circulation unit 45, and a similar data specification unit 47.

ベクトルデータ生成部４１は、検索データ入力部２０又はデータ登録部５０から入力されたデータから多次元のベクトルデータとなる特徴量を抽出する。 The vector data generation unit 41 extracts feature quantities that become multidimensional vector data from the data input from the search data input unit 20 or the data registration unit 50.

検索開始点決定部４３は、検索データから抽出したベクトルデータに近接するグラフインデックス上の既存のベクトルデータを、検索開始点として決定する。グラフ検索部４０は、複数のベクトルデータのうちのいずれかを検索開始点として決定し、この検索開始点を起点としてリンクの巡回を始める。 The search start point determination unit 43 determines the existing vector data on the graph index close to the vector data extracted from the search data as the search start point. The graph search unit 40 determines any one of the plurality of vector data as a search start point, and starts the link circulation starting from the search start point.

グラフ巡回部４５は、既定の検索終了条件を満たすまで、検索開始点からグラフインデックスＤＢ７０に設定されているリンクを順次辿る処理を行う。 The graph circulator 45 performs a process of sequentially following the links set in the graph index DB 70 from the search start point until a predetermined search end condition is satisfied.

類似データ特定部４７は、グラフ巡回部４５が辿ったリンクの経路上にあるベクトルデータのうち、検索データに対して既定の距離内にあるベクトルデータを類似データとして特定する。この特定した類似データ、或いはそれに関連する情報が検索結果としてクライアント端末に送られる。 The similar data specifying unit 47 specifies, as similar data, vector data within a predetermined distance with respect to the search data among vector data on the link route followed by the graph circulating unit 45. The identified similar data or related information is sent to the client terminal as a search result.

データ登録部５０は、登録データ入力部１０から入力された登録データを画像ＤＢ６０及びグラフインデックスＤＢ７０に登録する処理を行う。データ登録部５０は、登録データとして入力された画像データに新たな画像ＩＤを割り当てて、画像ＤＢ６０に記憶する。また、画像データから抽出したベクトルデータに画像ＩＤを対応付けて画像ＤＢ６０に記憶する。 The data registration unit 50 performs processing for registering the registration data input from the registration data input unit 10 in the image DB 60 and the graph index DB 70. The data registration unit 50 assigns a new image ID to the image data input as registration data, and stores it in the image DB 60. In addition, the image data is stored in the image DB 60 in association with the vector data extracted from the image data.

図１によれば、データ登録部５０は、類似データ取得部５１と、マージ部５３と、リンク最適化部５５と、ＤＢ更新部５７とを備えて構成される。 According to FIG. 1, the data registration unit 50 includes a similar data acquisition unit 51, a merge unit 53, a link optimization unit 55, and a DB update unit 57.

類似データ取得部５１は、登録データ入力部１０から入力された登録データについての類似データを検索する要求をグラフ検索部４０に出力することで、登録データに類似するデータを取得する。この類似データは、適宜、登録データと対応付けてメモリ上に一時的に保持される。 The similar data obtaining unit 51 obtains data similar to the registered data by outputting a request for retrieving similar data for the registered data input from the registered data input unit 10 to the graph retrieving unit 40. The similar data is temporarily stored in the memory in association with the registration data as appropriate.

また、類似データ取得部５１は、登録データに対して検索された類似データとの間にリンクを生成する。このリンクは、グラフインデックスＤＢ７０にデータを本登録する前に一時的に生成され、一時メモリ８０にグラフインデックスＤＢ７０と同一のデータ構造を有するデータテーブルを生成して記憶される。 Moreover, the similar data acquisition part 51 produces | generates a link between the similar data searched with respect to registration data. This link is temporarily generated before the main registration of data in the graph index DB 70, and a data table having the same data structure as the graph index DB 70 is generated and stored in the temporary memory 80.

マージ部５３は、複数の登録データの入力により、各登録データに対してそれぞれに検索された類似データとの間に生成されたリンク情報を一時メモリ８０上でマージする。また、マージ部５３は、一時メモリ８０においてリンク先として記憶された類似データについて、グラフインデックスＤＢ７０上において既にリンクが設定されているものは、その類似データに対するリンク先の情報を該グラフインデックスＤＢ７０から読み出して一時メモリ８０に記憶する。また、複数の登録データとの間にもリンクを生成して一時メモリ８０に記憶する。これにより、一時メモリ８０には、登録データと類似データとの間に生成したリンクと、類似データに対して既に設定されたリンクとが記憶される。 The merging unit 53 merges link information generated between similar data searched for each registered data in the temporary memory 80 by inputting a plurality of registered data. Also, the merge unit 53, for similar data stored as a link destination in the temporary memory 80, for which a link has already been set on the graph index DB 70, obtains information on the link destination for the similar data from the graph index DB 70. Read out and store in temporary memory 80. In addition, a link is generated between a plurality of registration data and stored in the temporary memory 80. Thereby, the temporary memory 80 stores the link generated between the registered data and the similar data and the link already set for the similar data.

リンク最適化部５５は、グラフインデックスのグラフ構造が煩雑にならないようリンクの最適化を行う機能部である。類似データに対してリンク先を追加することにより、類似データからのリンク数が増加していき、これにより、グラフ検索部４０が巡回するリンクが増加してしまい、検索速度の低下を招く可能性がある。そのため、類似データからのリンク先を追加する際には、そのリンク数を適正に保つよう、リンク数の見直しを行う。 The link optimization unit 55 is a functional unit that optimizes the link so that the graph structure of the graph index is not complicated. By adding link destinations to similar data, the number of links from similar data increases, and this may increase the number of links that the graph search unit 40 circulates, leading to a decrease in search speed. There is. Therefore, when adding a link destination from similar data, the number of links is reviewed so as to keep the number of links appropriate.

ＤＢ更新部５７は、リンク最適化部５５の処理結果に基づいて、登録データの画像ＤＢ６０及びグラフインデックスＤＢ７０へのデータ記憶と、グラフインデックスＤＢ７０のリンク情報の更新を行う。具体的には、登録データについての画像ＩＤとベクトルデータとをグラフインデックスＤＢ７０に登録する際に、類似データの画像ＩＤをリンク先として対応付けて記憶する。また、類似データの画像ＩＤに対したリンク先に、登録データの画像ＩＤを追加することで、登録データと類似データとの間に相互のリンクが形成される。 Based on the processing result of the link optimization unit 55, the DB update unit 57 stores data of the registered data in the image DB 60 and the graph index DB 70, and updates the link information of the graph index DB 70. Specifically, when registering the image ID and vector data for the registered data in the graph index DB 70, the image ID of the similar data is stored in association with the link destination. Further, by adding the image ID of the registration data to the link destination with respect to the image ID of the similar data, a mutual link is formed between the registration data and the similar data.

検索装置１は、グラフ構造を有する検索インデックスであるグラフインデックスＤＢ７０を生成し、グラフ構造上のリンクを辿ることにより検索データに類似するデータを検索する。また、登録データが入力された場合には、検索時に巡回する検索インデックスに登録データを新たに追加して、該登録データも巡回経路とすることによりその登録データの検索も可能になるように、自己生成的にグラフインデックスを形成していく。 The search device 1 generates a graph index DB 70 that is a search index having a graph structure, and searches for data similar to the search data by following links on the graph structure. In addition, when registration data is input, the registration data is newly added to the search index that circulates at the time of the search, and the registration data can also be searched by making the registration data a circulation route. A graph index is formed in a self-generating manner.

〔検索装置の動作〕
次に、本実施形態における検索装置１がデータの登録時に行うグラフ自己生成処理の動作を、図２〜８を参照して詳細に説明する。このグラフ自己生成処理は、予め検索装置１のメモリ上に記憶されたプログラムに基づくことにより実行される。図２、３及び６は、検索装置１の動作の一例を示すフローチャートであり、図４、５、７及び８は、グラフインデックス上でのデータ検索並びにリンク最適化の処理の様子を示す図である。 [Operation of search device]
Next, the operation of the graph self-generation process performed when the search device 1 according to the present embodiment registers data will be described in detail with reference to FIGS. This graph self-generation process is executed based on a program stored in advance in the memory of the search device 1. 2, 3 and 6 are flowcharts showing an example of the operation of the search device 1, and FIGS. 4, 5, 7 and 8 are views showing the state of the data search and link optimization processing on the graph index. is there.

〔グラフ自己生成処理〕
まず、検索装置１は、初期設定処理を行う（ステップＳ１）。ここでの初期設定は、データ登録を行った場合に新たに登録したデータに設定するリンク数や、リンク削減を行う際の閾値（リンク削減閾値）を予め設定する。 [Graph self-generation processing]
First, the search device 1 performs an initial setting process (step S1). In this initial setting, the number of links set in newly registered data when data registration is performed, and a threshold value (link reduction threshold value) when performing link reduction are set in advance.

次に、登録データ入力部１０から登録データがデータ登録部５０に入力されると（ステップＳ２）、ベクトルデータ生成部４１は、登録データから特徴量を抽出することで、ベクトルデータを生成する（ステップＳ３）。このとき、複数の登録データが入力された場合には、その複数のデータのうち、所定数（例えば３つ）の登録データについてベクトルデータを生成する。 Next, when registration data is input from the registration data input unit 10 to the data registration unit 50 (step S2), the vector data generation unit 41 generates vector data by extracting feature amounts from the registration data ( Step S3). At this time, when a plurality of registration data are input, vector data is generated for a predetermined number (for example, three) of registration data among the plurality of data.

そして、類似データ取得部５１が登録データをグラフ検索部４０に出力してグラフ検索処理を行わせる（ステップＳ４）。このとき、複数の登録データが入力された場合には、グラフ検索処理を複数のＣＰＵを用いて並列処理により行うこととしてもよい。尚、図２においては、登録データが３つ入力された場合に、３プロセスによりグラフ検索処理Ａ〜Ｃを並列処理する例を示しているが、並列処理のプロセス数は適宜設計変更可能である。 And the similar data acquisition part 51 outputs registration data to the graph search part 40, and performs a graph search process (step S4). At this time, when a plurality of registered data are input, the graph search process may be performed by a parallel process using a plurality of CPUs. FIG. 2 shows an example in which the graph search processes A to C are processed in parallel by three processes when three pieces of registration data are input. However, the number of processes for parallel processing can be changed as appropriate. .

〔グラフ検索処理〕
〔検索開始点決定処理〕
グラフ検索部４０がグラフ検索処理を開始すると、先ず、検索開始点決定部４３が、登録データに近接するベクトルデータを、検索開始点として決定する（ステップＳ４１）。図３は、検索開始点を決定するための処理の一例を示すフローチャートである。 [Graph search processing]
[Search start point determination processing]
When the graph search unit 40 starts the graph search process, the search start point determination unit 43 first determines vector data close to the registered data as a search start point (step S41). FIG. 3 is a flowchart illustrating an example of a process for determining a search start point.

検索開始点決定部４３は、先ず、グラフインデックスの中から任意の点（例えば、図４（ｂ）の点ａ）を選択し（ステップＳ４３０）、その選択した点（選択点）にリンクされた点（例えば、図４（ｃ）の点ｃ及びｄ）をグラフインデックスＤＢ７０に基づいて取得する（ステップＳ４３２）。 The search start point determination unit 43 first selects an arbitrary point (for example, point a in FIG. 4B) from the graph index (step S430), and is linked to the selected point (selected point). Points (for example, points c and d in FIG. 4C) are acquired based on the graph index DB 70 (step S432).

次に、そのリンクされた点に対する登録データとの特徴量空間における距離を算出する（ステップＳ４３４）。検索開始点決定部４３は、算出した距離の中で登録データと距離が最短の点を抽出し（ステップＳ４３６）、その点と登録データとの距離Ｄ２が、登録データと選択点との距離Ｄ１よりも小さいかを判定する（ステップＳ４３８、４４０）。図４（ｄ）において、選択点ａと登録データｇとの間の距離Ｄ１と、選択点ａとリンク先ｃとの間の距離Ｄ２とを比較すると、距離Ｄ２のほうが小さいと判定される。 Next, the distance in the feature amount space with the registered data for the linked point is calculated (step S434). The search start point determination unit 43 extracts the point having the shortest distance from the registered data in the calculated distance (step S436), and the distance D2 between the point and the registered data is the distance D1 between the registered data and the selected point. Or less (steps S438 and 440). In FIG. 4D, when the distance D1 between the selected point a and the registered data g is compared with the distance D2 between the selected point a and the link destination c, it is determined that the distance D2 is smaller.

検索開始点決定部４３は、距離Ｄ２が距離Ｄ１より小さいと判定した場合には（ステップＳ４４０；Ｙｅｓ）、ステップＳ４３６において抽出した点を選択して（ステップＳ４４２）、ステップＳ４３２に処理を移行する。 When the search start point determination unit 43 determines that the distance D2 is smaller than the distance D1 (step S440; Yes), the search start point determination unit 43 selects the point extracted in step S436 (step S442), and the process proceeds to step S432. .

即ち、ステップＳ４３２〜Ｓ４４２のループ処理を行うことにより、最初に任意に選択した点からグラフインデックス上で登録データに近い点を選んでいくようにリンクを辿っていくこととなる。 That is, by performing the loop processing of steps S432 to S442, the link is traced so as to select a point close to the registered data on the graph index from the point arbitrarily selected first.

検索開始点決定部４３は、ステップＳ４４０において、距離Ｄ２が距離Ｄ１よりも大きいと判定した場合には（ステップＳ４４０；Ｎｏ）、選択している点を検索開始点として決定する（ステップＳ４４４）。 When determining that the distance D2 is greater than the distance D1 in step S440 (step S440; No), the search start point determination unit 43 determines the selected point as the search start point (step S444).

図４（ｅ）においては、最初に選択された点ａから登録データｇに近い点が順次選択され、点ｃ→点ｂ→点ｄとリンクが巡回されて、登録データに最も近い点ｄが検索開始点として決定される。 In FIG. 4E, the point closest to the registration data g is sequentially selected from the point a selected first, and the point c → the point b → the point d is circulated so that the point d closest to the registration data becomes the point d. It is determined as a search start point.

尚、上述した検索開始点決定処理において、距離Ｄ２が距離Ｄ１よりも大きいとして検索開始点を決定した場合にも、その検索開始点が登録データを中心とした検索範囲よりも外にある場合がある。この場合、検索開始点が検索範囲外にあるので検索を開始することができない。そのため、ステップＳ４４４で決定した検索開始点が検索範囲外である場合には、新たにグラフインデックス内から任意点を選択して、検索開始点決定処理を所定回数行うことで検索範囲内の検索開始点を決定する。 In the search start point determination process described above, even when the search start point is determined on the assumption that the distance D2 is greater than the distance D1, the search start point may be outside the search range centered on the registered data. is there. In this case, the search cannot be started because the search start point is outside the search range. Therefore, if the search start point determined in step S444 is outside the search range, an arbitrary point is newly selected from the graph index, and the search start point determination process is performed a predetermined number of times to start the search within the search range. Determine the point.

また、ステップＳ４３０で任意選択する点をランダムに選択すると、検索開始点決定処理を繰り返し行った場合に、近接した点を再度選択してしまう可能性がある。その場合、その近接した点からリンクを辿ったとしても同経路のリンクを辿る可能性が高くなり、検索範囲内の検索開始点を決定することができない。 Further, if a point to be arbitrarily selected in step S430 is selected at random, there is a possibility that an adjacent point will be selected again when the search start point determination process is repeated. In this case, even if the link is traced from the adjacent point, the possibility that the link of the same path is traced is high, and the search start point within the search range cannot be determined.

そのため、検索開始点決定処理を繰り返し行う場合には、過去に選択した点から所定距離離れた点を新たに選択することが好ましい。予めベクトル空間上で分散した点を複数特定しておきその中から任意点を選択することとしてもよい。これにより、検索範囲内の検索開始点を効率よく決定することができる。 Therefore, when the search start point determination process is repeatedly performed, it is preferable to newly select a point that is a predetermined distance away from a previously selected point. A plurality of points dispersed in the vector space may be specified in advance, and an arbitrary point may be selected from them. Thereby, the search start point within the search range can be determined efficiently.

［グラフの巡回］
上述のようにして検索開始点を決定すると、グラフ巡回部４５が、検索開始点を起点としてリンクを順次辿っていく（ステップＳ４３）。このとき、一つのベクトルデータからリンクが分岐している（つまり複数の進路がある）場合もあり、その場合には、辿った経路上にあったベクトルデータ、或いはそれを特定するための情報を、適宜の方法で検索装置１のメモリに記憶しておく。 [Turning the graph]
When the search start point is determined as described above, the graph circulator 45 sequentially follows the links starting from the search start point (step S43). At this time, there is a case where the link is branched from one vector data (that is, there are a plurality of paths). In this case, the vector data on the traced path, or information for specifying it is used. Then, it is stored in the memory of the search device 1 by an appropriate method.

［類似データの検索］
グラフ巡回部４５は、経路上のリンクを辿りながら、規定の検索終了条件を満たすかどうかを判断する（ステップＳ４５）。検索終了条件としては、種々のものが考えられる。例えば、以下の何れかの検索終了条件がありうる。 [Search for similar data]
The graph circulating unit 45 determines whether or not a predetermined search end condition is satisfied while following the link on the route (step S45). Various search termination conditions can be considered. For example, there may be any of the following search end conditions.

（検索終了条件１）
検索開始点を始点としてリンクを辿った結果が、登録データを中心とした所定の検索範囲（例えば、図５（ａ）の検索範囲）をα倍（α＞１）した範囲（すなわち検索限界範囲）を越えること。ここで、αは、例えば１．５など、適宜の値を採用することができる。αを大きくする程、検索もれを少なくできる。一方、αを小さくする程、検索時間を短縮できる。 (Search end condition 1)
The result of following the link starting from the search start point is a range obtained by multiplying a predetermined search range (for example, the search range in FIG. 5A) centered on the registered data by α (α> 1) (that is, the search limit range). ). Here, for α, an appropriate value such as 1.5 can be adopted. As α increases, search leaks can be reduced. On the other hand, the search time can be shortened as α is reduced.

リンクを辿って検索したデータの数が増えた場合に、αの値を減少させることで、検索を早期に終了させることができる。これにより、データの数が過大である場合でも、検索結果を早期に取得することができる。なお、αの値は、１以下にならないように設定することが、検索精度を高めるためには好ましい。但し、早期に検索を終了させたい場合は、α＜１とすることも可能である。 When the number of data searched by following links increases, the search can be terminated early by decreasing the value of α. Thereby, even when the number of data is excessive, a search result can be acquired early. Note that it is preferable to set the value of α so as not to be 1 or less in order to improve the search accuracy. However, if it is desired to end the search at an early stage, it is possible to set α <1.

（検索終了条件２）
検索開始点を始点とするリンクを辿った結果が、登録データを中心とした検索範囲の外側において、リンクを、既定の回数だけ何れも辿ったこと。 (Search end condition 2)
The result of following the link starting from the search start point is that the link has been traced a predetermined number of times outside the search range centered on the registered data.

ここで、既定の回数とは、例えば５回であるが、適宜の値を採用することができる。例えば、図５（ａ）において、点ｂから点ａを辿ると、検索範囲を超えてしまうが、検索範囲を超えてから更に１回リンクを辿ると点ｃに辿り着く。このように、検索範囲の外側での巡回数を設定することで、検索漏れを少なくできる。 Here, the predetermined number of times is, for example, five, but an appropriate value can be adopted. For example, in FIG. 5A, when the point a is traced from the point b, the search range is exceeded, but when the link is followed once more after the search range is exceeded, the point c is reached. Thus, the omission of search can be reduced by setting the number of rounds outside the search range.

一つの経路について検索終了条件を満たしたとき、他の経路におけるリンクを辿る。いずれの経路についても検索終了条件を満たしたとき、次のステップに移る。 When the search end condition is satisfied for one route, the link in the other route is traced. When the search end condition is satisfied for any route, the process proceeds to the next step.

類似データ特定部４７は、検索終了条件を満たすまでグラフの巡回を行い（ステップＳ４５→Ｓ４３）、検索終了条件を満たした場合は（ステップＳ４５；Ｙｅｓ）、巡回した点（ベクトルデータ）と、登録データとの距離を算出し、その距離に基づいてランキングした類似データを出力する（ステップＳ４７）。 The similar data specifying unit 47 circulates the graph until the search end condition is satisfied (step S45 → S43). If the search end condition is satisfied (step S45; Yes), the circulated point (vector data) is registered. The distance to the data is calculated, and similar data ranked based on the distance is output (step S47).

例えば、図５（ａ）において検索開始点を点ｄとした場合は、リンク先である点ｂ、ｅ、ｆのリンクを辿り、図５（ｂ）のように次に点ｅにリンクされている点ｃに辿る。そして、図５（ｃ）のように点ｃから点ａに辿ると検索範囲を超える。この検索範囲を超えたことが検索終了条件である場合に、リンクの巡回を終了する。 For example, when the search start point is point d in FIG. 5 (a), the links of points b, e, and f which are link destinations are traced and then linked to point e as shown in FIG. 5 (b). Trace to point c. Then, as shown in FIG. 5C, when the point c is traced to the point a, the search range is exceeded. When the search end condition is that the search range is exceeded, the link circulation is ended.

そして、検索開始点の点ｄから辿った、点ｂ〜ｆそれぞれと登録データｇとの距離に基づいて類似データのランキングを得る。図５（ｃ）においては、点ｄ、ｂ、ｃ、ｅ、ｆといった順でランキングが得られる。 Then, ranking of similar data is obtained based on the distance between each of the points b to f and the registered data g traced from the search start point d. In FIG. 5C, rankings are obtained in the order of points d, b, c, e, and f.

〔リンク生成処理〕
グラフ検索部４０による類似データの検索が終了すると、データ登録部５０の類似データ取得部５１が登録データと類似データとの間へのリンクの生成（図６（ａ）のフローチャート）を行う（ステップＳ５）。 [Link generation processing]
When the similar data retrieval by the graph retrieval unit 40 is completed, the similar data acquisition unit 51 of the data registration unit 50 generates a link between the registered data and the similar data (the flowchart in FIG. 6A) (step S1). S5).

類似データ取得部５１は、先ず、登録データ毎に得られた検索結果である類似データを取得し（ステップＳ５１）、その類似データと登録データとの間にリンクを生成する（ステップＳ５２）。登録データが複数である場合には、各登録データの入力に対応した類似データとの間にリンクを生成する。 The similar data acquisition unit 51 first acquires similar data that is a search result obtained for each registered data (step S51), and generates a link between the similar data and the registered data (step S52). When there are a plurality of registered data, a link is generated between similar data corresponding to the input of each registered data.

類似データ取得部５１は、それぞれの登録データの間にもリンクを更に生成する（ステップＳ５３）。そして、検索結果の類似データについてのリンク情報をインデックスＤＢ７０から読み出す（ステップＳ５４）。類似データ取得部５１は、ステップＳ５２〜Ｓ５４により生成・読み出したリンクを一時メモリ８０に記憶して、それらをマージ（合併）する（ステップＳ５５）。 The similar data acquisition unit 51 further generates a link between the respective registration data (step S53). And the link information about the similar data of the search result is read from the index DB 70 (step S54). The similar data acquisition unit 51 stores the link generated / read out in steps S52 to S54 in the temporary memory 80, and merges them (step S55).

例えば、図７(ａ)のようにグラフ構造を有するデータ群に登録データＸ〜Ｚを登録する場合、登録データ毎にグラフ検索処理を行う。登録データＸについては、図７（ｂ）のように、点ａ〜ｃが検索結果として得られ、画像ＩＤとリンク先とを対応付けたリンクが生成される。登録データＹについては、図７（ｃ）のように点ａ、ｂ、ｄが得られる。また、登録データＺについては、図７（ｄ）のように点ｂ、ｄ、ｆが検索結果として得られる。 For example, when registering registered data X to Z in a data group having a graph structure as shown in FIG. 7A, a graph search process is performed for each registered data. As for the registered data X, as shown in FIG. 7B, points a to c are obtained as search results, and a link in which the image ID is associated with the link destination is generated. For registered data Y, points a, b, and d are obtained as shown in FIG. For the registration data Z, points b, d, and f are obtained as search results as shown in FIG.

これらの検索結果の類似データについてのリンク情報がグラフインデックスＤＢ７０から読み出される。図７（ａ）〜（ｅ）において、破線は登録データに対して新たに生成されたリンクを示し、実線はグラフインデックスＤＢ７０上で既に設定記憶されているリンクを示している。例えば、登録データＸの検索結果として得られた点ａについては、点ｂ、ｃ及びｄがリンク先となるリンク情報が読み出される。また、登録データＸ〜Ｚのそれぞれの間にリンクが生成される。 Link information about similar data of these search results is read from the graph index DB 70. 7A to 7E, a broken line indicates a newly generated link for registered data, and a solid line indicates a link that is already set and stored on the graph index DB 70. For example, for the point a obtained as the search result of the registration data X, the link information in which the points b, c, and d are the link destinations is read. Moreover, a link is produced | generated between each of registration data XZ.

図７（ｅ）には一時メモリ８０のデータ例を示している。図のように、一時メモリ８０には、登録データと類似データとの間のリンク、登録データ間のリンク、類似データに既に設定されているリンクが記憶されマージされる。 FIG. 7E shows an example of data in the temporary memory 80. As shown in the figure, the temporary memory 80 stores and merges links between registered data and similar data, links between registered data, and links already set in similar data.

［リンク最適化処理］
次に、リンク最適化部５５が、図６（ｂ）に示すリンク最適化処理を行う（ステップＳ６）。本実施形態においてリンク最適化処理は、リンクが生成された類似データ（検索結果）をリンク削減対象として行われる。即ち、リンク削減対象のデータを取得し（ステップＳ６１）、そのデータについて一時メモリ８０上でのリンク数と、そのリンクの長さをグラフインデックスＤＢ７０と画像ＤＢ６０のベクトルデータとに基づいて取得する（ステップＳ６２）。そして、そのリンク数がリンク削減閾値以上であった場合には、リンクの長さに基づいてリンクの削減を行う（ステップＳ６３）。 [Link optimization processing]
Next, the link optimization unit 55 performs a link optimization process shown in FIG. 6B (step S6). In the present embodiment, the link optimization process is performed by using similar data (search results) in which links are generated as a link reduction target. That is, link reduction target data is acquired (step S61), and the number of links on the temporary memory 80 and the length of the link are acquired based on the vector data of the graph index DB 70 and the image DB 60 (step S61). Step S62). If the number of links is equal to or greater than the link reduction threshold, link reduction is performed based on the link length (step S63).

このとき、リンクが生成された類似データにリンクされたデータのうち、距離の遠い順に削減する。例えば、リンク削減閾値が‘５’である場合、図８（ａ）においては、リンクが生成された点ａ〜ｄ、ｆのうち、点ｃとｄに設定されたリンク数が６以上となり、リンク削減閾値以上となる。従って、点ｃについては、リンクの中で最も距離の遠い点ｂとの間に設定されたリンクが削減される。また、点ｄについては、点ａと点ｃとのそれぞれ間に設定されたリンクが削減される。 At this time, among the data linked to the similar data for which the link is generated, the data is reduced in order of increasing distance. For example, when the link reduction threshold is “5”, in FIG. 8A, among the points a to d and f where the links are generated, the number of links set to the points c and d is 6 or more, The link reduction threshold is exceeded. Therefore, the link set between the point c and the point b farthest among the links is reduced. For point d, the link set between point a and point c is reduced.

〔データベース登録〕
ＤＢ更新部５７は、リンク最適化部５５によるリンク削減処理の結果に基づいて、グラフインデックスＤＢ７０のリンク情報を更新すると共に、登録データについてのリンク先もグラフインデックスＤＢ７０に追加記憶する。また、登録データ入力部１０より入力された登録データとそのベクトルデータとを画像ＤＢ６０に記憶する（ステップＳ７）。そして、登録データ入力部１０から入力されたデータに未処理のデータがある場合には（ステップＳ８；Ｙｅｓ）、ステップＳ３に処理を移行して、グラフ検索処理、リンクの生成、リンクの最適化を行う。また、新たなデータがない場合には（ステップＳ８；Ｎｏ）、グラフ自己生成処理を終了する。 [Database registration]
The DB update unit 57 updates the link information of the graph index DB 70 based on the result of the link reduction processing by the link optimization unit 55, and additionally stores the link destination for the registered data in the graph index DB 70. Also, the registration data and the vector data input from the registration data input unit 10 are stored in the image DB 60 (step S7). If there is unprocessed data in the data input from the registered data input unit 10 (step S8; Yes), the process proceeds to step S3 to perform graph search processing, link generation, and link optimization. I do. If there is no new data (step S8; No), the graph self-generation process is terminated.

また、詳細な説明は省略するが、検索データに対して類似するデータを検索して検索結果をクライアント端末に返す検索処理のみの場合は、検索データ入力部２０から入力される検索データに対して、上述したグラフ検索部４０によるグラフ検索処理を行う。そして、類似データ特定部４７により得られた類似データを距離に基づいてランキングした検索結果を表示データとして検索結果出力部３０が生成して、クライアント端末に返送する。 Further, although detailed description is omitted, in the case of only the search processing for searching similar data to the search data and returning the search result to the client terminal, the search data input from the search data input unit 20 is processed. The graph search processing by the graph search unit 40 described above is performed. And the search result output part 30 produces | generates the search result which ranked similar data obtained by the similar data specific | specification part 47 based on distance as display data, and returns it to a client terminal.

以上、第１実施形態によれば、グラフインデックスのデータ間のリンクを巡回して類似データを検索するという検索アルゴリズムを用いて、登録データに対する類似データを検索し、その類似データと登録データとの間に新たにリンクを生成することで、検索装置１がリンクを辿って検索しているグラフインデックスにデータ登録して、登録後に巡回するデータとしてリンクを形成していく。このように、検索装置１は、検索の際に巡回していくグラフインデックス上にデータの登録を行って、グラフインデックスを自己生成する。 As described above, according to the first embodiment, similar data for registered data is searched using a search algorithm that searches for similar data by searching for links between graph index data. By newly generating a link in the meantime, data is registered in the graph index searched by the search device 1 following the link, and the link is formed as data to be circulated after registration. Thus, the search device 1 registers data on the graph index that circulates during the search, and self-generates the graph index.

また、複数の登録データが入力された場合には、各登録データについての類似データを高速に検索すると共に、各登録データの追加登録により生成されるリンク数の最適化を行ってからデータベース登録を行うことで、検索装置１は、データの登録の際に、グラフインデックスのリンク構造の最適化を行って、自己修復していく。 In addition, when a plurality of registration data are input, similar data for each registration data is searched at a high speed, and the number of links generated by additional registration of each registration data is optimized, and then database registration is performed. By doing so, the search apparatus 1 performs self-repair by optimizing the link structure of the graph index when registering data.

従って、グラフインデックスを用いた検索アルゴリズムにより、データベースの全てのデータとの距離を算出することなく、類似データを検索できるため、多次元のベクトルデータを記憶したデータベースに高速にデータ登録していくことができる。また、グラフインデックスのリンク数を最適に保っていくため、データの増加によってリンク構造が煩雑になって検索速度が低下してしまうことを防止できる。 Therefore, similar data can be searched without calculating the distance to all data in the database by a search algorithm using a graph index, so that data can be registered at high speed in a database storing multidimensional vector data. Can do. In addition, since the number of links in the graph index is kept optimal, it is possible to prevent the search speed from being lowered due to a complicated link structure due to an increase in data.

〔第２実施形態〕
次に、検索装置１の第２実施形態について説明する。尚、上述した第１実施形態と同一の機能構成、同一の処理ステップには、説明を省略する。第２実施形態における検索装置１は、図２におけるリンク最適化処理を図９に示すフローチャートによるリンク最適化処理に置き換えることにより実現される。以下、図１０のリンク最適化処理の様子を示す図を参照しつつ、リンク最適化処理について説明する。尚、図９のリンク構造の図示において、説明の簡略化のため、図１０（ｂ）以降のデータ間のリンクの図示を破線により適宜省略している。 [Second Embodiment]
Next, a second embodiment of the search device 1 will be described. Note that the description of the same functional configuration and the same processing steps as those in the first embodiment will be omitted. The search device 1 in the second embodiment is realized by replacing the link optimization process in FIG. 2 with the link optimization process according to the flowchart shown in FIG. Hereinafter, the link optimization process will be described with reference to the diagram showing the state of the link optimization process of FIG. In the illustration of the link structure in FIG. 9, the illustration of the links between the data in FIG.

上述したように、グラフインデックスを用いた検索アルゴリズムでは、ベクトルデータ間のリンクを巡回していくことにより、類似データを探索するが、リンク切れが生じてしまうと、ベクトルデータの孤立により巡回できなくなってしまう。第２実施形態の検索装置１は、そのリンク削減を行う際に、そのリンクの修復を行うものである。 As described above, a search algorithm using a graph index searches for similar data by circulating links between vector data. However, if a link break occurs, it cannot be performed due to isolation of vector data. End up. The search device 1 according to the second embodiment repairs the link when the link is reduced.

先ず、リンク最適化部５５は、削減対象のリンクが設定された点について、その点にリンクされた全て点の中から該リンクを保持する点（リンク保持点）を特定する（ステップＳ６１）。このリンク保持点の特定は、例えば、削減対象の点との距離が近いものから所定数の点としてもよいし、所定距離内にある点としてもよい。また、削減対象の点をリンク保持点に含めても良い。 First, the link optimizing unit 55 specifies a point (link holding point) that holds the link among all the points linked to the point for which the reduction target link is set (step S61). The link holding point may be specified by, for example, a predetermined number of points that are close to the point to be reduced or a point within a predetermined distance. Further, the points to be reduced may be included in the link holding points.

図１０（ａ）においては、削減対象の点ａに対して、距離の近い５つの点ｂ、ｃ、ｄ、ｅ、ｆをリンク保持点として特定している。そして、削減対象点にリンクされた点のうち、リンク保持点以外の点をリンクの修復対象として、その修復対象点の中から一つずつ選択していく（ステップＳ６２）。図１０（ａ）では、点ｇを修復対象点として選択している。 In FIG. 10A, five points b, c, d, e, and f that are close to the point a to be reduced are specified as link holding points. Then, points other than the link holding point among points linked to the reduction target points are selected one by one from the repair target points as link repair targets (step S62). In FIG. 10A, the point g is selected as the repair target point.

次に、リンク最適化部５５は、修復対象点と各リンク保持点との距離を算出し（ステップＳ６３）、その算出した距離の中から最短距離を有するリンク保持点と、修復対象点との間に新たなリンクを設定する（ステップＳ６４）。 Next, the link optimization unit 55 calculates the distance between the repair target point and each link holding point (step S63), and the link holding point having the shortest distance among the calculated distances and the repair target point are calculated. A new link is set between them (step S64).

例えば、図１０（ｂ）のように、修復対象点ｇと、リンク保持点ｂ〜ｆのそれぞれとの距離を算出し、図１０（ｃ）のように、最短の距離となったリンク保持点ｅと修復対象点ｇとの間に新たなリンクを設定する。尚、修復対象点と、リンク保持点との間に既にリンクが設定されている場合には、リンクの設定を行わなくとも良い。 For example, as shown in FIG. 10B, the distance between the restoration target point g and each of the link holding points b to f is calculated, and the link holding point having the shortest distance as shown in FIG. 10C. A new link is set between e and the repair target point g. When a link has already been set between the repair target point and the link holding point, the link need not be set.

リンク最適化部５５は、リンクの設定を行った修復対象点と、削減対象点との間に設定されているリンクを解除する（ステップＳ６５）。図１０（ｄ）において、点ａと点ｇとの間のリンクが解除される。 The link optimization unit 55 releases the link set between the repair target point for which the link has been set and the reduction target point (step S65). In FIG. 10D, the link between the points a and g is released.

次に、リンク保持点以外の削減対象点にリンクされた点、即ち、修復対象点の中でステップＳ６２〜Ｓ６５の修復処理を全ての点に対して行った場合には（ステップＳ６６；Ｎｏ）、リンク最適化処理を終了し、修復しいない点がある場合には（ステップＳ６６；Ｙｅｓ）、ステップＳ６２に処理を移行する。 Next, when the points linked to the reduction target points other than the link holding points, that is, the repair process in steps S62 to S65 is performed on all the points among the repair target points (step S66; No). When the link optimization process is finished and there is a point that has not been repaired (step S66; Yes), the process proceeds to step S62.

図１０（ａ）においてリンク保持点として特定された点ｂ〜ｆ以外の点について削減対象点に対するリンクが解除されても、図１０（ｅ）のようにそれぞれリンク保持点に対してリンクが設定されるため、リンク切れが生ずることを防止できる。 Even if the link to the reduction target point is released for points other than the points b to f specified as the link holding point in FIG. 10A, a link is set for each link holding point as shown in FIG. Therefore, it is possible to prevent the link from being broken.

以上のように、第２実施形態によれば、リンクの削減対象点と修復対象点とのリンクが解除されても、修復対象点からリンク保持点に対してのリンクが確保されるため、リンク切れが発生することを防止できる。従って、グラフインデックスのリンク削除を行う場合にも、検索性能の低下を防止できる。 As described above, according to the second embodiment, even if the link between the link reduction target point and the repair target point is released, a link from the repair target point to the link holding point is secured. It is possible to prevent cutting. Therefore, it is possible to prevent a decrease in search performance even when the graph index link is deleted.

〔第３実施形態〕
次に、検索装置１の第３実施形態について説明する。尚、上述した第１実施形態と同一の機能構成、同一の処理ステップには、説明を省略する。第３実施形態における検索装置１の機能構成を示すブロック図を図１１に示す。 [Third Embodiment]
Next, a third embodiment of the search device 1 will be described. Note that the description of the same functional configuration and the same processing steps as those in the first embodiment will be omitted. FIG. 11 is a block diagram showing a functional configuration of the search device 1 in the third embodiment.

第３実施形態における検索装置１は、グラフインデックスＤＢ７０として、主インデックス７２と、暫定インデックス７４とを備える。主インデックス７２と、暫定インデックス７４とは、図１に示したデータ構成と同様であり、画像ＩＤと、リンク先とを対応付けてリンク情報として記憶する。 The search device 1 according to the third embodiment includes a main index 72 and a provisional index 74 as the graph index DB 70. The main index 72 and the provisional index 74 are the same as the data configuration shown in FIG. 1, and store the image ID and the link destination in association with each other as link information.

暫定インデックス７４は、主インデックス７２に本登録する前に暫定的にリンク情報を記憶しておくデータベースであり、データ数が主インデックス７２に対して十分に少ない数で設定されている。 The temporary index 74 is a database that temporarily stores link information before main registration in the main index 72, and the number of data is set to a sufficiently small number with respect to the main index 72.

インデックスＤＢ７０の登録データ数が膨大になると、インデックスＤＢ７０のグラフ構造が複雑になると共に巡回するリンクも増し、類似データの検索の処理時間が次第に増していく。そのため、類似データの検索を行って、データ登録を行うという２段階の処理を有する検索装置１において、複数のデータが入力されると、データ検索の待ち時間が発生しするということがありえる。 When the number of registered data in the index DB 70 becomes enormous, the graph structure of the index DB 70 becomes complicated and the number of links that circulate increases, and the processing time for searching for similar data gradually increases. Therefore, in the search device 1 having a two-stage process of searching for similar data and registering data, a waiting time for data search may occur when a plurality of data is input.

そのため、第３実施形態における検索装置１は、暫定インデックス７４という主インデックス７２よりもデータ数の制限されたインデックスを設け、複数の登録データが入力される場合には、一旦、暫定インデックス７４に対して登録を行う。そして、適宜な時間帯に暫定インデックス７４に登録した複数の登録データを上述した手法を用いて、主インデックス７２に登録する。 Therefore, the search device 1 according to the third embodiment provides an index with the number of data limited as compared with the main index 72, which is a provisional index 74, and when a plurality of registered data is input, And register. Then, a plurality of registration data registered in the provisional index 74 in an appropriate time zone is registered in the main index 72 using the method described above.

第３実施形態において、データ登録部５０は、インデックス切替部５２と、追加登録判定部５４とを更に有して構成される。インデックス切替部５２は、主インデックス７２と暫定インデックス７４との何れかを選択して、ＤＢ更新部５７がリンク情報を更新・登録するデータベースの対象を切り替える制御を行う。具体的には、登録データ入力部１０から新たに入力された登録データである場合には、暫定インデックス７４への登録を行う。また、暫定インデックス７４から入力された複数の登録データである場合には、主インデックス７２への登録を行う。 In the third embodiment, the data registration unit 50 further includes an index switching unit 52 and an additional registration determination unit 54. The index switching unit 52 performs control for selecting either the main index 72 or the provisional index 74 and switching the database target for which the DB update unit 57 updates / registers link information. Specifically, when the registration data is newly input from the registration data input unit 10, registration to the provisional index 74 is performed. Further, in the case of a plurality of registration data input from the provisional index 74, registration to the main index 72 is performed.

追加登録判定部５４は、暫定インデックス７４に登録されているデータを主インデックス７２に登録するか否かの判定を行う。判定の基準としては、例えば、暫定インデックス７４に登録されたデータ数が規定数（例えば１００件）に達することとしてもよい。また、所定の時間（例えば深夜０時）に達することを基準としてもよい。 The additional registration determination unit 54 determines whether or not data registered in the provisional index 74 is registered in the main index 72. As a criterion for determination, for example, the number of data registered in the provisional index 74 may reach a specified number (for example, 100). Further, it may be based on reaching a predetermined time (for example, midnight).

第３実施形態の検索装置１は、登録データが入力されると、先ず、上述したグラフ自己生成処理を行って暫定インデックス７４にリンク情報を記憶する。このとき、主インデックスよりもデータ数が少ないので、高速なデータ登録が行える。 When the registration data is input, the search device 1 according to the third embodiment first performs the graph self-generation process described above and stores link information in the provisional index 74. At this time, since the number of data is smaller than that of the main index, high-speed data registration can be performed.

また、登録データの入力時に、例えば、暫定インデックス７４のデータ数が予め指定された規定数を超えた場合には、暫定インデックス７４に記憶されたベクトルデータを登録データとして入力して、主インデックス７２に移し変える。 Further, when the registration data is input, for example, if the number of data in the provisional index 74 exceeds a predetermined number specified in advance, the vector data stored in the provisional index 74 is input as registration data, and the main index 72 Change to.

即ち、暫定インデックス７４内の個々のベクトルデータ（登録データ）に対する類似データを主インデックス７２から検索し、その類似データにリンクすべきベクトルデータを特定する。そして、暫定インデックス７４の複数のベクトルデータに対して検索した類似データとの間にリンクを生成して、上述したようにリンクのマージ・削減処理を行って主インデックス７２にベクトルデータをリンク先とを記憶し、暫定インデックス７４内のデータを削除する。 That is, similar data for individual vector data (registered data) in the provisional index 74 is searched from the main index 72, and vector data to be linked to the similar data is specified. Then, a link is generated between the similar data searched for the plurality of vector data of the provisional index 74, and the link merge / reduction processing is performed as described above, and the vector data is stored in the main index 72 as the link destination. And the data in the provisional index 74 is deleted.

また、グラフ検索部４０は、暫定インデックス７４にデータが記憶されている場合、主インデックス７２と暫定インデックス７４の両方を検索し、それぞれの検索結果をマージする。検索時には二つのインデックスを検索することになるが、暫定インデックスのサイズは小さいので、検索速度の低下は少ない。 Further, when data is stored in the temporary index 74, the graph search unit 40 searches both the main index 72 and the temporary index 74 and merges the respective search results. Although two indexes are searched at the time of searching, the size of the provisional index is small, so the decrease in search speed is small.

このように、主インデックス７２への登録前に暫定インデックス７４に事前登録することで、データ登録を即座にグラフインデックスＤＢ７０に反映させて、データ検索が可能となる。また、暫定インデックス７４から主インデックス７２への登録をバッチ処理として行うことで、全データにリンクの張られたグラフインデックスＤＢ７０を定期的に作成することができる。尚、グラフ登録時の各登録データに対するグラフ検索処理を並列で行うことで、データ登録を高速に行なえる。 In this way, by pre-registering with the temporary index 74 before registering with the main index 72, data registration can be immediately reflected in the graph index DB 70, and data search can be performed. Further, by performing registration from the temporary index 74 to the main index 72 as a batch process, it is possible to periodically create a graph index DB 70 that is linked to all data. Note that data registration can be performed at high speed by performing graph search processing on each registered data at the time of graph registration in parallel.

次ぎに、上述した実施形態における変形例を説明する。
［変形例１］
登録データのベクトルデータと同一のベクトルデータが既にグラフインデックスＤＢ７０に記憶されている場合には、グラフインデックスＤＢ７０には記憶せずに、画像ＤＢ６０にのみ記憶し、既に記憶されている同一のベクトルデータに関連付けることができる。この場合、グラフインデックスＤＢ７０には、画像ＩＤの他に、同一ベクトルデータを有する画像データの画像ＩＤを関連ＩＤとして記憶するデータ項目を新たに設ける。 Next, a modification of the above-described embodiment will be described.
[Modification 1]
When the same vector data as the vector data of the registered data is already stored in the graph index DB 70, it is stored only in the image DB 60 without being stored in the graph index DB 70. Can be associated with In this case, in addition to the image ID, the graph index DB 70 is newly provided with a data item for storing the image ID of image data having the same vector data as a related ID.

データ検索の際に、類似データとして検索したデータに関連ＩＤが対応付けられている場合には、その関連ＩＤで示される画像データも検索結果として出力することができる。このため、同一のベクトルデータの登録によりグラフインデックスの構造が煩雑化して検索速度が低下してしまうことを防止できる。 When a related ID is associated with data searched as similar data during data search, the image data indicated by the related ID can also be output as a search result. For this reason, it is possible to prevent the search speed from being lowered due to the complicated structure of the graph index due to the registration of the same vector data.

［変形例２］
グラフ検索処理の検索範囲を、登録データを中心とした規定の距離により定めたが、検索開始点から巡回する検索数により定めることとしてもよい。即ち、上述した方法により検索開始点を決定し、その検索開始点からリンクを辿る。このとき、リンクの巡回により所定の検索数（例えば、１０個）のデータを収集する。そして、収集した各データと登録データの距離の順に各データをソートして巡回したデータのリストを得る。このリストにおいて検索数分のベクトルデータを保持し、登録データに対して最遠のベクトルデータに対応する距離を、検索範囲として更新していく。 [Modification 2]
Although the search range of the graph search process is determined by a specified distance centered on the registered data, it may be determined by the number of searches that circulate from the search start point. That is, the search start point is determined by the method described above, and the link is traced from the search start point. At this time, data of a predetermined number of searches (for example, 10) is collected by link circulation. Then, each data is sorted in the order of the distance between each collected data and the registered data to obtain a list of data obtained by circulation. Vector data corresponding to the number of searches is held in this list, and the distance corresponding to the farthest vector data with respect to the registered data is updated as the search range.

即ち、検索終了条件を満たさない限りリンクを辿り、上記のリスト内のデータよりも登録データに近いデータが見つかった場合には、その距離順に上記リストに追加する。そして、このデータ追加によりリスト内において更新された最遠のデータに応じて検索範囲が更新される。これにより、検索数分のデータの収集に応じて、登録データとの距離が近いデータでリストが更新されていくことで、検索範囲を狭めていくことができる。このため、検索数の指定に応じて、検索範囲を動的に定めることができる。また、近いデータが検索されるたびに、検索範囲を狭めることができるので、検索精度を維持しつつ、検索速度を向上させることができる。 That is, as long as the search end condition is not satisfied, the link is followed, and if data closer to the registered data than the data in the list is found, the data is added to the list in the order of the distance. Then, the search range is updated according to the farthest data updated in the list by this data addition. Thus, the search range can be narrowed by updating the list with data that is close to the registered data according to the collection of data for the number of searches. For this reason, the search range can be dynamically determined according to the designation of the number of searches. Moreover, since the search range can be narrowed each time near data is searched, the search speed can be improved while maintaining the search accuracy.

［変形例３］
上述の変形例２の手法を用いて、検索開始点の決定のためのリンク巡回する範囲を定めることとしてもよい。即ち、グラフインデックスから任意の点を選択して最近傍の点を特定する処理を、検索数を１個としたグラフ検索処理と見なして、リンク巡回することで検索される登録データにより近いデータを１つ収集して保持・更新することで、そのデータと登録データとの距離を検索範囲の半径として動的に定めていく。 [Modification 3]
The range of link circulation for determining the search start point may be determined using the method of the second modification described above. That is, the process of selecting an arbitrary point from the graph index and specifying the nearest point is regarded as a graph search process with one search, and the data closer to the registered data to be searched by link circulation is obtained. By collecting, holding and updating one, the distance between the data and the registered data is dynamically determined as the radius of the search range.

リンクを巡回する際には、動的に定めた検索範囲を超えないようにデータを選択していく。このため、検索開始点を特定するためにリンクを辿りながら、検索範囲を狭めていくことができ、検索開始点特定までの時間を短くできる。 When the link is visited, data is selected so as not to exceed the dynamically determined search range. Therefore, the search range can be narrowed while following the link to specify the search start point, and the time until the search start point is specified can be shortened.

また、検索範囲をα倍（α＞１）した範囲（検索限界範囲）を越えない範囲でリンクを巡回することにより、登録データに近いデータがリンクされていない場合があっても、巡回する範囲が広がるため、最適な検索開始点の特定することができるようになる。 In addition, the link is circulated within a range that does not exceed the range (search limit range) obtained by multiplying the search range by α (α> 1), so that the range that circulates even when data close to the registered data may not be linked. As a result, the optimum search start point can be identified.

［変形例４］
検索開始点を決定する際に巡回したデータが最終的な検索結果となり得る可能性があるため、図３のステップＳ４３４において算出した距離を巡回履歴としてメモリに保持しておき、グラフ検索においてその巡回履歴を参照することで、距離の算出処理を省略することができる。 [Modification 4]
Since the data circulated when determining the search start point may be the final search result, the distance calculated in step S434 in FIG. The distance calculation process can be omitted by referring to the history.

尚、上述した各実施形態の動作は、コンピュータに適宜のコンピュータソフトウエアを組み込むことにより実施することができる。例えば、前記した各構成要素は、機能ブロックとして存在していればよく、独立したハードウエアとして存在しなくても良い。また、実装方法としては、ハードウエアを用いてもコンピュータソフトウエアを用いても良い。さらに、本発明における一つの機能要素が複数の機能要素の集合によって実現されても良く、本発明における複数の機能要素が一つの機能要素により実現されても良い。 The operations of the above-described embodiments can be implemented by incorporating appropriate computer software into the computer. For example, each component described above may exist as a functional block, and may not exist as independent hardware. As a mounting method, hardware or computer software may be used. Furthermore, one functional element in the present invention may be realized by a set of a plurality of functional elements, and a plurality of functional elements in the present invention may be realized by one functional element.

また、機能要素は、物理的に離間した位置に配置されていてもよい。この場合、機能要素どうしがネットワークにより接続されていても良い。グリッドコンピューティングにより機能を実現し、あるいは機能要素を構成することも可能である。 Moreover, the functional element may be arrange | positioned in the position physically separated. In this case, the functional elements may be connected by a network. It is also possible to realize functions or configure functional elements by grid computing.

第１実施形態における検索装置の機能構成の一例を示すブロック図。The block diagram which shows an example of a function structure of the search device in 1st Embodiment. 第１実施形態におけるグラフ自己生成処理の処理内容を説明するためのフローチャート。The flowchart for demonstrating the processing content of the graph self-generation process in 1st Embodiment. 第１実施形態における検索開始点決定の処理フローを説明するためのフローチャート。The flowchart for demonstrating the processing flow of the search start point determination in 1st Embodiment. 第１実施形態における検索開始点の決定の処理の具体例を示す模式図。The schematic diagram which shows the specific example of the process of the determination of the search start point in 1st Embodiment. 第１実施形態における類似データを検索の処理の具体例を示す模式図。The schematic diagram which shows the specific example of the process of searching for similar data in 1st Embodiment. 第１実施形態における仮リンク生成処理及びリンク最適化処理の処理内容の具体例を示す模式図。The schematic diagram which shows the specific example of the processing content of the temporary link production | generation process and link optimization process in 1st Embodiment. 第１実施形態におけるリンク生成処理の具体例を示す模式図。The schematic diagram which shows the specific example of the link production | generation process in 1st Embodiment. 第１実施形態におけるリンク最適化処理の具体例を示す模式図。The schematic diagram which shows the specific example of the link optimization process in 1st Embodiment. 第２実施形態におけるリンク最適化処理の処理内容の具体例を示す模式図。The schematic diagram which shows the specific example of the processing content of the link optimization process in 2nd Embodiment. 第２実施形態におけるリンク最適化処理の具体例を示す模式図。The schematic diagram which shows the specific example of the link optimization process in 2nd Embodiment. 第３実施形態における検索装置の機能構成を示すブロック図。The block diagram which shows the function structure of the search device in 3rd Embodiment.

１検索装置
１０登録データ入力部
２０検索データ入力部
３０検索結果出力部
４０グラフ検索部
４１ベクトルデータ生成部
４３検索開始点決定部
４５グラフ巡回部
４７類似データ特定部
５０データ登録部
５１類似データ取得部
５３マージ部
５５リンク最適化部
５７ＤＢ更新部
６０画像ＤＢ
７０グラフインデックスＤＢ
７２主インデックス
７４暫定インデックス DESCRIPTION OF SYMBOLS 1 Search apparatus 10 Registration data input part 20 Search data input part 30 Search result output part 40 Graph search part 41 Vector data generation part 43 Search start point determination part 45 Graph circulation part 47 Similar data specification part 50 Data registration part 51 Similar data acquisition Unit 53 merge unit 55 link optimization unit 57 DB update unit 60 image DB
70 Graph Index DB
72 Main index 74 Provisional index

Claims

A database in which a link for tracing from one data to the other data is set between the data and a plurality of the data is stored;
For the input search target data, select any of the data stored in the database as a search start point, and sequentially follow the link from the search start point to A graph search means for searching and outputting the data within a predetermined distance from the data to be searched as similar data among the data in
By inputting the data to be registered in the database as the search target to the graph search means, the similar data for the registration target data is obtained, and a link is established between the registration target data and the similar data. Similar data acquisition means to be generated;
Merge means for merging the links respectively generated by the similar data acquisition means for a plurality of registration target data;
Link optimization means for performing a reduction process on the link merged by the merging means and the link already set on the database in the data generated by the link;
Updating the link information of the database based on the result of the reduction process, and storing the data to be registered in the database;
A search data management device comprising:

The link optimization means includes:
The search according to claim 1, wherein the reduction processing is performed based on a total number of links merged by the merging means and links already set in the database in the data in which the links are generated. Data management device.

The link optimization means includes:
3. The reduction processing is performed based on a length of a link merged by the merging unit and a link already set on the database in the data in which the link is generated. The described search data management device.

The link generation means includes
The search data management apparatus according to any one of claims 1 to 3, wherein a link is further generated between each of the plurality of registration target data.

In a search data management method in which a link for tracing from one data to the other data is set between the data, and a computer manages a database in which a plurality of the data is stored as a data group to be searched,
For the input search target data, select any of the data stored in the database as a search start point, and sequentially follow the link from the search start point to A graph search step of searching for and outputting similar data to data within a predetermined distance from the data to be searched among the data in
By inputting the data to be registered in the database as the search target to the graph search means, the similar data for the registration target data is obtained, and a link is established between the registration target data and the similar data. A similar data acquisition process to be generated;
A merge step of merging the links respectively generated by the similar data acquisition step for a plurality of registration target data;
A link optimization step for performing a reduction process on the link merged in the merging step and a link already set on the database in the data in which the link is generated;
A data update step of updating the link information of the database based on the result of the reduction process and storing the registration target data in the database;
A search data management method comprising:

The program for making a computer perform the process of the search data management method of Claim 5.