JP5014398B2

JP5014398B2 - Search data management device

Info

Publication number: JP5014398B2
Application number: JP2009240991A
Authority: JP
Inventors: 雅二郎岩崎
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2009-10-20
Filing date: 2009-10-20
Publication date: 2012-08-29
Anticipated expiration: 2029-10-20
Also published as: JP2011090351A

Description

本発明は、類似データをデータベースから検索すると共に、そのデータベースに新たなデータを登録するための装置に関するものである。 The present invention relates to an apparatus for retrieving similar data from a database and registering new data in the database.

画像や音声等から抽出される特徴量は、多次元のベクトルデータで表され、各ベクトルデータ間の距離等を用いて類似データの検索が行われる。多次元ベクトルデータの検索において、検索対象のデータ数が多い場合、検索キーとなるデータ（検索データ）と検索対象のデータとの距離を逐一計算しなければならないため、検索に要する時間が非常に長くなってしまう。 A feature amount extracted from an image, sound, or the like is represented by multidimensional vector data, and similar data is searched using a distance between the vector data. When searching for multi-dimensional vector data, if the number of search target data is large, the distance between the search key data (search data) and the search target data must be calculated one by one. It will be long.

特許文献１には、次のような技術が記載されている。即ち、データベース内の全ての特徴量について、他の特徴量との間の類似度を予め計算し、類似度の高いデータをそれぞれ関連付けて格納しておく。そして、与えられた検索キーの最近傍を求めてその最近傍のデータに対応する類似データを検索結果として返す。 Patent Document 1 describes the following technique. That is, for all feature quantities in the database, the similarity with other feature quantities is calculated in advance, and data with high similarity is stored in association with each other. Then, the nearest neighbor of the given search key is obtained, and similar data corresponding to the nearest neighbor data is returned as a search result.

特開２００１−５２０２４号公報JP 2001-52024 A

ところで、検索対象のデータ群（データベース）に新たなデータを登録しようとするときは、データ間の類似度を考慮したインデックスを予め付与しておくことで、検索の高速性を担保できる。このため、その登録データに対する類似データをデータベース中から検索する必要がある。 By the way, when new data is to be registered in a data group (database) to be searched, an index considering the degree of similarity between the data is assigned in advance, thereby ensuring high speed search. For this reason, it is necessary to search similar data for the registered data from the database.

しかし、特許文献１の技術では、検索の高速化のために、各データに対して全データを類似度順に記憶しておく必要があるが、データベースの全データとの類似度を事前計算する必要があり、検索対象のデータが膨大になると、データ数に応じて検索にかかる時間が長くなってしまった。また、データ登録を行う都度、データ間の類似度を算出して、インデックスの見直しを行う必要があるため、データ数が増えるほど、そのインデックスの構築が煩雑になってしまった。 However, in the technique of Patent Document 1, in order to speed up the search, it is necessary to store all data in order of similarity for each data, but it is necessary to pre-calculate the similarity to all data in the database. When the search target data is enormous, the time required for the search becomes longer depending on the number of data. Also, every time data is registered, it is necessary to calculate the similarity between the data and review the index. Therefore, as the number of data increases, the construction of the index becomes complicated.

本発明は、上述の課題に鑑みて為されたものであり、その目的とするところは、類似データの検索対象であるデータ群に、検索インデックスを煩雑にすることなく高速にデータを登録することである。 The present invention has been made in view of the above-described problems, and an object of the present invention is to register data at high speed without complicating a search index in a data group to be searched for similar data. It is.

上記目的を達成するため、第１の発明の検索データ管理装置は、
一方のデータから他方のデータに辿るためのリンクがデータ間に設定されて該データが複数記憶されたデータベースと、
入力された検索対象のデータに対し、前記データベースに記憶されたデータのうちの何れかを検索開始点として選択し、該検索開始点から前記リンクを順次辿ることによって、該リンクを辿る経路上にあるデータのうち、該検索対象のデータと既定の距離内にあるデータを類似データとして検索し出力するグラフ検索手段と、
前記データベースへの登録対象となるデータを前記グラフ検索手段に入力することにより、該登録対象のデータに対する類似データを取得する類似データ取得手段と、
前記登録対象となるデータと、前記類似データ取得手段により取得された類似データとの間にリンクを設定して前記データベースに記憶すると共に、該類似データに設定されているリンクの削減処理を行うリンク最適化手段と、
を備えることを特徴としている。 In order to achieve the above object, the search data management device according to the first invention provides:
A database in which a link for tracing from one data to the other data is set between the data and a plurality of the data is stored;
For the input search target data, select any one of the data stored in the database as a search start point, and sequentially follow the link from the search start point, so that the path following the link Graph search means for searching for and outputting data within a predetermined distance from data to be searched as similar data among certain data,
Similar data acquisition means for acquiring similar data for the registration target data by inputting the data to be registered in the database to the graph search means;
A link that sets a link between the data to be registered and the similar data acquired by the similar data acquisition unit, stores the link in the database, and performs a process of reducing the links set in the similar data Optimization means;
It is characterized by having.

また、第２の発明において、前記リンク最適化手段は、
前記類似データに設定されたリンクが所定数以上である場合に、該リンクが設定されたデータ間の距離が短い順に該所定数分の該リンクを保持し、他のリンクを解除することを特徴としている。 In the second invention, the link optimization means includes:
When the number of links set in the similar data is equal to or greater than a predetermined number, the predetermined number of the links are held in order from the shortest distance between the data set with the links, and other links are released. It is said.

また、第３の発明において、前記リンク最適化手段は、
前記リンクの削減対象のデータにリンクされている他方のデータの中から、リンクを保持する保持データと、リンクを解除する解除データとを特定し、
解除データと保持データとの間の距離に基づいて、該解除データと該保持データとの間にリンクを設定して前記データベースに記憶することを特徴としている。 In the third invention, the link optimization means includes:
From the other data linked to the data to be reduced in the link, the holding data for holding the link and the release data for releasing the link are identified,
Based on the distance between the release data and the held data, a link is set between the release data and the held data and stored in the database.

また、第４の発明において、前記グラフ検索手段は、
前記データベースから任意のデータを選択し、そのデータから前記リンクを順次辿ることにより前記登録対象のデータから前記既定の距離内のデータを探索して、該データを検索開始点として選択することを特徴としている。 In the fourth invention, the graph search means includes:
Selecting arbitrary data from the database, searching the data within the predetermined distance from the registration target data by sequentially following the link from the data, and selecting the data as a search start point It is said.

また、第５の発明において、前記グラフ検索手段は、
前記任意のデータからリンクを辿った結果、該リンクを辿る経路上にあるデータの中に、前記規定の距離内にあるデータがない場合には、選択済みの前記任意のデータとは異なるデータを新たに選択して、再度前記リンクを順次辿る処理を行って前記検索開始点を探索することを特徴としている。 In the fifth invention, the graph search means includes:
As a result of following the link from the arbitrary data, if there is no data within the specified distance in the data on the path following the link, data different from the selected arbitrary data is displayed. The search start point is searched by performing a process of newly selecting and sequentially following the links again.

また、第６の発明において、
前記データベースに記憶されたデータを空間分割により階層化し、該データが属する階層毎に記憶する階層データベースを更に備え、
前記グラフ検索手段は、
前記登録対象のデータが属する前記階層を特定し、その特定した階層に属する他のデータを前記検索開始点として選択することを特徴としている。 In the sixth invention,
The data stored in the database is further hierarchized by space division, further comprising a hierarchical database for storing for each hierarchy to which the data belongs,
The graph search means includes:
The hierarchy to which the data to be registered belongs is specified, and other data belonging to the specified hierarchy is selected as the search start point.

また、第７の発明において、前記グラフ検索手段は、
前記階層データベースに基づいて選択した前記検索開始点と、前記登録対象のデータとの距離が前記規定の距離より離れている場合には、前記データベースから任意のデータを選択し、そのデータに対するリンクを辿ることで該規定の距離内のデータを探索して、該検索開始点として選択することを特徴としている。 In the seventh invention, the graph search means includes:
If the distance between the search start point selected based on the hierarchical database and the data to be registered is more than the specified distance, select any data from the database and link to the data. By searching, data within the specified distance is searched and selected as the search start point.

また、第８の発明は、一方のデータから他方のデータに辿るためのリンクがデータ間に設定されて該データが複数記憶されたデータベースをコンピュータが検索対象のデータ群としてデータ管理する検索データ管理方法において、
入力された検索対象となるデータに対し、前記データベースに記憶されたデータのうちの何れかを検索開始点として選択し、該検索開始点から前記リンクを順次辿ることによって、該リンクを辿る経路上にあるデータのうち、該検索対象のデータと既定の距離内にあるデータを類似データとして検索して出力するグラフ検索工程と、
前記データベースへの登録対象となるデータを前記検索対象として前記グラフ検索工程において検索させて、該登録対象のデータに対する類似データを取得する類似データ取得工程と、
前記登録対象となるデータと、前記類似データ取得工程において取得された類似データとの間にリンクを設定して前記データベースに記憶すると共に、該類似データに設定されているリンクの削減処理を行うリンク最適化工程と、
を前記コンピュータが行うことを特徴としている。 According to an eighth aspect of the present invention, there is provided search data management in which a computer for managing a database in which a plurality of data is stored by setting a link for tracing from one data to the other data as a data group to be searched. In the method
For the input search target data, select one of the data stored in the database as a search start point, and sequentially follow the link from the search start point to A graph search step of searching and outputting data within a predetermined distance from the data to be searched as similar data among the data in
A similar data acquisition step of causing the data to be registered in the database to be searched in the graph search step as the search target, and acquiring similar data for the data to be registered;
A link that sets a link between the data to be registered and the similar data acquired in the similar data acquisition step, stores the link in the database, and performs a process of reducing the links set in the similar data An optimization process;
Is performed by the computer.

また、第９の発明のプログラムは、検索データ管理方法の処理をコンピュータに実行させることを特徴としている。 According to a ninth aspect of the invention, there is provided a program for causing a computer to execute processing of a search data management method.

本発明によれば、データベースにデータを登録する際には、データ間に設定されたリンクを辿った経路上のデータの中から規定の距離内の類似データが検索されるため、データベース内の全データに対して距離を算出する必要がなく、データ群から類似するデータを高速に検索するため、データ登録を高速に行うことができる。また、データベース内にデータ登録する際に、類似データの設定されているリンクの削減をするため、データ登録によってデータ間のリンク構造が煩雑になることを逐次防止できるため、データ登録による検索速度の低下を防止することが可能になる。 According to the present invention, when registering data in a database, similar data within a specified distance is searched from data on a route that follows a link set between the data. Since it is not necessary to calculate the distance to the data and similar data is searched from the data group at high speed, data registration can be performed at high speed. In addition, when registering data in the database, the number of links in which similar data is set can be reduced, so that it is possible to sequentially prevent the link structure between data from becoming complicated due to data registration. It is possible to prevent the decrease.

〔第１実施形態の装置構成〕
本発明を適用した第１実施形態に係る検索装置を、添付の図面に基づいて説明する。尚、本発明の内容は、本実施形態に限定されるものではなく、特許請求の範囲に記載された範囲内において、具体的な構成に対して種々の変更を加えうるものである。 [Apparatus Configuration of First Embodiment]
A search device according to a first embodiment to which the present invention is applied will be described with reference to the accompanying drawings. The contents of the present invention are not limited to the present embodiment, and various modifications can be made to the specific configuration within the scope described in the claims.

また、本実施形態では、ベクトルデータとして、画像の特徴量を表す多次元ベクトルデータを扱う例を説明する。但し、ベクトルデータとしては、音声やその他のマルチメディアデータ情報の特徴量を表すものであってもよいし、他の種類の多次元データであってもよい。 In the present embodiment, an example will be described in which multidimensional vector data representing the feature amount of an image is handled as vector data. However, the vector data may represent features of voice and other multimedia data information, or may be other types of multidimensional data.

図１は、本発明の一例である検索装置１の機能構成を示すブロック図である。検索装置１は、一つのハードウエアあるいはソフトウエアにより構成される必要はない。必要に応じて、複数のハードウエアあるいはソフトウエアの組み合わせにより検索装置１に相当する機能を提供することができる。ネットワーク上に点在する複数のサーバにより、本実施形態の検索装置１を構成することもできる。 FIG. 1 is a block diagram showing a functional configuration of a search device 1 which is an example of the present invention. The search device 1 does not need to be configured by a single piece of hardware or software. If necessary, a function corresponding to the search device 1 can be provided by a combination of a plurality of hardware or software. The search device 1 of this embodiment can also be configured by a plurality of servers scattered on the network.

検索装置１は、クライアント端末から送信される各種要求に応じてデータ処理を行って、その処理結果をクライアント端末に返送する。具体的に、検索装置１は、検索要求である検索クエリを受信すると、検索クエリに応じた検索を行って検索結果をクライアント端末に返送する。また、検索対象のデータを格納したデータベース（以下「ＤＢ」と略す）に新たなデータを登録する要求を受信した場合にはＤＢに登録する。 The search device 1 performs data processing in response to various requests transmitted from the client terminal, and returns the processing result to the client terminal. Specifically, when receiving a search query that is a search request, the search device 1 performs a search according to the search query and returns a search result to the client terminal. When a request for registering new data in a database (hereinafter abbreviated as “DB”) that stores data to be searched is received, it is registered in the DB.

［クライアント端末の構成］
クライアント端末の図示は省略するが、キーボードやマウスやタッチパッドといった入力手段と、ディスプレイやプリンタといった出力手段と、ＣＰＵ等を備えたパーソナルコンピュータにより構成される。クライアント端末は、ネットワークを介して検索装置１と接続されている。また、クライアント端末は、ユーザに指定された検索データや登録データを検索装置１に送る。 [Client terminal configuration]
Although illustration of the client terminal is omitted, the client terminal is configured by a personal computer including an input unit such as a keyboard, a mouse, and a touch pad, an output unit such as a display and a printer, and a CPU. The client terminal is connected to the search device 1 via a network. Further, the client terminal sends search data and registration data designated by the user to the search device 1.

ここで、検索データとは、検索装置１のデータベースに記憶されたデータの中から類似するデータを検索するためのキーとなるデータである。また、登録データとは、検索装置１のデータベースに新たに登録するデータである。尚、検索データ及び登録データから抽出したベクトルデータも以下「検索データ」、「登録データ」という。 Here, the search data is data serving as a key for searching for similar data from data stored in the database of the search device 1. Registration data is data newly registered in the database of the search device 1. The vector data extracted from the search data and registration data is also referred to as “search data” and “registration data”.

［検索装置の構成］
図１によれば、検索装置１は、登録データ入力部１０と、検索データ入力部２０と、検索結果出力部３０と、グラフ検索部４０と、データ登録部５０と、画像ＤＢ６０と、グラフインデックスＤＢ７０とを備えて構成される。 [Configuration of search device]
According to FIG. 1, the search device 1 includes a registration data input unit 10, a search data input unit 20, a search result output unit 30, a graph search unit 40, a data registration unit 50, an image DB 60, a graph index. DB70 is comprised.

登録データ入力部１０は、クライアント端末から送信される登録データを受け付ける機能部であり、データの登録要求と共に登録データを受信すると、該登録データをデータ登録部５０に入力する。 The registration data input unit 10 is a functional unit that receives registration data transmitted from a client terminal. When registration data is received together with a data registration request, the registration data input unit 10 inputs the registration data to the data registration unit 50.

検索データ入力部２０は、クライアント端末から送信される検索データを受け付けるための機能部であり、検索要求と共に検索データを受信すると、その検索データをグラフ検索部４０に入力する。 The search data input unit 20 is a functional unit for receiving the search data transmitted from the client terminal. When the search data is received together with the search request, the search data input unit 20 inputs the search data to the graph search unit 40.

検索結果出力部３０は、検索データ入力部２０の入力に応じてグラフ検索部４０による検索結果をクライアント端末に送信するための機能部である。検索結果出力部３０は、検索結果を表示するための表示データを生成してクライアント端末に送信する。 The search result output unit 30 is a functional unit for transmitting the search result by the graph search unit 40 to the client terminal according to the input of the search data input unit 20. The search result output unit 30 generates display data for displaying the search result and transmits it to the client terminal.

画像ＤＢ６０は、検索対象となる画像データを蓄積記憶するデータベースであり、図１に示すように画像ＩＤと、画像データと、ベクトルデータとを対応付けて記憶する。グラフ検索部４０は、この画像ＤＢ６０に記憶された画像データから複数の特徴量である多次元のベクトルデータを抽出して記憶する。ベクトルデータの次元数は特に制約されないが、検索精度を高めるためには、高い次元数（例えば１０次元以上）とすることが好ましい。 The image DB 60 is a database that stores and stores image data to be searched, and stores an image ID, image data, and vector data in association with each other as shown in FIG. The graph search unit 40 extracts and stores multidimensional vector data as a plurality of feature amounts from the image data stored in the image DB 60. The number of dimensions of the vector data is not particularly limited, but is preferably set to a high number of dimensions (for example, 10 dimensions or more) in order to improve search accuracy.

グラフインデックスＤＢ７０は、グラフ構造のインデックスを記憶するデータベースであり、検索対象となるデータの特徴量を表すベクトルデータを格納している。データ登録部５０は、画像データから抽出したベクトルデータに基づいてグラフ構造のインデックスを生成する。 The graph index DB 70 is a database that stores an index having a graph structure, and stores vector data representing feature amounts of data to be searched. The data registration unit 50 generates a graph structure index based on the vector data extracted from the image data.

また、グラフインデックスＤＢ７０は、一方のベクトルデータから他方のベクトルデータに辿るために設定されたリンクの情報が格納されている。具体的には、図１に示すように、グラフインデックスＤＢ７０は、画像ＤＢ６０に対応したリンク元である画像ＩＤと、このベクトルデータにリンクされた一つ以上の他方のリンク先の画像ＩＤとを対応付けて記憶する。ベクトルデータ間にリンクを張ることにより、図４（ａ）に示すようにベクトル空間上のベクトルデータａ〜ｆの間にグラフ構造が形成される。 Further, the graph index DB 70 stores information on links set to trace from one vector data to the other vector data. Specifically, as shown in FIG. 1, the graph index DB 70 displays an image ID that is a link source corresponding to the image DB 60 and one or more other link destination image IDs linked to the vector data. Store in association with each other. By establishing a link between the vector data, a graph structure is formed between the vector data a to f in the vector space as shown in FIG.

このグラフインデックスＤＢ７０のリンク情報により形成されるグラフ構造、即ち、該リンク情報を適宜「グラフインデックス」という。リンクとは、一つのデータから他のデータを辿ることができる情報である。リンクは一方向であってもよいが、図１に示すように、一方のリンク先を示すデータとその逆方向のリンク先を示すデータとを対に記憶して双方向に辿れる構成とすることで、データの検索速度を向上できる。 The graph structure formed by the link information in the graph index DB 70, that is, the link information is appropriately referred to as “graph index”. A link is information that can trace one data to another data. The link may be unidirectional, but as shown in FIG. 1, the data indicating one link destination and the data indicating the link destination in the opposite direction are stored as a pair and can be traced bidirectionally. Thus, the data search speed can be improved.

図４（ａ）においては、ベクトルデータを点（ａ〜ｆ）により表しており、多次元の特徴量空間にベクトルデータが分布している様子を示している。また、各ベクトルデータの間にリンクが設定され、実線により該リンクが表されている。以下の説明において、ベクトルデータを適宜「点」と称して説明する。 In FIG. 4A, vector data is represented by points (a to f), and shows a state in which vector data is distributed in a multidimensional feature amount space. A link is set between each vector data, and the link is represented by a solid line. In the following description, vector data will be referred to as “points” as appropriate.

グラフ検索部４０は、グラフインデックスＤＢ７０に設定されているリンクを巡回することにより検索データに類似するデータを検索する。グラフ検索部４０は、図１に示すようにベクトルデータ生成部４１と、検索開始点決定部４３と、グラフ巡回部４５と、類似データ特定部４７とを備えて構成される。 The graph search unit 40 searches for data similar to the search data by circulating the links set in the graph index DB 70. As shown in FIG. 1, the graph search unit 40 includes a vector data generation unit 41, a search start point determination unit 43, a graph circulation unit 45, and a similar data specification unit 47.

ベクトルデータ生成部４１は、検索データ入力部２０又はデータ登録部５０から入力されたデータから多次元のベクトルデータとなる特徴量を抽出する。 The vector data generation unit 41 extracts feature quantities that become multidimensional vector data from the data input from the search data input unit 20 or the data registration unit 50.

検索開始点決定部４３は、検索データから抽出したベクトルデータに近接するグラフインデックス上の既存のベクトルデータを、検索開始点として決定する。グラフ検索部４０は、複数のベクトルデータのうちのいずれかを検索開始点として決定し、この検索開始点を起点としてリンクの巡回を始める。 The search start point determination unit 43 determines the existing vector data on the graph index close to the vector data extracted from the search data as the search start point. The graph search unit 40 determines any one of the plurality of vector data as a search start point, and starts the link circulation starting from the search start point.

グラフ巡回部４５は、既定の検索終了条件を満たすまで、検索開始点からグラフインデックスＤＢ７０に設定されているリンクを順次辿る処理を行う。 The graph circulator 45 performs a process of sequentially following the links set in the graph index DB 70 from the search start point until a predetermined search end condition is satisfied.

類似データ特定部４７は、グラフ巡回部４５が辿ったリンクの経路上にあるベクトルデータのうち、検索データに対して既定の距離内にあるベクトルデータを類似データとして特定する。この特定した類似データ、或いはそれに関連する情報が検索結果としてクライアント端末に送られる。 The similar data specifying unit 47 specifies, as similar data, vector data within a predetermined distance with respect to the search data among vector data on the link route followed by the graph circulating unit 45. The identified similar data or related information is sent to the client terminal as a search result.

データ登録部５０は、登録データ入力部１０から入力された登録データを画像ＤＢ６０及びグラフインデックスＤＢ７０に登録する処理を行う。データ登録部５０は、登録データとして入力された画像データに新たな画像ＩＤを割り当てて、画像ＤＢ６０に記憶する。また、画像データから抽出したベクトルデータに画像ＩＤを対応付けて画像ＤＢ６０に記憶する。 The data registration unit 50 performs processing for registering the registration data input from the registration data input unit 10 in the image DB 60 and the graph index DB 70. The data registration unit 50 assigns a new image ID to the image data input as registration data, and stores it in the image DB 60. In addition, the image data is stored in the image DB 60 in association with the vector data extracted from the image data.

図１によれば、データ登録部５０は、類似データ取得部５１と、リンク登録部５３と、リンク最適化部５５とを備えて構成される。 According to FIG. 1, the data registration unit 50 includes a similar data acquisition unit 51, a link registration unit 53, and a link optimization unit 55.

類似データ取得部５１は、登録データ入力部１０から入力された登録データについての類似データを検索する要求をグラフ検索部４０に出力することで、登録データに類似するデータを取得する。 The similar data obtaining unit 51 obtains data similar to the registered data by outputting a request for retrieving similar data for the registered data input from the registered data input unit 10 to the graph retrieving unit 40.

リンク登録部５３は、登録データと、類似データ取得部５１により取得された類似データとの間にリンクを設定して、グラフインデックスＤＢ７０に記憶する。具体的には、登録データについての画像ＩＤとベクトルデータとをグラフインデックスＤＢ７０に登録する際に、類似データの画像ＩＤをリンク先として対応付けて記憶する。また、類似データの画像ＩＤに対したリンク先に、登録データの画像ＩＤを追加することで、登録データと類似データとの間に相互のリンクが形成される。 The link registration unit 53 sets a link between the registration data and the similar data acquired by the similar data acquisition unit 51 and stores the link in the graph index DB 70. Specifically, when registering the image ID and vector data for the registered data in the graph index DB 70, the image ID of the similar data is stored in association with the link destination. Further, by adding the image ID of the registration data to the link destination with respect to the image ID of the similar data, a mutual link is formed between the registration data and the similar data.

リンク最適化部５５は、リンク登録部５３により登録された新たなリンクにより、グラフインデックスのグラフ構造が煩雑にならないようリンクの最適化を行う機能部である。詳細は後述するが、簡単に説明すると、類似データに対してリンク先を追加することにより、類似データからのリンク数が増加していき、これにより、グラフ検索部４０が巡回するリンクが増加してしまい、検索速度の低下を招く可能性がある。そのため、類似データからのリンク先を追加する際には、そのリンク数を適正に保つよう、リンクの見直しを行う。 The link optimization unit 55 is a functional unit that optimizes the link so that the graph structure of the graph index is not complicated by the new link registered by the link registration unit 53. Although details will be described later, in brief, by adding link destinations to similar data, the number of links from the similar data increases, thereby increasing the number of links that the graph search unit 40 circulates. May lead to a decrease in search speed. Therefore, when adding a link destination from similar data, the link is reviewed so as to keep the number of links appropriate.

検索装置１は、グラフ構造を有する検索インデックスであるグラフインデックスＤＢ７０を生成し、グラフ構造上のリンクを辿ることにより検索データに類似するデータを検索する。また、登録データが入力された場合には、検索時に巡回する検索インデックスに登録データを新たに追加して、該登録データも巡回経路とすることによりその登録データの検索も可能になるように、自己生成的にグラフインデックスを形成していく。 The search device 1 generates a graph index DB 70 that is a search index having a graph structure, and searches for data similar to the search data by following links on the graph structure. In addition, when registration data is input, the registration data is newly added to the search index that circulates at the time of the search, and the registration data can also be searched by making the registration data a circulation route. A graph index is formed in a self-generating manner.

〔検索装置の動作〕
次に、本実施形態における検索装置１がデータの登録時に行うグラフ自己生成処理の動作を、図２〜６を参照して詳細に説明する。このグラフ自己生成処理は、予め検索装置１のメモリ上に記憶されたプログラムに基づくことにより実行される。図２及び３は、検索装置１の動作の一例を示すフローチャートであり、図４〜６は、グラフインデックス上でのデータ検索並びにリンク最適化の処理の様子を示す図である。 [Operation of search device]
Next, the operation of the graph self-generation process performed when the search device 1 according to the present embodiment registers data will be described in detail with reference to FIGS. This graph self-generation process is executed based on a program stored in advance in the memory of the search device 1. FIGS. 2 and 3 are flowcharts showing an example of the operation of the search device 1, and FIGS. 4 to 6 are views showing a state of data search and link optimization processing on the graph index.

〔グラフ自己生成処理〕
まず、検索装置１は、初期設定処理を行う（ステップＳ１）。ここでの初期設定は、データ登録を行った場合に新たに登録したデータに設定するリンク数や、リンク削減を行う際の閾値（リンク削減閾値）を予め設定する。 [Graph self-generation processing]
First, the search device 1 performs an initial setting process (step S1). In this initial setting, the number of links set in newly registered data when data registration is performed, and a threshold value (link reduction threshold value) when performing link reduction are set in advance.

次に、登録データ入力部１０から登録データがデータ登録部５０に入力されると（ステップＳ２）、類似データ取得部５１が登録データをグラフ検索部４０に出力してグラフ検索処理を行わせる（ステップＳ３）。 Next, when registration data is input from the registration data input unit 10 to the data registration unit 50 (step S2), the similar data acquisition unit 51 outputs the registration data to the graph search unit 40 to perform graph search processing ( Step S3).

〔グラフ検索処理〕
グラフ検索部４０がグラフ検索処理を開始すると、先ず、ベクトルデータ生成部４１は、登録データから特徴量を抽出することで、ベクトルデータを生成する（ステップＳ４１）。 [Graph search processing]
When the graph search unit 40 starts the graph search process, first, the vector data generation unit 41 generates vector data by extracting feature amounts from the registered data (step S41).

〔検索開始点決定処理〕
ついで、検索開始点決定部４３が、登録データに近接するベクトルデータを、検索開始点として決定する（ステップＳ４３）。図３は、検索開始点を決定するための処理の一例を示すフローチャートである。 [Search start point determination processing]
Next, the search start point determination unit 43 determines vector data close to the registered data as a search start point (step S43). FIG. 3 is a flowchart illustrating an example of a process for determining a search start point.

検索開始点決定部４３は、先ず、グラフインデックスの中から任意の点（例えば、図４（ｂ）の点ａ）を選択し（ステップＳ４３０）、その選択した点（選択点）にリンクされた点（例えば、図４（ｃ）の点ｂ及びｃ）をグラフインデックスＤＢ７０に基づいて取得する（ステップＳ４３２）。 The search start point determination unit 43 first selects an arbitrary point (for example, point a in FIG. 4B) from the graph index (step S430), and is linked to the selected point (selected point). Points (for example, points b and c in FIG. 4C) are acquired based on the graph index DB 70 (step S432).

次に、そのリンクされた点（リンク先）に対する登録データとの特徴量空間における距離を算出する（ステップＳ４３４）。検索開始点決定部４３は、算出した距離の中で登録データと距離が最短の点を抽出し（ステップＳ４３６）、その点と登録データとの距離Ｄ２が、登録データと選択点との距離Ｄ１よりも小さいかを判定する（ステップＳ４３８、４４０）。図４（ｄ）において、選択点ａと登録データｇとの間の距離Ｄ１と、選択点ａとリンク先ｃとの間の距離Ｄ２とを比較すると、距離Ｄ２のほうが小さいと判定される。 Next, the distance in the feature amount space from the registered data for the linked point (link destination) is calculated (step S434). The search start point determination unit 43 extracts the point having the shortest distance from the registered data in the calculated distance (step S436), and the distance D2 between the point and the registered data is the distance D1 between the registered data and the selected point. Or less (steps S438 and 440). In FIG. 4D, when the distance D1 between the selected point a and the registered data g is compared with the distance D2 between the selected point a and the link destination c, it is determined that the distance D2 is smaller.

検索開始点決定部４３は、距離Ｄ２が距離Ｄ１より小さいと判定した場合には（ステップＳ４４０；Ｙｅｓ）、ステップＳ４３６において抽出した点を選択して（ステップＳ４４２）、ステップＳ４３２に処理を移行する。 When the search start point determination unit 43 determines that the distance D2 is smaller than the distance D1 (step S440; Yes), the search start point determination unit 43 selects the point extracted in step S436 (step S442), and the process proceeds to step S432. .

即ち、ステップＳ４３２〜Ｓ４４２のループ処理を行うことにより、最初に任意に選択した点からグラフインデックス上で登録データに近い点を選んでいくようにリンクを辿っていくこととなる。 That is, by performing the loop processing of steps S432 to S442, the link is traced so as to select a point close to the registered data on the graph index from the point arbitrarily selected first.

検索開始点決定部４３は、ステップＳ４４０において、距離Ｄ２が距離Ｄ１よりも大きいと判定した場合には（ステップＳ４４０；Ｎｏ）、選択している点を検索開始点として決定する（ステップＳ４４４）。 When determining that the distance D2 is greater than the distance D1 in step S440 (step S440; No), the search start point determination unit 43 determines the selected point as the search start point (step S444).

図４（ｅ）においては、最初に選択された点ａから登録データｇに近い点が順次選択され、点ｃ→点ｂ→点ｄとリンクが巡回されて、登録データに最も近い点ｄが検索開始点として決定される。 In FIG. 4E, the point closest to the registration data g is sequentially selected from the point a selected first, and the point c → the point b → the point d is circulated so that the point d closest to the registration data becomes the point d. It is determined as a search start point.

尚、上述した検索開始点決定処理において、距離Ｄ２が距離Ｄ１よりも大きいとして検索開始点を決定した場合にも、その検索開始点が登録データを中心とした検索範囲よりも外にある場合がある。この場合、検索開始点が検索範囲外にあるので検索を開始することができない。そのため、ステップＳ４４４で決定した検索開始点が検索範囲外である場合には、新たにグラフインデックス内から任意点を選択して、検索開始点決定処理を所定回数行うことで検索範囲内の検索開始点を決定する。 In the search start point determination process described above, even when the search start point is determined on the assumption that the distance D2 is greater than the distance D1, the search start point may be outside the search range centered on the registered data. is there. In this case, the search cannot be started because the search start point is outside the search range. Therefore, if the search start point determined in step S444 is outside the search range, an arbitrary point is newly selected from the graph index, and the search start point determination process is performed a predetermined number of times to start the search within the search range. Determine the point.

また、ステップＳ４３０で任意選択する点をランダムに選択すると、検索開始点決定処理を繰り返し行った場合に、近接した点を再度選択してしまう可能性がある。その場合、その近接した点からリンクを辿ったとしても同経路のリンクを辿る可能性が高くなり、検索範囲内の検索開始点を決定することができない。 Further, if a point to be arbitrarily selected in step S430 is selected at random, there is a possibility that an adjacent point will be selected again when the search start point determination process is repeated. In this case, even if the link is traced from the adjacent point, the possibility that the link of the same path is traced is high, and the search start point within the search range cannot be determined.

そのため、検索開始点決定処理を所定回数繰り返し行う場合には、過去に選択した点から所定距離離れた点を新たに選択することが好ましい。予めベクトル空間上で分散した点を複数特定しておきその中から任意点を選択することとしてもよい。これにより、検索範囲内の検索開始点を効率よく決定することができる。 Therefore, when the search start point determination process is repeatedly performed a predetermined number of times, it is preferable to newly select a point that is a predetermined distance away from a previously selected point. A plurality of points dispersed in the vector space may be specified in advance, and an arbitrary point may be selected from them. Thereby, the search start point within the search range can be determined efficiently.

［グラフの巡回］
上述のようにして検索開始点を決定すると、グラフ巡回部４５が、検索開始点を起点としてリンクを順次辿っていく（ステップＳ４５）。このとき、一つのベクトルデータからリンクが分岐している（つまり複数の進路がある）場合もあり、その場合には、辿った経路上にあったベクトルデータ、或いはそれを特定するための情報を、適宜の方法で検索装置１のメモリに記憶しておく。 [Turning the graph]
When the search start point is determined as described above, the graph circulator 45 sequentially follows the links starting from the search start point (step S45). At this time, there is a case where the link is branched from one vector data (that is, there are a plurality of paths). In this case, the vector data on the traced path, or information for specifying it is used. Then, it is stored in the memory of the search device 1 by an appropriate method.

［類似データの検索］
グラフ巡回部４５は、経路上のリンクを辿りながら、規定の検索終了条件を満たすかどうかを判断する。検索終了条件としては、種々のものが考えられる。例えば、以下の何れかの検索終了条件がありうる。 [Search for similar data]
The graph circulator 45 determines whether or not a prescribed search end condition is satisfied while following links on the route. Various search termination conditions can be considered. For example, there may be any of the following search end conditions.

（検索終了条件１）
検索開始点を始点としてリンクを辿った結果が、登録データを中心とした所定の検索範囲（例えば、図５（ａ）の検索範囲）をα倍（α＞１）した範囲（すなわち検索限界範囲）を越えること。ここで、αは、例えば１．５など、適宜の値を採用することができる。αを大きくする程、検索もれを少なくできる。一方、αを小さくする程、検索時間を短縮できる。 (Search end condition 1)
The result of following the link starting from the search start point is a range obtained by multiplying a predetermined search range (for example, the search range in FIG. 5A) centered on the registered data by α (α> 1) (that is, the search limit range). ). Here, for α, an appropriate value such as 1.5 can be adopted. As α increases, search leaks can be reduced. On the other hand, the search time can be shortened as α is reduced.

リンクを辿って検索したデータの数が増えた場合に、αの値を減少させることで、検索を早期に終了させることができる。これにより、データの数が過大である場合でも、検索結果を早期に取得することができる。なお、αの値は、１以下にならないように設定することが、検索精度を高めるためには好ましい。但し、早期に検索を終了させたい場合は、α＜１とすることも可能である。 When the number of data searched by following links increases, the search can be terminated early by decreasing the value of α. Thereby, even when the number of data is excessive, a search result can be acquired early. Note that it is preferable to set the value of α so as not to be 1 or less in order to improve the search accuracy. However, if it is desired to end the search at an early stage, it is possible to set α <1.

（検索終了条件２）
検索開始点を始点とするリンクを辿った結果が、登録データを中心とした検索範囲の外側において、リンクを、既定の回数だけ何れも辿ったこと。 (Search end condition 2)
The result of following the link starting from the search start point is that the link has been traced a predetermined number of times outside the search range centered on the registered data.

ここで、既定の回数とは、例えば５回であるが、適宜の値を採用することができる。例えば、図５（ａ）において、点ｂから点ａを辿ると、検索範囲を超えてしまうが、検索範囲を超えてから更に１回リンクを辿ると点ｃに辿り着く。このように、検索範囲の外側での巡回数を設定することで、検索漏れを少なくできる。 Here, the predetermined number of times is, for example, five, but an appropriate value can be adopted. For example, in FIG. 5A, when the point a is traced from the point b, the search range is exceeded, but when the link is followed once more after the search range is exceeded, the point c is reached. Thus, the omission of search can be reduced by setting the number of rounds outside the search range.

一つの経路について検索終了条件を満たしたとき、他の経路におけるリンクを辿る。いずれの経路についても検索終了条件を満たしたとき、次のステップに移る。 When the search end condition is satisfied for one route, the link in the other route is traced. When the search end condition is satisfied for any route, the process proceeds to the next step.

類似データ特定部４７は、検索終了条件を満たすまでグラフの巡回を行い（ステップＳ４７→Ｓ４５）、検索終了条件を満たした場合は（ステップＳ４７；Ｙｅｓ）、巡回した点（ベクトルデータ）と、登録データとの距離を算出し、その距離に基づいてランキングした類似データを出力する（ステップＳ４９）。 The similar data specifying unit 47 circulates the graph until the search end condition is satisfied (step S47 → S45). If the search end condition is satisfied (step S47; Yes), the circulated point (vector data) is registered. The distance to the data is calculated, and similar data ranked based on the distance is output (step S49).

例えば、図５（ａ）において検索開始点を点ｄとした場合は、リンク先である点ｂ、ｅ、ｆのリンクを辿り、図５（ｂ）のように次に点ｅにリンクされている点ｃに辿る。そして、図５（ｃ）のように点ｃから点ａに辿ると検索範囲を超える。この検索範囲を超えたことが検索終了条件である場合に、リンクの巡回を終了する。 For example, when the search start point is point d in FIG. 5 (a), the links of points b, e, and f which are link destinations are traced and then linked to point e as shown in FIG. 5 (b). Trace to point c. Then, as shown in FIG. 5C, when the point c is traced to the point a, the search range is exceeded. When the search end condition is that the search range is exceeded, the link circulation is ended.

そして、検索開始点の点ｄから辿った、点ｂ〜ｆそれぞれと登録データｇとの距離に基づいて類似データのランキングを得る。図５（ｃ）においては、点ｄ、ｂ、ｃ、ｅ、ｆといった順でランキングが得られる。 Then, ranking of similar data is obtained based on the distance between each of the points b to f and the registered data g traced from the search start point d. In FIG. 5C, rankings are obtained in the order of points d, b, c, e, and f.

グラフ検索部４０による類似データの検索が終了すると、データ登録部５０のリンク登録部５３が登録データと類似データとの間にリンクを設定する（ステップＳ５）。ここで、リンク登録部５３は、登録データと類似データとの間の仮のリンクを生成して、後述のリンク最適化処理後にグラフインデックスＤＢ７０に記憶することとしてもよい。尚、仮のリンクも一つベクトルデータから他方のベクトルデータに辿るためのものであり、一時的にメモリに保持される。 When the similar data search by the graph search unit 40 is completed, the link registration unit 53 of the data registration unit 50 sets a link between the registered data and the similar data (step S5). Here, the link registration part 53 is good also as producing | generating the temporary link between registration data and similar data, and memorize | storing it in the graph index DB70 after the link optimization process mentioned later. The temporary link is also for tracing from one vector data to the other vector data, and is temporarily held in the memory.

検索された類似データのランキングの上位から、初期設定により設定されたリンク数分の類似データが選択されて、その類似データに対して仮のリンクが設定される。例えば、リンク数が‘３’である場合には、図５（ｄ）のように登録データｇに類似データの点ｂ、ｃ、ｄがリンクされる。 Similar data for the number of links set by the initial setting is selected from the top of the ranking of similar data searched, and temporary links are set for the similar data. For example, when the number of links is ‘3’, similar data points b, c, and d are linked to the registered data g as shown in FIG.

［リンク最適化処理］
次に、リンク最適化部５５が、リンク最適化処理を行う（ステップＳ６）。本実施形態においてリンク最適化処理は、登録データから仮リンクを生成した類似データをリンク削減対象として行われる。即ち、リンク削減対象のデータについてのリンク数をグラフインデックスＤＢ７０に基づいて取得し、そのリンク数がリンク削減閾値以上であった場合には、そのデータのリンクの削減を行う。 [Link optimization processing]
Next, the link optimization unit 55 performs link optimization processing (step S6). In the present embodiment, the link optimization process is performed on the similar data obtained by generating the temporary link from the registered data as the link reduction target. That is, the number of links for the data to be reduced is acquired based on the graph index DB 70, and if the number of links is equal to or greater than the link reduction threshold, the link of the data is reduced.

このとき、リンク削減対象のデータにリンク（仮リンクを含む）された類似データのうち、距離の遠い順に削減する。例えば、リンク削減閾値が‘４’である場合、図６（ａ）においては、リンクが生成された点ｃ、ｂ、ｄに設定されたリンク数が４となり、リンク削減閾値以上となる。従って、点ｃについては、リンクの中で最も距離の遠い点ｂとの間に設定されたリンクが削減される。また、点ｄについては、点ｅとの間に設定されたリンクが削減され、図６（ｂ）のようなグラフ構造となる。ここで、リンク削減対象の点ｂについては、点ｃのリンクが削減された時点で、リンク数がリンク削減閾値未満となるため、リンクの削減は行わなくともよい。 At this time, the similar data linked (including the temporary link) to the link reduction target data is reduced in order of increasing distance. For example, when the link reduction threshold is ‘4’, in FIG. 6A, the number of links set at points c, b, and d at which links are generated is 4, which is equal to or greater than the link reduction threshold. Therefore, the link set between the point c and the point b farthest among the links is reduced. Further, for the point d, the link set between the point e and the point e is reduced, and a graph structure as shown in FIG. 6B is obtained. Here, the link reduction target point b does not need to be reduced because the number of links becomes less than the link reduction threshold when the link at the point c is reduced.

リンク最適化部５５がリンク削減処理の結果に基づいて、グラフインデックスＤＢ７０のリンク情報を更新すると共に、画像ＤＢ６０に登録データが記憶される（ステップＳ７）。そして、登録データ入力部１０から更に登録データが入力される場合には（ステップＳ８；Ｙｅｓ）、ステップＳ２に処理を移行して、グラフ検索処理、リンクの生成、リンクの最適化を行う。また、新たなデータがない場合には（ステップＳ８；Ｎｏ）、グラフ自己生成処理を終了する。 The link optimization unit 55 updates the link information of the graph index DB 70 based on the result of the link reduction process, and the registration data is stored in the image DB 60 (step S7). When further registration data is input from the registration data input unit 10 (step S8; Yes), the process proceeds to step S2 to perform graph search processing, link generation, and link optimization. If there is no new data (step S8; No), the graph self-generation process is terminated.

また、詳細な説明は省略するが、検索データに対して類似するデータを検索して検索結果をクライアント端末に返す検索処理のみの場合は、検索データ入力部２０から入力される検索データに対して、上述したグラフ検索部４０によるグラフ検索処理を行う。そして、類似データ特定部４７により得られた類似データを距離に基づいてランキングした検索結果を表示データとして検索結果出力部３０が生成して、クライアント端末に返送する。 Further, although detailed description is omitted, in the case of only the search processing for searching similar data to the search data and returning the search result to the client terminal, the search data input from the search data input unit 20 is processed. The graph search processing by the graph search unit 40 described above is performed. And the search result output part 30 produces | generates the search result which ranked similar data obtained by the similar data specific | specification part 47 based on distance as display data, and returns it to a client terminal.

以上、第１実施形態によれば、グラフインデックスのデータ間のリンクを巡回して類似データを検索するという検索アルゴリズムを用いて、登録データに対する類似データを検索し、その類似データと登録データとの間に新たにリンクを生成することで、検索装置１がリンクを辿って検索しているグラフインデックスにデータ登録して、登録後に巡回するデータとしてリンクを形成していく。このように、検索装置１は、検索の際に巡回していくグラフインデックス上にデータの登録を行って、グラフインデックスを自己生成する。 As described above, according to the first embodiment, similar data for registered data is searched using a search algorithm that searches for similar data by searching for links between graph index data. By newly generating a link in the meantime, data is registered in the graph index searched by the search device 1 following the link, and the link is formed as data to be circulated after registration. Thus, the search device 1 registers data on the graph index that circulates during the search, and self-generates the graph index.

また、データの登録を行う都度、新たにリンクが形成されたデータのリンク数の最適化を行うことで、検索装置１は、データの登録の際に、グラフインデックスのリンク構造の最適化を行って、自己修復していく。 In addition, each time data is registered, the search device 1 optimizes the link structure of the graph index when registering data by optimizing the number of links of newly formed data. And self-healing.

従って、グラフインデックスを用いた検索アルゴリズムにより、データベースの全てのデータとの距離を算出することなく、類似データを検索できるため、多次元のベクトルデータを記憶したデータベースに高速にデータ登録していくことができる。また、グラフインデックスのリンク数を最適に保っていくため、データの増加によってリンク構造が煩雑になって検索速度が低下してしまうことを防止できる。 Therefore, similar data can be searched without calculating the distance to all data in the database by a search algorithm using a graph index, so that data can be registered at high speed in a database storing multidimensional vector data. Can do. In addition, since the number of links in the graph index is kept optimal, it is possible to prevent the search speed from being lowered due to a complicated link structure due to an increase in data.

〔第２実施形態〕
次に、検索装置１の第２実施形態について説明する。尚、上述した第１実施形態と同一の機能構成、同一の処理ステップには、説明を省略する。第２実施形態における検索装置１は、図２におけるリンク最適化処理を図７に示すフローチャートによるリンク最適化処理に置き換えることにより実現される。以下、図８のリンク最適化処理の様子を示す図を参照しつつ、リンク最適化処理について説明する。尚、図８のリンク構造の図示において、説明の簡略化のため、図８（ｂ）以降のデータ間のリンクの図示を破線により適宜省略している。 [Second Embodiment]
Next, a second embodiment of the search device 1 will be described. Note that the description of the same functional configuration and the same processing steps as those in the first embodiment will be omitted. The search device 1 in the second embodiment is realized by replacing the link optimization process in FIG. 2 with the link optimization process according to the flowchart shown in FIG. Hereinafter, the link optimization process will be described with reference to the diagram showing the state of the link optimization process of FIG. In the illustration of the link structure in FIG. 8, the illustration of the links between data after FIG. 8B is appropriately omitted by broken lines for the sake of simplicity.

上述したように、グラフインデックスを用いた検索アルゴリズムでは、ベクトルデータ間のリンクを巡回していくことにより、類似データを探索するが、リンク切れが生じてしまうと、ベクトルデータの孤立により巡回できなくなってしまう。第２実施形態の検索装置１は、そのリンク削減を行う際に、そのリンクの修復を行うものである。 As described above, a search algorithm using a graph index searches for similar data by circulating links between vector data. However, if a link break occurs, it cannot be performed due to isolation of vector data. End up. The search device 1 according to the second embodiment repairs the link when the link is reduced.

先ず、リンク最適化部５５は、削減対象のリンクが設定された点について、その点にリンクされた全て点の中から該リンクを保持する点（リンク保持点）を特定する（ステップＳ６１）。このリンク保持点の特定は、例えば、削減対象の点との距離が近いものから所定数の点としてもよいし、所定距離内にある点としてもよい。図８（ａ）においては、削減対象の点ａに対して、距離の近い６つの点ｂ、ｃ、ｄ、ｅ、ｆをリンク保持点として特定している。また、削減対象の点をリンク保持点に含めても良い。 First, the link optimizing unit 55 specifies a point (link holding point) that holds the link among all the points linked to the point for which the reduction target link is set (step S61). The link holding point may be specified by, for example, a predetermined number of points that are close to the point to be reduced or a point within a predetermined distance. In FIG. 8A, six points b, c, d, e, and f that are close to the point a to be reduced are specified as link holding points. Further, the points to be reduced may be included in the link holding points.

図８（ａ）においては、削減対象の点ａに対して、距離の近い５つの点ｂ、ｃ、ｄ、ｅ、ｆをリンク保持点として特定している。そして、削減対象点にリンクされた点のうち、リンク保持点以外の点をリンクの修復対象として、その修復対象点の中から一つずつ選択していく（ステップＳ６２）。図８（ａ）では、点ｇを修復対象点として選択している。 In FIG. 8A, five points b, c, d, e, and f that are close to the point a to be reduced are specified as link holding points. Then, points other than the link holding point among points linked to the reduction target points are selected one by one from the repair target points as link repair targets (step S62). In FIG. 8A, the point g is selected as a repair target point.

次に、リンク最適化部５５は、修復対象点と各リンク保持点との距離を算出し（ステップＳ６３）、その算出した距離の中から最短距離を有するリンク保持点と、修復対象点との間に新たなリンクを設定する（ステップＳ６４）。 Next, the link optimization unit 55 calculates the distance between the repair target point and each link holding point (step S63), and the link holding point having the shortest distance among the calculated distances and the repair target point are calculated. A new link is set between them (step S64).

例えば、図８（ｂ）のように、修復対象点ｇと、リンク保持点ｂ〜ｆのそれぞれとの距離を算出し、図８（ｃ）のように、最短の距離となったリンク保持点ｅと修復対象点ｇとの間に新たなリンクを設定する。尚、修復対象点と、リンク保持点との間に既にリンクが設定されている場合には、リンクの設定を行わなくとも良い。 For example, as shown in FIG. 8B, the distance between the restoration target point g and each of the link holding points b to f is calculated, and the link holding point having the shortest distance as shown in FIG. 8C. A new link is set between e and the repair target point g. When a link has already been set between the repair target point and the link holding point, the link need not be set.

リンク最適化部５５は、リンクの設定を行った修復対象点と、削減対象点との間に設定されているリンクを解除する（ステップＳ６５）。図８（ｄ）において、点ａと点ｇとの間のリンクが解除される。 The link optimization unit 55 releases the link set between the repair target point for which the link has been set and the reduction target point (step S65). In FIG. 8D, the link between the points a and g is released.

次に、リンク保持点以外の削減対象点にリンクされた点、即ち、修復対象点の中でステップＳ６２〜Ｓ６５の修復処理を全ての点に対して行った場合には（ステップＳ６６；Ｎｏ）、リンク最適化処理を終了し、修復しいない点がある場合には（ステップＳ６６；Ｙｅｓ）、ステップＳ６２に処理を移行する。 Next, when the points linked to the reduction target points other than the link holding points, that is, the repair process in steps S62 to S65 is performed on all the points among the repair target points (step S66; No). When the link optimization process is finished and there is a point that has not been repaired (step S66; Yes), the process proceeds to step S62.

図８（ａ）においてリンク保持点として特定された点ｂ〜ｆ以外の点について削減対象点に対するリンクが解除されても、図８（ｅ）のようにそれぞれリンク保持点に対してリンクが設定されるため、リンク切れが生ずることを防止できる。 Even if the link to the reduction target point is released for points other than the points b to f specified as the link holding point in FIG. 8A, a link is set for each link holding point as shown in FIG. Therefore, it is possible to prevent the link from being broken.

以上のように、第２実施形態によれば、リンクの削減対象点と修復対象点とのリンクが解除されても、修復対象点からリンク保持点に対してのリンクが確保されるため、リンク切れが発生することを防止できる。従って、グラフインデックスのリンク削除を行う場合にも、検索性能の低下を防止できる。 As described above, according to the second embodiment, even if the link between the link reduction target point and the repair target point is released, a link from the repair target point to the link holding point is secured. It is possible to prevent cutting. Therefore, it is possible to prevent a decrease in search performance even when the graph index link is deleted.

〔第３実施形態〕
次ぎに、検索装置１の第３実施形態について説明する。上述した実施形態において、検索装置１は、グラフインデックス上で任意選択した点からリンクを辿ることで検索開始点を特定することとしたが、第３実施形態においては、ベクトルデータをベクトル空間上でクラスタリングしてインデックス化しておくことにより、検索開始点を特定する。このインデックスは、公知のベクトル空間の空間分割の技術により生成され、簡単に説明すると以下のように実現される。 [Third Embodiment]
Next, a third embodiment of the search device 1 will be described. In the above-described embodiment, the search device 1 specifies the search start point by following a link from a point arbitrarily selected on the graph index. However, in the third embodiment, vector data is represented on the vector space. The search start point is specified by clustering and indexing. This index is generated by a known space division technique for a vector space, and is briefly described as follows.

先ず、グラフインデックスＤＢ７０中の任意のベクトルデータ（点Ａ）を選択し、その点Ａから最遠の点Ｂを選択する。最遠の点を選ぶためには、点Ａから各点までの距離をそれぞれ計算すればよい。次に、点ＢからグラフインデックスＤＢ７０の全ての点までの距離を求め、距離の順序でソートする。ソートされた順において隣接する距離どうしの間の差分を算出する。例えば、ソートされた距離が、順に、ｄ１，ｄ２，ｄ３，…，ｄｎのように並んでいるとする。この場合、｜ｄ１−ｄ２｜，｜ｄ２−ｄ３｜のように順次算出する。 First, arbitrary vector data (point A) in the graph index DB 70 is selected, and a point B farthest from the point A is selected. In order to select the farthest point, the distance from the point A to each point may be calculated. Next, the distances from the point B to all the points in the graph index DB 70 are obtained and sorted in the order of the distances. The difference between adjacent distances in the sorted order is calculated. For example, it is assumed that the sorted distances are arranged in order as d1, d2, d3,. In this case, the calculation is sequentially performed as | d1-d2 |, | d2-d3 |.

そして、差分が最大である距離の間で、クラスタの分割を行う。例えば、ｄ１とｄ２との距離が最大であるとすると、距離ｄ１に対応する点Ｐ１と、距離ｄ２に対応する点Ｐ２の間で分割を行う。これにより、類似するデータを木構造のクラスタに分けることができる。 Then, the cluster is divided between the distances having the maximum difference. For example, if the distance between d1 and d2 is the maximum, division is performed between a point P1 corresponding to the distance d1 and a point P2 corresponding to the distance d2. Thereby, similar data can be divided into clusters of a tree structure.

木構造のクラスタによるインデックスは、例えば、クラスタの中心点と、クラスタの半径、親子関係を有する子クラスタのＩＤとを対応付けることによりデータベース化される。尚、このクラスタのインデックスの構成は適宜公知技術を適用可能である。 An index by a tree-structured cluster is made into a database by, for example, associating the center point of the cluster with the radius of the cluster and the ID of a child cluster having a parent-child relationship. It should be noted that a known technique can be applied as appropriate to the cluster index configuration.

木構造インデックスからの検索開始点の探索は、先ず、クラスタの中心と半径とに基づいて登録データが属するクラスタを木構造の上層から特定し、下層のクラスタのうち、登録データが属するクラスタを特定する。このようにして、順次、登録データが属する下層のクラスタを特定していき、最下層のクラスタに属するデータを検索開始点として決定する。最下層のクラスタに複数のデータが属している場合、それぞれを検索開始点とすることができる。即ち、登録データに近付く方向に木構造インデックスを辿ることにより、登録データに比較的に近いデータを検索開始点として取得することができる。 To search for the search start point from the tree structure index, first specify the cluster to which the registration data belongs from the upper layer of the tree structure based on the center and radius of the cluster, and then specify the cluster to which the registration data belongs from the lower clusters To do. In this way, the lower layer cluster to which the registered data belongs is sequentially identified, and the data belonging to the lowermost layer cluster is determined as the search start point. When a plurality of data belong to the lowermost cluster, each can be set as a search start point. That is, by tracing the tree structure index in the direction approaching the registered data, data relatively close to the registered data can be acquired as a search start point.

空間分割によりベクトルデータ群をクラスタに分割した様子を図９に示す。このように、分割したクラスタ毎にベクトルデータを対応付けたインデックスを生成しておくことで、同一階層にあるベクトルデータを高速に検索することができる。 FIG. 9 shows a state where the vector data group is divided into clusters by space division. Thus, by generating an index in which vector data is associated with each divided cluster, vector data in the same hierarchy can be searched at high speed.

ここで、例えば、図９に示すように、登録データが入力されたとし、木構造インデックスを探索することで同階層のベクトルデータを探索するが、そのベクトルデータが登録データを中心とした検索範囲外である場合がある。この場合、同階層のベクトルデータを検索開始点と設定すると、検索効率が悪くなってしまう。 Here, for example, as shown in FIG. 9, it is assumed that registration data is input, and the vector data in the same hierarchy is searched by searching the tree structure index. However, the search range centered on the registration data is the vector data. May be outside. In this case, if the vector data of the same hierarchy is set as the search start point, the search efficiency is deteriorated.

そのため、先ず、同一階層のベクトルデータをクラスタリングによるインデックスを用いて探索し、そのベクトルデータが検索範囲内であるか否かを判定する。そして、検索範囲内であれば、そのベクトルデータを検索開始点として特定する。 Therefore, first, vector data in the same hierarchy is searched using an index by clustering, and it is determined whether or not the vector data is within the search range. If it is within the search range, the vector data is specified as the search start point.

また、検索範囲外であるならば、上述した検索開始点決定処理（図３参照）に従って、その同一階層のベクトルデータを任意点として選択して開始して検索開始点を決定する。これにより、検索開始点を高速に決定できると共に、検索結果の精度を保つことができる。 If it is outside the search range, the search start point is determined by selecting and starting vector data of the same hierarchy as an arbitrary point according to the search start point determination process (see FIG. 3) described above. As a result, the search start point can be determined at high speed and the accuracy of the search result can be maintained.

次ぎに、上述した実施形態における変形例を説明する。
［変形例１］
登録データのベクトルデータと同一のベクトルデータが既にグラフインデックスＤＢ７０に記憶されている場合には、グラフインデックスＤＢ７０には記憶せずに、画像ＤＢ６０にのみ記憶し、既に記憶されている同一のベクトルデータに関連付けることができる。この場合、グラフインデックスＤＢ７０には、画像ＩＤの他に、同一ベクトルデータを有する画像データの画像ＩＤを関連ＩＤとして記憶するデータ項目を新たに設ける。 Next, a modification of the above-described embodiment will be described.
[Modification 1]
When the same vector data as the vector data of the registered data is already stored in the graph index DB 70, it is stored only in the image DB 60 without being stored in the graph index DB 70. Can be associated with In this case, the graph index DB 70 is newly provided with a data item for storing the image ID of the image data having the same vector data as the related ID in addition to the image ID.

データ検索の際に、類似データとして検索したデータに関連ＩＤが対応付けられている場合には、その関連ＩＤで示される画像データも検索結果として出力することができる。このため、同一のベクトルデータの登録によりグラフインデックスの構造が煩雑化して検索速度が低下してしまうことを防止できる。 When a related ID is associated with data searched as similar data during data search, the image data indicated by the related ID can also be output as a search result. For this reason, it is possible to prevent the search speed from being lowered due to the complicated structure of the graph index due to the registration of the same vector data.

［変形例２］
グラフ検索処理の検索範囲を、登録データを中心とした規定の距離により定めたが、検索開始点から巡回する検索数により定めることとしてもよい。即ち、上述した方法により検索開始点を決定し、その検索開始点からリンクを辿る。このとき、リンクの巡回により所定の検索数（例えば、１０個）のデータを収集する。そして、収集した各データと登録データの距離の順に各データをソートして、巡回したデータのリストを得る。このリストにおいて検索数分のベクトルデータを保持し、登録データに対して最遠のベクトルデータに対応する距離を、検索範囲として更新していく。 [Modification 2]
Although the search range of the graph search process is determined by a specified distance centered on the registered data, it may be determined by the number of searches that circulate from the search start point. That is, the search start point is determined by the method described above, and the link is traced from the search start point. At this time, data of a predetermined number of searches (for example, 10) is collected by link circulation. Then, the data is sorted in the order of the distance between the collected data and the registered data, and a list of the circulated data is obtained. Vector data corresponding to the number of searches is held in this list, and the distance corresponding to the farthest vector data with respect to the registered data is updated as the search range.

即ち、検索終了条件を満たさない限りリンクを辿り、上記のリスト内のデータよりも登録データに近いデータが見つかった場合には、その距離順に上記リストに追加する。そして、このデータ追加によりリスト内において更新された最遠のデータに応じて検索範囲が更新される。これにより、検索数分のデータの収集に応じて登録データとの距離が近いデータでリストが更新されていくことで、検索範囲を狭めていくことができる。このため、検索数の指定に応じて、検索範囲を動的に定めることができる。また、近いデータが検索されるたびに、検索範囲を狭めることができるので、検索精度を維持しつつ、検索速度を向上させることができる。 That is, as long as the search end condition is not satisfied, the link is followed, and if data closer to the registered data than the data in the list is found, the data is added to the list in the order of the distance. Then, the search range is updated according to the farthest data updated in the list by this data addition. As a result, the search range can be narrowed by updating the list with data that is close to the registered data according to the collection of data for the number of searches. For this reason, the search range can be dynamically determined according to the designation of the number of searches. Moreover, since the search range can be narrowed each time near data is searched, the search speed can be improved while maintaining the search accuracy.

［変形例３］
上述の変形例２の手法を用いて、検索開始点の決定のためのリンク巡回する範囲を定めることとしてもよい。即ち、グラフインデックスから任意の点を選択して最近傍の点を特定する処理を、検索数を１個としたグラフ検索処理と見なして、リンク巡回することで検索される登録データにより近いデータを１つ収集して保持・更新することで、そのデータと登録データとの距離を検索範囲の半径として動的に定めていく。 [Modification 3]
The range of link circulation for determining the search start point may be determined using the method of the second modification described above. That is, the process of selecting an arbitrary point from the graph index and specifying the nearest point is regarded as a graph search process with one search, and the data closer to the registered data to be searched by link circulation is obtained. By collecting, holding and updating one, the distance between the data and the registered data is dynamically determined as the radius of the search range.

リンクを巡回する際には、動的に定めた検索範囲を超えないようにデータを選択していく。このため、検索開始点を特定するためにリンクを辿りながら、検索範囲を狭めていくことができ、検索開始点特定までの時間を短くできる。 When the link is visited, data is selected so as not to exceed the dynamically determined search range. Therefore, the search range can be narrowed while following the link to specify the search start point, and the time until the search start point is specified can be shortened.

また、検索範囲をα倍（α＞１）した範囲（検索限界範囲）を越えない範囲でリンクを巡回することにより、登録データに近いデータがリンクされていない場合があっても、巡回する範囲が広がるため、最適な検索開始点の特定することができるようになる。 In addition, the link is circulated within a range that does not exceed the range (search limit range) obtained by multiplying the search range by α (α> 1), so that the range that circulates even when data close to the registered data may not be linked. As a result, the optimum search start point can be identified.

［変形例４］
検索開始点を決定する際に巡回したデータが最終的な検索結果となり得る可能性があるため、図３のステップＳ４３４において算出した距離を巡回履歴としてメモリに保持しておき、グラフ検索においてその巡回履歴を参照することで、距離の算出処理を省略することができる。 [Modification 4]
Since the data circulated when determining the search start point may be the final search result, the distance calculated in step S434 in FIG. 3 is stored in the memory as a circulation history, and the circulation is performed in the graph search. The distance calculation process can be omitted by referring to the history.

尚、上述した各実施形態の動作は、コンピュータに適宜のコンピュータソフトウエアを組み込むことにより実施することができる。例えば、前記した各構成要素は、機能ブロックとして存在していればよく、独立したハードウエアとして存在しなくても良い。また、実装方法としては、ハードウエアを用いてもコンピュータソフトウエアを用いても良い。さらに、本発明における一つの機能要素が複数の機能要素の集合によって実現されても良く、本発明における複数の機能要素が一つの機能要素により実現されても良い。 The operations of the above-described embodiments can be implemented by incorporating appropriate computer software into the computer. For example, each component described above may exist as a functional block, and may not exist as independent hardware. As a mounting method, hardware or computer software may be used. Furthermore, one functional element in the present invention may be realized by a set of a plurality of functional elements, and a plurality of functional elements in the present invention may be realized by one functional element.

また、機能要素は、物理的に離間した位置に配置されていてもよい。この場合、機能要素どうしがネットワークにより接続されていても良い。グリッドコンピューティングにより機能を実現し、あるいは機能要素を構成することも可能である。 Moreover, the functional element may be arrange | positioned in the position physically separated. In this case, the functional elements may be connected by a network. It is also possible to realize functions or configure functional elements by grid computing.

第１実施形態における検索装置の機能構成の一例を示すブロック図。The block diagram which shows an example of a function structure of the search device in 1st Embodiment. 第１実施形態におけるグラフ自己生成処理の処理内容を説明するためのフローチャート。The flowchart for demonstrating the processing content of the graph self-generation process in 1st Embodiment. 第１実施形態における検索開始点決定の処理フローを説明するためのフローチャート。The flowchart for demonstrating the processing flow of the search start point determination in 1st Embodiment. 第１実施形態における検索開始点の決定の処理の具体例を示す模式図。The schematic diagram which shows the specific example of the process of the determination of the search start point in 1st Embodiment. 第１実施形態における類似データを検索の処理の具体例を示す模式図。The schematic diagram which shows the specific example of the process of searching for similar data in 1st Embodiment. 第１実施形態におけるリンク最適化処理の処理内容の具体例を示す模式図。The schematic diagram which shows the specific example of the processing content of the link optimization process in 1st Embodiment. 第２実施形態におけるリンク最適化処理の処理フローを説明するためのフローチャート。The flowchart for demonstrating the processing flow of the link optimization process in 2nd Embodiment. 第２実施形態におけるリンク最適化処理の処理内容の具体例を示す模式図。The schematic diagram which shows the specific example of the processing content of the link optimization process in 2nd Embodiment. 第３実施形態における木構造インデックスの一例を示す図。The figure which shows an example of the tree structure index in 3rd Embodiment.

１検索装置
１０登録データ入力部
２０検索データ入力部
３０検索結果出力部
４０グラフ検索部
４１ベクトルデータ生成部
４３検索開始点決定部
４５グラフ巡回部
４７類似データ特定部
５０データ登録部
５１類似データ取得部
５３リンク登録部
５５リンク最適化部
６０画像ＤＢ
７０グラフインデックスＤＢ DESCRIPTION OF SYMBOLS 1 Search apparatus 10 Registration data input part 20 Search data input part 30 Search result output part 40 Graph search part 41 Vector data generation part 43 Search start point determination part 45 Graph circulation part 47 Similar data specification part 50 Data registration part 51 Similar data acquisition Unit 53 link registration unit 55 link optimization unit 60 image DB
70 Graph Index DB

Claims

A database in which a link for tracing from one data to the other data is set between the data and a plurality of the data is stored;
For the input search target data, select any one of the data stored in the database as a search start point, and sequentially follow the link from the search start point, so that the path following the link Graph search means for searching for and outputting data within a predetermined distance from data to be searched as similar data among certain data,
Similar data acquisition means for acquiring similar data for the registration target data by inputting the data to be registered in the database to the graph search means;
A link that sets a link between the data to be registered and the similar data acquired by the similar data acquisition unit, stores the link in the database, and performs a process of reducing the links set in the similar data Optimization means;
A search data management device comprising:

The link optimization means includes:
When the number of links set in the similar data is equal to or greater than a predetermined number, the predetermined number of the links are held in order from the shortest distance between the data set with the links, and other links are released. The search data management device according to claim 1.

The link optimization means includes:
From the other data linked to the data to be reduced in the link, the holding data for holding the link and the release data for releasing the link are identified,
3. The search data according to claim 1, wherein a link is set between the release data and the held data based on a distance between the release data and the held data and stored in the database. Management device.

The graph search means includes:
Selecting arbitrary data from the database, searching the data within the predetermined distance from the registration target data by sequentially following the link from the data, and selecting the data as a search start point The search data management device according to any one of claims 1 to 3.

The graph search means includes:
As a result of following the link from the arbitrary data, if there is no data within the specified distance in the data on the path following the link, data different from the selected arbitrary data is displayed. 5. The search data management apparatus according to claim 4, wherein the search start point is searched by performing a process of newly selecting and sequentially following the links again.

The data stored in the database is further hierarchized by space division, further comprising a hierarchical database for storing for each hierarchy to which the data belongs,
The graph search means includes:
The search data according to any one of claims 1 to 3, wherein the hierarchy to which the registration target data belongs is specified, and other data belonging to the specified hierarchy is selected as the search start point. Management device.

The graph search means includes:
If the distance between the search start point selected based on the hierarchical database and the data to be registered is more than the specified distance, select any data from the database and link to the data. 7. The search data management apparatus according to claim 6, wherein the search data is searched for and searched for within the specified distance and selected as the search start point.

In a search data management method in which a link for tracing from one data to the other data is set between the data, and a computer manages a database in which a plurality of the data is stored as a data group to be searched,
For the input search target data, select one of the data stored in the database as a search start point, and sequentially follow the link from the search start point to A graph search step of searching and outputting data within a predetermined distance from the data to be searched as similar data among the data in
A similar data acquisition step of causing the data to be registered in the database to be searched in the graph search step as the search target, and acquiring similar data for the data to be registered;
A link that sets a link between the data to be registered and the similar data acquired in the similar data acquisition step, stores the link in the database, and performs a process of reducing the links set in the similar data An optimization process;
The search data management method, wherein the computer performs

The program for making a computer perform the process of the search method of Claim 8.