JP5179564B2

JP5179564B2 - Query segment position determination device

Info

Publication number: JP5179564B2
Application number: JP2010292481A
Authority: JP
Inventors: 純平三宅; 浩司塚本
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2010-12-28
Filing date: 2010-12-28
Publication date: 2013-04-10
Anticipated expiration: 2030-12-28
Also published as: JP2012141681A

Description

本発明は、検索エンジン等における検索クエリに対する操作技術に関する。 The present invention relates to an operation technique for a search query in a search engine or the like.

インターネット上には、膨大な情報の中から所望の情報を探し出すためのツールとして検索エンジンと呼ばれるサイトが設けられている。 A site called a search engine is provided on the Internet as a tool for searching for desired information from a vast amount of information.

検索エンジンには、予め機械的あるいは人的に、検索キーワードと当該検索キーワードが含まれるページとを対応付けた検索ＤＢ（Data Base）が設けられており、ユーザが端末装置から入力した検索クエリに基づいて検索ＤＢを検索し、ヒットしたページのタイトルやスニペット等をＵＲＬ（Uniform Resource Locator）リンクを伴って検索結果として端末装置に表示する。ユーザはタイトルやスニペット等を見た上で実際のページを見てみようと思った場合、タイトル等を選択することでリンクされたページに画面が遷移する。 The search engine is provided with a search DB (Data Base) in which a search keyword and a page including the search keyword are associated with each other in advance mechanically or manually, and a search query input by a user from a terminal device is provided. Based on the search DB, the title or snippet of the hit page is displayed on the terminal device as a search result with a URL (Uniform Resource Locator) link. When the user wants to see the actual page after viewing the title, snippet, etc., the screen transitions to the linked page by selecting the title, etc.

ところで、検索クエリとしては、一つの単語等を指定する場合のほかに、「まんが□立ち読み」といったように、スペース「□」等をセパレータとして指定する場合が多い。この場合、「まんが」と「立ち読み」の両者を含むページが検索（ＡＮＤ検索）される。検索エンジンは、「まんが」に基づく検索結果と「立ち読み」に基づく検索結果とをＡＮＤ演算し、最終的な検索結果としてユーザに返す。なお、セパレータにより区切ることはセグメンテーションと呼ばれ、検索クエリ上のセパレータの位置はクエリセグメント位置と呼ばれる。 By the way, as a search query, in addition to designating a single word or the like, a space “□” or the like is often designated as a separator, such as “Manga □ browsing”. In this case, a page including both “manga” and “browsing” is searched (AND search). The search engine performs an AND operation on the search result based on “manga” and the search result based on “browsing” and returns the result to the user as a final search result. Separation by a separator is called segmentation, and the position of the separator on the search query is called a query segment position.

しかし、ユーザが適切な位置にセパレータを入れるとは限らず、セパレータを除けば同じ文字列であっても、クエリセグメント位置の違いによって検索結果が変わってしまい、所望の検索結果が得られない場合がある。例えば、上記の例「まんが□立ち読み」に代えて「まんが立ち□読み」と入力された場合、検索エンジンは「まんが立ち」と「読み」を個別に検索し、それぞれの検索結果のＡＮＤ演算を行うため、「まんが□立ち読み」の最終的な検索結果とは異なってしまう。より現実的な例としては、芸能人の楽曲のタイトル等を検索しようとした場合、楽曲のタイトルが２つ以上の語に分割できる場合に、連続して検索クエリにした場合と途中にセパレータを入れて検索クエリにした場合とで検索結果が異なってしまう。この場合、楽曲のタイトルは分割されずに検索ＤＢに登録される場合が多いため、分割せずに連続して検索クエリにした場合は所望の検索結果が得られるが、分割した場合は一般用語のノイズに紛れて所望の検索結果が見つけられないことが多い。 However, the user does not always put a separator at an appropriate position, and the search result changes depending on the query segment position and the desired search result cannot be obtained even if the character string is the same except for the separator. There is. For example, if “Manga standing reading” is entered instead of “Manga standing reading” in the above example, the search engine searches for “Manga standing” and “reading” separately, and performs an AND operation on each search result. This is different from the final search result of “Manga □ Browsing”. As a more realistic example, when trying to search for a celebrity song title, etc., when the song title can be divided into two or more words, a separator is inserted between the search query and the search query. The search results will differ depending on the search query. In this case, since the title of the music is often registered in the search DB without being divided, a desired search result can be obtained when the search query is made continuously without being divided, but in the case of dividing the general term In many cases, the desired search results cannot be found due to noise.

特開２００９−３０１１４０号公報JP 2009-301140 A

上述したようにユーザの入力した検索クエリに基づいて単に検索を行ったのでは検索精度の低下を招くこととなり、検索精度を向上する対策が求められていた。 As described above, simply performing a search based on a search query input by the user causes a decrease in search accuracy, and a countermeasure for improving the search accuracy has been demanded.

一方、特許文献１には、テキストデータベースに蓄積されたテキストデータに対してテキストセグメンテーションを行う技術が開示されているが、検索クエリを対象としたものではない。 On the other hand, Patent Document 1 discloses a technique for performing text segmentation on text data accumulated in a text database, but is not intended for a search query.

本発明は上記の従来の問題点に鑑み提案されたものであり、その目的とするところは、ユーザにより入力された検索クエリのクエリセグメント位置を適正な位置に修正することで、検索精度を高めることにある。 The present invention has been proposed in view of the above-described conventional problems, and its object is to improve search accuracy by correcting the query segment position of a search query input by a user to an appropriate position. There is.

上記の課題を解決するため、本発明にあっては、請求項１に記載されるように、検索ログを取得する検索ログ取得手段と、取得した検索ログの検索クエリからセパレータを削除するセパレータ削除手段と、セパレータが削除された検索クエリを複数の文字列に分割するクエリ分割手段と、分割された文字列に基づいて検索データベースを検索する第１検索手段と、検索結果に基づいて、分割された一の文字列が隣接する文字列と連接共起する頻度をスコアとして算出する解析手段と、算出されたスコアに基づいてクエリセグメント位置を決定するクエリセグメント位置決定手段とを備えるクエリセグメント位置決定装置を要旨としている。 In order to solve the above-described problems, in the present invention, as described in claim 1, a search log acquisition unit that acquires a search log, and a separator deletion that deletes a separator from a search query of the acquired search log And a query dividing means for dividing the search query from which the separator is deleted into a plurality of character strings, a first search means for searching a search database based on the divided character strings, and a search result. Query segment position determination comprising: analysis means for calculating as a score the frequency at which a single character string is connected and co-occurring with adjacent character strings; and query segment position determination means for determining a query segment position based on the calculated score The gist of the device.

また、請求項２に記載されるように、請求項１に記載のクエリセグメント位置決定装置において、前記クエリ分割手段は、形態素解析により検索クエリを複数の文字列に分割するようにすることができる。 In addition, as described in claim 2, in the query segment position determination device according to claim 1, the query dividing unit can divide the search query into a plurality of character strings by morphological analysis. .

また、請求項３に記載されるように、請求項１に記載のクエリセグメント位置決定装置において、前記クエリ分割手段は、検索ログから最も分割数の多いパターンを選択することで検索クエリを複数の文字列に分割するようにすることができる。 Moreover, as described in claim 3, in the query segment position determination device according to claim 1, the query dividing unit selects a plurality of search queries by selecting a pattern having the largest number of divisions from the search log. It can be divided into character strings.

また、請求項４に記載されるように、請求項１に記載のクエリセグメント位置決定装置において、前記クエリ分割手段は、検索ログから文字列の結合モデルを複数生成することで検索クエリを複数の文字列に分割するようにすることができる。 In addition, as described in claim 4, in the query segment position determination device according to claim 1, the query dividing unit generates a plurality of search queries by generating a plurality of character string combination models from the search log. It can be divided into character strings.

また、請求項５に記載されるように、検索ログを取得する検索ログ取得手段と、取得した検索ログの検索クエリからセパレータを削除するセパレータ削除手段と、セパレータが削除された検索クエリを複数の文字列に分割するクエリ分割手段と、分割された文字列に基づいて検索データベースを検索する第１検索手段と、検索結果に基づいて、分割された一の文字列が隣接する文字列と連接共起する頻度をスコアとして算出する解析手段と、算出されたスコアに基づいてクエリセグメント位置を決定するクエリセグメント位置決定手段と、決定されたクエリセグメント位置に基づいてパターン認識部の学習を行う学習手段と、ユーザから入力された検索クエリを前記パターン認識部によりクエリセグメント位置の適否を判断し、適正なクエリセグメント位置に修正を行うクエリ修正手段と、修正された検索クエリに基づいて前記検索データベースを検索する第２検索手段とを備える検索装置として構成することができる。 In addition, as described in claim 5, a search log acquisition unit that acquires a search log, a separator deletion unit that deletes a separator from a search query of the acquired search log, and a plurality of search queries from which the separator has been deleted Query dividing means for dividing the character string, first search means for searching the search database based on the divided character string, and one divided character string based on the search result are connected to the adjacent character string. Analyzing means for calculating the frequency of occurrence as a score, query segment position determining means for determining a query segment position based on the calculated score, and learning means for learning the pattern recognition unit based on the determined query segment position And the search query input from the user is judged by the pattern recognition unit to determine the suitability of the query segment position. It can be constructed and query correction means corrects the segment position as a search device and a second search means for searching the search database based on the modified search query.

また、請求項６に記載されるように、ユーザから検索クエリを受け付ける受付手段と、受け付けた検索クエリからセパレータを削除するセパレータ削除手段と、セパレータが削除された検索クエリを複数の文字列に分割するクエリ分割手段と、分割された文字列に基づいて検索データベースを検索する第１検索手段と、検索結果に基づいて、分割された一の文字列が隣接する文字列と連接共起する頻度をスコアとして算出する解析手段と、算出されたスコアに基づいてクエリセグメント位置を決定するクエリセグメント位置決定手段と、決定されたクエリセグメント位置決定手段に基づいて検索クエリを修正するクエリ修正手段と、修正された検索クエリに基づいて前記検索データベースを検索する第２検索手段とを備える検索装置として構成することができる。 In addition, as described in claim 6, a receiving unit that receives a search query from a user, a separator deleting unit that deletes a separator from the received search query, and a search query from which the separator has been deleted is divided into a plurality of character strings. Query dividing means, first search means for searching a search database based on the divided character strings, and frequency of one divided character string concatenated with adjacent character strings based on the search result Analyzing means for calculating as a score, query segment position determining means for determining a query segment position based on the calculated score, query correcting means for correcting a search query based on the determined query segment position determining means, and correction And a second search means for searching the search database based on the search query. It can be.

また、請求項７に記載されるように、検索装置の制御部が、検索ログを取得する検索ログ取得工程と、前記制御部が、取得した検索ログの検索クエリからセパレータを削除するセパレータ削除工程と、前記制御部が、セパレータが削除された検索クエリを複数の文字列に分割するクエリ分割工程と、前記制御部が、分割された文字列に基づいて検索データベースを検索する第１検索工程と、前記制御部が、検索結果に基づいて、分割された一の文字列が隣接する文字列と連接共起する頻度をスコアとして算出する解析工程と、前記制御部が、算出されたスコアに基づいてクエリセグメント位置を決定するクエリセグメント位置決定工程と、前記制御部が、決定されたクエリセグメント位置に基づいてパターン認識部の学習を行う学習工程と、前記制御部が、ユーザから入力された検索クエリを前記パターン認識部によりクエリセグメント位置の適否を判断し、適正なクエリセグメント位置に修正を行うクエリ修正工程と、前記制御部が、修正された検索クエリに基づいて前記検索データベースを検索する第２検索工程とを備える検索制御方法として構成することができる。 In addition, as described in claim 7, a search log acquisition step in which the control unit of the search device acquires a search log, and a separator deletion step in which the control unit deletes a separator from the search query of the acquired search log A query dividing step in which the control unit divides the search query from which the separator has been deleted into a plurality of character strings; and a first search step in which the control unit searches a search database based on the divided character strings; The control unit calculates, as a score, the frequency at which one divided character string is concatenated and co-occurs with an adjacent character string based on the search result; and the control unit is based on the calculated score. A query segment position determining step for determining a query segment position, and a learning step in which the control unit learns the pattern recognition unit based on the determined query segment position; The control unit determines the suitability of the query segment position by the pattern recognition unit for the search query input from the user, and corrects the query segment position to an appropriate query segment position, and the control unit performs the corrected search. It can comprise as a search control method provided with the 2nd search process of searching the search database based on a query.

また、請求項８に記載されるように、検索装置の制御部が、ユーザから検索クエリを受け付ける受付工程と、前記制御部が、受け付けた検索クエリからセパレータを削除するセパレータ削除工程と、前記制御部が、セパレータが削除された検索クエリを複数の文字列に分割するクエリ分割工程と、前記制御部が、分割された文字列に基づいて検索データベースを検索する第１検索工程と、前記制御部が、検索結果に基づいて、分割された一の文字列が隣接する文字列と連接共起する頻度をスコアとして算出する解析工程と、前記制御部が、算出されたスコアに基づいてクエリセグメント位置を決定するクエリセグメント位置決定工程と、前記制御部が、決定されたクエリセグメント位置決定手段に基づいて検索クエリを修正するクエリ修正工程と、前記制御部が、修正された検索クエリに基づいて前記検索データベースを検索する第２検索工程とを備える検索制御方法として構成することができる。 In addition, as described in claim 8, the control unit of the search device receives a search query from a user, the separator delete step of deleting the separator from the received search query, and the control A query dividing step of dividing the search query from which the separator is removed into a plurality of character strings, a first search step of searching the search database based on the divided character strings, and the control unit An analysis step of calculating, as a score, the frequency with which one divided character string is concatenated and co-occurring with an adjacent character string based on the search result, and the control unit determines a query segment position based on the calculated score A query segment position determining step for determining the query, and the control unit correcting the search query based on the determined query segment position determining means When the control unit can be configured as a search control method and a second search step of searching the search database based on the modified search query.

また、請求項９に記載されるように、検索装置を構成するコンピュータを、検索ログを取得する検索ログ取得手段、取得した検索ログの検索クエリからセパレータを削除するセパレータ削除手段、セパレータが削除された検索クエリを複数の文字列に分割するクエリ分割手段、分割された文字列に基づいて検索データベースを検索する第１検索手段、検索結果に基づいて、分割された一の文字列が隣接する文字列と連接共起する頻度をスコアとして算出する解析手段、算出されたスコアに基づいてクエリセグメント位置を決定するクエリセグメント位置決定手段、決定されたクエリセグメント位置に基づいてパターン認識部の学習を行う学習手段、ユーザから入力された検索クエリを前記パターン認識部によりクエリセグメント位置の適否を判断し、適正なクエリセグメント位置に修正を行うクエリ修正手段、修正された検索クエリに基づいて前記検索データベースを検索する第２検索手段として機能させる検索制御プログラムとして構成することができる。 According to a ninth aspect of the present invention, a computer constituting the search device includes a search log acquisition unit that acquires a search log, a separator deletion unit that deletes a separator from a search query of the acquired search log, and the separator is deleted. Query dividing means for dividing the search query into a plurality of character strings, first search means for searching the search database based on the divided character strings, and characters adjacent to one divided character string based on the search results Analyzing means for calculating the frequency of concatenated co-occurrence with a column as a score, query segment position determining means for determining a query segment position based on the calculated score, and learning of the pattern recognition unit based on the determined query segment position The learning means determines whether the query segment position is appropriate by the pattern recognition unit based on the search query input by the user. Sectional and can be configured as an appropriate query correction means corrects the query segment location, the search control program to function as the second searching means for searching the search database based on the modified search query.

また、請求項１０に記載されるように、検索装置を構成するコンピュータを、ユーザから検索クエリを受け付ける受付手段、受け付けた検索クエリからセパレータを削除するセパレータ削除手段、セパレータが削除された検索クエリを複数の文字列に分割するクエリ分割手段、分割された文字列に基づいて検索データベースを検索する第１検索手段、検索結果に基づいて、分割された一の文字列が隣接する文字列と連接共起する頻度をスコアとして算出する解析手段、算出されたスコアに基づいてクエリセグメント位置を決定するクエリセグメント位置決定手段、決定されたクエリセグメント位置決定手段に基づいて検索クエリを修正するクエリ修正手段、修正された検索クエリに基づいて前記検索データベースを検索する第２検索手段として機能させる検索制御プログラムとして構成することができる。 According to a tenth aspect of the present invention, a computer constituting the search device includes: a reception unit that receives a search query from a user; a separator deletion unit that deletes a separator from the received search query; and a search query from which the separator has been deleted. Query dividing means for dividing into a plurality of character strings, first search means for searching a search database based on the divided character strings, and one divided character string connected to an adjacent character string based on the search result Analyzing means for calculating the frequency of occurrence as a score, query segment position determining means for determining a query segment position based on the calculated score, query correcting means for correcting a search query based on the determined query segment position determining means, Second search means for searching the search database based on the corrected search query It can be configured as a search control program to function.

本発明にあっては、ユーザにより入力された検索クエリのクエリセグメント位置を適正な位置に修正することで、検索精度を高めることができる。 In the present invention, it is possible to improve the search accuracy by correcting the query segment position of the search query input by the user to an appropriate position.

本発明の第１の実施形態にかかるシステムの構成例を示す図である。It is a figure which shows the structural example of the system concerning the 1st Embodiment of this invention. 検索ログのデータ構造例を示す図である。It is a figure which shows the example of a data structure of a search log. 検索ＤＢのデータ構造例を示す図である。It is a figure which shows the data structure example of search DB. 検索装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of a search device. 第１の実施形態の処理例を示すフローチャートである。It is a flowchart which shows the process example of 1st Embodiment. データの遷移の様子を示す図（その１）である。It is a figure (the 1) which shows the mode of a data transition. 学習データの例を示す図である。It is a figure which shows the example of learning data. データの遷移の様子を示す図（その２）である。It is a figure (the 2) which shows the mode of a data transition. 本発明の第２の実施形態にかかるシステムの構成例を示す図である。It is a figure which shows the structural example of the system concerning the 2nd Embodiment of this invention. 第２の実施形態の処理例を示すフローチャートである。It is a flowchart which shows the process example of 2nd Embodiment.

以下、本発明の好適な実施形態につき説明する。 Hereinafter, preferred embodiments of the present invention will be described.

＜第１の実施形態＞
図１は本発明の第１の実施形態にかかるシステムの構成例を示す図である。 <First Embodiment>
FIG. 1 is a diagram showing a configuration example of a system according to the first embodiment of the present invention.

図１において、インターネット等のネットワーク１には、ユーザが操作するＰＣ（Personal Computer）、携帯電話、ＰＤＡ（Personal Digital
Assistants）等のユーザ端末２が複数接続されている。ユーザ端末２は、一般的なブラウザ（Ｗｅｂブラウザ）２１を備えている。ブラウザ２１は、インターネットの標準プロトコルであるＨＴＴＰ（Hyper Text Transfer Protocol）等に従い、ＨＴＭＬ（Hyper
Text Markup Language）等の言語で記述されたページデータの要求・取得・表示およびフォームデータの送信等を行う機能を有している。 In FIG. 1, a network 1 such as the Internet includes a PC (Personal Computer), a mobile phone, a PDA (Personal Digital) operated by a user.
A plurality of user terminals 2 such as Assistants) are connected. The user terminal 2 includes a general browser (Web browser) 21. The browser 21 is in accordance with HTTP (Hyper Text Transfer Protocol), which is a standard protocol of the Internet, etc.
It has a function of requesting / acquiring / displaying page data written in a language such as Text Markup Language) and transmitting form data.

一方、ネットワーク１には、ユーザの操作するユーザ端末２のブラウザ２１からのアクセスに対してＷｅｂ検索を行って検索結果をユーザ端末２のブラウザ２１に返送する検索装置３が接続されている。 On the other hand, the network 1 is connected to a search device 3 that performs a Web search for access from the browser 21 of the user terminal 2 operated by the user and returns the search result to the browser 21 of the user terminal 2.

検索装置３は、機能部として、クエリセグメント位置学習部３０１とパターン認識部３０９と検索クエリ受付部３１０と検索クエリ修正部３１１と検索部３１２と検索結果応答部３１３とを備えている。クエリセグメント位置学習部３０１は、検索ログ取得部３０２とセパレータ削除部３０３とクエリ分割部３０４と検索部３０５と検索結果解析部３０６とクエリセグメント位置決定部３０７と学習データ生成・学習要求部３０８とを備えている。 The search device 3 includes a query segment position learning unit 301, a pattern recognition unit 309, a search query receiving unit 310, a search query correcting unit 311, a search unit 312 and a search result response unit 313 as functional units. The query segment position learning unit 301 includes a search log acquisition unit 302, a separator deletion unit 303, a query division unit 304, a search unit 305, a search result analysis unit 306, a query segment position determination unit 307, a learning data generation / learning request unit 308, It has.

これらの機能部は、検索装置３を構成するコンピュータのＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only
Memory）、ＲＡＭ（Random Access Memory）等のハードウェア資源上で実行されるコンピュータプログラムによって実現されるものである。これらの機能部は、単一のコンピュータ上に配置される必要はなく、必要に応じて分散される形態であってもよい。 These functional units include a CPU (Central Processing Unit) and a ROM (Read Only) of the computer constituting the search device 3.
The present invention is realized by a computer program executed on hardware resources such as a memory (RAM) and a random access memory (RAM). These functional units do not need to be arranged on a single computer, and may be distributed as necessary.

また、検索装置３が利用するデータベースとして、検索ログ３１４と検索ＤＢ３１５とが設けられている。これらのデータベースは、検索装置３を構成するコンピュータ内のＨＤＤ（Hard Disk Drive）等の記憶媒体上に所定のデータを体系的に保持するものである。なお、検索ログ３１４と検索ＤＢ３１５は検索装置３内に配置される必要はなく、他の装置上に配置してもよい。 A search log 314 and a search DB 315 are provided as databases used by the search device 3. These databases systematically hold predetermined data on a storage medium such as an HDD (Hard Disk Drive) in a computer constituting the search device 3. Note that the search log 314 and the search DB 315 need not be arranged in the search device 3 and may be arranged on another device.

図２は検索ログ３１４のデータ構造例を示す図である。検索ログ３１４は、検索ログ生データと検索ログ集計データとを含んでいる。検索ログ生データは、「検索日時」「検索クエリ」等の項目を含んでいる。「検索日時」は検索が実行された日時である。「検索クエリ」は検索に用いられた検索式である。検索ログ集計データは、「検索クエリ」「検索回数（頻度）」等の項目を含んでいる。「検索クエリ」は、検索に用いられた検索式である。「検索回数（頻度）」は、同じ検索クエリについて検索された回数もしくは頻度である。 FIG. 2 is a diagram showing an example of the data structure of the search log 314. The search log 314 includes raw search log data and search log total data. The search log raw data includes items such as “search date and time” and “search query”. “Search date and time” is the date and time when the search was executed. “Search query” is a search expression used for the search. The search log total data includes items such as “search query” and “number of searches (frequency)”. “Search query” is a search expression used for the search. “Number of searches (frequency)” is the number or frequency of searches for the same search query.

図３は検索ＤＢ３１５のデータ構造例を示す図である。検索ＤＢ３１５は、「検索キーワード」「ページＵＲＬ」「タイトル」「要約」等の項目を含んでいる。「検索キーワード」は、検索に用いた文字列である。「ページＵＲＬ」は、検索キーワードが含まれるページのアドレスである。「タイトル」は、ページのタイトルである。「要約」は、ページの内容の一部もしくは全部の文字列である。 FIG. 3 is a diagram illustrating an example of the data structure of the search DB 315. The search DB 315 includes items such as “search keyword”, “page URL”, “title”, and “summary”. “Search keyword” is a character string used for the search. “Page URL” is an address of a page including a search keyword. “Title” is the title of the page. “Summary” is a character string of a part or all of the contents of the page.

図１に戻り、検索装置３のクエリセグメント位置学習部３０１は、バッチ処理として動作し、検索ログ３１４に基づいて検索ＤＢ３１５を検索し、適正なクエリセグメント位置を決定して、ＳＶＭ（Support Vector Machine）等のパターン認識部３０９を学習させる機能を有している。 Returning to FIG. 1, the query segment position learning unit 301 of the search device 3 operates as a batch process, searches the search DB 315 based on the search log 314, determines an appropriate query segment position, and supports an SVM (Support Vector Machine). ) And the like to have the pattern recognition unit 309 learn.

検索ログ取得部３０２は、検索ログ３１４の検索ログ生データから検索クエリを取得する機能を有している。 The search log acquisition unit 302 has a function of acquiring a search query from raw search log data of the search log 314.

セパレータ削除部３０３は、検索ログ取得部３０２の取得した検索クエリからスペース等のセパレータを削除する機能を有している。 The separator deletion unit 303 has a function of deleting a separator such as a space from the search query acquired by the search log acquisition unit 302.

クエリ分割部３０４は、セパレータ削除部３０３でセパレータが削除された検索クエリを意味のある範囲で複数の文字列に分割する機能を有している。 The query dividing unit 304 has a function of dividing the search query from which the separator has been deleted by the separator deleting unit 303 into a plurality of character strings within a meaningful range.

検索部３０５は、クエリ分割部３０４で分割された複数の文字列に基づいて検索ＤＢ３１５を検索する機能を有している。 The search unit 305 has a function of searching the search DB 315 based on a plurality of character strings divided by the query division unit 304.

検索結果解析部３０６は、検索部３０５の検索で得られた検索結果を解析し、クエリ分割部３０４で分割された複数の文字列のそれぞれが隣接する文字列と接続された状態でページ上に存在（連接共起）する頻度を表わすスコアを算出する機能を有している。 The search result analysis unit 306 analyzes the search result obtained by the search of the search unit 305, and each of the plurality of character strings divided by the query dividing unit 304 is connected to the adjacent character string on the page. It has a function of calculating a score representing the frequency of existence (joint co-occurrence).

クエリセグメント位置決定部３０７は、検索結果解析部３０６の算出したスコアに基づいて適正なクエリセグメント位置を決定する機能を有している。 The query segment position determination unit 307 has a function of determining an appropriate query segment position based on the score calculated by the search result analysis unit 306.

学習データ生成・学習要求部３０８は、クエリセグメント位置決定部３０７で決定されたクエリセグメント位置の特徴を示す学習データを生成し、その学習データに基づいてパターン認識部３０９に学習を行うことを要求する機能を有している。 The learning data generation / learning request unit 308 generates learning data indicating the characteristics of the query segment position determined by the query segment position determination unit 307 and requests the pattern recognition unit 309 to perform learning based on the learning data. It has a function to do.

パターン認識部３０９は、周知のＳＶＭ等のメカニズムを有し、特徴ベクトルおよび教師信号に基づいて学習を行い、パターン認識時には、入力された特徴ベクトルがいずれのクラス（本例では、検索クエリとしてクエリセグメント位置が適正か否か）に属するかを示す認識結果を出力する機能を有している。 The pattern recognition unit 309 has a well-known mechanism such as SVM, and performs learning based on the feature vector and the teacher signal. At the time of pattern recognition, the input feature vector is assigned to any class (in this example, a query as a search query). It has a function of outputting a recognition result indicating whether the segment position belongs or not.

検索クエリ受付部３１０は、ユーザ端末２のブラウザ２１からの検索要求時に検索クエリを受け付ける機能を有している。 The search query receiving unit 310 has a function of receiving a search query when a search request is issued from the browser 21 of the user terminal 2.

検索クエリ修正部３１１は、検索クエリ受付部３１０の受け付けた検索クエリのクエリセグメント位置が適正か否かパターン認識部３０９によって、全ての文字列間を順番にクエリセグメント位置をずらすことで検索クエリを修正する機能を有している。 The search query correcting unit 311 determines whether the query segment position of the search query received by the search query receiving unit 310 is appropriate or not by shifting the query segment position in order between all character strings by the pattern recognition unit 309. Has a function to correct.

検索部３１２は、検索クエリ修正部３１１を経由した検索クエリ（必要に応じて修正された検索クエリ）に基づいて検索ＤＢ３１５を検索する機能を有している。 The search unit 312 has a function of searching the search DB 315 based on a search query (a search query corrected as necessary) via the search query correction unit 311.

検索結果応答部３１３は、検索部３１２の検索結果を要求元のユーザ端末２のブラウザ２１に返送する機能を有している。 The search result response unit 313 has a function of returning the search result of the search unit 312 to the browser 21 of the requesting user terminal 2.

図４は検索装置３のハードウェア構成例を示す図である。 FIG. 4 is a diagram illustrating a hardware configuration example of the search device 3.

図４において、検索装置３は、システムバス３１に接続されたＣＰＵ３２、ＲＯＭ３３、ＲＡＭ３４、ＮＶＲＡＭ（Non-Volatile Random Access Memory）３５、Ｉ／Ｆ（Interface）３６と、Ｉ／Ｆ３６に接続された、キーボード、マウス、モニタ、ＣＤ／ＤＶＤ（Compact Disk/Digital Versatile Disk）ドライブ等のＩ／Ｏ（Input/Output Device）３７、ＨＤＤ（Hard Disk Drive）３８、ＮＩＣ（Network Interface Card）３９等を備えている。Ｍはプログラムもしくはデータが格納されたＣＤ／ＤＶＤ等のメディア（記録媒体）である。 In FIG. 4, the search device 3 is connected to the CPU 32, ROM 33, RAM 34, NVRAM (Non-Volatile Random Access Memory) 35, I / F (Interface) 36, and I / F 36 connected to the system bus 31. A keyboard, mouse, monitor, CD / DVD (Compact Disk / Digital Versatile Disk) drive I / O (Input / Output Device) 37, HDD (Hard Disk Drive) 38, NIC (Network Interface Card) 39, etc. Yes. M is a medium (recording medium) such as a CD / DVD in which a program or data is stored.

図５は第１の実施形態の処理例を示すフローチャートである。 FIG. 5 is a flowchart illustrating an example of processing according to the first embodiment.

図５（ａ）において、処理を開始すると（ステップＳ１０１）、検索装置３のクエリセグメント位置学習部３０１の検索ログ取得部３０２は、検索ログ３１４の検索ログ生データから検索クエリを取得する（ステップＳ１０２）。図６（ａ）は取得した検索クエリの例を示している。 5A, when the process is started (step S101), the search log acquisition unit 302 of the query segment position learning unit 301 of the search device 3 acquires a search query from the search log raw data of the search log 314 (step S101). S102). FIG. 6A shows an example of the acquired search query.

次いで、図５（ａ）に戻り、クエリセグメント位置学習部３０１のセパレータ削除部３０３は、検索ログ取得部３０２の取得した検索クエリからスペース等のセパレータを削除する（ステップＳ１０３）。図６（ｂ）は図６（ａ）の検索クエリからセパレータを削除した状態を示している。 5A, the separator deletion unit 303 of the query segment position learning unit 301 deletes a separator such as a space from the search query acquired by the search log acquisition unit 302 (step S103). FIG. 6B shows a state where the separator is deleted from the search query of FIG.

次いで、図５（ａ）に戻り、クエリセグメント位置学習部３０１のクエリ分割部３０４は、セパレータを削除した検索クエリを意味のある範囲で複数の文字列に分割する（ステップＳ１０４）。検索クエリの分割手法としては、例えば、次の３つの手法がある。 Next, returning to FIG. 5A, the query dividing unit 304 of the query segment position learning unit 301 divides the search query from which the separator is deleted into a plurality of character strings within a meaningful range (step S104). As a search query dividing method, for example, there are the following three methods.

（１）形態素解析により分割する手法：この手法では、一定の精度で分割が可能であるメリットがある反面、辞書に登録されていない未知語に弱いというデメリットがある。 (1) Method of dividing by morphological analysis: This method has a merit that it can be divided with a certain accuracy, but has a demerit that it is weak against unknown words that are not registered in the dictionary.

（２）検索ログから最も分割数の多いパターンを選択する手法：すなわち、検索ログ３１４からスペース等のセパレータを削除した状態で同一となる検索クエリを全て抽出し、その中で最も分割数の多いものを選択する。また、相対的に検索回数（頻度）の低い検索クエリや、文字列の連接確率モデルより顕著に確率の低い検索クエリは棄却する。この手法では、形態素解析辞書にあまり入っていないようなテレビゲーム名やアイドルグループ名など、具体的な固有名詞についてもデータを有する可能性が高いのでこれらの分割に強いというメリットがある反面、棄却する基準となる頻度が明確でないというデメリットがある。 (2) A method of selecting a pattern with the largest number of divisions from the search log: In other words, all the same search queries are extracted from the search log 314 with a separator such as a space removed, and among them, the number of divisions is the largest. Choose one. Also, search queries with a relatively low search frequency (frequency) and search queries with a significantly lower probability than the connection probability model of character strings are rejected. This technique has the merit of being strong in segmentation because it is highly likely to have data for specific proper nouns such as video game names and idol group names that are not so much in the morphological analysis dictionary, but rejected. There is a demerit that the frequency that becomes the standard to do is not clear.

（３）検索ログから計算した文字列の連接確率モデルで分割する手法：この手法では、未知語にも対応できるメリットがある反面、意味のない文字列が含まれるというデメリットがある。 (3) A method of dividing by a character string concatenation probability model calculated from a search log: This method has a merit of dealing with unknown words, but has a demerit of including meaningless character strings.

上記の各手法を採用する選択基準としては、素早く実装し、適当な精度を得たい場合は（１）を、未知語の分割も行ない、更に検索精度も考慮したい場合は（２）か（３）がよい。（２）は最適な分割のパターンを選択するだけなので実装が容易である。（３）は文字列間毎に分割するかどうかを解析する必要がありコストの高い実装である。（２）と（３）は精度で大きな差は少ないが、（３）の方が細かい単位で文字列が分割されるため、より精度改善が期待される。 The selection criteria for adopting each of the above methods are (1) when it is quickly implemented and appropriate accuracy is obtained, and (2) or (3) when unknown words are divided and search accuracy is considered. ) Is good. (2) is easy to implement because it only selects an optimal division pattern. (3) is a costly implementation because it is necessary to analyze whether to divide between character strings. Although (2) and (3) are small in accuracy, there is little difference in accuracy, but in (3), the character string is divided in finer units, so that further improvement in accuracy is expected.

図６（ｃ）は図６（ｂ）の状態から分割を行った状態を示している。 FIG. 6C shows a state where the division is performed from the state of FIG.

次いで、図５（ａ）に戻り、クエリセグメント位置学習部３０１の検索部３０５は、クエリ分割部３０４で分割された検索クエリの個々の文字列に基づいて検索ＤＢ３１５を検索する（ステップＳ１０５）。 Next, returning to FIG. 5A, the search unit 305 of the query segment position learning unit 301 searches the search DB 315 based on the individual character strings of the search query divided by the query division unit 304 (step S105).

次いで、クエリセグメント位置学習部３０１の検索結果解析部３０６は、検索部３０５の検索で得られた検索結果を解析し、個々の文字列のヒット件数の他に、検索結果として得られた要約（スニペット）おいて、分割された複数の文字列のそれぞれが隣接する文字列と接続された状態でページ上に存在する連接共起数をカウントし、連接共起頻度を表わすスコアを算出する（ステップＳ１０６）。スコアとしては、シンプソン係数や相互情報量が用いられる。 Next, the search result analysis unit 306 of the query segment position learning unit 301 analyzes the search result obtained by the search of the search unit 305 and, in addition to the number of hits of each character string, a summary obtained as a search result ( In the snippet), the number of concatenated co-occurrence existing on the page in a state where each of the divided character strings is connected to the adjacent character string is counted, and a score representing the concatenated co-occurrence frequency is calculated (step) S106). As the score, a Simpson coefficient or mutual information is used.

シンプソン係数によるスコア
＝(word1とword2の連接状態でのヒット数)／min(word1のヒット数，word2のヒット数)
相互情報量によるスコア
＝log(P(word1,word2))/(P(word1)P(word2))
ここで、minは括弧内の数値のうち小さい方を表わす。また、P(word1,word2)はword1とword2の同時分布関数、P(word1)はword1の周辺確率分布関数、P(word2)はword2の周辺確率分布関数である。 Simpson coefficient score = (number of hits when word1 and word2 are connected) / min (number of word1 hits, number of word2 hits)
Score based on mutual information = log (P (word1, word2)) / (P (word1) P (word2))
Here, min represents the smaller one of the numerical values in parentheses. P (word1, word2) is a simultaneous distribution function of word1 and word2, P (word1) is a marginal probability distribution function of word1, and P (word2) is a marginal probability distribution function of word2.

なお、検索結果解析部３０６で要約（スニペット）から連接共起数をカウントすることに代え、検索部３０５で隣接する文字列を接続した状態で検索ＤＢ３１５の検索を行ってもよい。 Instead of counting the number of connected co-occurrence from the summary (snippet) by the search result analysis unit 306, the search DB 315 may be searched with the search unit 305 connecting adjacent character strings.

次いで、クエリセグメント位置学習部３０１のクエリセグメント位置決定部３０７は、検索結果解析部３０６の算出したスコアに基づいて適正なクエリセグメント位置を決定する（ステップＳ１０７）。図６（ｄ）では「まんが□立ち」のスコアが、設定した閾値のスコアより低いとすると、図６（ｅ）のように「まんが」と「立ち」の間を適正なクエリセグメント位置と決定する。 Next, the query segment position determination unit 307 of the query segment position learning unit 301 determines an appropriate query segment position based on the score calculated by the search result analysis unit 306 (step S107). In FIG. 6D, if the score of “Manga □ Standing” is lower than the score of the set threshold value, an appropriate query segment position is determined between “Manga” and “Standing” as shown in FIG. To do.

次いで、図５（ａ）に戻り、クエリセグメント位置学習部３０１の学習データ生成・学習要求部３０８は、クエリセグメント位置決定部３０７で決定されたクエリセグメント位置の特徴を示す学習データを生成し、その学習データに基づいてパターン認識部３０９に学習を行うことを要求する（ステップＳ１０８）。 5A, the learning data generation / learning request unit 308 of the query segment position learning unit 301 generates learning data indicating the characteristics of the query segment position determined by the query segment position determination unit 307. Based on the learning data, the pattern recognition unit 309 is requested to perform learning (step S108).

図７は「まんが□立ち読み」の学習データの例を示しており、窓幅を「３」とした例である。すなわち、学習データの各行は、「評価値：文字列特徴」という形式をとっている。「1-gram」「2-gram」「3-gram」はN-gramのタイプを表わすとともに、評価値として例えば「１」を示している。「Qcount」は検索クエリの検索回数（頻度）を示し、「Wcount」はＷｅｂ検索のヒット件数を示している。「L_All/」はクエリセグメント位置より左の全文字列を示し、「R_All/」はクエリセグメント位置より右の全文字列を示している。「I*/」は注目しているクエリセグメント位置を跨いでる文字列を示している。 FIG. 7 shows an example of learning data of “Manga □ Browsing”, in which the window width is “3”. That is, each line of the learning data takes the form of “evaluation value: character string feature”. “1-gram”, “2-gram”, and “3-gram” represent N-gram types and, for example, “1” as an evaluation value. “Qcount” indicates the number of searches (frequency) of the search query, and “Wcount” indicates the number of hits in the Web search. “L_All /” indicates all character strings to the left of the query segment position, and “R_All /” indicates all character strings to the right of the query segment position. “I * /” indicates a character string straddling the query segment position of interest.

次いで、図５（ａ）に戻り、処理を終了する（ステップＳ１０９）。 Next, returning to FIG. 5A, the process is terminated (step S109).

次に、上述した学習の行われたパターン認識部３０９を用いた実際の検索処理について説明する。 Next, an actual search process using the above-described learned pattern recognition unit 309 will be described.

図５（ｂ）において、処理を開始すると（ステップＳ１１１）、検索装置３の検索クエリ受付部３１０は、ユーザ端末２のブラウザ２１からの検索要求時に検索クエリを受け付ける（ステップＳ１１２）。 In FIG.5 (b), if a process is started (step S111), the search query reception part 310 of the search device 3 will receive a search query at the time of the search request from the browser 21 of the user terminal 2 (step S112).

次いで、検索装置３の検索クエリ修正部３１１は、検索クエリ受付部３１０の受け付けた検索クエリのクエリセグメント位置が適正か否かパターン認識部３０９によって、全ての文字列間を順番にクエリセグメント位置をずらすことで検索クエリを修正する（ステップＳ１１３）。予めクエリセグメント位置を変えた候補を作成し、いずれが適正かをパターン認識により判断するようにしてもよい。 Next, the search query correction unit 311 of the search device 3 determines whether the query segment position of the search query received by the search query reception unit 310 is appropriate or not by using the pattern recognition unit 309 to sequentially set the query segment position between all character strings. The search query is corrected by shifting (step S113). Candidates whose query segment positions have been changed in advance are created, and which is appropriate may be determined by pattern recognition.

図８は受け付けた検索クエリからクエリセグメント位置を修正する処理例を示したものであり、受け付けた検索クエリの特徴を示す「1-gram:〜」「2-gram:〜」「3-gram:〜」を各文字列間毎にクエリセグメント位置をずらしながら生成し、パターン認識部３０９に入力する。この場合のクエリセグメント位置は適正でない箇所があるため、適正でない箇所でパターン認識の結果は不適正となる。そして、パターン認識の結果が適正を示す位置にクエリセグメント位置を決定し、検索クエリを修正する。 FIG. 8 shows an example of processing for correcting the query segment position from the accepted search query. “1-gram: ˜”, “2-gram: ˜”, “3-gram:” showing the characteristics of the accepted search query. Are generated while shifting the query segment position for each character string and input to the pattern recognition unit 309. In this case, since the query segment position is not appropriate, the pattern recognition result is inappropriate at an inappropriate place. Then, the query segment position is determined at a position where the pattern recognition result indicates appropriateness, and the search query is corrected.

次いで、図５（ｂ）に戻り、検索装置３の検索部３１２は修正後（適正であるため修正されない場合もある）の検索クエリで検索ＤＢ３１５を検索し（ステップＳ１１４）、検索装置３の検索結果応答部３１３は、検索部３１２の検索結果を要求元のユーザ端末２のブラウザ２１に返送する（ステップＳ１１５）。そして、処理を終了する（ステップＳ１１６）。 Next, returning to FIG. 5B, the search unit 312 of the search device 3 searches the search DB 315 with the search query after correction (may not be corrected because it is appropriate) (step S <b> 114). The result response unit 313 returns the search result of the search unit 312 to the browser 21 of the requesting user terminal 2 (step S115). Then, the process ends (step S116).

＜第２の実施形態＞
図９は本発明の第２の実施形態にかかるシステムの構成例を示す図である。前述した第１の実施形態ではバッチ処理によりクエリセグメント位置の学習を行い、学習結果に基づいて受け付けた検索クエリを修正する場合について説明したが、この第２の実施形態では、受け付けた検索クエリを逐次に修正するようにしている。 <Second Embodiment>
FIG. 9 is a diagram showing a configuration example of a system according to the second embodiment of the present invention. In the first embodiment described above, the query segment position is learned by batch processing and the received search query is corrected based on the learning result. In the second embodiment, the received search query is I am trying to correct it sequentially.

図９において、検索装置３は、検索クエリ受付部３１０とセパレータ削除部３０３とクエリ分割部３０４と検索部３０５と検索結果解析部３０６とクエリセグメント位置決定部３０７と検索クエリ修正部３１１と検索部３１２と検索結果応答部３１３と検索ＤＢ３１５とを備えている。図１とは各部の配置が若干異なっているが、同じ名称で同じ符号を付した機能部はほぼ同様の機能を有している。 9, the search device 3 includes a search query receiving unit 310, a separator deleting unit 303, a query dividing unit 304, a search unit 305, a search result analyzing unit 306, a query segment position determining unit 307, a search query correcting unit 311, and a search unit. 312, a search result response unit 313, and a search DB 315. Although the arrangement of each part is slightly different from that in FIG. 1, functional parts having the same names and the same reference numerals have substantially the same functions.

図１０は第２の実施形態の処理例を示すフローチャートである。 FIG. 10 is a flowchart illustrating an example of processing according to the second embodiment.

図１０において、処理を開始すると（ステップＳ２０１）、検索装置３の検索クエリ受付部３１０は、ユーザ端末２のブラウザ２１からの検索要求時に検索クエリを受け付ける（ステップＳ２０２）。 In FIG. 10, when processing is started (step S201), the search query receiving unit 310 of the search device 3 receives a search query when a search request is issued from the browser 21 of the user terminal 2 (step S202).

次いで、検索装置３のセパレータ削除部３０３は、検索クエリ受付部３１０の受け付けた検索クエリからスペース等のセパレータを削除する（ステップＳ２０３）。 Next, the separator deleting unit 303 of the search device 3 deletes a separator such as a space from the search query received by the search query receiving unit 310 (step S203).

次いで、検索装置３のクエリ分割部３０４は、セパレータを削除した検索クエリを意味のある範囲で複数の文字列に分割する（ステップＳ２０４）。 Next, the query dividing unit 304 of the search device 3 divides the search query from which the separator has been deleted into a plurality of character strings within a meaningful range (step S204).

次いで、検索装置３の検索部３０５は、クエリ分割部３０４で分割された検索クエリの個々の文字列に基づいて検索ＤＢ３１５を検索する（ステップＳ２０５）。 Next, the search unit 305 of the search device 3 searches the search DB 315 based on the individual character strings of the search query divided by the query division unit 304 (step S205).

次いで、検索装置３の検索結果解析部３０６は、検索部３０５の検索で得られた検索結果を解析し、個々の文字列のヒット件数の他に、検索結果として得られた要約（スニペット）おいて、分割された複数の文字列のそれぞれが隣接する文字列と接続された状態でページ上に存在する連接共起数をカウントし、連接共起頻度を表わすスコアを算出する（ステップＳ２０６）。なお、検索結果解析部３０６で要約（スニペット）から連接共起数をカウントすることに代え、検索部３０５で隣接する文字列を接続した状態で検索ＤＢ３１５の検索を行ってもよい。 Next, the search result analysis unit 306 of the search device 3 analyzes the search result obtained by the search of the search unit 305, and in addition to the number of hits of each character string, the summary (snippet) obtained as the search result Then, the number of concatenated co-occurrence existing on the page in a state where each of the divided character strings is connected to the adjacent character string is counted, and a score representing the concatenated co-occurrence frequency is calculated (step S206). Instead of counting the number of connected co-occurrence from the summary (snippet) by the search result analysis unit 306, the search DB 315 may be searched with the search unit 305 connecting adjacent character strings.

次いで、検索装置３のクエリセグメント位置決定部３０７は、検索結果解析部３０６の算出したスコアに基づいて適正なクエリセグメント位置を決定する（ステップＳ２０７）。 Next, the query segment position determination unit 307 of the search device 3 determines an appropriate query segment position based on the score calculated by the search result analysis unit 306 (step S207).

次いで、検索装置３の検索クエリ修正部３１１は、クエリセグメント位置決定部３０７で決定されたクエリセグメント位置に基づいて検索クエリを修正する（ステップＳ２０８）。 Next, the search query correction unit 311 of the search device 3 corrects the search query based on the query segment position determined by the query segment position determination unit 307 (step S208).

次いで、検索装置３の検索部３１２は、修正後（適正であるため修正されない場合もある）の検索クエリで検索ＤＢ３１５を検索し（ステップＳ２０９）、検索結果応答部３１３は、検索部３１２の検索結果を要求元のユーザ端末２のブラウザ２１に返送する（ステップＳ２１０）。そして、処理を終了する（ステップＳ２１１）。 Next, the search unit 312 of the search device 3 searches the search DB 315 with the search query after correction (it may not be corrected because it is appropriate) (step S209), and the search result response unit 313 searches the search unit 312. The result is returned to the browser 21 of the requesting user terminal 2 (step S210). Then, the process ends (step S211).

＜総括＞
以上説明したように、本実施形態によれば、ユーザにより入力された検索クエリのクエリセグメント位置を適正な位置に修正することで、検索精度を高めることができる。 <Summary>
As described above, according to the present embodiment, the search accuracy can be improved by correcting the query segment position of the search query input by the user to an appropriate position.

以上、本発明の好適な実施の形態により本発明を説明した。ここでは特定の具体例を示して本発明を説明したが、特許請求の範囲に定義された本発明の広範な趣旨および範囲から逸脱することなく、これら具体例に様々な修正および変更を加えることができることは明らかである。すなわち、具体例の詳細および添付の図面により本発明が限定されるものと解釈してはならない。 The present invention has been described above by the preferred embodiments of the present invention. While the invention has been described with reference to specific embodiments, various modifications and changes may be made to the embodiments without departing from the broad spirit and scope of the invention as defined in the claims. Obviously you can. In other words, the present invention should not be construed as being limited by the details of the specific examples and the accompanying drawings.

１ネットワーク
２ユーザ端末
２１ブラウザ
３検索装置
３０１クエリセグメント位置学習部
３０２検索ログ取得部
３０３セパレータ削除部
３０４クエリ分割部
３０５検索部
３０６検索結果解析部
３０７クエリセグメント位置決定部
３０８学習データ生成・学習要求部
３０９パターン認識部
３１０検索クエリ受付部
３１１検索クエリ修正部
３１２検索部
３１３検索結果応答部
３１４検索ログ
３１５検索ＤＢ
DESCRIPTION OF SYMBOLS 1 Network 2 User terminal 21 Browser 3 Search apparatus 301 Query segment position learning part 302 Search log acquisition part 303 Separator deletion part 304 Query division part 305 Search part 306 Search result analysis part 307 Query segment position determination part 308 Learning data generation / learning request Unit 309 pattern recognition unit 310 search query reception unit 311 search query correction unit 312 search unit 313 search result response unit 314 search log 315 search DB

Claims

A search log acquisition means for acquiring a search log;
Separator deletion means for deleting the separator from the search query of the acquired search log,
Query splitting means for splitting the search query with the separator removed into multiple strings,
First search means for searching a search database based on the divided character strings;
An analysis means for calculating, as a score, the frequency at which one divided character string is connected and co-occurring with an adjacent character string, based on the search results;
A query segment position determining device comprising: query segment position determining means for determining a query segment position based on the calculated score.

The query segment position determination apparatus according to claim 1, wherein
The query segment position determining device divides a search query into a plurality of character strings by morphological analysis.

The query segment position determination apparatus according to claim 1, wherein
The query segment position determining device divides a search query into a plurality of character strings by selecting a pattern having the largest number of divisions from a search log.

The query segment position determination apparatus according to claim 1, wherein
The query segment position determining device divides a search query into a plurality of character strings by generating a plurality of character string combination models from a search log.

A search log acquisition means for acquiring a search log;
Separator deletion means for deleting the separator from the search query of the acquired search log,
Query splitting means for splitting the search query with the separator removed into multiple strings,
First search means for searching a search database based on the divided character strings;
An analysis means for calculating, as a score, the frequency at which one divided character string is connected and co-occurring with an adjacent character string, based on the search results;
Query segment position determining means for determining a query segment position based on the calculated score;
Learning means for learning the pattern recognition unit based on the determined query segment position;
Query correction means for determining whether or not a query segment position is appropriate by the pattern recognition unit for a search query input by a user, and correcting the query query position to an appropriate query segment position;
A search device comprising: second search means for searching the search database based on a corrected search query.

A receiving means for receiving a search query from a user;
A separator deleting means for deleting the separator from the received search query;
Query splitting means for splitting the search query with the separator removed into multiple strings,
First search means for searching a search database based on the divided character strings;
An analysis means for calculating, as a score, the frequency at which one divided character string is connected and co-occurring with an adjacent character string, based on the search results;
Query segment position determining means for determining a query segment position based on the calculated score;
Query modifying means for modifying the search query based on the determined query segment position determining means;
A search device comprising: second search means for searching the search database based on a corrected search query.

A search log acquisition step in which the control unit of the search device acquires the search log;
The control unit deletes a separator from the search query of the acquired search log;
A query dividing step for dividing the search query from which the separator is deleted into a plurality of character strings;
A first search step in which the control unit searches a search database based on the divided character string;
An analysis step in which the control unit calculates, as a score, the frequency at which one divided character string is connected and co-occurs with an adjacent character string, based on a search result;
A query segment position determining step in which the control unit determines a query segment position based on the calculated score;
A learning step in which the control unit learns the pattern recognition unit based on the determined query segment position;
A query correcting step in which the control unit determines whether or not a query segment position is appropriate by the pattern recognition unit for a search query inputted by a user, and corrects the query query to an appropriate query segment position;
A search control method comprising: a second search step in which the control unit searches the search database based on a corrected search query.

The control unit of the search device accepts a search query from the user;
A separator deletion step in which the control unit deletes the separator from the accepted search query;
A query dividing step for dividing the search query from which the separator is deleted into a plurality of character strings;
A first search step in which the control unit searches a search database based on the divided character string;
An analysis step in which the control unit calculates, as a score, the frequency at which one divided character string is connected and co-occurs with an adjacent character string, based on a search result;
A query segment position determining step in which the control unit determines a query segment position based on the calculated score;
A query correcting step for correcting the search query based on the determined query segment position determining means;
A search control method comprising: a second search step in which the control unit searches the search database based on a corrected search query.

The computers that make up the search device
Search log acquisition means for acquiring search logs,
Separator deletion means for deleting the separator from the search query of the acquired search log,
Query splitting means that splits the search query with the separator removed into multiple strings,
First search means for searching a search database based on the divided character string;
An analysis means for calculating, as a score, a frequency at which one divided character string is connected and co-occurring with an adjacent character string, based on the search result;
Query segment position determining means for determining a query segment position based on the calculated score;
Learning means for learning the pattern recognition unit based on the determined query segment position;
Query correction means for determining whether or not a query segment position is appropriate by the pattern recognition unit for a search query input by a user, and correcting the query segment position to an appropriate query segment position;
A search control program that functions as second search means for searching the search database based on a corrected search query.

The computers that make up the search device
Accepting means for receiving search queries from users,
Separator deletion means for deleting separators from received search queries,
Query splitting means that splits the search query with the separator removed into multiple strings,
First search means for searching a search database based on the divided character string;
An analysis means for calculating, as a score, a frequency at which one divided character string is connected and co-occurring with an adjacent character string, based on the search result;
Query segment position determining means for determining a query segment position based on the calculated score;
Query correcting means for correcting the search query based on the determined query segment position determining means;
A search control program that functions as second search means for searching the search database based on a corrected search query.