JP2002342345A

JP2002342345A - Keyword classifying method and terminal equipment

Info

Publication number: JP2002342345A
Application number: JP2001152020A
Authority: JP
Inventors: Takahiro Komoriya; 貴広小森谷; Masahiro Yamauchi; 昌浩山内
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2001-05-22
Filing date: 2001-05-22
Publication date: 2002-11-29

Abstract

PROBLEM TO BE SOLVED: To provide a terminal equipment, with which the storage areas in a dictionary for classification are reduced and the maintenance of the dictionary for classification is not required. SOLUTION: The terminal equipment 30 sends a keyword to a search server 10 (search engine) and requests the search of a Web site corresponding to the keyword. An analytic part 33 receives the list of Web sites corresponding to the keyword from the search server 10, performs the analysis of required data and extracts category candidates. On the basis of predetermined rules, a judging part 34 selects a category out of the category candidates extracted by the analytic part 33.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、キーワード分類方
法および端末装置に関し、特に、ＷＷＷ（Ｗｏｒｌｄ
ＷｉｄｅＷｅｂ）上にある検索サイトを利用して、キ
ーワードに対するカテゴリを求めるキーワード分類方法
および同方法を用いた端末装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a keyword classification method and a terminal device, and more particularly, to a WWW (World).
The present invention relates to a keyword classification method for obtaining a category for a keyword using a search site on Wide Web, and a terminal device using the method.

【０００２】[0002]

【従来の技術】従来、データの分類システムとして、各
種のものが存在する。たとえば、特開平６−３４８７５
５号公報に開示されている文書分類システムには、各カ
テゴリの標本文書内の単語を検出し、唯一のカテゴリの
みに出現した単語をそのカテゴリのキーワードとして分
類用辞書に登録する。文書を分類する際には、このよう
にして作成された分類用辞書を用い、分類対象文書に含
まれる単語と、分類用辞書に登録済みのキーワードとの
一致数を検出し、一致数が最も多かったカテゴリを、分
類対象文書の分類先としている。これにより、文書を分
類するための分類用の情報を人手により定義する必要が
なくなる。2. Description of the Related Art Conventionally, various types of data classification systems exist. For example, JP-A-6-34875
The document classification system disclosed in Japanese Patent Publication No. 5 detects words in a sample document of each category and registers words appearing in only one category in the classification dictionary as keywords of that category. When classifying a document, the classification dictionary created in this way is used to detect the number of matches between the words included in the classification target document and the keywords registered in the classification dictionary. Many categories are set as the classification destinations of the classification target documents. This eliminates the need for manually defining classification information for classifying documents.

【０００３】特開２０００−１１０４４公報に開示され
ている商品販売情報分析システムでは、分析用商品の分
類コードに対する登録を処理し、商品の販売実績の情報
傾向分析を処理する。次に、販売実績の情報傾向分析処
理で指定した分析条件設定値から分析用商品の分類コー
ドを登録して、その商品販売実績情報から検出した規則
性に基づいて分析用商品分類マスタデータベースにおけ
るデータ構築が行なわれる。In the merchandise sales information analysis system disclosed in Japanese Patent Application Laid-Open No. 2000-11044, registration of a classification code of a merchandise for analysis is processed, and information trend analysis of sales performance of the merchandise is performed. Next, the classification code of the product for analysis is registered from the analysis condition set value specified in the information trend analysis process of sales performance, and the data in the product classification master database for analysis is registered based on the regularity detected from the product sales performance information. Construction is performed.

【０００４】[0004]

【発明が解決しようとする課題】しかし、特開平６−３
４８７５５号公報記載の文書分類システムでは、データ
を分類するための分類用辞書を自動作成するにあたっ
て、ユーザが前もって文書の分類を作成し、分類毎に標
本文書をデータベースに蓄積しておく必要がある。この
ため、依然人手を必要としている。また、分類用辞書を
システム上に保有することから、余分な記憶頭域が必要
である。However, Japanese Patent Laid-Open No. 6-3 / 1994
In the document classification system described in Japanese Patent No. 48755, when automatically creating a classification dictionary for classifying data, it is necessary for a user to create document classifications in advance and store sample documents in a database for each classification. . For this reason, human labor is still required. Further, since a classification dictionary is stored in the system, an extra storage area is required.

【０００５】一方、特開２０００−１１０４４公報記載
の商品販売情報分析システムでは、分析用商品分類マス
タデータベースはサーバ上に構築される。このため、サ
ーバのシステム構成が大きくなる。また、ワークステー
ションからの分析要求の都度、分析用商品分類マスタデ
ータベースを更新しなければならず、サーバへの負荷が
大きくなる。On the other hand, in the merchandise sales information analysis system described in JP-A-2000-11044, an analysis merchandise classification master database is constructed on a server. Therefore, the system configuration of the server becomes large. In addition, the analysis product classification master database must be updated every time an analysis request is made from the workstation, which increases the load on the server.

【０００６】本発明は、そのような状況に鑑みてなされ
たもので、その目的は、分類用辞書の記憶領域を削減し
たキーワード分類方法および端末装置を提供することで
ある。The present invention has been made in view of such a situation, and an object of the present invention is to provide a keyword classification method and a terminal device in which the storage area of a classification dictionary is reduced.

【０００７】本発明の他の目的は、分類用辞書のメンテ
ナンスが不要なキーワード分類方法および端末装置を提
供することである。It is another object of the present invention to provide a keyword classification method and a terminal device which do not require maintenance of a classification dictionary.

【０００８】[0008]

【課題を解決するための手段】本発明のある局面に従う
キーワード分類方法は、ネットワークに接続される端末
装置で用いられる。キーワード分類方法は、検索条件を
指定し、検索条件の属する階層化されたカテゴリおよび
検索条件に合致するホームページのリンク先をホームペ
ージとして提供する検索エンジンを参照するステップ
と、検索エンジンからホームページを構成するテキスト
データを取得するステップと、取得したテキストデータ
を分析して、カテゴリ候補を抽出するステップと、所定
のルールに基づいて、抽出されたカテゴリ候補から検索
条件に合致するカテゴリを決定するステップとを含む。A keyword classification method according to an aspect of the present invention is used in a terminal device connected to a network. The keyword classification method includes a step of designating a search condition, a step of referring to a search engine that provides, as a home page, a hierarchical category to which the search condition belongs and a link destination of a home page that matches the search condition, and forming the home page from the search engine Obtaining text data, analyzing the obtained text data to extract category candidates, and determining a category that matches a search condition from the extracted category candidates based on a predetermined rule. Including.

【０００９】検索エンジンを利用してカテゴリを求める
ようにした。このため、システム固有のデータベースを
保有する必要がなく、システムの簡略化を可能とする。
また、分類用辞書のための記憶領域を必要としないた
め、記憶領域の削減につながる。さらに、検索エンジン
でのカテゴリ分類は、定期的にメンテナンスが行なわれ
ている。このため、利用者による分類用辞書のメンテナ
ンスをする必要がなくなると同時に、常に最新の分類情
報の取得が可能となる。A category is obtained by using a search engine. Therefore, it is not necessary to have a database unique to the system, and the system can be simplified.
In addition, since a storage area for the classification dictionary is not required, the storage area can be reduced. Further, categorization in search engines is regularly maintained. Therefore, it is not necessary for the user to maintain the classification dictionary, and the latest classification information can always be obtained.

【００１０】好ましくは、カテゴリを決定するステップ
は、カテゴリ候補のうち、テキストデータ内で最も出現
頻度の高いカテゴリを決定するステップを含む。[0010] Preferably, the step of determining a category includes a step of determining a category having the highest appearance frequency in the text data among the category candidates.

【００１１】本発明の他の局面に従う端末装置は、ネッ
トワークに接続されている。端末装置は、検索条件を指
定し、検索条件の属する階層化されたカテゴリおよび検
索条件に合致するホームページのリンク先をホームペー
ジとして提供する検索エンジンを参照し、検索エンジン
からホームページを構成するテキストデータを取得する
ための検索手段と、検索手段に接続され、テキストデー
タを分析して、カテゴリ候補を抽出するための分析手段
と、分析手段に接続され、所定のルールに基づいて、抽
出されたカテゴリ候補から検索条件に合致するカテゴリ
を判断するための判断手段とを含む。[0011] A terminal device according to another aspect of the present invention is connected to a network. The terminal device specifies a search condition, refers to a search engine that provides, as a homepage, a hierarchical category to which the search condition belongs and a link destination of a homepage that matches the search condition, and outputs text data constituting the homepage from the search engine. A search unit for acquiring, and an analysis unit connected to the search unit for analyzing text data to extract a category candidate; and an extracted category candidate connected to the analysis unit based on a predetermined rule. And a judging means for judging a category that matches the search condition from the above.

【００１２】検索エンジンを利用してカテゴリを求める
ようにした。このため、システム固有のデータベースを
保有する必要がなく、システムの簡略化を可能とする。
また、分類用辞書のための記憶領域を必要としないた
め、記憶領域の削減につながる。さらに、検索エンジン
でのカテゴリ分類は、定期的にメンテナンスが行なわれ
ている。このため、利用者による分類用辞書のメンテナ
ンスをする必要がなくなると同時に、常に最新の分類情
報の取得が可能となる。A category is obtained by using a search engine. Therefore, it is not necessary to have a database unique to the system, and the system can be simplified.
In addition, since a storage area for the classification dictionary is not required, the storage area can be reduced. Further, categorization in search engines is regularly maintained. Therefore, it is not necessary for the user to maintain the classification dictionary, and the latest classification information can always be obtained.

【００１３】好ましくは、判断手段は、カテゴリ候補の
うち、テキストデータ内で最も出現頻度の高いカテゴリ
を判断するための手段を含む。Preferably, the judging means includes means for judging a category having the highest appearance frequency in the text data among the category candidates.

【００１４】[0014]

【発明の実施の形態】図１を参照して、本発明の実施の
形態に係る検索システムは、利用者が利用する端末装置
３０と、インターネット２０を介して端末装置３０と接
続され、検索条件に合致するＨＴＭＬ（ＨｙｐｅｒＴ
ｅｘｔＭａｒｋｕｐＬａｎｇｕａｇｅ）データを検索
する検索サーバ１０とを含む。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Referring to FIG. 1, a search system according to an embodiment of the present invention is connected to a terminal device 30 used by a user and to the terminal device 30 via the Internet 20. HTML (Hyper T
ext Markup Language) data.

【００１５】インターネット２０上には、ＷＷＷが構築
されており、このＷＷＷでは、ＨＴＴＰ（Ｈｙｐｅｒ
ＴｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）と呼ば
れるプロトコルにより、データの転送を行なう。また、
ＨＴＭＬに代表される記述言語でＷｅｂページを記述す
ることにより、情報の検索や表示を簡単に行なうことが
できる。A WWW is constructed on the Internet 20. In the WWW, an HTTP (Hyper)
Data transfer is performed by a protocol called “Text Transfer Protocol”. Also,
By describing a Web page in a description language represented by HTML, information can be easily searched and displayed.

【００１６】検索サーバ１０は、ＷＷＷサーバ１１と、
インターネット上に公開されている情報が蓄積されてい
るデータベース１２と、ＷＷＷサーバ１１およびデータ
ベース１２に接続され、ＷＷＷサーバ１１から受取った
キーワードに基づいてデータベース１２を検索するＷｅ
ｂサイト検索部１３と、Ｗｅｂサイト検索部１３および
ＷＷＷサーバ１１に接続され、キーワードに該当するＷ
ｅｂサイトの一覧を生成し、ＷＷＷサーバ１１を介して
端末装置３０に送出するＷｅｂページ生成部１４とを含
む。このような検索サーバ１０は、一般に検索エンジン
と呼ばれている。The search server 10 includes a WWW server 11 and
A database 12 storing information published on the Internet, a WWW server 11 and a We connected to the database 12 and searching the database 12 based on keywords received from the WWW server 11;
b site search unit 13, connected to the Web site search unit 13 and the WWW server 11,
a web page generation unit 14 that generates a list of websites and sends the list to the terminal device 30 via the WWW server 11. Such a search server 10 is generally called a search engine.

【００１７】検索エンジンは、一般に全文検索型とディ
レクトリ型とに大別されるが、本発明では、ディレクト
リによってカテゴリ分類を行っているディレクトリ型の
検索エンジンを利用する。その著名なものとしては、Ｙ
ＡＨＯＯ！（Ｒ）やＬＹＣＯＳ（Ｒ）などがある。Search engines are generally classified into a full-text search type and a directory type. In the present invention, a directory-type search engine that performs category classification by directory is used. Its famous one is Y
AHOO! (R) and LYCOS (R).

【００１８】これらの、ディレクトリ型検索エンジン
は、Ｗｅｂサイトの作者又は検索エンジンの管理者がＷ
ｅｂサイトを登録することにより、データベースが構築
される。このように特定者が登録を行なうので、データ
ベースに登録されるＷｅｂサイトの数は少ないが、その
信頼性は高い。また、データベースは分類分けされてい
るので、無関係なＷｅｂサイトが検索される可能性は低
い。これらのことから、これらの検索エンジンが提供す
るＷｅｂサイトの一覧に含まれるカテゴリを、データの
分類用辞書として取扱う。[0018] These directory-type search engines are provided by the creator of the Web site or the administrator of the search engine.
By registering the website, a database is constructed. Since the specific person registers in this way, the number of Web sites registered in the database is small, but the reliability is high. Further, since the database is classified, it is unlikely that an unrelated Web site will be searched. For these reasons, the categories included in the list of Web sites provided by these search engines are handled as data classification dictionaries.

【００１９】端末装置３０は、パソコンや携帯端末など
であり、インターネット２０に接続し、ＷＷＷブラウザ
を用いて各種サーバにアクセスし、情報やサービスの提
供を受ける。なお、ここでは、端末装置３０は、検索サ
ーバ１０より、検索サービスの提供を受ける。The terminal device 30 is a personal computer or a portable terminal, is connected to the Internet 20, accesses various servers using a WWW browser, and receives information and services. Here, the terminal device 30 receives a search service from the search server 10.

【００２０】端末装置３０は、インターネット２０を介
して、検索サーバ１０との情報の送受信の制御を行なう
通信制御部３１と、通信制御部３１に接続され、通信制
御部３１を介して受取ったＷｅｂサイトの一覧の分析を
行ないカテゴリ候補を抽出する分析部３３と、分析部３
３に接続され、分析部３３で抽出されたカテゴリ候補の
中から、予め定められたルールに基づいてカテゴリを選
択する判断部３４と、購入商品データや、文書データな
どの分類対象となるデータを記憶するデータ蓄積部３５
と、通信制御部３１に接続され、検索条件を、通信制御
部３１を介して検索サーバ１０に送信する検索部３２
と、結果を出力する出力部３７と、検索部３２、判断部
３４、データ蓄積部３５および出力部３７に接続され、
ユーザの指示に基いて、データ蓄積部３５に記憶するデ
ータの分類を行なう処理部３６とを含む。The terminal device 30 is connected to the communication control unit 31 via the Internet 20 to control the transmission and reception of information with the search server 10, and is connected to the communication control unit 31 and receives the Web received via the communication control unit 31. An analysis unit 33 for analyzing a list of sites and extracting category candidates; and an analysis unit 3
3 and a determination unit 34 for selecting a category from the category candidates extracted by the analysis unit 33 based on a predetermined rule, and a classification target data such as purchased product data and document data. Data storage unit 35 to be stored
A search unit 32 connected to the communication control unit 31 and transmitting the search condition to the search server 10 via the communication control unit 31
Connected to the output unit 37 for outputting the result, the search unit 32, the determination unit 34, the data storage unit 35, and the output unit 37,
A processing unit 36 for classifying data stored in the data storage unit 35 based on a user's instruction.

【００２１】以下の説明では、データ蓄積部３５には、
インターネット２０に接続されている多数のＷｅｂサー
バ（図示せず）から収集した複数のホームページが、タ
イトルと共に保存されているものとする。ホームページ
毎に付与されるタイトルを基準に、検索エンジンを利用
して、ホームページをカテゴリ別に分類する。In the following description, the data storage unit 35
It is assumed that a plurality of homepages collected from many Web servers (not shown) connected to the Internet 20 are stored together with titles. Based on the title given to each homepage, the homepage is classified into categories using a search engine.

【００２２】ユーザから指示を受けた処理部３６は、デ
ータ蓄積部３５からホームページ毎に付与されるタイト
ルを読み出して検索部３２へ出力する。検索部３２は、
通信制御部３１を介して、タイトルを検索条件として、
検索サーバ１０へ送出し、該当するＷｅｂサイトの検索
を要求する。[0034] The processing unit 36, which has received an instruction from the user, reads the title given to each home page from the data storage unit 35 and outputs it to the search unit 32. The search unit 32
Through the communication control unit 31, the title is used as a search condition,
The request is sent to the search server 10, and a search for the corresponding Web site is requested.

【００２３】検索の結果が検索サーバ１０から送られて
くると、分析部３３は、通信制御部３１を介して、Ｗｅ
ｂサイトの一覧（Ｗｅｂページ）を受取り、分析を行な
う。Ｗｅｂページは、ホームページを構成するテキスト
データであり、分析部３３では、該テキストデータの分
析を行ない、カテゴリ記載部分を判断し、カテゴリ候補
となる文字列を抽出する。判断部３４は、分析部３３で
抽出されたカテゴリ候補の中から、予め決められたルー
ルに基いてカテゴリを選択し処理部３６へ送出する。処
理部３６は、データ蓄積部３５より読み出したタイトル
を有するホームページに、選択したカテゴリを割り付
け、データ蓄積部３５へ書込む。When the search result is sent from the search server 10, the analysis unit 33 sends the search result to the Web server via the communication control unit 31.
Receives a list of b sites (Web pages) and performs analysis. The Web page is text data that constitutes a home page. The analysis unit 33 analyzes the text data, determines a category description portion, and extracts a character string that is a category candidate. The determination unit 34 selects a category from the category candidates extracted by the analysis unit 33 based on a predetermined rule, and sends the category to the processing unit 36. The processing unit 36 assigns the selected category to the home page having the title read from the data storage unit 35 and writes the selected category to the data storage unit 35.

【００２４】さらに、処理部３６は、このようにして分
類された結果から、ホームページの採取傾向などの分析
を行ない、その結果を出力部３７へ出力する。出力部３
７は、処理部３６より受け取った結果をグラフなどに加
工し表示する。Further, the processing unit 36 analyzes the tendency of collecting home pages from the results classified in this way, and outputs the results to the output unit 37. Output unit 3
Reference numeral 7 processes the result received from the processing unit 36 into a graph or the like and displays it.

【００２５】図２を参照して、キーワード分類処理につ
いて説明する。ここでは、購入商品の分類を例として説
明する。Referring to FIG. 2, the keyword classification processing will be described. Here, the classification of purchased products will be described as an example.

【００２６】端末装置３０を利用するユーザは、キーワ
ードとなる商品名、たとえば、「ＭＤ」を設定し、検索
エンジンヘＷｅｂサイトの検索を要求する（ステップＳ
１１）。検索要求を受けた検索エンジンのＷｅｂサイト
検索部１３は、キーワード「ＭＤ」に合うＷｅｂサイト
の検索を行なう（ステップＳ１２）。端末装置３０は、
検索サーバ１０から、キーワード「ＭＤ」にヒットした
Ｗｅｂサイトのタイトルー覧やカテゴリ情報を含んだＷ
ｅｂページを収得する（ステップＳ１３）。A user using the terminal device 30 sets a product name as a keyword, for example, "MD", and requests a search engine to search a Web site (step S).
11). Upon receiving the search request, the Web site search unit 13 of the search engine searches for a Web site that matches the keyword “MD” (Step S12). The terminal device 30
From the search server 10, a W-list including a list of titles and category information of Web sites that hit the keyword "MD"
An eb page is obtained (step S13).

【００２７】図３は、検索サーバ１０より取得されたＷ
ｅｂページの概念図であり、キーワード「ＭＤ」に対す
るＷｅｂサイトの検索結果の一例である。検索サーバ１
０より取得されるデータは、実際には、ＨＴＭＬなどの
記述言語でＷｅｂページの構成を記したテキストデータ
であるが、図３では、ＷＷＷブラウザ上での表示イメー
ジで表現してある。FIG. 3 shows the W obtained from the search server 10.
It is a conceptual diagram of an web page, and is an example of the search result of the web site with respect to the keyword "MD". Search server 1
The data obtained from 0 is actually text data describing the configuration of a Web page in a description language such as HTML, but in FIG. 3, it is represented by a display image on a WWW browser.

【００２８】分析部３３は、テキストデータの分析を行
なう（ステップＳ１４）。すなわち、まず、テキストデ
ータ中の日本語文字コードを統一するため、日本語文字
コードの変換を行なう。たとえば、シフトＪＩＳ（Japa
nese Industrial Standards）またはＪＩＳコードなど
からＵＮＩＸ（Ｒ）で使用されるＥＵＣ（Extended UNI
XCode）ヘの変換を行なう。次に、分析部３３は、テキ
ストデータからカテゴリ記載部分１０１を抽出し要素毎
に分割する。図４は、カテゴリ記載部分１０１を要素毎
に分割したものである。分割されたカテゴリ記載部分１
０１は、カテゴリの階層の深さ２０１および抽出した要
素、すなわちカテゴリ候補２０２からなる。The analysis section 33 analyzes the text data (step S14). That is, first, the Japanese character code is converted in order to unify the Japanese character code in the text data. For example, Shift JIS (Japa
EUC (Extended UNI) used in UNIX (R) from nese Industrial Standards) or JIS code etc.
XCode). Next, the analysis unit 33 extracts the category description portion 101 from the text data and divides it into elements. FIG. 4 is a diagram in which the category description portion 101 is divided for each element. Category description part 1 divided
01 includes a category depth 201 and extracted elements, that is, category candidates 202.

【００２９】カテゴリ記載部分１０１は、「／」や
「＞」などの区切り文字で区切られ、カテゴリ候補２０
２が抽出される。カテゴリ候補２０２が抽出される毎に
階層の深さ２０１が大きくなっていく。The category description portion 101 is delimited by a delimiter such as "/" or ">", and the category candidate 20
2 is extracted. Each time a category candidate 202 is extracted, the depth 201 of the hierarchy increases.

【００３０】判断部３４は、カテゴリ候補２０２の中か
ら、予め定められたルールに基づいてカテゴリの決定を
行なう（ステップＳ１５）。カテゴリを決定するルール
としては、キーワードと一致した要素の１つ上の階層の
要素をカテゴリとして決定する、または、統計的処理を
施し、その位置関係からカテゴリを決定するなどのいろ
いろな方法が考えられる。ここでは、最下位層のカテゴ
リ要素を除き、テキストデータ内で最も出現回数の多い
カテゴリを、そのキーワードに対応するカテゴリとして
決定する。たとえば、図４では、最下位層である「Ｍ
Ｄ」および「記録媒体」を除き、出現回数の多い「ＡＶ
機器」をキーワード「ＭＤ」に対するカテゴリと判断し
ている。処理部３６は、判断した結果である「ＡＶ機
器」を出力部３７に出力し、この処理を終了する（ステ
ップＳ１６）。The determining unit 34 determines a category from the category candidates 202 based on a predetermined rule (step S15). As a rule for determining a category, there are various methods such as determining an element in a layer immediately above the element that matches the keyword as a category, or performing statistical processing and determining the category from the positional relationship. Can be Here, the category having the highest number of appearances in the text data is determined as the category corresponding to the keyword, excluding the category element of the lowest layer. For example, in FIG. 4, the lowest layer “M
D ”and“ recording medium ”,“ AV ”
It is determined that “device” is a category for the keyword “MD”. The processing unit 36 outputs “AV device” as a result of the determination to the output unit 37, and ends this processing (step S16).

【００３１】なお、本実施例では、複数のカテゴリ候補
の中から、自動的にカテゴリを１つに決定するようにし
ているが、１つのキーワードが、複数のカテゴリに含ま
れるように設定してもよい。In this embodiment, one category is automatically determined from a plurality of category candidates, but one keyword is set so as to be included in the plurality of categories. Is also good.

【００３２】また、自動的にカテゴリを決定するのでは
なく、複数のカテゴリの中から、ユーザがカテゴリを選
択するようにしてもよい。Instead of automatically determining a category, a user may select a category from a plurality of categories.

【００３３】以上説明したように本実施の形態によれ
ば、端末装置３０が、従来の分類用辞書の代わりに、検
索エンジン（検索サーバ１０）を利用するようにした。
このため、システム固有のデータベースを保有する必要
がなく、システムの簡略化を可能とする。As described above, according to the present embodiment, the terminal device 30 uses the search engine (the search server 10) instead of the conventional classification dictionary.
Therefore, it is not necessary to have a database unique to the system, and the system can be simplified.

【００３４】また、端末装置３０における分類用辞書の
ための記憶領域を必要としない。さらに、検索エンジン
でのカテゴリ分類は、定期的にメンテナンスが行なわれ
ている。このため、端末装置３０側では、分類用辞書の
メンテナンスをする必要がなくなると同時に、常に最新
の分類情報の取得が可能となる。Further, a storage area for the classification dictionary in the terminal device 30 is not required. Further, categorization in search engines is regularly maintained. For this reason, the terminal device 30 does not need to maintain the classification dictionary, and can always acquire the latest classification information.

【００３５】今回開示された実施の形態はすべての点で
例示であって制限的なものではないと考えられるべきで
ある。本発明の範囲は上記した説明ではなくて特許請求
の範囲によって示され、特許請求の範囲と均等の意味お
よび範囲内でのすべての変更が含まれることが意図され
る。The embodiments disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

【００３６】[0036]

【発明の効果】本発明によると、分類用辞書の記憶領域
を削減することができる。また、分類用辞書のメンテナ
ンスが不要であり、常に最新の分類情報の取得が可能と
なる。According to the present invention, the storage area of the classification dictionary can be reduced. Further, the maintenance of the classification dictionary is unnecessary, and the latest classification information can be always obtained.

[Brief description of the drawings]

【図１】本発明の実施の形態に係る検索システムの構
成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a search system according to an embodiment of the present invention.

【図２】キーワード分類処理のフローチャートであ
る。FIG. 2 is a flowchart of a keyword classification process.

【図３】検索サーバ１０より取得されたＷｅｂページ
の概念図である。FIG. 3 is a conceptual diagram of a Web page acquired from a search server 10.

【図４】抽出されたカテゴリ候補の一例を示す図であ
る。FIG. 4 is a diagram showing an example of an extracted category candidate.

[Explanation of symbols]

１０検索サーバ、１１サーバ、１２データベー
ス、１３サイト検索部、１４ページ生成部、２０
インターネット、３０端末装置、３１通信制御部、
３２検索部、３３分析部、３４判断部、３５デ
ータ蓄積部、３６処理部、３７出力部、１０１カテ
ゴリ記載部分、２０２カテゴリ候補。10 search server, 11 server, 12 database, 13 site search section, 14 page generation section, 20
Internet, 30 terminal devices, 31 communication control unit,
32 search part, 33 analysis part, 34 judgment part, 35 data storage part, 36 processing part, 37 output part, 101 category description part, 202 category candidate.

Claims

[Claims]

1. A keyword classification method for a terminal device connected to a network, wherein a search condition is specified, and a hierarchical category to which the search condition belongs and a link destination of a homepage that matches the search condition are set as homepages. Referring to a search engine to be provided; acquiring text data constituting a homepage from the search engine; analyzing the acquired text data to extract a category candidate; based on a predetermined rule And determining a category that matches the search condition from the extracted category candidates.

2. The keyword classification method according to claim 1, wherein said step of determining a category includes a step of determining a category having the highest appearance frequency in said text data among said category candidates.

3. A terminal device connected to a network, wherein a search engine is provided for designating search conditions and providing, as a home page, a hierarchical category to which the search conditions belong and a link destination of a home page matching the search conditions. A search unit for acquiring text data constituting a home page from the search engine, and an analysis unit connected to the search unit for analyzing the text data and extracting a category candidate. A terminal connected to the analyzing unit and including a determining unit configured to determine a category that matches a search condition from the extracted category candidates based on a predetermined rule.

4. The terminal device according to claim 3, wherein said determining means includes means for determining a category having the highest frequency of appearance in said text data among said category candidates.