JP3465815B2

JP3465815B2 - Text filtering system

Info

Publication number: JP3465815B2
Application number: JP37741798A
Authority: JP
Inventors: 哲夫渡辺; 正彦藏田; 邦雄佐藤
Original assignee: 株式会社Ｑｕｉｃｋ
Priority date: 1998-12-28
Filing date: 1998-12-28
Publication date: 2003-11-10
Anticipated expiration: 2018-12-28
Also published as: JP2000200278A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、通信社や新聞社等
から送られてくる電子テキストデータを複数のユーザに
配信するテキストフィルタリングシステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text filtering system for delivering electronic text data sent from a news agency or a newspaper company to a plurality of users.

【０００２】[0002]

【従来の技術】従来から、大量の電子ニュース等の電子
テキストデータをフィルタリングしてして複数のユーザ
へ配信するシステムがある。これらのシステムにおける
全文テキストデータの監視は、定時でのバッチ処理によ
る検索であり、クリッピングと呼ぶべきものである。一
方、従来のリアルタイム監視と呼ばれていたものは文の
一部分、例えば新聞等の見出しのみの監視であり、全文
の監視ではなかった。すなわち、従来のシステムでは、
時間的に遅れをともなった全文ベースのテキスト監視
か、リアルタイムであってもテキストの一部しか監視の
対象としていなかった。2. Description of the Related Art Conventionally, there is a system for filtering a large amount of electronic text data such as electronic news and delivering it to a plurality of users. Monitoring of full-text data in these systems is a search by batch processing at a fixed time, and should be called clipping. On the other hand, the conventional so-called real-time monitoring is monitoring only a part of a sentence, for example, a headline of a newspaper or the like, and is not monitoring the whole sentence. That is, in the conventional system,
Full-text based text monitoring with a time delay, or only part of the text was monitored even in real time.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、資産運
用にかかわる機関投資家、金融機関等のユーザ（運用担
当者）は、１秒でも早くデータを取得して、売りか買い
かの判断を下す必要ある。現在でも、英文のテキストデ
ータについては、所定の条件設定を行っておけば、リア
ルタイムで必要とするテキストデータを取得できる。こ
れに対して、日本語のように、単語と単語との間に区切
りのない言語のテキストデータの場合、時々刻々送られ
てくるテキストデータのうち必要とするテキストデータ
のみを全文検索してリアルタイムで配信するのは、処理
時間がかかり、困難であった。ここで、全文検索とはテ
キストデータの始めから終わりまでの全ての文字列を検
索したり、フィルタリングしたりすることをいう。However, users (operation managers) such as institutional investors and financial institutions involved in asset management need to acquire data as early as one second and make a decision whether to sell or buy. is there. Even now, with regard to English text data, required text data can be acquired in real time by setting predetermined conditions. On the other hand, in the case of text data in a language such as Japanese where there is no break between words, full-text search is performed for only the necessary text data that is sent from time to time, and real-time search is performed. It was difficult and time-consuming to deliver. Here, the full-text search refers to searching or filtering all character strings from the beginning to the end of text data.

【０００４】また、多数のユーザのフィルタリング条件
式に含まれる全ての検索データで一つの有限オートマト
ンを作成することにより、テキストデータの一度の走査
で全ての検索タームを照合することは従来から可能であ
る。しかしながら、従来のオートマトンを用いたフィル
タリング方法では、ノイズが多く実用にならなかった。
ノイズの多いデータは、取得したテキストデータに対し
て瞬時に判断を下さなければならいなユーザにとって、
大きな負担となる。また、ユーザの数が多くなり、フィ
ルタリングのためのキーワードが多くなると、オートマ
トンのサイズが大きくなり、処理に時間がかかる。この
ため、リアルタイム処理といっても従来の方法では、数
十分かかる場合もある。オートマトンはモメリに展開す
るので、サイズが大きくなると、メモリの使用量が多く
なり、メモリを圧迫するので全体の処理速度が落ちる。
このようにオートマトンを用いたフィルタリング処理は
テキストデータを高速で処理できるが、オートマトンの
サイズが大きくなると、処理速度が遅くなり、しかもオ
ートマトンを用いたフィルタリングはノイズが多いとい
う問題もある。Further, it is conventionally possible to collate all search terms with one scan of text data by creating one finite automaton with all search data included in filtering conditional expressions of many users. is there. However, the conventional filtering method using the automaton is not practical because of a lot of noise.
For noisy data, for users who have to make an instant judgment on the acquired text data,
It will be a heavy burden. Further, as the number of users increases and the number of keywords for filtering increases, the size of the automaton increases, and the processing takes time. Therefore, even if it is called real-time processing, the conventional method may take tens of minutes. Since the automaton expands into memory, the larger the size, the larger the amount of memory used and the pressure on the memory, resulting in a decrease in the overall processing speed.
As described above, the filtering process using the automaton can process the text data at a high speed, but when the size of the automaton becomes large, the processing speed becomes slow, and the filtering using the automaton has a problem that it is noisy.

【０００５】本発明は上記事情に基づいてなされたもの
であり、日本語等のように単語と単語の区切りのない言
語のテキストデータについてリアルタイムでテキストデ
ータをフィルタリングすることができるテキストデータ
フィルタリングシステム及びテキストフィルタリング方
法を提供することを目的とする。The present invention has been made based on the above circumstances, and a text data filtering system and a text data filtering system capable of filtering text data in real time with respect to text data in a language in which words and words are not separated from each other, such as Japanese. The purpose is to provide a text filtering method.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するため
に本発明に係るテキストフィルタリングシステムは、所
定の文字列とその文字列に対応する識別コードを記述し
た辞書をオートマトンに展開しておき、入力されるテキ
ストデータに対してフィルタリングを実行し、該当する
文字列が前記テキストデータにあれば、対応する識別コ
ードを前記テキストデータに付与する識別コード付与手
段と、前記識別コードを付与されたテキストデータに対
して、前記文字列の前後に付く文字を検索し、予め定め
た文字が付いていたときには、前記付与された識別コー
ドを削除することにより識別コードを確定するノイズ除
去手段と、を具備することを特徴とするものである。In order to achieve the above object, a text filtering system according to the present invention has a dictionary in which a predetermined character string and an identification code corresponding to the character string are written in an automaton. If the corresponding text string is present in the text data and filtering is performed on the input text data, an identification code assigning means for assigning a corresponding identification code to the text data, and a text to which the identification code is assigned A noise removing unit that searches the data for characters before and after the character string and, when a predetermined character is added, deletes the assigned identification code to fix the identification code. It is characterized by doing.

【０００７】予め各ユーザ毎にフリーキーワードと前記
識別コードと論理演算子を用いて作成して登録したフィ
ルタリング条件をオートマトンに展開し、前記識別コー
ドが付与されたテキストデータを走査して、フィルタリ
ングした結果を出力するフィルタリング手段を備えるこ
とが望ましい。上記目的を達成するために本発明に係る
テキストフィルタリング方法は、一次処理として、プレ
フィルタリングにより識別コードを付与する工程と、付
与された識別コードについてノイズを除去する工程とを
備え、二次処理として、ユーザが設定したフィルタリン
グ条件によりフィルタリング処理を行う工程と識別コー
ドについてのノイズを除去する工程とを備えることを特
徴とするものである。[0007] The filtering condition created and registered for each user in advance by using the free keyword, the identification code and the logical operator is expanded into the automaton, and the text data to which the identification code is added is scanned and filtered. It is desirable to have filtering means for outputting the results. To achieve the above object, the text filtering method according to the present invention includes, as a primary process, a step of adding an identification code by pre-filtering, and a step of removing noise from the added identification code. The present invention is characterized by including a step of performing a filtering process according to a filtering condition set by a user and a step of removing noise of an identification code.

【０００８】本発明は共通で絞り込みできるものについ
ては、オートマトンのフィルタリングを使用し、これに
より対象となるプロファイルを絞り込んだ上で、更に各
プロファイル毎の個別の検索を行うようにすることによ
り、テキストデータのノイズの少ないリアルタイム処理
が可能となった。ここで、プロファイルとは、ユーザが
設定したフィルタリング条件であると共に、検索条件で
ある。In the present invention, as for the items that can be narrowed down in common, the filtering of the automaton is used, the target profile is narrowed down by this, and further the individual search is performed for each profile to Real-time processing with less noise in the data is now possible. Here, the profile is a search condition as well as a filtering condition set by the user.

【０００９】[0009]

【発明の実施の形態】［実施形態の構成］図１は、本発
明の一実施形態であるテキストフィルタリングシステム
のブロック図である。図１に示すテキストデータシステ
ムは、各種の電子情報源１０からの電子テキストデータ
の入力を管理するデータ受信サーバ１２と、データ受信
サーバから送られてきたテキストデータに対して銘柄コ
ードを付与する一次フィルタリングサーバ１６と、銘柄
コードが付与されたテキストデータのバックアップを取
るための記憶部１８と、銘柄コードが付与されたテキス
トデータをデータベース２２に登録したり、全文検索し
たりする検索サーバ２０と、ユーザが設定したフィルタ
リング条件でテキストデータをフィルタリングする二次
フィルタリングサーバ２４と、二次フィルタリングサー
バがフィルタリングした結果等を格納する記録部２６
と、端末とのデータの授受を制御するウエブサーバ２８
とを備える。各サーバは、他のサーバとデータを送受す
るための図示しない送受信部を有する。尚、本実施形態
のテキストデータは、企業情報や新聞ニース等の金融証
券業務で用いるデータであるとする。また、本システム
全体は、日に一回、例えば夜中の１２時に、システムの
管理者が設定するユーザに関するデータ等を取り込んで
更新する。さらに、各電子情報源から送られるテキスト
データは、数ｋバイトのものであり、本実施形態のシス
テムは、かかる大きさのテキストデータをリアルタイム
で配信するものである。本実施形態のシステムは、各サ
ーバのＯＳにＷｉｎｄｏｗｓＮＴを使用している。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [Configuration of Embodiment] FIG. 1 is a block diagram of a text filtering system according to an embodiment of the present invention. The text data system shown in FIG. 1 manages the input of electronic text data from various electronic information sources 10, and a primary receiving code for the text data sent from the data reception server 12. A filtering server 16, a storage unit 18 for backing up text data with a stock code, a search server 20 for registering text data with a stock code in a database 22 and performing full-text search. A secondary filtering server 24 that filters text data according to a filtering condition set by the user, and a recording unit 26 that stores the result of filtering by the secondary filtering server.
And a web server 28 for controlling the exchange of data with the terminal
With. Each server has a transmission / reception unit (not shown) for transmitting / receiving data to / from another server. The text data of this embodiment is assumed to be data used in financial securities business such as corporate information and newspaper nice. Further, the entire system fetches and updates the data about the user set by the system administrator once a day, for example, at 12:00 midnight. Further, the text data sent from each electronic information source is a few kbytes, and the system of the present embodiment delivers the text data of such a size in real time. The system of this embodiment uses Windows NT as the OS of each server.

【００１０】［データ受信サーバ］データ受信サーバ１
２は、電子情報源１０から送られくるテキストデータを
一旦ニュースキューファイル記憶部１４に記録してか
ら、一次フィルタリングサーバに送る。時々刻々に送ら
れてくるテキストデータを一次フィルタリングサーバに
送る際に、送るタイミングを合わせたり、一次フィルタ
リングサーバの処理に合わせて送るために、一時的にテ
キストデータをキューファイル記録部に溜めて置く必要
があるからである。また、何らかの事情で一次フィルタ
リングサーバの処理が遅れたときに、電子情報源１０か
らのテキストデータの取りこぼしが無いように、受けた
テキストデータは一旦ニュースキューファイル記憶部１
４に記憶する。[Data receiving server] Data receiving server 1
Second, the text data sent from the electronic information source 10 is temporarily recorded in the news queue file storage unit 14 and then sent to the primary filtering server. When sending the text data sent from moment to moment to the primary filtering server, the text data is temporarily stored in the queue file recording part in order to match the timing of sending and to send it according to the processing of the primary filtering server. It is necessary. Further, when the processing of the primary filtering server is delayed for some reason, the received text data is temporarily received in the news queue file storage unit 1 so that the text data is not missed from the electronic information source 10.
Store in 4.

【００１１】［一次フィルタリングサーバ］一次フィル
タリングサーバでは、前処理としてテキストパターンマ
ッチング処理を用いて分類コードを付与し、オートマト
ンのフィルタリングにより銘柄コードを付与し、更にノ
イズ除去辞書（テーブル）を参照して、ノイズを除去す
る処理を行う。かかる処理を行う一次フィルタリングサ
ーバ１２は、テキストデータから余分な空白を削除した
りして整形するテキスト編集部１２１と、テキストデー
タに対して分類コードを付与する分類コード付与部１２
２と、テキストデータに対して識別コードである銘柄コ
ードを付与する銘柄コード付与部１２３と、銘柄コード
が付与されたテキストデータからノイズを除去するため
のノイズ除去部１２４と、テキストデータを一回走査す
るだけで、多くのフィルタリング条件を照合することが
できるテキストサーチエンジン１２５とを備える。一次
フィルタリングサーバは、この他に、検索サーバや二次
フィルタリングサーバに送るデータを１レコードとした
テキストデータをバックアップデータとしてバックアッ
プ記録部１８に記録する。システムがクラッシュした
り、或はバックアップデータを他で活用したいときに、
このバックアップデータを使用する。テキスト編集部１
２１は、受信したテキストデータを整形する。すなわ
ち、通常、ニューステキストの中には、画面に表示する
ときに見やすくするために、テキストの両側にスペース
が挿入されている場合がある。このような独自整形がス
ペースでされていると、検索時やフィルタリング時にお
ける単語の泣き別れが生じたり、クライアント端末でHT
LM表示を行うときに不便である。このため、テキストデ
ータの整形が必要となる。また、テキスト編集部は、ニ
ュース本文から、センテンスの切れ目以外の改行を取り
除いたり、半角カタカナを全角カタカナに変換する処理
を行う。更に、テキスト編集部は、検索エンジンが扱う
データの形式が項目毎にカンマで区切られたＣＳＶ形式
であるので、テキストデータとコードとをＣＳＶ形式に
変換して出力する。[Primary Filtering Server] In the primary filtering server, a classification code is added by using text pattern matching processing as preprocessing, a stock code is added by filtering of an automaton, and a noise removal dictionary (table) is referred to. , To remove noise. The primary filtering server 12 that performs this processing includes a text editing unit 121 that removes extra blanks from the text data and shapes the text data, and a classification code addition unit 12 that adds a classification code to the text data.
2, a brand code assigning unit 123 that assigns a brand code that is an identification code to the text data, a noise removing unit 124 that removes noise from the text data to which the brand code is assigned, and the text data once. It has a text search engine 125 that can match many filtering conditions only by scanning. In addition to this, the primary filtering server records, in the backup recording unit 18, text data in which one record is data sent to the search server or the secondary filtering server as backup data. If your system crashes or you want to use your backup data elsewhere,
Use this backup data. Text editor 1
21 shapes the received text data. That is, usually, in the news text, spaces may be inserted on both sides of the text in order to make it easy to see when displayed on the screen. If this kind of unique formatting is done in the space, words may be separated at the time of searching or filtering, or HT may occur on the client terminal.
It is inconvenient when displaying LM. Therefore, it is necessary to format the text data. In addition, the text editing unit removes line breaks other than sentence breaks from the news text and converts half-width katakana to full-width katakana. Further, since the data format handled by the search engine is the CSV format in which each item is separated by a comma, the text editing unit converts the text data and the code into the CSV format and outputs the converted data.

【００１２】［分類コード付与部］次に、分類コード付
与部１２２における処理について説明する。分類コード
には、例えば、投資家の立場に立った場合、情報源のコ
ード、情報のカテゴリーのコード、人事情報のコード、
格付情報のコード等が必要となる。分類コードは、分類
変換テーブルの指示に従って、ジャンル・コードとニュ
ース・タイトル（見出し）からソース（情報源）番号、
ジャンル番号、（ジャンル）詳細番号、日英種別へ変換
する。分類コードを付与する分類変換テーブルは、図２
（Ａ）に示すルールにしたがって、項目間をカンマで区
切ったＣＳＶ（図２（Ｂ）参照）とする。項目名のジャ
ンル・コードは、送られてくるニュースに予め付けられ
ているコードである。対象フィールドは、検索する対象
が「設定なし」、「タイトル」、「本文」の３つに区分
される。演算子としては、完全一致、前方一致、後方一
致、中間一致、囲み文字内中間一致を設けている。囲み
文字１は、演算子で「囲み文字内中間」を使うときに、
囲み開始の文字を定義するものであり、囲み文字２は、
その終了文字を定義するものである。ソース番号は、ジ
ャンル・コードに対応する番号を入れる。情報源がどこ
であるのかは、このコード番号を見て判断する。ジャン
ル番号は、ＮＡＡ（ Nikkei Asahi All ）の中で演算子
の条件によって割り当てられる番号である。詳細番号
は、ジャンル番号を更に細分化して使用する場合に用い
る。ジャンル番号と詳細番号を見ることにより、当該テ
キストデータが人事情報であるのか、格付情報であるの
か等の判断を行うことができる。日英番号は、日本語ニ
ュースのときは、「０」を英語ニュースのときは「１」
を付ける。有効日数は、ニュースを検索可能とする期間
を発行日からの日数によって示すものである。次に、図
２（Ｂ）を参照してＣＳＶの具体例について説明する。
同図（Ｂ）の一行目は、ＮＡＡというジャンルで、タイ
トルに前方一致で＜朝日＞があったら、ソース（情報
源）番号として「１」、ジャンル番号として「２」、日
英番号として「０」、有効日数として「０」を付与する
ことを意味する。尚、この例では、詳細番号は省略され
ている。また、同図（Ｂ）の最終行は、ＮＡＡというジ
ャンルで、墨付き括弧で囲まれた文字内にＮＱＮの文字
列があれば、ソース番号として「１」、ジャンル番号と
して「８」、詳細番号として「５」、日英番号として
「０」、有効日数として「０」を付与することを意味す
る。[Category Code Addition Unit] Next, the processing in the classification code addition unit 122 will be described. Classification codes include, for example, from the standpoint of an investor, the source code, the information category code, the personnel information code,
Rating information code etc. are required. The classification code is the source (information source) number from the genre code and news title (headline) according to the instructions of the classification conversion table,
Convert to genre number, (genre) detailed number, Japanese-English type. The classification conversion table to which the classification code is added is shown in FIG.
According to the rule shown in (A), the items are CSV (see FIG. 2B) separated by commas. The genre code of the item name is a code attached to the news sent in advance. In the target field, the search target is divided into three categories: "no setting", "title", and "body". As operators, exact match, forward match, backward match, intermediate match, and in-box intermediate match are provided. Enclosed character 1 is when you use "Inside enclosing character" in the operator,
It defines the character to start the enclosing, and the enclosing character 2 is
It defines the end character. The source number is the number corresponding to the genre code. This code number is used to determine where the information source is. The genre number is a number assigned in NAA (Nikkei Asahi All) according to the condition of an operator. The detailed number is used when the genre number is further subdivided and used. By looking at the genre number and the detailed number, it is possible to judge whether the text data is personnel information or rating information. The Japanese-English number is "0" for Japanese news and "1" for English news.
Attach. The number of valid days indicates the period during which news can be searched by the number of days from the issue date. Next, a specific example of CSV will be described with reference to FIG.
The first line in the same figure (B) is a genre called NAA, and if there is a prefix <Asahi> in the title, the source (information source) number is "1", the genre number is "2", and the Japanese-English number is " It means that "0" and "0" are given as the number of effective days. In this example, detailed numbers are omitted. Further, the last line in the figure (B) is a genre called NAA, and if there is a character string of NQN in the characters enclosed in black brackets, the source number is "1", the genre number is "8", and the details. This means that "5" is assigned as the number, "0" is assigned as the Japanese-English number, and "0" is assigned as the number of valid days.

【００１３】このような分類変換テーブルを使って、送
られてきたテキストデータに対してテキストパターンマ
ッチングを行って、分類コードを付与する。また、でき
るだけ処理時間を短縮するため及び処理手順を単純化す
るために、分類変換テーブルの上から順にマッチングを
行い、ヒットしたら、その分類番号を付与し、それ以後
の検索は行わないことにする。したがって、優先させた
い分類番号はこのテーブルの上の方に置くように配慮す
る必要がある。この処理では、使用する演算子の殆どは
前方一致か後方一致であるので、タイトルや本文の先頭
か最後尾の文字列だけを照合するだけであるので、この
処理時間は極めて短い。Using such a classification conversion table, text pattern matching is performed on the sent text data, and a classification code is added. Also, in order to reduce the processing time as much as possible and to simplify the processing procedure, matching is performed in order from the top of the classification conversion table, and if there is a hit, the classification number is given and no further search is performed. . Therefore, it is necessary to consider that the classification numbers to be given priority should be placed at the top of this table. In this process, most of the operators used are prefix match or suffix match, so only the character strings at the beginning or end of the title or body are matched, so this processing time is extremely short.

【００１４】［銘柄コード付与部］次に、銘柄コードの
付与について説明する。図３は銘柄コードを付与すると
きに用いる銘柄辞書の一部を示す図である。銘柄辞書４
２ａは、図３に示すように、銘柄コード、例えば６６０
１と、銘柄名、例えば日本、日本製作所、Nihon を記述
したテーブル形式の辞書である。銘柄辞書は予め各銘柄
コード毎に作成して記録しておく。銘柄辞書は、システ
ムを立ち上げたときに、オートマトンに展開しておき、
新たに上場された銘柄や廃止された銘柄の情報を取り込
むために、一日に一回、例えば夜中の１２時にシステム
を立ち上げ直すことにより書きかえる。かかるシステム
の立ち上げは瞬時に行われるので、配信サービスに影響
を及ぼすことはない。[Brand Code Assigning Unit] Next, the assigning of a brand code will be described. FIG. 3 is a diagram showing a part of a brand dictionary used when a brand code is assigned. Stock dictionary 4
2a is a stock code, for example, 660, as shown in FIG.
1 and a brand name, for example, Japan, Japan Manufacturing Co., Ltd., Nihon. A stock dictionary is created and recorded for each stock code in advance. The stock dictionary is expanded in the automaton when the system is started,
It can be rewritten by reactivating the system once a day, for example, at 12 o'clock in the middle of the night, in order to capture information on newly listed and abolished brands. Since the system is started up instantly, it does not affect the distribution service.

【００１５】また、タイトル中に「人事」という文字列
が有れば、本文の検索は行わないこととすることによ
り、銘柄コードを付与する際のノイズを低減することが
できる。人事に関するニュースには、本田、武田、松下
等の人名が記述されていることが多く、これらの人名
が、本田技研自動車、武田薬品工業、松下電器産業等の
企業名と同じであることから、人事に関するニュースに
銘柄コードを付与してしまうことが多い。このように人
名は、銘柄名と同じものが多いので、予め、テキストデ
ータのタイトルに「人事」の文字列があるときには、分
類コードを付与するときに、人事に関するデータである
旨のコードを付与する。このコードが付与されたテキス
トデータをオートマトンのフィルタリングの対象テキス
トデータとしないことにより、銘柄コード付与の処理速
度の向上と、ノイズの低減を図ることができる。If the title includes the character string "personnel", the text in the title is not searched, so that noise when assigning the stock code can be reduced. Personnel names such as Honda, Takeda, Matsushita, etc. are often described in personnel news, and since these names are the same as company names such as Honda Motor Co., Takeda Pharmaceutical Company, Matsushita Electric Industrial, etc., Stock codes are often added to HR news. In this way, many people's names are the same as the stock names, so if the text data title has the character string "HR" in advance, when assigning the classification code, add a code indicating that it is data related to personnel. To do. By not using the text data to which this code is added as the text data to be filtered by the automaton, it is possible to improve the processing speed of assigning the stock code and reduce noise.

【００１６】銘柄コード付与部１２３は、銘柄辞書をオ
ートマトンに展開しておき、受信したテキストデータを
リアルタイムでテキストサーチエンジンに入力して、走
査する。テキストデータの中に該当する文字列があれ
ば、その文字列に対応する銘柄コードを出力する。出力
された銘柄コードは、半角の数値でテキストデータの予
め定めた、所定の位置に記述される。例えば、日本製作
所の場合は、６６０１のコード番号をテキストデータと
区分して付与する。テキストデータは、タブコードによ
り配信元の情報、タイトル、本文、日付、時間等の幾つ
かの項目に分けられている。そこに銘柄コードの項目を
作り込んで、対応する銘柄コードを数値で付与する。こ
のようにして、銘柄コードを付与したテキストデータを
もとに次のノイズ除去の処理が行われる。The brand code assigning unit 123 expands the brand dictionary into an automaton, inputs the received text data to the text search engine in real time, and scans it. If there is a corresponding character string in the text data, the issue code corresponding to that character string is output. The output brand code is described by a half-width numerical value at a predetermined position in the text data. For example, in the case of Japan Manufacturing Co., Ltd., the code number of 6601 is assigned separately from the text data. The text data is divided into several items such as the information of the delivery source, the title, the text, the date, and the time by the tab code. The item of the stock code is created there, and the corresponding stock code is given numerically. In this way, the next noise removal process is performed based on the text data to which the brand code is added.

【００１７】［ノイズ除去部］次に、ノイズ除去につい
て説明する。オートマトンのフィルタリング処理はリア
ルタイムで処理できるが、ノイズが多いので、そのノイ
ズを低減する工夫が必要となる。ノイズの除去を、各ユ
ーザ毎に行うことも可能ではある。しかしながら、ユー
ザが確実なノイズ除去の条件式を作成することは容易な
ことではなく、またオートマトンのサイズが大きくなり
すぎるので、現実的でない。そこで、本実施形態では、
オートマトンのフィルタリング処理による銘柄コードを
付与した後で、ノイズ除去部によりノイズを除去してい
る。図４（Ａ）はノイズ除去辞書の記述方法を示す図で
あり、同図（Ｂ）は（Ａ）の記述方法により記述したノ
イズ除去辞書の一部を示す図である。ノイズ除去辞書４
２ｂは、図４（Ａ）に示すルールに基づき、ＣＳＶ（Co
mma Separated Value ）テキストにより作成される。す
なわち、各銘柄コード毎に第１カラムには、銘柄コード
として数値、例えば日本製作所の場合は６６０１を記述
する。第２カラムには、チェックの対象となる銘柄名、
例えば日本を、第３カラムには、オペレーションコード
と文字列を必要な個数分記述する。例えば、＠１化，＠
１電，＠１情，＠１市と記述する。以後、この３つのカ
ラムを１のセット、すなわち銘柄コードと、チェックの
必要のある銘柄名と、オペレーションと文字列を組みに
したものと、を１のセットにし、かかるセットの記述を
繰り返すことによりノイズ辞書を作成する。作成した
ら、ノイズ除去辞書も登録しておく。[Noise Removal Unit] Next, noise removal will be described. The filtering process of the automaton can be processed in real time, but since there is a lot of noise, it is necessary to devise a way to reduce that noise. It is also possible to remove noise for each user. However, it is not easy for the user to create a conditional expression for reliable noise removal, and the size of the automaton becomes too large, which is not realistic. Therefore, in this embodiment,
The noise is removed by the noise removing unit after the brand code is added by the filtering process of the automaton. FIG. 4A is a diagram showing a description method of the noise removal dictionary, and FIG. 4B is a diagram showing a part of the noise removal dictionary described by the description method of FIG. 4A. Noise removal dictionary 4
2b is based on the rule shown in FIG.
mma Separated Value) Created by text. That is, a numerical value, for example, 6601 in the case of Nippon Seisakusho, is described in the first column for each issue code as the issue code. In the second column, the stock name to be checked,
For example, Japan is described in the third column with the required number of operation codes and character strings. For example, @ 1 conversion, @
Described as 1 train, @ 1 information, @ 1 city. After that, these three columns are set as 1, that is, the stock code, the stock name that needs to be checked, and the combination of the operation and the character string are set as 1, and the description of the set is repeated. Create a noise dictionary. Once created, also register the noise removal dictionary.

【００１８】本実施形態の銘柄辞書を使用すると、例え
ば銘柄コード６６０１は、テキストデータのなかに対象
文字列として「日本」、「日本製作所」、「Nihon 」の
何れかの文字列が有れば、そのテキストデータに銘柄コ
ード６０５１が付与される。したがって、このテキスト
データの中に、「日本化成」、「日本情報エンジニヤリ
ング」、「日本電線」、「日本市」の文字列がある場合
にも、銘柄コード６６０１が付与される。このように、
図４に示す銘柄辞書を使用して銘柄コードを付与する
と、ノイズを含むようになる。もちろん対象文字列を
「日本製作所」だけにすれば、ノイズのない銘柄コード
付与ができる。しかしながら、銘柄辞書をこのように作
成すると、テキストデータの中で、日本製作所のこと
が、略称で「日本」と記述されているものを検索するこ
とができない。すなわち、フィルタリング結果が信頼性
の低いものとなってしまう。そこで、本実施形態では、
第１段階では、オートマトンのフィルタリングを用いて
この様な略称をも含めた文字列の検索を行って、銘柄コ
ードの付与を行い。第２段階で、ノイズ除去部を用いて
ノイズを除去することとしている。When the stock dictionary of the present embodiment is used, for example, the stock code 6601 can be used if the text data contains a character string of "Japan", "Japan Works", or "Nihon" as a target character string. The stock code 6051 is added to the text data. Therefore, even when the text data includes the character strings “Nippon Kasei”, “Japan Information Engineering”, “Nippon Electric Wire”, and “Japan City”, the stock code 6601 is given. in this way,
When a brand code is assigned using the brand dictionary shown in FIG. 4, noise is included. Of course, if the target character string is "Japan Seisakusho" only, it is possible to assign a stock code without noise. However, if the stock dictionary is created in this way, it is not possible to search for the Japanese manufacturing company whose abbreviated name is "Japan" in the text data. That is, the filtering result becomes unreliable. Therefore, in this embodiment,
In the first stage, a string of characters including such abbreviations is searched for by using automaton filtering, and a stock code is assigned. In the second step, the noise is removed by using the noise removing unit.

【００１９】ノイズ除去部１２４は、銘柄コード付与部
１２３で暫定的に付与された銘柄コードを確定する処理
を行う。すなわち、暫定的に付与された銘柄コード毎
に、ノイズ除去辞書を参照して当該銘柄コードを除去す
るか否かを判断する。例えば、銘柄コード６６０１につ
いては、テキストデータ中の文字列「日本」の後方に
「化」や「電」や「市」が有れば、このテキストデータ
に付与した銘柄コード６６１０を除去する。このように
図４に示すルールに基づき作成した辞書を参照すること
により、ノイズを除去して銘柄コードを確定する。The noise removing unit 124 carries out a process of fixing the brand code provisionally assigned by the brand code assigning unit 123. That is, it is determined whether or not the issue code is removed by referring to the noise removal dictionary for each issue code provisionally provided. For example, with respect to the issue code 6601, if there is "ka", "den" or "city" after the character string "Japan" in the text data, the issue code 6610 given to this text data is removed. Thus, by referring to the dictionary created based on the rule shown in FIG. 4, noise is removed and the stock code is determined.

【００２０】このノイズ除去辞書を育てることにより、
ノイズ除去の精度を向上させることができる。このノイ
ズ辞書は、サイズが大きくなっても、処理時間に影響を
及ぼすことはない。付与された銘柄コードについての
み、必要な参照をすれば足りるので、このノイズ処理に
より処理時間がかかり、リアルタイム処理ができなくな
ることはない。By growing this noise removal dictionary,
The accuracy of noise removal can be improved. This noise dictionary does not affect the processing time even if the size increases. Since it suffices to make a necessary reference only for the assigned brand code, this noise processing takes a long processing time, and real-time processing is not disabled.

【００２１】また、ある特定の分類コードが付与された
もの、例えば人事に関するニュースの場合は、通常、タ
イトル（見出し）が「○×会社の人事」となっているの
で、タイトルだけを検索して、銘柄コードを付与する。
これにより、銘柄コードを付与するにあたり、生ずるノ
イズを低減して、迅速なフィルタリング処理を行うこと
ができる。In addition, in the case of news to which a certain classification code is added, for example, news relating to personnel, since the title (headline) is usually "Ox company personnel", only the title is searched. , Assign a stock code.
As a result, it is possible to reduce noise that occurs when a brand code is added and to perform a quick filtering process.

【００２２】［二次フィルタリングサーバ］二次フィル
タリングサーバ２４は、ユーザが設定したフィルタリン
グ条件でオートマトンのフィルタリングを行うフィルタ
リング部２４１と、オートマトンのフィルタリングによ
ってヒットしたプロファイルについてコードを確認する
コード確認部２４２と、テキストデータを一回走査する
だけで多くのフィルタリング条件を照合することができ
るテキストサーチエンジン２４３とを備える。また二次
フィルタリングサーバは、ユーザが設定したフィルタリ
ング条件をプロファイル情報記録部２６３に記憶し、ま
たユーザからのヒットデータがあるか否かの問い合わせ
に対して、フィルタ結果記録部２６２を参照して応答す
る。更に、新たなユーザが加わったり、ユーザのオプシ
ョンが変更になったりするので、一日に一回、管理者が
設定したユーザテーブル４１を参照して、更新されたユ
ーザ情報をユーザ情報記録部２６１に記録する。プロフ
ァイル情報記憶部２６３は、各ユーザ毎のフィルタリン
グ条件を記録するものである。フィルタリング結果記録
部２６２は、二次フィルタリングした結果であるテキス
トデータの見出しを記録しておくものである。このフィ
ルタリング結果である見出しは、クライアントからの指
示により、ウエブサーバのフィルタリング部を介して、
クライアントの表示装置に表示される。尚、本文は、デ
ータベース２２に記録され、ウエブサーバの検索部を介
して、クライアントの表示装置に表示される。各プロフ
ァイル毎に３０件分のフィルタリング結果を記録する。
本実施形態では、プロファイルは予め一のユーザ毎に９
件の登録を行うことができる。このようにフィルタリン
グ結果を記録しておくことにより、フィルタリング結果
を再利用することができる。すなわち、このフィルタリ
ング結果をリアルタイムで表示するだけでなく、ユーザ
が必要とするときに何時でもフィルタリング結果を表示
することができる。フィルタリング結果を記録していな
い従来のシステムに比べて本実施形態のシステムは使い
勝手が良くなる。[Secondary Filtering Server] The secondary filtering server 24 includes a filtering unit 241 for filtering an automaton under a filtering condition set by a user, and a code checking unit 242 for checking a code for a profile hit by the filtering of the automaton. , A text search engine 243 capable of matching many filtering conditions by scanning the text data once. Further, the secondary filtering server stores the filtering condition set by the user in the profile information recording unit 263, and responds to the inquiry about the hit data from the user by referring to the filter result recording unit 262. To do. Furthermore, since a new user is added or a user's option is changed, the updated user information is referred to by referring to the user table 41 set by the administrator once a day. To record. The profile information storage unit 263 records filtering conditions for each user. The filtering result recording unit 262 records the heading of the text data that is the result of the secondary filtering. The headline that is the filtering result is instructed by the client via the filtering unit of the web server,
Displayed on the client display. The text is recorded in the database 22 and displayed on the display device of the client via the search unit of the web server. 30 filtering results are recorded for each profile.
In the present embodiment, the profile is 9 in advance for each user.
You can register the case. By recording the filtering result in this way, the filtering result can be reused. That is, not only can the filtering result be displayed in real time, but the filtering result can be displayed at any time when the user needs it. The system of the present embodiment is more convenient than the conventional system in which the filtering result is not recorded.

【００２３】確定した銘柄コードが付与されたテキスト
データは、二次フィルタリングサーバで、フィルタリン
グ処理される。プロファイル情報記録部２６３には、予
め各ユーザ毎がフリーキーワードと銘柄コードと論理演
算子とを用いて作成したフィルタリング条件が登録され
ている。フィルタリング部２４１は、このフィルタリン
グ条件をオートマトンに展開しておき、受信した銘柄コ
ード付テキストデータをリアルタイムでテキストサーチ
エンジンに入力して走査する。フィルタリング条件に合
致するものがあれば、ウエブサーバを介してその旨を該
当するユーザに通知する。これによりテキストデータを
リアルタイムでユーザに配信することができる。The text data to which the fixed brand code is added is filtered by the secondary filtering server. In the profile information recording unit 263, filtering conditions created in advance by each user using a free keyword, a stock code, and a logical operator are registered. The filtering unit 241 expands this filtering condition into an automaton, inputs the received text data with brand code into the text search engine in real time, and scans it. If there is one that meets the filtering conditions, the corresponding user is notified of that fact via the web server. This allows the text data to be delivered to the user in real time.

【００２４】ところで、ユーザが銘柄コードとして例え
ば、１９９８を設定した場合、フィルタリング対象とな
ったテキストデータ中に年号の１９９８がある場合に
も、このテキストデータがヒットされてしまう。このた
め、銘柄コードについては、テキストデータがヒットさ
れた後に、その銘柄コード、この場合１９９８がテキス
トデータの銘柄の項目にあるか否かを確認する。タブで
区切られた銘柄の項目を見て、そこに無ければ、ノイズ
であると判断する。検索するときに、銘柄項目の個所だ
けを見て、銘柄コードの検索を行えば、この様なノイズ
は生じない。しかしながら、この方法は、全ての銘柄コ
ードについて調べなければならないので、時間がかか
る。したがって、オートマトンで他のフリーキーワード
と一緒に一回の走査で照合して、後でヒットした銘柄コ
ードだけを個別にチェックする方が処理速度の点で優れ
ている。By the way, when the user sets, for example, 1998 as the stock code, this text data is hit even if the year data 1998 is included in the text data to be filtered. Therefore, for the issue code, after the text data is hit, it is confirmed whether the issue code, in this case 1998, is in the item of the issue of the text data. Look at the item of the brand delimited by the tab, and if it is not there, determine that it is noise. Such noise will not occur if the stock code is searched by looking only at the stock item when searching. However, this method is time consuming because it has to look up all stock codes. Therefore, it is better in terms of processing speed to collate with other free keywords in the automaton in one scan and individually check only the stock code that is hit later.

【００２５】二次フィルタリングサーバのオートマトン
はユーザがフィルタリング条件を変更する都度、変更後
のフィルタリング条件をオートマトンに展開し直す。し
かしながら、このオートマトンのサイズは、銘柄辞書の
文だけ従来のものよりサイズが小さいので、従来のシス
テムに比べてオートマトンへ再展開するときの処理時間
を短縮することができる。Each time the user changes the filtering conditions, the automaton of the secondary filtering server redeploys the changed filtering conditions in the automaton. However, since the size of this automaton is smaller than that of the conventional one by only the sentences of the stock dictionary, the processing time when re-developing into the automaton can be shortened as compared with the conventional system.

【００２６】また、ユーザテーブルには、ユーザ毎に検
索できるニュース源が個別に設定されている。オートマ
トンのフィルタリング処理によりヒットしたものについ
て、更に、このユーザテーブルを参照して、このニュー
ス源が有料であれば、これを買っているか否かも判断す
る。このようにして徐々に絞り込みを行ってノイズの無
いデータをユーザに配信する。In the user table, news sources that can be searched for are set individually for each user. If the news source is charged, it is also determined whether or not the news source has been paid for a hit by the filtering process of the automaton. In this way, the noise-free data is distributed to the user by gradually narrowing it down.

【００２７】本実施形態のように、オートマトンを用い
てフィルタリングした結果から、銘柄コードの確認処
理、テーブル参照処理、というふうに徐々に絞り込むよ
うにして検索処理を行うことにより、効率良く且つ正確
な検索を行うことができる。言い換えれば、リアルタイ
ムでノイズの少ないフィルタリング処理を行うことがで
きる。As in the present embodiment, the search processing is performed by gradually narrowing down the stock code confirmation processing, the table reference processing, and the like based on the result of filtering using the automaton, so that efficient and accurate processing can be performed. You can search. In other words, real-time filtering processing with less noise can be performed.

【００２８】なお、本実施形態の一次フィルタリングサ
ーバ及び二次フィルタリングサーバで使用するテキスト
サーチエンジンは、市販されているものを使用してい
る。このテキストサーチエンジンは、実体は数個の関数
として提供されている。ここでは、これらの関数につい
ての詳細な説明は省略する。［検索サーバ］検索サーバ２０は、全文検索エンジン２
０１や図示しないシソーラス機能を備える。全文検索エ
ンジンは、リアルタイムでテキストデータを登録した
り、検索したりすることができるエンジンである。全文
検索には、ｎグラムインデックス法やビットマップイン
デックス法等の種々の手法があるが、本全文検索エンジ
ンには、テキストデータ登録時の処理速度をできる限り
速めたエンジン、すなわちフィルタリング処理と同等以
上の処理速度を有する全文検索エンジンを使用してい
る。これにより、フィルタリングだけでなく、リアルタ
イムの登録・検索も可能となる。また、本全文検索エン
ジンは、テキストデータの検索以外にも、データベース
に必要な基本性能を有する。例えば、日付、数値及び文
字列の項目を定義することができる。本実施形態では、
この全文検索エンジンは市販のものを用いている。した
がって、この全文検索エンジンについての詳細な説明は
省略する。The text search engine used in the primary filtering server and the secondary filtering server of this embodiment is a commercially available one. This text search engine is provided as several functions. Here, detailed description of these functions is omitted. [Search Server] The search server 20 is a full-text search engine 2
01 and a thesaurus function not shown. The full-text search engine is an engine that can register and search text data in real time. There are various methods such as n-gram index method and bitmap index method for full-text search. However, this full-text search engine has the same or higher speed as the processing speed at the time of text data registration, that is, filtering processing. It uses a full-text search engine with processing speed of. This enables real-time registration / search as well as filtering. Further, this full-text search engine has basic performance required for a database in addition to searching text data. For example, date, numeric and string items can be defined. In this embodiment,
This full-text search engine uses a commercially available one. Therefore, detailed description of this full-text search engine is omitted.

【００２９】［ウエブサーバ］ウエブサーバ２８は、検
索部２８１と、フィルタリング部２８２と、メイン部２
８３とを有し、クライアントとのテキストデータ等の送
受を制御するサーバである。検索部２８１は、クライア
ントの表示装置に検索画面を表示し、リアルタイムでフ
ィルタリングした結果を含む過去分のテキストデータに
対してユーザが行う検索を制御する。フィルタリング部
は、フィルタリング画面をクライアントの表示装置に表
示し、フィルタリング結果をリアルタイムでクライアン
トに知らせたり、クライアントが設定したフィルタリン
グ条件を二次フィルタリングサーバに送ったりする。メ
イン部２８３は、アクセスするユーザの名前、ＩＤ番
号、ユーザが買っているオプションを確認してアクセス
を認めるか否か等の制御を行う。クライアント側ではブ
ラウザを使用している。尚、本実施形態では、ウエブサ
ーバとブラウザで本システムとクライアントと間の情報
の送受を行っているが、本発明はこれに限られるもので
はなく、他のクライアント・サーバ型やメールサーバ等
を用いるようにしても良い。[Web Server] The web server 28 includes a search unit 281, a filtering unit 282, and a main unit 2.
83 is a server that controls the sending and receiving of text data and the like with the client. The search unit 281 displays a search screen on the display device of the client and controls the search performed by the user for the past text data including the result of real-time filtering. The filtering unit displays the filtering screen on the display device of the client, informs the client of the filtering result in real time, and sends the filtering condition set by the client to the secondary filtering server. The main unit 283 controls the name and ID number of the accessing user, confirms the option the user is buying, and determines whether or not to grant access. The client side uses a browser. In this embodiment, the web server and the browser send and receive information between the system and the client, but the present invention is not limited to this, and other client / server type or mail server may be used. You may use it.

【００３０】また、ユーザテーブル４１や辞書テーブル
４２は、システムの管理者がその内容を設定する。［実施形態の動作］図５は一次フィルタリングサーバに
おいて銘柄コードを付与するときのフローチャートであ
る。本フローはシステムを立ち上げたときに実行され
る。ステップ１では、辞書テーブル４２から銘柄辞書を
読み込んで、オートマトンに展開しておく。ステップ２
では、同じく辞書テーブル４２からノイズ除去辞書を読
み出してメモリ上に展開する。この状態でデータ受信サ
ーバからのテキストデータの送信を待つ（ステップ
３）。データ受信サーバからテキストデータが送信され
れば、ステップ４に移行して、そのテキストデータに対
して、テキストパターンマッチングにより前述した情報
源等を示す分類コードを付与する。テキストデータが無
ければ、ステップ９に移行して、システムを終了するか
否かを判断する。次に、銘柄辞書を参照してフィルタリ
ングを行い、そのテキストデータに該当する銘柄コード
を暫定的に付与する。この銘柄コードは１つに限られる
ものではなく、該当するものが複数あれば、その全ての
銘柄コードを付与する。The system administrator sets the contents of the user table 41 and the dictionary table 42. [Operation of Embodiment] FIG. 5 is a flow chart when a stock code is added in the primary filtering server. This flow is executed when the system is started up. In step 1, the stock dictionary is read from the dictionary table 42 and developed in the automaton. Step two
Then, similarly, the noise removal dictionary is read from the dictionary table 42 and developed in the memory. In this state, it waits for the text data to be transmitted from the data receiving server (step 3). When the text data is transmitted from the data receiving server, the process proceeds to step 4, and the text data is provided with a classification code indicating the above-mentioned information source by text pattern matching. If there is no text data, the process proceeds to step 9 to determine whether or not to end the system. Next, the issue dictionary is referenced to perform filtering, and the issue code corresponding to the text data is provisionally added. This issue code is not limited to one, and if there are multiple corresponding issues, all issue codes are assigned.

【００３１】ステップ６では、暫定的に付与された各銘
柄コードについて、ノイズ辞書を参照し、ノイズ辞書に
記述されたノイズであれば、その銘柄コードを削除す
る。このようにして、暫定的に付与された銘柄コードを
確定する。確定した銘柄コードが付与されたテキストデ
ータは、検索サーバに送られるとともに（ステップ
７）、二次フィルタリングサーバに送られる（ステップ
８）。次に、ステップ９でシステムを終了するか否かを
判断し、終了でなければ、ステップ３に移行してテキス
トデータの受信を待ち。終了であれば、一次フィルタリ
ングサーバにおける処理を終了する。In step 6, the noise dictionary is referred to for each provisionally assigned brand code, and if it is noise described in the noise dictionary, the brand code is deleted. In this way, the stock code provisionally given is fixed. The text data to which the decided brand code is added is sent to the search server (step 7) and the secondary filtering server (step 8). Next, in step 9, it is determined whether or not the system is to be terminated. If not, the process proceeds to step 3 and waits for reception of text data. If it is finished, the processing in the primary filtering server is finished.

【００３２】図６は二次フィルタリングサーバにおい
て、ユーザが設定したフィルタリング条件でフィルタリ
ングを行うときのフローチャートである。本フローはシ
ステムを立ち上げたときに実行される。ステップ１で
は、ユーザ名やユーザＩＤ情報を含むユーザ情報をデー
タベース及びメモリ上に展開する。ステップ２では、ユ
ーザが設定したプロファイル情報をオートマトンに展開
し、一次フィルタリングサーバからの分類コード及び銘
柄コードが付与されたテキストデータの受信を待つ。テ
キストデータが無ければ、ステップ１０に移行して終了
か否かを判断する。終了でなければ、ステップ３に戻っ
てテキストデータの受信を待つ。一次フィルタリングサ
ーバからテキストデータが送られてきたら、ステップ４
でオートマトンによるフィルタリングを行う。フィルタ
リングによりヒットしたプロファイルがあれば、そのプ
ロファイルについてのみ、銘柄コードのノイズがないか
確認する。例えば、あるプロファイルの中に、銘柄コー
ドとして１９９８があると、テキストデータの本文中に
１９９８という年号標記があっても、銘柄コードの１９
９８と誤って検出してしまう。このため、ヒットしたプ
ロファイル中の銘柄コードについて、テキストデータ中
の銘柄コードの項目に記述されているものであるか否か
を確認する。１９９８が銘柄コードの項目ではなく、本
文中に記述されたものであれば、ノイズであると判断す
る。ステップ６でプロファイルのヒット数分、例えばヒ
ットしたプロファイルが１０個あれば、その１０個分の
処理をしたか否かを判断し、終了していれば、次のニュ
ースの受信を待つ。終了していなければ、ステップ７に
移行して、その他のコード条件と一致しているか否かを
判断する。すなわち、ユーザが指定した分類コードに該
当するか、例えば指定した分野や、指定した言語（日本
語か英語）かを判断する。オートマトンのフィルタリン
グでは、分類コードの情報を判断することはできないの
で、分類コードの判断は、フィルタリングとは別個に調
べる必要がある。ステップ８では、更にそのユーザが、
ヒットした情報源のアクセス権があるか否か、すなわち
ユーザがその情報源を買っているか否かをユーザテーブ
ルの情報を参照して判断する。ステップ７及びステップ
８での処理は、ヒットしたプロファイルのみについて行
えばよいので、リアルタイム処理が可能である。ステッ
プ８の判断でアクセス権があると判断すれば、ステップ
９でフィルタリング結果をフィルタリング結果記録部に
記録するとともに、ウエブサーバのフィルタリング部を
介して、ヒットしたことをユーザに通知する。また、ス
テップ７及びステップ８でＮＯと判断された場合、この
テキストデータは、そのプロファイルを登録したユーザ
へは配信されない。FIG. 6 is a flow chart when the secondary filtering server performs filtering under the filtering condition set by the user. This flow is executed when the system is started up. In step 1, user information including a user name and user ID information is expanded on a database and a memory. In step 2, the profile information set by the user is developed in the automaton, and the reception of the text data to which the classification code and the brand code are added from the primary filtering server is waited. If there is no text data, the process proceeds to step 10 and it is determined whether or not the process ends. If not completed, the process returns to step 3 and waits for reception of text data. If text data is sent from the primary filtering server, step 4
Filter by automaton with. If there is a profile that has been hit by filtering, check the stock code for noise only for that profile. For example, if a profile contains 1998 as a stock code, even if the year code of 1998 is included in the text of the text data, the stock code 19
It is falsely detected as 98. Therefore, it is confirmed whether or not the brand code in the hit profile is described in the brand code item in the text data. If 1998 is not a brand code item but is described in the text, it is determined to be noise. In step 6, if there are 10 hit profiles, for example, if there are 10 hit profiles, it is judged whether or not the processing for the 10 hits has been performed, and if completed, the reception of the next news is waited. If not completed, the process proceeds to step 7 and it is determined whether or not the other code conditions are met. That is, it is determined whether the classification code specified by the user is applicable, for example, the specified field or the specified language (Japanese or English). Since the classification code information cannot be judged by the filtering of the automaton, the judgment of the classification code needs to be examined separately from the filtering. In step 8, the user
Whether or not there is an access right to the hit information source, that is, whether or not the user is buying the information source is determined by referring to the information in the user table. The processing in step 7 and step 8 can be performed in real time because only the hit profile needs to be performed. If it is determined in step 8 that there is an access right, in step 9 the filtering result is recorded in the filtering result recording unit, and the user is notified of the hit through the filtering unit of the web server. If NO in steps 7 and 8, this text data is not distributed to the user who registered the profile.

【００３３】［検索画面］次に、図７を用いて検索条件
を設定する場合について説明する。クライアントの端末
を立ち上げ、ウエブサーバとの通信を確立した上で、図
７の画面を用いて検索条件を設定する。図７に示す検索
画面のウィンドウ５１内に配置されている検索語の欄に
は、検索したい文字（フリーキーワード）と論理演算子
を入力する。銘柄コード欄には、銘柄コードを数値で入
力する。複数の銘柄コードを所定の記号で区切って入力
することにより、検索語と銘柄コードのＡＮＤ条件を指
定することができる。保存条件の欄には、作成した検索
条件を保存するときの名前を入力する。この欄に名前を
入力して保存釦をクリックすると、保存検索条件のプル
ダウンメニーに入力した名前で登録される。本実施形態
では、検索条件を合計１０個まで登録することができ
る。登録した検索条件は、検索サーバに保存される。期
間指定の欄では、検索する期間を入力する。表示オプシ
ョンの欄では、表示の順序や表示の本数を設定する。ソ
ースの欄では、検索対象とする情報源をクリックして指
定する。尚、ここで表示される情報源数はユーザとの契
約により、その内容が決まる。[Search Screen] Next, the case of setting search conditions will be described with reference to FIG. After starting the client terminal and establishing communication with the web server, search conditions are set using the screen of FIG. In the search word column arranged in the window 51 of the search screen shown in FIG. 7, a character (free keyword) to be searched and a logical operator are input. Enter the brand code numerically in the brand code field. By inputting a plurality of stock codes separated by a predetermined symbol, the AND condition of the search word and the stock code can be specified. In the storage condition field, enter a name for saving the created search condition. If you enter a name in this field and click the save button, the name entered in the pull-down menu of saved search conditions will be registered. In this embodiment, a total of 10 search conditions can be registered. The registered search conditions are saved in the search server. In the period specification field, enter the period to search. In the display option column, the display order and the number of displays are set. In the Source column, click the information source to be searched and specify. The number of information sources displayed here is determined by the contract with the user.

【００３４】図７の画面を用いて検索条件を設定とする
と、例えば、日本、首相等の文字を入力して真下の検索
釦をクリックすると、検索が実行され、データベースの
中から日本と首相の文字を含むテキストデータが検索さ
れ、その見出しのリストがウィンドウ５３に表示され
る。ここで、ユーザがある見出しをクリックすると、そ
の見出しに対応する本文がウィンドウ５４に表示され
る。これにより、過去分のテキストデータの中から、必
要なテキストデータを検索することができる。尚、過去
分といっても、本実施形態におけるテキストデータのデ
ータベースへの登録はリアルタイムで処理されるので、
リアルタイム入力される新しいテキストデータも検索対
象とされる。また、検索条件は保存釦を押して登録する
ことにより、再利用することもできるし、登録した検索
条件は呼び出して変更することも可能である。When the search conditions are set using the screen of FIG. 7, for example, when characters such as Japan and the prime minister are entered and the search button directly below is clicked, the search is executed, and the search is executed between the database and Japan. The text data including the characters is searched, and the list of the headings is displayed in the window 53. Here, when the user clicks on a headline, the text corresponding to the headline is displayed in the window 54. As a result, it is possible to retrieve the necessary text data from the past text data. Incidentally, even if it is past, the registration of the text data in the database in this embodiment is processed in real time.
New text data input in real time is also searched. Further, the search condition can be reused by pressing the save button and registered, or the registered search condition can be called and changed.

【００３５】［フィルタリング画面］次に、図８を用い
てフィルタリング条件を設定する場合について説明す
る。図７の画面で最上段のウィンドウ５１内のフィルタ
リング釦をクリックすると、画面の表示が図８のものに
変わる。ウィンドウ５２、ウィンドウ５３及びウィンド
ウ５４の機能は基本的に図７に示すものと同様である。
ウィンドウ５４内の保存プロファイルの欄は、設定した
フィルタリング条件を登録する欄である。ここに名前を
入れて、保存釦をクリックすると、この画面で設定した
フィルタリング条件が二次フィルタリングサーバに登録
される。また、登録したプロファイルの名前は、最上段
のウィンドウ５２内の下部にある通知エリアに表示され
る。本実施形態では、合計９個まで、プロファイルを登
録することができる。[Filtering Screen] Next, the case of setting filtering conditions will be described with reference to FIG. When the filtering button in the uppermost window 51 on the screen of FIG. 7 is clicked, the screen display is changed to that of FIG. The functions of the window 52, the window 53 and the window 54 are basically the same as those shown in FIG.
The storage profile column in the window 54 is a column for registering the set filtering conditions. If you enter a name here and click the save button, the filtering conditions set on this screen will be registered in the secondary filtering server. The name of the registered profile is displayed in the notification area at the bottom of the uppermost window 52. In this embodiment, up to 9 profiles can be registered in total.

【００３６】図８の画面を用いてフィルタリング条件を
設定とする。例えば、検索語の欄に提携と入力し、検索
画面の場合と同様にして銘柄コードを入力し、ソース欄
で情報源を指定して保存プロファイルの欄にプロファイ
ル名を入力し、保存釦をクリックする。すると、ウィン
ドウ５２の下部にある通知エリアの数値「１」の後にそ
のプロファイル名が表示され、リアルタイムで本実施形
態のシステムが受信するテキストデータに対してフィル
タリングが実行される。システムが受信したテキストデ
ータがその登録したフィルタリング条件に該当すれば、
ウィンドウ５２の通知エリアに登録したプロファイル名
にマークを付け点滅表示して、そのプロファイル名に該
当するニュースがヒットしたことをユーザに知らせる。
ユーザがその点滅表示されているプロファイル名をクリ
ックすると、ヒットしたテキストデータの見出しがウィ
ンドウ５３に表示される。また、表示された見出しをク
リックすると、ウィンドウ５４にその見出しに対応する
テキストデータの本体が表示される。このようにしてフ
ィルタリング条件を予め登録しておくことにより、時々
刻々発生するニュースをリアルタイムで取り込んで、読
むことができる。尚、登録したフィルタリング条件は呼
び出して、変更することも可能である。また、設定した
フィルタリング条件を登録する前に、確認検索釦をクリ
ックすることにより、この条件で検索を行って検証する
ことができる。Filtering conditions are set using the screen of FIG. For example, enter tie-up in the search term field, enter the stock code in the same way as in the search screen, specify the information source in the source field, enter the profile name in the save profile field, and click the save button. To do. Then, the profile name is displayed after the numerical value "1" in the notification area at the bottom of the window 52, and the text data received by the system of the present embodiment is filtered in real time. If the text data received by the system meets the registered filtering conditions,
A mark is added to the registered profile name in the notification area of the window 52 so that it flashes to notify the user that news corresponding to the profile name has hit.
When the user clicks the blinking profile name, the headline of the hit text data is displayed in the window 53. When the displayed headline is clicked, the body of the text data corresponding to the headline is displayed in the window 54. By thus registering the filtering conditions in advance, it is possible to capture and read the news that occurs momentarily in real time. The registered filtering condition can be called and changed. Further, by clicking the confirmation search button before registering the set filtering condition, it is possible to perform a search and verify with this condition.

【００３７】尚、図７及び図８に示す画面では、ヒット
した場合に、見出しのみ表示する場合ついて説明した
が、ヒットしたときに最新のテキストデータの本文を自
動で、ウィンドウ５４に表示するようにしてもよい。ま
た、図７及び図８では、図を簡略化するために、分類コ
ードの設定欄を省略したが、銘柄コードと同様にして、
分類コードの設定を行うことができる。In the screens shown in FIG. 7 and FIG. 8, the case where only the headline is displayed when hit is explained, but the body of the latest text data is automatically displayed in the window 54 when hit. You may In addition, in FIG. 7 and FIG. 8, the classification code setting field is omitted for simplification of the drawing, but in the same manner as the stock code,
The classification code can be set.

【００３８】［実施形態の効果］上記の本実施形態で
は、銘柄コードを用いることにより、多数の銘柄コード
を指定して高速でフィルタリングすることが可能となっ
た。従来のオートマトンを用いる方法では、一つの銘柄
のフィルタリング条件を作成する際に、正式名称の他
に、複数の変形名称、例えば略称、アルファベットの大
文字、小文字、カタカナ、ひらがな等を考慮するので、
フィルタリング条件式が長くなる。したがって、複数の
ユーザの各々が約３０００の銘柄の中から１００乃至２
００近い銘柄を指定してフィルタリング条件を登録する
と、従来のオートマトンを用いる方法では、処理時間が
かかり、リアルタイムでテキストデータを処理すること
はできなかった。これに対して本実施形態では、予めテ
キストデータに銘柄コードを付与することにより、以後
のフィルタリング処理等を高速化することができるの
で、全体としてフィルタリング処理の高速化、リアルタ
イム処理が可能になった。[Effects of the Embodiment] In the above-described embodiment, by using the issue code, it is possible to specify a large number of issue codes and perform high-speed filtering. In the method using the conventional automaton, when creating a filtering condition for one issue, in addition to the official name, multiple variant names, for example, abbreviations, uppercase letters, lowercase letters, katakana, hiragana, etc. are considered,
The filtering conditional expression becomes long. Therefore, each of the plurality of users may have 100 to 2 out of about 3000 stocks.
If a brand close to 00 is specified and filtering conditions are registered, the conventional method using an automaton takes a long processing time and cannot process text data in real time. On the other hand, in the present embodiment, since the branding code is added to the text data in advance, the subsequent filtering process and the like can be sped up, so that the filtering process as a whole can be sped up and real-time processing can be performed. .

【００３９】また、従来のオートマトンを用いる方法で
は、ユーザが銘柄コードを追加、削除、変更等するたび
に、大きなオートマトンを作り直す必要があり、これも
リアルタイム処理の障害となっている。これに対して、
本実施形態では、各ユーザが銘柄を変更しても、銘柄コ
ードを付与する側のオートマトンは組み替える必要がな
い。頻繁に組み替えられる可能性があるユーザが設定す
るプロファイルは、サイズの小さい二次側のオートマト
ンに展開することにより、ユーザによる銘柄の変更に対
して、リアルタイムに応答することが可能となる。Further, in the conventional method using the automaton, it is necessary to recreate a large automaton each time the user adds, deletes, or changes a brand code, which is also an obstacle to real-time processing. On the contrary,
In the present embodiment, even if each user changes the brand, it is not necessary to rearrange the automaton on the side that gives the brand code. By expanding the profile set by the user, which may be recombined frequently, in the secondary automaton having a small size, it becomes possible to respond in real time to a change in the brand by the user.

【００４０】上記の本実施形態では、約３０００ある銘
柄コードの付与を一次側で行っておくことにより、ユー
ザの銘柄変更に対しても、二次側の小さなサイズのオー
トマトンを組み替えるだけで良いので、リアルタイム処
理が可能となった。従来のシステムでは、ノイズを除去
する場合、各ユーザ毎に行わなければならなかった。し
たがって、従来のシステムでは、ノイズを除去する処理
を各ユーザが設定し、システムのオートマトンは、重複
したノイズ除去を行わざるを得なかった。このため、オ
ートマトンのサイズが大きくなり、リアルタイム処理が
困難となっていた。これに対して本実施形態では、一次
フィルタリングの後、ノイズを除去するが、このノイズ
除去の処理をユーザ全員が共有できる。これによりノイ
ズを確実に除去することができるとともに、処理速度の
向上を図ることができる。In the above-described embodiment, since about 3000 brand codes are assigned on the primary side, even if the user changes the brand code, only a small size automaton on the secondary side needs to be recombined. Real-time processing is now possible. In the conventional system, the noise must be removed for each user. Therefore, in the conventional system, each user sets a process for removing noise, and the system automaton is forced to perform duplicate noise removal. For this reason, the size of the automaton becomes large, making real-time processing difficult. On the other hand, in the present embodiment, noise is removed after the primary filtering, but this noise removal processing can be shared by all users. As a result, noise can be reliably removed and the processing speed can be improved.

【００４１】上記の本実施形態によれば、キーワードに
よる全文検索の他、銘柄コード（最大２２５銘柄／プロ
ファイルまで）や分類コードによる検索も行うことがで
きる。また、ユーザが予めキーワードによる自動監視条
件を設定しておくと、システムが受信するニュースをリ
アルタイムで監視し、該当するニュースが発生すると、
自動的にユーザに通知し、ユーザはその内容を見ること
ができる。According to this embodiment described above, in addition to full-text search by keywords, search by brand code (up to 225 brands / profile) or classification code can be performed. In addition, if the user sets the automatic monitoring conditions by keywords in advance, the news received by the system will be monitored in real time, and when the corresponding news occurs,
It automatically notifies the user, who can see its contents.

【００４２】［他の実施形態］なお、本発明は、上記の
実施形態に限定されるものではなく、その要旨の範囲内
において種々の変形が可能である。例えば、上記の実施
形態では、検索画面とフィルタリング画面とを切り換え
て表示する場合について説明したが、この両画面を一つ
の画面に表示するようにしてもよい。また、上記の実施
形態では、各機能ごとに別個のサーバを用いる場合につ
いて説明したが、１台のサーバで処理するようにしても
よい。更に、上記の実施形態では、テキストデータが金
融証券情報に関するものである場合について説明した
が、テキストデータはスポーツや芸能に関するデータ或
は企業の社内情報であってもよい。[Other Embodiments] The present invention is not limited to the above embodiments, and various modifications can be made within the scope of the invention. For example, in the above embodiment, the case where the search screen and the filtering screen are switched and displayed has been described, but both screens may be displayed on one screen. Further, in the above embodiment, the case where a separate server is used for each function has been described, but the processing may be performed by one server. Further, in the above embodiment, the case where the text data is related to financial securities information has been described, but the text data may be data related to sports and entertainment or company internal information.

【００４３】[0043]

【発明の効果】以上説明したように本発明によれば、リ
アルタイムで入力される日本語テキストデータをオート
マトンを用いたフィルタリングによりリアルタイムで走
査して銘柄コードを自動付与し、これよりリアルタイム
で入力される日本語テキストデータを銘柄コードにより
検索したりフィルタリングすることができるテキストフ
ィルタリングシステム及びテキストフィルタリング方法
を提供することができる。As described above, according to the present invention, Japanese text data input in real time is scanned in real time by filtering using an automaton, and a stock code is automatically added. It is possible to provide a text filtering system and a text filtering method capable of searching and filtering Japanese text data according to a stock code.

[Brief description of drawings]

【図１】本発明の一実施形態であるテキストフィルタリ
ングシステムのブロック図である。FIG. 1 is a block diagram of a text filtering system according to an embodiment of the present invention.

【図２】分類変換テーブルを説明するための図である。FIG. 2 is a diagram for explaining a classification conversion table.

【図３】銘柄辞書を示すである。FIG. 3 shows a stock dictionary.

【図４】ノイズ除去辞書を説明するための図である。FIG. 4 is a diagram for explaining a noise removal dictionary.

【図５】一次フィルタリングサーバにおいて銘柄コード
を付与するときのフローチャートである。FIG. 5 is a flowchart when assigning a stock code in the primary filtering server.

【図６】二次フィルタリングサーバにおいて、ユーザが
設定したフィルタリング条件でフィルタリングを行うと
きのフローチャートである。FIG. 6 is a flowchart when filtering is performed in the secondary filtering server under filtering conditions set by the user.

【図７】検索画面を示す図である。FIG. 7 is a diagram showing a search screen.

【図８】フィルタリング画面を示す図である。FIG. 8 is a diagram showing a filtering screen.

[Explanation of symbols]

１０電子情報源１２データ受信サーバ１４ニュースキューファイル記憶部１６一次フィルタリングサーバ１８記録部２０検索サーバ２２データベース２４二次フィルタリングサーバ２６記録部２８ウエブサーバ 10 electronic resources 12 Data receiving server 14 News queue file storage 16 Primary filtering server 18 Recording section 20 Search server 22 Database 24 Secondary filtering server 26 Record Department 28 Web servers

フロントページの続き (72)発明者佐藤邦雄東京都渋谷区南平台町15−15 南平台今井ビル５階株式会社キューズ・クリエイティブ内 (56)参考文献特開平６−259481（ＪＰ，Ａ) ＷＡＬＬ，Ｌ．＆ＳＣＨＷＡＲＴＺ，Ｌ．著近藤嘉雪訳，Ｐｅｒｌプログラミング，日本，ソフトバンク株式会社，1993年２月28日，ｐ．45−48 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 ＪＩＣＳＴファイル（ＪＯＩＳ)Front page continued (72) Inventor Kunio Sato 15-15 Minamidairacho, Shibuya-ku, Tokyo Minamidaira Imai Building, 5th floor, Queues Creative Co., Ltd. L. & SCHWART Z, L. Written by Yoshiyuki Kondo, Perl Programming, Japan, SoftBank Corp., February 28, 1993, p. 45-48 (58) Fields surveyed (Int.Cl. ⁷ , DB name) G06F 17/30 JISST file (JOIS)

Claims

(57) [Claims]

1. A dictionary in which a predetermined character string and an identification code corresponding to the character string are described is expanded in an automaton, and filtering is performed on input text data, and the corresponding character string is the text. If in the data, an identification code assigning means for assigning a corresponding identification code to the text data, and for the text data to which the identification code is assigned,
And a noise removing unit that searches for characters before and after the character string and, when a predetermined character is attached, fixes the identification code by deleting the assigned identification code. Text filtering system to do.

2. A filtering condition created and registered in advance for each user by using a free keyword, the identification code and a logical operator is expanded into an automaton, and the text data provided with the identification code is scanned, The text filtering system according to claim 1, further comprising filtering means for outputting a result of filtering.

3. The identification code given to the text data output in claim 2, when the identification code is not described at a predetermined position in the text data, the identification code is regarded as noise, A text filtering system comprising a noise removing means for removing.

4. The text filtering system according to claim 1, further comprising a full-text search engine having a function of registering and managing the text data provided with the identification code in a database in real time. .

5. The text filtering system according to claim 1, wherein the identification code is a stock code.

6. For input text data, at least one operator among forward match, backward match, exact match and intermediate match operators is applied to at least one of a title and a body of the text data. A classification code assigning means for assigning at least one classification code of a code indicating an information source, a code indicating an information category, and a code indicating Japanese or English to the text data by performing a search using The text filtering system according to claim 1, 2, 3, 4, or 5.