JP6050503B2

JP6050503B2 - Mail indexing and retrieval using a hierarchical cache

Info

Publication number: JP6050503B2
Application number: JP2015533157A
Authority: JP
Inventors: シャ・ジヨン
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2012-09-21
Filing date: 2013-09-18
Publication date: 2016-12-21
Anticipated expiration: 2033-09-18
Also published as: TW201413479A; TWI554897B; CN103678405B; CN103678405A; EP2898430A2; JP2015537283A; US20140089258A1; WO2014047193A2; EP2898430B1; US9507821B2; WO2014047193A3

Description

他の出願の相互参照
本願は、２０１２年９月２１日出願の中国特許出願第２０１２１０３５７２６９．６号「ＭＥＴＨＯＤＡＮＤＳＹＳＴＥＭＦＯＲＥＳＴＡＢＬＩＳＨＩＮＧＭＡＩＬＩＮＤＩＣＥＳＡＮＤＭＥＴＨＯＤＡＮＤＳＹＳＴＥＭＦＯＲＳＥＡＲＣＨＩＮＧＭＡＩＬ」の優先権を主張する。当該出願は、すべての目的のために参照により本明細書に組み込まれる。 CROSS-REFERENCE of other application claims the priority of September 21, Chinese Patent Application No. 201210357269.6, filed 2012 "METHOD AND SYSTEM FOR ESTABLISHING MAIL INDICES AND METHOD AND SYSTEM FOR SEARCHING MAIL ". This application is incorporated herein by reference for all purposes.

本願は、ネットワークデータ処理の分野に関し、特に、メールインデックスを確立してメール検索を実行するための方法およびシステムに関する。 The present application relates to the field of network data processing, and in particular, to a method and system for establishing a mail index and performing a mail search.

インターネット通信が次第に広まって、ますます多くのユーザがメール（特に、電子メールすなわちＥメール）で通信するようになるにつれ、メールボックス検索がデータ検索の中でも重要な検索技術になった。メールボックス検索は、通例、メールボックスインデックスに基づいている。すなわち、ユーザのメールはすべて、通例、メールボックスインデックスを用いて検索される。 As Internet communication has become increasingly widespread and more and more users communicate via email (especially email or email), mailbox search has become an important search technology among data searches. Mailbox searches are usually based on a mailbox index. That is, all user mail is typically retrieved using the mailbox index.

メールインデックスを確立するための１つの既存の方法は、以下の通りである：概して、メールボックスインデックスが、転置インデックスの形態で確立される。例えば、ｄｏｃ＿ｉｄ１、ｄｏｃ＿ｉｄ２、および、ｄｏｃ＿ｉｄ３という名称の３つのメールファイルがあり、いずれも「ｈｅｌｌｏｍｙｗｏｒｌｄ」というフレーズを含んでいるとする。したがって、キーワードおよびメールファイルの対応付けを格納する転置インデックスレコードは、以下に示すようになる： One existing method for establishing a mail index is as follows: In general, a mailbox index is established in the form of an inverted index. For example, it is assumed that there are three mail files named doc_id1, doc_id2, and doc_id3, and all include the phrase “hello my world”. Thus, the transposed index record that stores the keyword and mail file associations is as follows:

ｈｅｌｌｏ−＞ｄｏｃ＿ｉｄ１，ｄｏｃ＿ｉｄ２，ｄｏｃ＿ｉｄ３， hello-> doc_id1, doc_id2, doc_id3

ｍｙ−＞ｄｏｃ＿ｉｄ１，ｄｏｃ＿ｉｄ２，ｄｏｃ＿ｉｄ３， my-> doc_id1, doc_id2, doc_id3

ｗｏｒｌｄ−＞ｄｏｃ＿ｉｄ１，ｄｏｃ＿ｉｄ２，ｄｏｃ＿ｉｄ３； world-> doc_id1, doc_id2, doc_id3;

上述の転置インデックスレコードは、転置インデックスファイルに格納される。転置インデックスファイル内の各転置インデックスレコードのオフセット位置および長さが記録され、オフセット位置は、以下のように辞書ファイルに書き込まれる： The above-mentioned inverted index record is stored in the inverted index file. The offset position and length of each inverted index record in the inverted index file is recorded, and the offset position is written to the dictionary file as follows:

｛“ｈｅｌｌｏ”：｛“ｆｉｌｅ＿ｐａｔｈ”：“／ｘｘｘ／ｉｎｖｅｒｔｅｄ＿ｉｎｄｅｘ＿ｆｉｌｅ”，“ｏｆｆｓｅｔ”：０｝｝； {“Hello”: {“file_path”: “/ xxx / inverted_index_file”, “offset”: 0}};

ユーザが「ｈｅｌｌｏ」を含むメールを検索すると仮定すると、辞書ファイル内のこのキーワードを含むすべてのメールを見つけることができる。すなわち、アドレス「／ｘｘｘ／ｉｎｖｅｒｔｅｄ＿ｉｎｄｅｘ＿ｆｉｌｅ」が見つけられる。次いで、この転置インデックスファイルが開かれ、オフセット「０」の位置がフェッチされ、したがって、３通のメール｛ｄｏｃ＿ｉｄ１，ｄｏｃ＿ｉｄ２，ｄｏｃ＿ｉｄ３｝がフェッチされうる。 Assuming that the user searches for mail containing “hello”, all mail containing this keyword in the dictionary file can be found. That is, the address “/ xxx / inverted_index_file” is found. This inverted index file is then opened and the position of offset “0” is fetched, so three emails {doc_id1, doc_id2, doc_id3} can be fetched.

しかしながら、新たなメールが追加されると、転置インデックスファイルは、検索結果の完全性を確保するために更新される必要がある。例えば、新たなメール（ｄｏｃ＿ｉｄ４）が追加されるとする。このメールも、合計３つのキーワード「ｈｅｌｌｏｍｙｗｏｒｌｄ」を含む。したがって、この時点で、転置インデックスレコードは、以下のように更新される必要がある： However, when new mail is added, the inverted index file needs to be updated to ensure the integrity of the search results. For example, it is assumed that a new mail (doc_id4) is added. This mail also includes a total of three keywords “hello my world”. Therefore, at this point, the inverted index record needs to be updated as follows:

ｈｅｌｌｏ−＞ｄｏｃ＿ｉｄ１，ｄｏｃ＿ｉｄ２，ｄｏｃ＿ｉｄ３，ｄｏｃ＿ｉｄ４， hello-> doc_id1, doc_id2, doc_id3, doc_id4

ｍｙ−＞ｄｏｃ＿ｉｄ１，ｄｏｃ＿ｉｄ２，ｄｏｃ＿ｉｄ３，ｄｏｃ＿ｉｄ４， my-> doc_id1, doc_id2, doc_id3, doc_id4

ｗｏｒｌｄ−＞ｄｏｃ＿ｉｄ１，ｄｏｃ＿ｉｄ２，ｄｏｃ＿ｉｄ３，ｄｏｃ＿ｉｄ４； world-> doc_id1, doc_id2, doc_id3, doc_id4;

更新された転置インデックスレコードが、転置インデックスファイルに保存される場合、２つの転置インデックスレコード「ｍｙ−＞ｄｏｃ＿ｉｄ１，ｄｏｃ＿ｉｄ２，ｄｏｃ＿ｉｄ３，ｄｏｃ＿ｉｄ４」および「ｗｏｒｌｄ−＞ｄｏｃ＿ｉｄ１，ｄｏｃ＿ｉｄ２，ｄｏｃ＿ｉｄ３，ｄｏｃ＿ｉｄ４」の元々の格納位置は、転置インデックスファイル内で変更される必要がある。同時に、辞書ファイル内の対応するオフセット値が、修正される必要がある。 When the updated inverted index record is stored in the inverted index file, the original of the two inverted index records “my-> doc_id1, doc_id2, doc_id3, doc_id4” and “world-> doc_id1, doc_id2, doc_id3, doc_id4” The storage location needs to be changed in the inverted index file. At the same time, the corresponding offset value in the dictionary file needs to be modified.

したがって、上述の方法では、新たなメールが追加されると、転置インデックスファイルの他の関連データ内容がシフトされる必要がある。 Therefore, in the above-described method, when a new mail is added, other related data contents of the inverted index file need to be shifted.

上述のようにメールインデックスを用いる既存のメールボックス検索は、通例、転置インデックスファイル全体のキーワード検索を必要とする。メールデータの規模が大きくなるにつれ、メールボックスサーバは、何億もの加入者および何十億もの個々のメールメッセージを有しうる。かかる大量のデータを格納するには、大量のハードディスクＩＯリソースが必要であり、メールボックスを迅速にインデックス化することが困難ないしは不可能になる。さらに、大量のメールの格納コストは、メールサーバにとって非常に高い。結果として、大量の格納リソースが拘束されうる。 As described above, an existing mailbox search using a mail index typically requires a keyword search of the entire inverted index file. As the size of mail data grows, a mailbox server can have hundreds of millions of subscribers and billions of individual mail messages. In order to store such a large amount of data, a large amount of hard disk IO resources are required, and it becomes difficult or impossible to quickly index a mailbox. Furthermore, the storage cost of a large amount of mail is very high for the mail server. As a result, a large amount of storage resources can be constrained.

以下の詳細な説明と添付の図面において、本発明の様々な実施形態を開示する。 Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.

本願の実施形態の技術的提案または従来技術をより明確に説明するために、実施形態または従来技術の説明に用いる必要のある図面の簡単な説明を以下に示す。明らかに、以下の説明にある図面は、本願に記載の実施形態のほんの一部である。当業者であれば、創造的努力を費やすことなしに、これらの図面に基づいて他の図面を得ることができる。 In order to more clearly describe the technical proposals of the embodiments of the present application or the prior art, a brief description of the drawings that are necessary to describe the embodiments or the prior art is given below. Apparently, the drawings in the following description are only a few of the embodiments described herein. Those skilled in the art can obtain other drawings based on these drawings without spending creative efforts.

複数レベルのキャッシュを利用するメールインデックス化システムの一実施形態を示すブロック図。1 is a block diagram illustrating one embodiment of a mail indexing system that utilizes a multi-level cache. FIG.

メールインデックスを確立するための処理の一実施形態を示すフローチャート。6 is a flowchart illustrating an embodiment of a process for establishing a mail index.

レベル２転置インデックスレコードをレベル３キャッシュに転送するための処理の一実施形態を示すフローチャート。9 is a flowchart illustrating one embodiment of a process for transferring a level 2 inverted index record to a level 3 cache.

メールインデックスを確立するための処理の別の実施形態を示すフローチャート。6 is a flowchart illustrating another embodiment of a process for establishing a mail index.

メール検索処理の一実施形態を示すフローチャート。The flowchart which shows one Embodiment of mail search processing.

メールインデックスを確立するよう構成されたシステムの一実施形態を示すブロック図。1 is a block diagram illustrating one embodiment of a system configured to establish a mail index. FIG.

転送ユニットの一実施形態を示すブロック図。The block diagram which shows one Embodiment of a transfer unit.

メールインデックスを確立するよう構成されたシステムのブロック図。1 is a block diagram of a system configured to establish a mail index.

メールインデックスを確立するよう構成されたシステムの別の実施形態を示すブロック図。FIG. 3 is a block diagram illustrating another embodiment of a system configured to establish a mail index.

メール検索システムの一実施形態を示すブロック図。The block diagram which shows one Embodiment of a mail search system.

本発明は、処理、装置、システム、物質の組成、コンピュータ読み取り可能な格納媒体上に具現化されたコンピュータプログラム製品、および／または、プロセッサ（プロセッサに接続されたメモリに格納および／またはそのメモリによって提供される命令を実行するよう構成されたプロセッサ）を含め、様々な形態で実装されうる。本明細書では、これらの実装または本発明が取りうる任意の他の形態を、技術と呼ぶ。一般に、開示された処理の工程の順序は、本発明の範囲内で変更されてもよい。特に言及しない限り、タスクを実行するよう構成されるものとして記載されたプロセッサまたはメモリなどの構成要素は、ある時間にタスクを実行するよう一時的に構成された一般的な構成要素として、または、タスクを実行するよう製造された特定の構成要素として実装されてよい。本明細書では、「プロセッサ」という用語は、１または複数のデバイス、回路、および／または、コンピュータプログラム命令などのデータを処理するよう構成された処理コアを指すものとする。 The present invention is a process, apparatus, system, composition of matter, computer program product embodied on a computer readable storage medium, and / or processor (stored in and / or stored in a memory connected to a processor). A processor configured to execute the provided instructions) and can be implemented in various forms. In this specification, these implementations or any other form that the invention may take may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or memory that is described as being configured to perform a task is a general component that is temporarily configured to perform a task at a certain time, or It may be implemented as a particular component that is manufactured to perform a task. As used herein, the term “processor” is intended to refer to a processing core configured to process one or more devices, circuits, and / or data such as computer program instructions.

以下では、本発明の原理を示す図面を参照しつつ、本発明の１または複数の実施形態の詳細な説明を行う。本発明は、かかる実施形態に関連して説明されているが、どの実施形態にも限定されない。本発明の範囲は、特許請求の範囲によってのみ限定されるものであり、多くの代替物、変形物、および、等価物を含む。以下の説明では、本発明の完全な理解を提供するために、多くの具体的な詳細事項が記載されている。これらの詳細事項は、例示を目的としたものであり、本発明は、これらの具体的な詳細事項の一部または全てがなくとも特許請求の範囲に従って実施可能である。簡単のために、本発明に関連する技術分野で周知の技術事項については、本発明が必要以上にわかりにくくならないように、詳細には説明していない。 The following provides a detailed description of one or more embodiments of the invention with reference to the drawings illustrating the principles of the invention. Although the invention has been described in connection with such embodiments, it is not limited to any embodiment. The scope of the invention is limited only by the claims and includes many alternatives, modifications, and equivalents. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. These details are for the purpose of illustration, and the present invention may be practiced according to the claims without some or all of these specific details. For the purpose of simplicity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.

本願に記載の技術は、多くの汎用または専用コンピュータシステム環境または構成で利用できる。これらの例は、パーソナルコンピュータ、サーバ、ハンドヘルドデバイスまたは携帯型装置、タブレット型の装置、マルチプロセッサシステム、マイクロプロセッサベースのシステム、セットトップボックス、プログラム可能な家庭用電子機器、ネットワークＰＣ、ミニコンピュータ、メインフレームコンピュータ、上記のシステムまたは装置の内の任意のものを備える分散型コンピュータ環境などを含む。 The techniques described herein may be utilized in many general purpose or special purpose computer system environments or configurations. Examples of these are personal computers, servers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, Including a mainframe computer, a distributed computing environment with any of the above systems or devices.

本願は、コンピュータによって実行されるコンピュータ実行可能なコマンド（プログラムモジュールなど）の一般的なコンテキストで記述されてよい。一般に、プログラムモジュールは、特定のタスクの実行または特定の抽象データ型の実施のためのルーチン、プログラム、オブジェクト、コンポーネント、データ構造などを備える。本願は、分散型コンピュータ環境で実行されてもよい。かかる分散型コンピュータ環境では、通信ネットワークを介して接続されたリモート処理装置によって、タスクが実行される。分散型コンピュータ環境において、プログラムモジュールは、記憶装置を備えるローカルまたはリモートコンピュータの記憶媒体に格納されうる。 This application may be described in the general context of computer-executable commands (such as program modules) being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., for performing particular tasks or implementing particular abstract data types. The present application may be executed in a distributed computing environment. In such a distributed computer environment, a task is executed by a remote processing device connected via a communication network. In a distributed computing environment, program modules may be stored on a local or remote computer storage medium that includes a storage device.

図１Ａは、複数レベルのキャッシュを利用するメールインデックス化システムの一実施形態を示すブロック図である。この例において、システム１５０は、１または複数のメールボックスサーバ１５２と、階層的に構成されたキャッシュ１５４〜１５８とを備える。メールボックスサーバ１５２は、ｅメールメッセージをインデックス化して、階層的に構成されたキャッシュにインデックス情報を格納する。３つのキャッシュレベルが、例示の目的で図示されているが、異なる数のキャッシュが他の実施形態で用いられてもよい。レベル１キャッシュは、サーバ１５２のランダムアクセスメモリ（ＲＡＭ）など、低待ち時間メモリを用いて実装される。レベル２キャッシュおよびレベル３キャッシュは、ハードディスクまたはその他の記憶デバイスなど、レベル１キャッシュの実装に用いる低待ち時間メモリよりも大きい待ち時間を有する１または複数のコンポーネントを用いて実装される。いくつかの実施形態において、メールに対して確立された転置インデックスレコードは、まず、レベル１キャッシュに保存される。レベル１キャッシュに格納されたデータが第１の所定の閾値に達すると、レベル１キャッシュ内のレベル１転置インデックスレコードはすべて、将来の転置インデックスレコードのための余地を確保するためにレベル２キャッシュファイルに転送される。レベル２キャッシュは、キャッシュファイルを格納する。レベル２キャッシュ内のデータ量が第２の所定の閾値に達すると、レベル２データは、転置インデックスファイルを格納するレベル３キャッシュに転送される。メールインデックスを確立するためのこの処理は、ハードディスクの読み書きの頻度を削減するため、ハードディスクの入力／出力（Ｉ／Ｏ）性能を向上させることができる。 FIG. 1A is a block diagram illustrating one embodiment of a mail indexing system that utilizes a multi-level cache. In this example, the system 150 includes one or more mailbox servers 152 and hierarchical caches 154 to 158. Mailbox server 152 indexes email messages and stores the index information in a hierarchically structured cache. Although three cache levels are shown for illustrative purposes, a different number of caches may be used in other embodiments. Level 1 cache is implemented using low latency memory, such as random access memory (RAM) of server 152. Level 2 and level 3 caches are implemented using one or more components that have a latency greater than the low latency memory used to implement level 1 caches, such as hard disks or other storage devices. In some embodiments, the inverted index record established for the mail is first stored in a level 1 cache. When the data stored in the level 1 cache reaches a first predetermined threshold, all level 1 inverted index records in the level 1 cache are all level 2 cache files to make room for future inverted index records. Forwarded to Level 2 cache stores cache files. When the amount of data in the level 2 cache reaches a second predetermined threshold, the level 2 data is transferred to the level 3 cache that stores the inverted index file. This process for establishing a mail index can improve hard disk input / output (I / O) performance because it reduces the frequency of hard disk reads and writes.

図１Ｂは、メールインデックスを確立するための処理の一実施形態を示すフローチャートである。処理１００は、システム（１５０など）上で実行されてよい。 FIG. 1B is a flowchart illustrating one embodiment of a process for establishing a mail index. Process 100 may be performed on a system (such as 150).

工程１０１で、ｅメールメッセージに関連するキーワードを取得するために、ｅメールメッセージが処理される。 At step 101, the email message is processed to obtain keywords associated with the email message.

いくつかの実施形態では、ｅメールメッセージのキーワードを取得するために、インデックスが確立されるｅメールメッセージに対して、単語分割が実行される。いくつかの実施形態において、インデックスは、ｅメールのテキストのみに対して確立されるため、単語分割は、ｅメールのテキストのみに適用される。いくつかの実施形態では、メッセージの件名も、インデックスを確立するための基礎となる必要があるため、単語分割は、メールキーワードを取得するために、メールのテキストおよびメールの件名の両方に適用される。当業者であれば、システムの具体的な要件に従って単語分割ツールを選択することにより、この工程を実現できる。 In some embodiments, word segmentation is performed on an email message for which an index is established to obtain keywords for the email message. In some embodiments, an index is established only for email text, so word splitting is applied only to email text. In some embodiments, the message subject also needs to be the basis for establishing the index, so word segmentation is applied to both the email text and the email subject to obtain email keywords. The One skilled in the art can implement this process by selecting a word segmentation tool according to the specific requirements of the system.

ユーザが新たなメールを受信するごとに、本願の実施形態に開示したメールインデックス確立方法が、その新たなメールに実施されてよいことがわかる。 It can be seen that each time a user receives a new mail, the mail index establishment method disclosed in the embodiment of the present application may be performed on the new mail.

工程１０２で、取得されたキーワードは、レベル１キャッシュに格納されたレベル１転置インデックスレコードを更新するための基礎として用いられる。 At step 102, the acquired keyword is used as a basis for updating the level 1 transposed index record stored in the level 1 cache.

現在のメールが単語分割を受けた後、キーワードおよびメールメッセージの間の対応関係を確立する転置インデックスレコードが生成される。例えば、現在のメールの識別子が「ｄｏｃ５」であり、この現在のメールが２つのキーワード「ｋｅｙｗｏｒｄ１」および「ｋｅｙｗｏｒｄ５」を有すると仮定する。この例において、転置インデックスレコードは、以下に示すように生成される： After the current mail undergoes word splitting, a transposed index record is created that establishes the correspondence between the keyword and the mail message. For example, assume that the identifier of the current mail is “doc5”, and this current mail has two keywords “keyword1” and “keyword5”. In this example, the inverted index record is generated as shown below:

ｋｅｙｗｏｒｄ１：ｄｏｃ５ keyword 1: doc5

ｋｅｙｗｏｒｄ５：ｄｏｃ５ keyword5: doc5

いくつかの実施形態において、レベル１キャッシュは、初期転置インデックスレコードを保存するために低待ち時間メモリ内に構成される。すなわち、現在のメールに対して最初に生成された転置インデックスレコードは、最初には、メモリ内のレベル１キャッシュだけに格納される。転置インデックスレコードは、キー／値ペアを含む。レベル１キャッシュ内の転置インデックスレコードの「キー」とは、様々なキーワードのことを指しており、「値」は、メールメッセージの識別子である。本実施形態のレベル１バッファの初期転置インデックスレコードが以下の通りであると仮定する： In some embodiments, the level 1 cache is configured in low latency memory to store the initial inverted index record. That is, the inverted index record generated first for the current mail is initially stored only in the level 1 cache in memory. An inverted index record contains key / value pairs. The “key” of the inverted index record in the level 1 cache refers to various keywords, and the “value” is an identifier of the mail message. Assume that the initial inverted index record of the level 1 buffer of this embodiment is as follows:

ｋｅｙｗｏｒｄ１：ｄｏｃ１，ｄｏｃ２，ｄｏｃ３； keyword 1: doc1, doc2, doc3;

ｋｅｙｗｏｒｄ２：ｄｏｃ１，ｄｏｃ４ keyword2: doc1, doc4

したがって、現在のメールから新たに生成された転置インデックスレコードは、レベル１キャッシュ内の転置インデックスレコードを更新するために、既存の転置インデックスレコードにマージされ、更新された転置インデックスレコードは、以下の通りになる： Therefore, the newly generated inverted index record from the current mail is merged with the existing inverted index record to update the inverted index record in the level 1 cache, and the updated inverted index record is as follows: become:

ｋｅｙｗｏｒｄ１：ｄｏｃ１，ｄｏｃ２，ｄｏｃ３，ｄｏｃ５； keyword 1: doc1, doc2, doc3, doc5;

ｋｅｙｗｏｒｄ２：ｄｏｃ１，ｄｏｃ４； keyword 2: doc1, doc4;

ｋｅｙｗｏｒｄ５：ｄｏｃ５ keyword5: doc5

工程１０３で、レベル１キャッシュ内のレベル１転置インデックスレコードのサイズが第１の所定の閾値を超えているか否かが判定される。超えている場合、制御は工程１０４に進み、そうでない場合、処理は終了する。 At step 103, it is determined whether the size of the level 1 inverted index record in the level 1 cache exceeds a first predetermined threshold. If so, control proceeds to step 104, otherwise the process ends.

いくつかの実施形態では、少なくとも２つのキャッシュレベルが、Ｉ／Ｏ動作を実現するように構成される。レベル１キャッシュは低待ち時間メモリ内に構成され、データはメモリモード転置インデックスレコードとして記録される。レベル２キャッシュは、より高待ち時間のメモリまたは記憶コンポーネント（ハードディスクなど）を用いて構成される。新たなメールが転置インデックスレコードを取得するために処理されると、レコードは、レベル１キャッシュに書き込まれる。メモリ内のレベル１キャッシュが、第１の所定の閾値（例えば、２ＭＢ）に達すると、レベル１キャッシュ内の転置インデックスレコードはすべて、レベル２キャッシュ内のバッファファイルに書き込まれる。 In some embodiments, at least two cache levels are configured to implement I / O operations. Level 1 cache is organized in low latency memory and data is recorded as memory mode inverted index records. Level 2 caches are configured using higher latency memory or storage components (such as hard disks). As new mail is processed to obtain an inverted index record, the record is written to the level 1 cache. When the level 1 cache in memory reaches a first predetermined threshold (eg, 2 MB), all transposed index records in the level 1 cache are written to a buffer file in the level 2 cache.

レベル１キャッシュのサイズが第１の閾値に達していない場合、現在のメールメッセージの転置インデックスレコードは、レベル１キャッシュに書き込まれる。ユーザが次の検索を行った時に、現在のメールが検索条件に一致すると、現在のメールは、レベル１キャッシュ内で見つけられる。これで、現在のメールに対するインデックス確立処理は終了する。 If the size of the level 1 cache has not reached the first threshold, the inverted index record of the current mail message is written to the level 1 cache. If the current mail matches the search criteria when the user performs the next search, the current mail is found in the level 1 cache. This ends the index establishment process for the current mail.

工程１０４で、レベル１キャッシュ内のレベル１転置インデックスレコードすべてが、レベル２キャッシュに転送され、レベル２キャッシュファイルに格納される。いくつかの実施形態において、転送は、データを新たな位置にコピーした後に古い位置からデータを削除することとして実施される。転置インデックスレコードは、工程１０２で述べたように、既存の転置インデックスレコードにマージされる。 At step 104, all level 1 inverted index records in the level 1 cache are transferred to the level 2 cache and stored in the level 2 cache file. In some embodiments, the transfer is implemented as deleting data from the old location after copying the data to the new location. The inverted index record is merged with the existing inverted index record as described in step 102.

工程１０５で、現在のレベル２キャッシュファイルのサイズが第２の所定の閾値を超えるか否かが判定される。超えている場合、制御は工程１０６に進み、そうでない場合、処理は終了する。 At step 105, it is determined whether the current level 2 cache file size exceeds a second predetermined threshold. If so, control proceeds to step 106, otherwise the process ends.

本願の実施形態において、レベル２キャッシュファイルについての閾値サイズ（第２の所定の閾値）は、レベル１キャッシュ内のレベル１転置インデックスレコードが書き込まれた現在のレベル２キャッシュファイルが所定の閾値に達した時に、レベル２キャッシュファイル内のレベル２転置インデックスレコードすべてがレベル３転置インデックスファイルに書き込まれるように構成される。 In the embodiment of the present application, the threshold size (second predetermined threshold) for the level 2 cache file is such that the current level 2 cache file in which the level 1 inverted index record in the level 1 cache is written reaches the predetermined threshold. When configured, all level 2 inverted index records in the level 2 cache file are configured to be written to the level 3 inverted index file.

この工程で、レベル２キャッシュファイルのサイズが第１の所定の閾値に達していない場合、現在のメールの転置インデックスレコードは、レベル２キャッシュに書き込まれる。すなわち、現在のメールの転置インデックスレコードは、レベル２キャッシュファイルに保存されている。ユーザが次の検索を行った時、現在のメールが検索条件に一致すると、現在のメールは、レベル２キャッシュファイル内で見つけられる。これで、現在のメールに対するインデックス確立処理は終了する。 In this step, if the size of the level 2 cache file does not reach the first predetermined threshold, the current mail transposition index record is written to the level 2 cache. That is, the current mail transposition index record is stored in the level 2 cache file. When the user performs the next search, if the current mail matches the search criteria, the current mail is found in the level 2 cache file. This ends the index establishment process for the current mail.

レベル２キャッシュファイルは、以下の現象を避けるために用いられるバッファとして機能する：メモリ内のレベル１バッファが満杯になった後、レベル１バッファ内の転置インデックスレコードが様々なキーワードに属するレベル３転置インデックスファイルに直接書き込まれている時に、複数のレベル３転置インデックスファイルを同時に書き込むことが必要になる。例えば、メモリ内のレベル１キャッシュに１０個の転置インデックスレコードが存在し、これら１０個の転置インデックスレコード内の１０個のキーワードが５個のレベル３転置インデックスファイルに別個に属すると仮定する。この時点で、レベル１キャッシュ内の転置インデックスレコードがレベル３転置インデックスファイルに直接書き込まれる場合、書き込み動作は、５個のレベル３転置インデックスファイルで同時に実行される必要がある。レベル２キャッシュファイルがバッファファイルとして機能すると、これら１０個の転置インデックスレコードは、１つのレベル２キャッシュファイルに書き込まれるだけでよい。他の転置インデックスレコード（その一部は、同じ５個のレベル３転置インデックスファイルに最終的に書き込まれうる）の追加で、レベル２キャッシュファイルが特定の閾値（例えば、４ＭＢ）まで大きくなると、レベル２キャッシュファイル内のレベル２転置インデックスに対してワンタイム分類（ｏｎｅ−ｔｉｍｅｃｌａｓｓｉｆｉｃａｔｉｏｎ）が実行される。バッファファイル内の多くの他の転置インデックスレコードと共に１０個の転置インデックスレコードが、キーワードの属するレベル３転置インデックスファイルに書き込まれる。これは、ハードディスクへのファイル書き込みの頻度を大幅に削減する。 A level 2 cache file serves as a buffer that is used to avoid the following phenomenon: after the level 1 buffer in memory is full, the transposed index record in the level 1 buffer belongs to various keywords. When writing directly to the index file, it is necessary to write multiple level 3 transposed index files simultaneously. For example, assume that there are 10 inverted index records in the level 1 cache in memory, and that 10 keywords in these 10 inverted index records belong separately to 5 level 3 inverted index files. At this point, if the inverted index record in the level 1 cache is written directly to the level 3 inverted index file, the write operation must be performed simultaneously on the five level 3 inverted index files. When the level 2 cache file functions as a buffer file, these 10 inverted index records need only be written to one level 2 cache file. With the addition of other inverted index records (some of which can eventually be written to the same five level 3 inverted index files), the level 2 cache file grows to a certain threshold (eg 4 MB) One-time classification is performed on the level 2 transposed index in the two cache files. Ten inverted index records, along with many other inverted index records in the buffer file, are written to the level 3 inverted index file to which the keyword belongs. This greatly reduces the frequency of writing files to the hard disk.

工程１０６で、レベル２キャッシュファイル内のレベル２転置インデックスレコードすべてが、パスファイルに従ってレベル３転置インデックスファイルに転送される。このパスファイルは、キーワードおよびレベル３転置インデックスファイルの間の対応関係を確立するマッピング関係を保存する。いくつかの実施形態において、パスファイルは、キーワードおよびレベル３転置インデックスファイルのマッピング関係（すなわち、どのキーワードが、どのレベル３転置インデックスファイルに保存されるのか）を記録するために生成されうる。パスファイルは、キーワードからレベル３転置インデックスファイルへのマッピング関係を記録する。パスファイル内のレコードの例は、以下の通りである： At step 106, all level 2 inverted index records in the level 2 cache file are transferred to the level 3 inverted index file according to the path file. This path file stores the mapping relationship that establishes the correspondence between the keyword and the level 3 inverted index file. In some embodiments, a path file may be generated to record the mapping relationship between keywords and level 3 inverted index files (ie, which keywords are stored in which level 3 inverted index files). The path file records the mapping relationship from the keyword to the level 3 inverted index file. An example of a record in a path file is as follows:

ｋｅｙｗｏｒｄ１〜ｋｅｙｗｏｒｄ３：／ｘｘ／ｌｅｖｅ＿３＿ｉｎｖｅｒｔｅｄ＿ｉｎｄｅｘ＿ｆｉｌｅ＿ｎａｍｅ＿１ keyword1 to keyword3: / xx / level_3_inverted_index_file_name_1

ｋｅｙｗｏｒｄ２〜．ｋｅｙｗｏｒｄｘ：／ｘｘ／ｌｅｖｅ＿３＿ｉｎｖｅｒｔｅｄ＿ｉｎｄｅｘ＿ｆｉｌｅ＿ｎａｍｅ＿２ keyword2-. keywordx: / xx / level_3_inverted_index_file_name_2

・・・ ...

ｋｅｙｗｏｒｄｎ〜ｋｅｙｗｏｒｄｍ：／ｘｘ／ｌｅｖｅ＿３＿ｉｎｖｅｒｔｅｄ＿ｉｎｄｅｘ＿ｆｉｌｅ＿ｎａｍｅ＿ｎ keywordn to keyword: / xx / level_3_inverted_index_file_name_n

キーワードは、異なるデポジットファイルにアルファベット順に格納されてよい。一例では、ｋｅｙｗｏｒｄ１が「ａｂａｃｕｓ」、ｋｅｙｗｏｒｄ３が「ａｚｕｒｅ」、ｋｅｙｗｏｒｄ２が「ｂａｃｋ」、ｋｅｙｗｏｒｄｘが「ｂｕｔｔｅｒ」、などである。その他の構成も可能である。パスファイルが構成された後、パスファイルに記録されたマッピング関係から、キーワードが位置するレベル３転置インデックスファイルを見つけることが、後のメールボックス検索において可能になる。次いで、システムは、レベル３転置インデックスファイルをフェッチした後、どのメールがキーワードを含むのかを判定できる。したがって、パスファイルは、キーワードからレベル３転置インデックスファイルへのパスを定義する。 The keywords may be stored in alphabetical order in different deposit files. In one example, keyword 1 is “abacus”, keyword 3 is “azure”, keyword 2 is “back”, keyword x is “butter”, and so on. Other configurations are possible. After the path file is constructed, it is possible to find the level 3 inverted index file where the keyword is located from the mapping relationship recorded in the path file in the subsequent mailbox search. The system can then determine which mail contains the keyword after fetching the level 3 inverted index file. Thus, the path file defines the path from the keyword to the level 3 inverted index file.

図２は、レベル２転置インデックスレコードをレベル３キャッシュに転送するための処理の一実施形態を示すフローチャートである。いくつかの実施形態において、処理２００は、処理１００の工程１０６を実施する。 FIG. 2 is a flowchart illustrating one embodiment of a process for transferring a level 2 inverted index record to a level 3 cache. In some embodiments, process 200 performs step 106 of process 100.

工程２０１で、レベル２キャッシュファイル内のレベル２転置インデックスレコードすべてが、メモリに転送される（例えば、ディスク格納位置からメモリに読み出される）。 At step 201, all level 2 transposed index records in the level 2 cache file are transferred to memory (eg, read from the disk storage location to memory).

工程２０２で、パスファイルに少なくとも部分的に基づいて、レベル２転置インデックスレコード内のキーワードに対応するレベル３転置インデックスファイルが決定される。上述のように、パスファイルは、キーワードおよびそれぞれのレベル３転置インデックスファイルに関連するパスを含む。パスは、レベル２転置インデックスレコード内のキーワードに対応するレベル３転置インデックスファイルの決定に用いられる。 At step 202, a level 3 inverted index file corresponding to a keyword in the level 2 inverted index record is determined based at least in part on the path file. As described above, the path file includes keywords and paths associated with each level 3 transposed index file. The path is used to determine the level 3 inverted index file corresponding to the keyword in the level 2 inverted index record.

工程２０３で、レベル２転置インデックスレコードは、キーワードに従って、決定されたレベル３転置インデックスファイルに書き込まれる。 At step 203, the level 2 transposed index record is written to the determined level 3 transposed index file according to the keyword.

いくつかの実施形態において、書き込み速度は、レベル３転置インデックスファイルに書き込み動作を実行するためにアペンドモードを用いることによって上げられる。したがって、いくつかの実施形態において、レベル２転置インデックスレコードがレベル３転置インデックスファイルに書き込まれる時、レベル２キャッシュファイル内の転置インデックスレコードはすべて、キーワードに従って、決定されたレベル３転置インデックスファイルにアペンドモードでフェッチされる。アペンドモードは、ファイルの編集に用いられる標準的なモードである。アペンドモードでは、新たなデータがファイルの最後に直接追加される。 In some embodiments, the write speed is increased by using the append mode to perform a write operation on the level 3 transposed index file. Thus, in some embodiments, when a level 2 inverted index record is written to a level 3 inverted index file, all the inverted index records in the level 2 cache file are appended to the determined level 3 inverted index file according to the keyword. Fetched in mode. The append mode is a standard mode used for file editing. In append mode, new data is added directly to the end of the file.

閾値を超えた時、システムはパスファイルをフェッチする。この時点で、システムは、バッファファイルの転置レコードをスキャンし、各レコードのキーワード（例えば、ｋｅｙｗｏｒｄ１）をフェッチする。次いで、パスファイル情報を用いて、ｋｅｙｗｏｒｄ１レコードがどの転置ファイルに配置されるべきかを学習し、その後、そのファイルにレコードを追加する。 When the threshold is exceeded, the system fetches the path file. At this point, the system scans the transposed records in the buffer file and fetches the keyword (eg keyword 1) of each record. Next, the path file information is used to learn in which transposed file the keyword1 record should be placed, and then the record is added to the file.

本願の実施形態は、レベル１キャッシュ、レベル２キャッシュファイル、および、レベル３転置インデックスファイルを含むアプローチを用いる。メールメッセージに対して確立された転置インデックスレコードは、まず、レベル１キャッシュに保存される；レベル１キャッシュが第１の所定の閾値に達すると、レベル１キャッシュ内のレベル１転置インデックスレコードはすべて、レベル２キャッシュファイルに転送される；レベル２キャッシュファイルが第２の所定の閾値に達すると、それらのファイル内のレベル２転置インデックスレコードは、レベル３転置インデックスファイルに転送される。このアプローチは、システムが、メールのための多数の転置インデックスレコードを有することによるハードディスクへの過剰な回数の書き込み動作を避けることを可能にする。その結果、メールインデックスを確立する処理における転置インデックスレコード書き込み速度が上昇する。これは、メールインデックス確立の速度を上昇させるだけでなく、ディスクへの過剰な回数の書き込み動作からの影響を削減し、ディスクＩＯ性能を向上させる。 Embodiments herein use an approach that includes a level 1 cache, a level 2 cache file, and a level 3 transposed index file. The inverted index record established for the mail message is first stored in the level 1 cache; when the level 1 cache reaches the first predetermined threshold, all level 1 inverted index records in the level 1 cache are: Transferred to level 2 cache files; when the level 2 cache files reach a second predetermined threshold, the level 2 transposed index records in those files are transferred to the level 3 transposed index file. This approach allows the system to avoid an excessive number of write operations to the hard disk due to having a large number of inverted index records for mail. As a result, the inverted index record writing speed in the process of establishing the mail index increases. This not only increases the speed of establishing the mail index, but also reduces the impact from an excessive number of write operations to the disk and improves disk IO performance.

さらに、本願の実施形態におけるメールインデックスの一部は、低待ち時間メモリを用いて実装されるレベル１キャッシュ内に保存される。結果として、ハードディスクは、メールインデックス全体を格納しない。したがって、新たなメールが頻繁に処理された場合に、メモリ内のバッファは、そのメールがディスク書き込みへ過剰な影響を与えることを防ぐ。さらに、最も新しいデータに対する速度を高めることにより、リアルタイム検索の目的を達成できる。 In addition, a portion of the mail index in the present embodiment is stored in a level 1 cache implemented using low latency memory. As a result, the hard disk does not store the entire mail index. Thus, if new mail is processed frequently, the buffer in memory prevents the mail from having an excessive impact on disk writes. In addition, real-time search objectives can be achieved by increasing the speed for the newest data.

図３は、メールインデックスを確立するための処理の別の実施形態を示すフローチャートである。いくつかの実施形態において、処理３００は、処理１００が完了した後に実行される。 FIG. 3 is a flowchart illustrating another embodiment of a process for establishing a mail index. In some embodiments, process 300 is performed after process 100 is complete.

この例では、さらに、第３の所定の閾値がレベル３転置インデックスファイルのために確立され；レベル３転置インデックスファイルのサイズが第３の所定の閾値に達すると、そのファイルは、複数（例えば、２つ）の転置インデックスサブファイルに分割され、これらの転置インデックスサブファイルは、第３の所定の閾値以下の大きさでなければならない。このように、各レベル３転置インデックスファイルが、過度に大きくならないことを保証し、したがって、メールインデックスに対するアクセス速度を保証することができる。 In this example, a third predetermined threshold is also established for the level 3 inverted index file; when the size of the level 3 inverted index file reaches the third predetermined threshold file, the file may be multiple (eg, Divided into two) inverted index subfiles, and these inverted index subfiles must be smaller than a third predetermined threshold. In this way, each level 3 transposed index file can be guaranteed not to become excessively large, thus ensuring access speed to the mail index.

工程３０１で、レベル３転置インデックスファイルのサイズが第３の所定の閾値を超えるか否かが判定される。超える場合、制御は工程３０２に進む。レベル３転置インデックスファイルのための閾値は、レベル２キャッシュファイルのための第２の所定の閾値と同じであってもよいし、異なる値であってもよい。 In step 301, it is determined whether the size of the level 3 inverted index file exceeds a third predetermined threshold. If so, control proceeds to step 302. The threshold for the level 3 inverted index file may be the same as or different from the second predetermined threshold for the level 2 cache file.

この例において、レベル１キャッシュ、レベル２キャッシュファイル、および、レベル３転置インデックスファイルのための閾値の大きさの構成は、特定の期間内のディスクへの読み書き動作の回数、および、ユーザがメール検索を行った時にユーザ検索結果を返すためにどれだけの時間が割り当てられるのか、などの要素を考慮する。ファイルが小さいほど、読み書き速度は速くなる。ただし、キャッシュサイズが非常に小さいと、ファイルの数が過剰になり、読み書き速度が遅くなる。したがって、閾値の大きさは、実際の条件に合わせて経験的に調整される。 In this example, the threshold size configuration for the level 1 cache, level 2 cache file, and level 3 inverted index file is determined by the number of disk read / write operations within a specific time period and the user searching for mail. Considering factors such as how much time is allocated to return user search results when The smaller the file, the faster the read / write speed. However, if the cache size is very small, the number of files becomes excessive and the read / write speed becomes slow. Therefore, the magnitude of the threshold is adjusted empirically according to actual conditions.

現在の工程において、レベル３転置インデックスファイルのサイズが第３の所定の閾値（例えば、４ＭＢ）に達しない場合、後の工程は実行されない。 In the current process, if the size of the level 3 inverted index file does not reach the third predetermined threshold (eg, 4 MB), the subsequent process is not executed.

工程３０２で、レベル３転置インデックスファイルは、複数（例えば、２つ）の転置インデックスサブファイルに分割される。 At step 302, the level 3 inverted index file is divided into multiple (eg, two) inverted index subfiles.

レベル３転置インデックスファイルが大きすぎる場合、レベル３転置インデックスファイルは、キーワードに基づいて分割できる。ファイルは、キーワードの粒度に従って分割される。すなわち、１つのキーワードに対応する転置インデックスレコードが、１つだけの転置インデックスサブファイルになる。したがって、その後に、１つのキーワードに関連するメールに関してクエリが行われると、１つのレベル３転置インデックスファイルだけがフェッチされる。さらに、このレベル３転置インデックスファイルは、閾値サイズ（例えば、４ＭＢ）を超えないので、メールインデックスが迅速にフェッチされることを保証する。一例は以下の通りである： If the level 3 inverted index file is too large, the level 3 inverted index file can be split based on keywords. The file is divided according to the keyword granularity. That is, the inverted index record corresponding to one keyword becomes only one inverted index subfile. Thus, when a query is subsequently made regarding mail associated with one keyword, only one level 3 inverted index file is fetched. Furthermore, this level 3 inverted index file does not exceed a threshold size (eg, 4 MB), thus ensuring that the mail index is fetched quickly. An example is as follows:

ｋｅｙｗｏｒｄ１：ｄｏｃ１，ｄｏｃ２，ｄｏｃ３．．． keyword 1: doc1, doc2, doc3. . .

ｋｅｙｗｏｒｄ２：ｄｏｃ１，ｄｏｃ３，ｄｏｃ４．．． keyword 2: doc1, doc3, doc4. . .

ｋｅｙｗｏｒｄ３：ｄｏｃ１，ｄｏｃ６．．． keyword3: doc1, doc6. . .

・・・ ...

ｋｅｙｗｏｒｄｎ：ｄｏｃ１，ｄｏｃｋ．．． keywordn: doc1, dock. . .

このファイルが大きすぎる場合、レコードエントリの境界で複数（例えば、２つ）の転置インデックスサブファイルに分割される。 If this file is too large, it is divided into multiple (for example, two) inverted index subfiles at the record entry boundaries.

いくつかの実施形態において、レベル３転置インデックスファイルが２つの転置インデックスサブファイルに分割される時、２つの要素を考慮する必要がある。一つ目は、１つのキーワードに対応する転置インデックスレコードが１つの転置インデックスサブファイル内に確実に存在することを保証するためのキーワード粒度である。二つ目は、２つの転置インデックスサブファイルが互いにできるだけ近いサイズであることが好ましいという事実を考慮する必要性である。このように、その後のサブファイルの分割の頻度が最小化される。 In some embodiments, when a level 3 inverted index file is split into two inverted index subfiles, two factors need to be considered. The first is a keyword granularity for ensuring that an inverted index record corresponding to one keyword exists in one inverted index subfile. The second is the need to take into account the fact that the two inverted index subfiles are preferably as close as possible to each other. In this way, the frequency of subsequent subfile division is minimized.

工程３０３で、パスファイルは、分割済みの転置インデックスサブファイルに従って更新される。 At step 303, the path file is updated according to the split inverted index subfile.

第３の所定の閾値を超えるレベル３転置インデックスファイルは、自身の識別子をそれぞれ持つ２つの転置インデックスサブファイルに分割されるので、元々のレベル３転置インデックスファイル内の様々なキーワードに対応する転置インデックスレコードは変化する。したがって、パスファイルは、分割済みの転置インデックスサブファイルに従って更新される必要がある。パスファイルが更新された後、元々のレベル３転置インデックスファイルは削除されてよい。 Since the level 3 inverted index file exceeding the third predetermined threshold is divided into two inverted index subfiles each having its own identifier, the inverted index corresponding to various keywords in the original level 3 inverted index file. Records change. Therefore, the path file needs to be updated according to the divided inverted index subfile. After the path file is updated, the original level 3 inverted index file may be deleted.

図４は、メールインデックスを確立するための処理の別の実施形態を示すフローチャートである。いくつかの実施形態において、処理４００は、処理１００が完了した後に実行される。この例において、レベル３転置インデックスファイルは、２つの部分すなわち初期静的圧縮ファイル（例えば、ｚｉｐファイル）および増分ファイルを有するよう構成されており、初期静的圧縮ファイルは、圧縮された転置インデックスレコードを保存し、増分ファイルは、圧縮されていない転置インデックスレコードを保存する。本実施形態において、転置インデックスレコードが新たなメールのために生成されると、これらの新たに生成された転置インデックスレコードは、増分ファイルに書き込まれてよい。結果として、メールインデックスをより迅速に確立し、過度の頻繁なディスク書き込みによる影響を低減できる。ディスクのＩＯ性能を向上させ、レベル３転置インデックスファイルのための格納リソースをさらに節約することができる。 FIG. 4 is a flowchart illustrating another embodiment of a process for establishing a mail index. In some embodiments, process 400 is performed after process 100 is complete. In this example, the level 3 inverted index file is configured to have two parts: an initial static compressed file (eg, a zip file) and an incremental file, where the initial static compressed file is a compressed inverted index record. Incremental files store uncompressed inverted index records. In this embodiment, when inverted index records are generated for new mail, these newly generated inverted index records may be written to an incremental file. As a result, mail indexes can be established more quickly and the impact of excessive frequent disk writes can be reduced. Disk IO performance can be improved and storage resources for level 3 inverted index files can be further saved.

上述のように、処理１００の最後に、レベル２転置インデックスレコードは、キーワードおよび対応するレベル３転置インデックスファイルのマッピング関係情報を指定するパスファイルに従ってレベル３転置インデックスファイルの中から決定された増分ファイルに転送される。処理１００に続いて、工程４０９で、増分ファイルが増分閾値を超えるか否かが判定される。超える場合、制御は工程４１０に進む。 As described above, at the end of the process 100, the level 2 inverted index record is an incremental file determined from among the level 3 inverted index file according to the path file specifying the mapping relationship information of the keyword and the corresponding level 3 inverted index file. Forwarded to Following the process 100, at step 409, it is determined whether the incremental file exceeds the incremental threshold. If so, control proceeds to step 410.

本実施形態では、閾値が、増分ファイルに対して構成される。例えば、増分ファイルは、４ＭＢの閾値以下であることが好ましい。増分ファイルが所定の閾値に達すると、増分ファイルに保存された内容は、レベル３転置インデックスファイルのサイズと、そのファイルが占有するハードディスクスペースの量とを削減するために圧縮される。この工程において、増分ファイルが増分閾値を超えない場合、後の工程は実行されない。 In this embodiment, a threshold is configured for the incremental file. For example, the incremental file is preferably below the 4MB threshold. When the incremental file reaches a predetermined threshold, the content stored in the incremental file is compressed to reduce the size of the level 3 inverted index file and the amount of hard disk space it occupies. In this step, if the incremental file does not exceed the incremental threshold, the subsequent steps are not performed.

工程４１０で、初期静的圧縮ファイルは、解凍済みの初期静的ファイルを取得するために解凍される。いくつかの実施形態において、増分ファイルのサイズが増分閾値を超えた場合、まず、初期静的圧縮ファイルの内容がフェッチされ、圧縮されていない初期静的ファイルを取得するために解凍される。 At step 410, the initial static compressed file is decompressed to obtain a decompressed initial static file. In some embodiments, when the size of the incremental file exceeds the incremental threshold, the contents of the initial static compressed file are first fetched and decompressed to obtain an uncompressed initial static file.

工程４１１で、解凍済みの初期静的ファイルおよび増分ファイルは、マージファイルを取得するためにマージされる。 At step 411, the decompressed initial static file and incremental file are merged to obtain a merge file.

工程４１２で、マージファイルは、現行の静的圧縮ファイル（例えば、ｚｉｐファイル）を生成するために、任意の適切な技術を用いて圧縮される。いくつかの実施形態において、マージファイルが現行の静的圧縮ファイルを生成するために圧縮されると、増分ファイルは、その元々の内容から解放される。このことすべての目的は、圧縮技術を用いてレベル３転置インデックスファイルに対するストレージコストを下げることであるが、同時に、頻繁なデータ書き込みに伴って起きる圧縮および解凍から生じる計算およびディスク書き込みのコストを削減することである。 At step 412, the merge file is compressed using any suitable technique to produce a current static compressed file (eg, a zip file). In some embodiments, the incremental file is released from its original contents when the merge file is compressed to produce the current static compressed file. The purpose of all this is to use compression techniques to lower the storage cost for level 3 inverted index files, but at the same time reduce the computational and disk writing costs that result from compression and decompression that accompany frequent data writes. It is to be.

図５は、メール検索処理の一実施形態を示すフローチャートである。処理５００は、処理１００などの処理を用いてメールインデックスが確立された後に実行される。 FIG. 5 is a flowchart illustrating an embodiment of the mail search process. The process 500 is executed after the mail index is established using the process 100 or the like.

ユーザがメールを検索する時、最初の工程は、検索されるキーワードをユーザから受信することである。工程５０１で、ユーザによって送信された検索キーワードが取得される。 When a user searches for mail, the first step is to receive the searched keyword from the user. In step 501, the search keyword sent by the user is obtained.

工程５０２で、パスファイルに従って、キーワードに対応するレベル３転置インデックスファイルが決定される。上述のように、このパスファイルは、キーワードおよびレベル３転置インデックスファイルの間の対応関係を確立するマッピング関係を保存する。 At step 502, a level 3 transposed index file corresponding to the keyword is determined according to the path file. As described above, this path file stores the mapping relationship that establishes the correspondence between the keyword and the level 3 inverted index file.

工程５０３で、キーワードを含む第１のメールセットが、レベル３転置インデックスファイルに基づいて決定される。 At step 503, a first mail set that includes the keywords is determined based on the level 3 inverted index file.

レベル３転置インデックスファイルは、パスファイルに従って決定され、検索されるキーワードを含む。いくつかの実施形態では、次に、レベル３転置インデックスファイルは、検索を容易にするために、メモリ内のバイナリツリーデータ構造にロードされる。バイナリツリーデータ構造は、検索されるキーワードを含むメールメッセージを見つけるために検索される。このセットのメールメッセージは、第１のメールセットとして見なされる。 The level 3 inverted index file includes keywords that are determined and searched according to the path file. In some embodiments, the level 3 inverted index file is then loaded into a binary tree data structure in memory to facilitate searching. The binary tree data structure is searched to find mail messages that contain the searched keyword. This set of mail messages is considered the first mail set.

工程５０４で、キーワードを含む第２のメールセットが、レベル１キャッシュに基づいて決定される。 At step 504, a second mail set that includes the keyword is determined based on the level 1 cache.

一部の例では、レベル１キャッシュは、キーワードを含むいくつかのメールを有しうる。キャッシュ内のこれらのメールが第１の所定の閾値にまだ達していなければ、それらは、レベル２キャッシュファイルにフェッチされていない。したがって、レベル１キャッシュで検索して、これらのメールを見つける必要があり、それらのメールは第２のメールセットを形成する。 In some examples, a level 1 cache may have several emails that contain keywords. If these mails in the cache have not yet reached the first predetermined threshold, they have not been fetched into the level 2 cache file. Therefore, it is necessary to search in the level 1 cache to find these mails, which form a second mail set.

工程５０５で、キーワードを含む第３のメールセットが、レベル２キャッシュファイルに基づいて決定される。 At step 505, a third mail set that includes the keyword is determined based on the level 2 cache file.

一部の例では、レベル２キャッシュファイルは、キーワードを含むいくつかのメールを有しうる。レベル２キャッシュファイル内のこのメールが第２の所定の閾値にまだ達していなければ、そのメールは、レベル３転置インデックスファイルにフェッチされていない。したがって、レベル２キャッシュファイルで検索して、これらのメールを見つける必要があり、それらのメールは第３のメールセットを形成する。 In some examples, a level 2 cache file may have several emails that contain keywords. If this mail in the level 2 cache file has not yet reached the second predetermined threshold, the mail has not been fetched into the level 3 inverted index file. Therefore, it is necessary to search the level 2 cache file to find these mails, which form a third mail set.

工程５０６で、第１のメールセット、第２のメールセット、および、第３のメールセットは、検索結果を得るためにマージされる。 At step 506, the first mail set, the second mail set, and the third mail set are merged to obtain search results.

最後に、別個に見つけられた後にマージされた、第１のメールセット、第２のメールセット、および、第３のメールセットの和が、ユーザの現在のキーワードに対する検索結果となる。工程は他の順序で配列されてもよく、例えば、代替例として、第１、第２、および、第３のメールセットが、それぞれ、レベル１キャッシュ、レベル２キャッシュ、および、レベル３キャッシュからのメールメッセージに対応するものとしてもよいことに注意されたい。 Finally, the sum of the first mail set, second mail set, and third mail set merged after being found separately is the search result for the user's current keyword. The steps may be arranged in other orders, for example, as an alternative, the first, second, and third mail sets are from Level 1 cache, Level 2 cache, and Level 3 cache, respectively. Note that it may correspond to an email message.

本実施形態におけるメール検索中、メールは、レベル１キャッシュ、レベル２キャッシュファイル、および、レベル３転置インデックスファイル内でユーザの検索キーワードに従って検索される。さらに、レベル１キャッシュ、レベル２キャッシュファイル、および、レベル３転置インデックスファイルのサイズは、それぞれの所定の閾値を超えない。したがって、これらの３つに対して、別個の検索を実行することで、ハードディスクの読み書き頻度を削減し、メール検索速度を上昇させ、ハードディスクＩＯ性能も向上させることができる。 During the mail search in this embodiment, the mail is searched according to the user search keyword in the level 1 cache, the level 2 cache file, and the level 3 inverted index file. Furthermore, the sizes of the level 1 cache, the level 2 cache file, and the level 3 inverted index file do not exceed the respective predetermined thresholds. Therefore, by executing separate searches for these three, it is possible to reduce the read / write frequency of the hard disk, increase the mail search speed, and improve the hard disk IO performance.

図６は、メールインデックスを確立するよう構成されたシステムの一実施形態を示すブロック図である。システムは、処理１００などの処理を実行するよう構成されており、以下を備える： FIG. 6 is a block diagram illustrating one embodiment of a system configured to establish a mail index. The system is configured to perform a process, such as process 100, and includes:

メールに単語分割を実行して、そのメールのキーワードを取得するよう構成された単語分割ユニット６０１。 A word segmentation unit 601 configured to perform word segmentation on an email and obtain keywords for the email.

レベル１キャッシュに保存されたレベル１転置インデックスレコードを更新するための基礎として、現在のメールのキーワードを用いるよう構成されたレベル１キャッシュ更新ユニット６０２。 A level 1 cache update unit 602 configured to use the current mail keyword as the basis for updating the level 1 inverted index record stored in the level 1 cache.

レベル１キャッシュ内のレベル１転置インデックスレコードのサイズが第１の所定の閾値を超えているか否かを判定するよう構成された第１の判定ユニット６０３。 A first determination unit 603 configured to determine whether the size of a level 1 inverted index record in the level 1 cache exceeds a first predetermined threshold.

第１の判定ユニットの結果が肯定であった場合に、レベル１キャッシュ内のレベル１転置インデックスレコードすべてをレベル２キャッシュファイルに転送するよう構成された第１の転送ユニット６０４。 A first transfer unit 604 configured to transfer all level 1 transposed index records in the level 1 cache to a level 2 cache file if the result of the first determination unit is positive.

現在のレベル２キャッシュファイルのサイズが第２の所定の閾値を超えているか否かを判定するよう構成された第２の判定ユニット６０５。 A second determination unit 605 configured to determine whether the current level 2 cache file size exceeds a second predetermined threshold.

第２の判定ユニットの結果が肯定であった場合に、キーワードおよびレベル３転置インデックスファイルのマッピング関係情報を格納するパスファイルに従って、レベル２キャッシュファイル内のレベル２転置インデックスレコードすべてをレベル３転置インデックスファイルに転送するよう構成された第２の転送ユニット６０６。 If the result of the second determination unit is affirmative, all the level 2 inverted index records in the level 2 cache file are stored in the level 3 inverted index according to the keyword and the path file storing the mapping relation information of the level 3 inverted index file. A second transfer unit 606 configured to transfer to a file.

図７は、転送ユニットの一実施形態を示すブロック図である。この例において、システム７００は、図６の第２の転送ユニット６０６を実装するために用いられており、以下を備える： FIG. 7 is a block diagram illustrating an embodiment of a transfer unit. In this example, the system 700 is used to implement the second transfer unit 606 of FIG. 6 and comprises:

レベル２キャッシュファイル内のレベル２転置インデックスレコードすべてをメモリにフェッチ（すなわち、転送）するよう構成された第１のフェッチモジュール７０１。 A first fetch module 701 configured to fetch (ie, transfer) all level 2 inverted index records in a level 2 cache file into memory.

レベル２転置インデックスレコード内のキーワードに対応するレベル３転置インデックスファイルを決定する基礎としてパスファイルを用いるよう構成された決定モジュール７０２。 A determination module 702 configured to use a path file as a basis for determining a level 3 inverted index file corresponding to a keyword in a level 2 inverted index record.

キーワードに従って、決定されたレベル３転置インデックスファイルにレベル２転置インデックスレコードをフェッチ（転送）するよう構成された第２のフェッチモジュール７０３。いくつかの実施形態において、第２のフェッチモジュール７０３は、特に、キーワードに従って、決定されたレベル３転置インデックスファイルにレベル２キャッシュファイル内の転置インデックスレコードすべてをアペンドモードでフェッチするよう構成される。 A second fetch module 703 configured to fetch (transfer) level 2 transposed index records to the determined level 3 transposed index file according to the keywords. In some embodiments, the second fetch module 703 is specifically configured to fetch all of the inverted index records in the level 2 cache file to the determined level 3 inverted index file in append mode according to the keyword.

図８は、メールインデックスを確立するよう構成されたシステムのブロック図である。この例において、システム８００は、図６の６００と同様のインデックス確立ユニットに加えて、以下を備える： FIG. 8 is a block diagram of a system configured to establish a mail index. In this example, system 800 comprises the following in addition to an index establishment unit similar to 600 in FIG.

レベル３転置インデックスファイルのサイズが第３の所定の閾値を超えているか否かを判定するよう構成された第３の判定ユニット８０１。 A third determination unit 801 configured to determine whether the size of the level 3 inverted index file exceeds a third predetermined threshold.

第３の判定ユニットの結果が肯定であった場合に、レベル３転置インデックスファイルを２つの転置インデックスサブファイルに分割するよう構成された分割ユニット８０２。 A split unit 802 configured to split a level 3 inverted index file into two inverted index subfiles if the result of the third determination unit is positive.

２つの分割された転置インデックスサブファイルに従ってパスファイルを更新するよう構成されたパスファイル更新ユニット８０３。 A path file update unit 803 configured to update the path file according to the two split inverted index subfiles.

図９は、メールインデックスを確立するよう構成されたシステムの別の実施形態を示すブロック図である。システム９００は、以下を備える： FIG. 9 is a block diagram illustrating another embodiment of a system configured to establish a mail index. System 900 comprises the following:

メールメッセージに単語分割を実行して、そのメールメッセージのキーワードを取得するよう構成された単語分割ユニット６０１。 A word splitting unit 601 configured to perform word splitting on a mail message and obtain keywords for the mail message.

レベル１キャッシュに保存されたレベル１転置インデックスレコードを更新するための基礎として、メールメッセージのキーワードを用いるよう構成されたレベル１キャッシュ更新ユニット６０２。 A level 1 cache update unit 602 configured to use mail message keywords as a basis for updating level 1 inverted index records stored in the level 1 cache.

レベル２キャッシュファイル内のレベル２転置インデックスレコードすべてをメモリにフェッチするよう構成された第１のフェッチモジュール７０１。 A first fetch module 701 configured to fetch all level 2 inverted index records in a level 2 cache file into memory.

レベル２転置インデックスレコード内のキーワードに対応するレベル３転置インデックスファイルを決定するためにパスファイルを用いる決定モジュール７０２。いくつかの実施形態において、レベル３転置インデックスファイルは、初期静的圧縮ファイルおよび増分ファイルを含む。 A decision module 702 that uses a path file to determine a level 3 inverted index file corresponding to a keyword in a level 2 inverted index record. In some embodiments, the level 3 inverted index file includes an initial static compressed file and an incremental file.

キーワードに従って、レベル２転置インデックスレコードを増分ファイルにフェッチするよう構成された第２のフェッチモジュール７０３。 A second fetch module 703 configured to fetch level 2 transposed index records into an incremental file according to keywords.

増分ファイルが増分閾値を超えているか否かを判定するよう構成された第４の判定ユニット９０１。 A fourth determination unit 901 configured to determine whether the incremental file exceeds an incremental threshold;

第４の判定ユニットの結果が肯定であった場合に、解凍済みの初期静的ファイルを取得するために、初期静的圧縮ファイルを解凍するよう構成された解凍ユニット９０２。 A decompression unit 902 configured to decompress an initial static compressed file to obtain a decompressed initial static file if the result of the fourth determination unit is positive.

マージファイルを取得するために、初期静的ファイルおよび増分ファイルをマージするよう構成されたマージユニット９０３。 A merge unit 903 configured to merge the initial static file and the incremental file to obtain a merge file.

現在の静的圧縮ファイルを生成するために、マージファイルを圧縮するよう構成された圧縮ユニット９０４。 A compression unit 904 configured to compress the merge file to generate a current static compressed file.

本実施形態において、レベル３転置インデックスファイルは、圧縮初期静的ファイルとしてまたは増分ファイルとして分類される。したがって、メールインデックスをより迅速に確立し、過度の頻繁なディスク書き込みによる影響を低減できる。ディスクのＩＯ性能を向上させ、レベル３転置インデックスファイルのための格納リソースをさらに節約することができる。 In this embodiment, the level 3 inverted index file is classified as a compressed initial static file or as an incremental file. Thus, the mail index can be established more quickly and the impact of excessive frequent disk writes can be reduced. Disk IO performance can be improved and storage resources for level 3 inverted index files can be further saved.

図１０は、メール検索システムの一実施形態を示すブロック図である。メール検索システム１０００は、上述のメールインデックス確立システムと協働するものであり、以下を備える： FIG. 10 is a block diagram illustrating an embodiment of a mail search system. The mail search system 1000 cooperates with the mail index establishment system described above and includes the following:

ユーザによって送信された検索キーワードを取得するよう構成されたキーワード取得ユニット１００１。 A keyword acquisition unit 1001 configured to acquire a search keyword transmitted by a user.

キーワードとレベル３転置インデックスファイルとの間の対応関係を確立するマッピング関係を保存するパスファイルに従って、キーワードに対応するレベル３転置インデックスファイルを決定するよう構成された決定ユニット１００２。 A decision unit 1002 configured to determine a level 3 inverted index file corresponding to the keyword according to a path file that stores a mapping relationship that establishes a correspondence between the keyword and the level 3 inverted index file.

レベル３転置インデックスファイルから、キーワードが位置する第１のメールセットを決定するよう構成された第１のメールセット決定ユニット１００３。 A first mail set determination unit 1003 configured to determine a first mail set in which a keyword is located from a level 3 inverted index file.

レベル１キャッシュから、キーワードが位置する第２のメールセットを決定するよう構成された第２のメールセット決定ユニット１００４。 A second mail set determination unit 1004 configured to determine a second mail set in which the keyword is located from the level 1 cache.

レベル２キャッシュファイルから、キーワードが位置する第３のメールセットを決定するよう構成された第３のメールセット決定ユニット１００５。 A third mail set determination unit 1005 configured to determine a third mail set in which the keyword is located from the level 2 cache file.

検索結果を取得するために、第１のメールセット、第２のメールセット、および、第３のメールセットをマージするよう構成された検索結果取得ユニット１００６。 A search result acquisition unit 1006 configured to merge the first mail set, the second mail set, and the third mail set to obtain search results.

上述のモジュール／ユニットは、１または複数のプロセッサ上で実行されるソフトウェアコンポーネントとして、特定の機能を実行するよう設計されたプログラム可能論理デバイスおよび／または特定用途向け集積回路などのハードウェアとして、もしくは、それらの組み合わせとして実装することができる。いくつかの実施形態において、モジュール／ユニットは、コンピュータデバイス（パーソナルコンピュータ、サーバ、ネットワーク装置など）に本願の実施形態に記載された方法を実行させるための複数の命令など、不揮発性記憶媒体（光学ディスク、フラッシュ記憶装置、携帯用ハードディスクなど）に格納することができるソフトウェア製品の形態で具現化されてよい。モジュール／ユニットは、単一のデバイス上に実装されてもよいし、複数のデバイスにわたって分散されてもよい。モジュール／ユニットの機能は、互いに統合されてもよいし、複数のサブモジュール／サブユニットにさらに分割されてもよい。 The modules / units described above may be implemented as hardware, such as programmable logic devices and / or application specific integrated circuits designed to perform specific functions as software components running on one or more processors, or Can be implemented as a combination of them. In some embodiments, the module / unit may be a non-volatile storage medium (optical It may be embodied in the form of a software product that can be stored on a disk, flash storage device, portable hard disk, etc. Modules / units may be implemented on a single device or distributed across multiple devices. The functions of the modules / units may be integrated with each other or further divided into a plurality of submodules / subunits.

上述の実施形態は、理解しやすいようにいくぶん詳しく説明されているが、本発明は、提供された詳細事項に限定されるものではない。本発明を実施する多くの代替方法が存在する。開示された実施形態は、例示であり、限定を意図するものではない。 Although the embodiments described above have been described in some detail for ease of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not intended to be limiting.

［適用例１］[Application Example 1]
電子メールメッセージ処理のための方法であって、  A method for processing email messages, comprising:
電子メールメッセージに関連する１セットのキーワードを取得する工程と、  Obtaining a set of keywords related to the email message;
１または複数のコンピュータプロセッサを用いて、前記１セットのキーワードに少なくとも部分的に基づいて、レベル１キャッシュに格納された１セットの転置インデックスレコードを更新する工程と、  Using one or more computer processors to update a set of transposed index records stored in a level 1 cache based at least in part on the set of keywords;
前記レベル１キャッシュに格納された前記１セットの転置インデックスレコードのサイズが第１の所定の閾値を超えたか否かを判定する工程と、  Determining whether the size of the set of inverted index records stored in the level 1 cache exceeds a first predetermined threshold;
前記第１の所定の閾値を超えた場合に、前記レベル１キャッシュ内の前記１セットの転置インデックスレコードをレベル２キャッシュに転送する工程と、  Transferring the set of inverted index records in the level 1 cache to a level 2 cache when the first predetermined threshold is exceeded;
レベル２キャッシュファイルのサイズが第２の所定の閾値を超えたか否かを判定する工程と、  Determining whether the size of the level 2 cache file exceeds a second predetermined threshold;
前記第２の所定の閾値を超えた場合に、前記キーワードと、対応するレベル３転置インデックスファイルとのマッピング関係情報を格納するパスファイルに従って、１セットの転置インデックスファイルを格納するレベル３キャッシュに前記レベル２キャッシュファイル内の転置インデックスレコードを転送する工程と、  When the second predetermined threshold is exceeded, the level 3 cache that stores a set of inverted index files according to a path file that stores mapping relationship information between the keyword and the corresponding level 3 inverted index file Transferring the inverted index record in the level 2 cache file;
を備える、方法。A method comprising:
［適用例２］[Application Example 2]
適用例１に記載の方法であって、さらに、  The method according to application example 1, further comprising:
レベル３転置インデックスファイルのサイズが第３の所定の閾値を超えたか否かを判定する工程と、  Determining whether the size of the level 3 inverted index file exceeds a third predetermined threshold;
前記レベル３転置インデックスファイルの前記サイズが前記第３の所定の閾値を超えた場合に、前記レベル３転置インデックスファイルを２つの転置インデックスサブファイルに分割する工程と、  Dividing the level 3 inverted index file into two inverted index subfiles when the size of the level 3 inverted index file exceeds the third predetermined threshold;
前記２つの分割された転置インデックスサブファイルに従って、前記パスファイルを更新する工程と、  Updating the path file according to the two split inverted index subfiles;
を備える、方法。A method comprising:
［適用例３］[Application Example 3]
適用例１に記載の方法であって、前記レベル２キャッシュファイル内の前記転置インデックスレコードをレベル３転置インデックスファイルに転送する工程は、  In the method described in Application Example 1, the step of transferring the inverted index record in the level 2 cache file to the level 3 inverted index file includes:
前記レベル２キャッシュファイル内の前記レベル２転置インデックスレコードをメモリにフェッチする工程と、  Fetching the level 2 inverted index record in the level 2 cache file into memory;
前記パスファイルに基づいて、前記レベル２転置インデックスレコード内のキーワードに対応する前記レベル３転置インデックスファイルを決定する工程と、  Determining the level 3 inverted index file corresponding to the keyword in the level 2 inverted index record based on the path file;
前記キーワードに従って、前記決定されたレベル３転置インデックスファイルに前記レベル２転置インデックスレコードを転送する工程と、  Transferring the level 2 inverted index record to the determined level 3 inverted index file according to the keyword;
を含む、方法。Including a method.
［適用例４］[Application Example 4]
適用例３に記載の方法であって、前記キーワードに従って、前記決定されたレベル３転置インデックスファイルに前記レベル２キャッシュファイル内の前記転置インデックスレコードを転送する工程は、アペンドモードで実行される、方法。  The method according to application example 3, wherein the step of transferring the inverted index record in the level 2 cache file to the determined level 3 inverted index file according to the keyword is performed in an append mode. .
［適用例５］[Application Example 5]
適用例３に記載の方法であって、１つのレベル３転置インデックスファイルが、初期静的圧縮ファイルおよび増分ファイルを備える、方法。  The method of application example 3, wherein one level 3 inverted index file comprises an initial static compressed file and an incremental file.
［適用例６］[Application Example 6]
適用例５に記載の方法であって、前記キーワードに従って、前記レベル３転置インデックスファイルに前記レベル２キャッシュファイル内の前記転置インデックスレコードを転送する工程は、  The method according to application example 5, wherein the step of transferring the inverted index record in the level 2 cache file to the level 3 inverted index file according to the keyword includes
前記キーワードに従って、前記増分ファイルに前記レベル２転置インデックスレコードを転送する工程と、  Transferring the level 2 inverted index record to the incremental file according to the keyword;
前記増分ファイルが増分閾値を超えたか否かを判定する工程と、  Determining whether the incremental file exceeds an incremental threshold;
前記増分閾値を超えた場合に、前記初期静的圧縮ファイルを解凍して、解凍済みの初期静的ファイルを取得する工程と、  Decompressing the initial static compressed file to obtain a decompressed initial static file when the incremental threshold is exceeded;
前記解凍済みの初期静的ファイルおよび前記増分ファイルをマージして、マージファイルを取得する工程と、  Merging the unzipped initial static file and the incremental file to obtain a merge file;
前記マージファイルを圧縮して、現行の静的圧縮を生成する工程と、  Compressing the merge file to generate a current static compression;
を含む、方法。Including a method.
［適用例７］[Application Example 7]
適用例１に記載の方法であって、さらに、  The method according to application example 1, further comprising:
ユーザによって送信された１または複数の検索キーワードのセットを取得する工程と、  Obtaining a set of one or more search keywords sent by a user;
前記パスファイルに従って、前記ユーザによって送信された前記検索キーワードに対応するレベル３転置インデックスファイルを決定する工程と、  Determining a level 3 inverted index file corresponding to the search keyword sent by the user according to the path file;
前記レベル３転置インデックスファイルに基づいて、前記検索キーワードを含む第１のメールセットを、前記レベル１キャッシュに基づいて、前記検索キーワードを含む第２のメールセットを、そして、前記レベル２キャッシュファイルに基づいて、前記検索キーワードを含む第３のメールセットを決定する工程と、  Based on the level 3 inverted index file, a first mail set including the search keyword is converted to a second mail set including the search keyword based on the level 1 cache, and then the level 2 cache file is stored. Based on, determining a third mail set that includes the search keyword;
前記第１のメールセット、第２のメールセット、および、第３のメールセットをマージして、検索結果を得る工程と、  Merging the first mail set, the second mail set, and the third mail set to obtain a search result;
を備える、方法。A method comprising:
［適用例８］[Application Example 8]
電子メールメッセージを処理するためのシステムであって、  A system for processing email messages,
１または複数のプロセッサであって、  One or more processors,
電子メールメッセージに関連する１セットのキーワードを取得し、    Retrieve a set of keywords related to an email message,
前記１セットのキーワードに少なくとも部分的に基づいて、レベル１キャッシュに格納された１セットの転置インデックスレコードを更新し、    Updating a set of inverted index records stored in a level 1 cache based at least in part on the set of keywords;
前記レベル１キャッシュに格納された前記１セットの転置インデックスレコードのサイズが第１の所定の閾値を超えたか否かを判定し、    Determining whether the size of the set of transposed index records stored in the level 1 cache exceeds a first predetermined threshold;
前記第１の所定の閾値を超えた場合に、前記レベル１キャッシュ内の前記１セットの転置インデックスレコードをレベル２キャッシュに転送し、    Transferring the set of inverted index records in the level 1 cache to a level 2 cache when the first predetermined threshold is exceeded;
レベル２キャッシュファイルのサイズが第２の所定の閾値を超えたか否かを判定し、    Determine whether the size of the level 2 cache file exceeds a second predetermined threshold;
前記第２の所定の閾値を超えた場合に、前記キーワードと、対応するレベル３転置インデックスファイルとのマッピング関係情報を格納するパスファイルに従って、１セットの転置インデックスファイルを格納するレベル３キャッシュに前記レベル２キャッシュファイル内の転置インデックスレコードを転送するよう構成された、１または複数のプロセッサと、    When the second predetermined threshold is exceeded, the level 3 cache that stores a set of inverted index files according to a path file that stores mapping relationship information between the keyword and the corresponding level 3 inverted index file One or more processors configured to transfer an inverted index record in a level 2 cache file;
前記１または複数のプロセッサに接続され、前記１または複数のプロセッサに命令を提供するよう構成された１または複数のメモリと、  One or more memories connected to the one or more processors and configured to provide instructions to the one or more processors;
を備える、システム。A system comprising:
［適用例９］[Application Example 9]
適用例８に記載のシステムであって、前記１または複数のプロセッサは、さらに、  The system according to application example 8, wherein the one or more processors further include:
レベル３転置インデックスファイルのサイズが第３の所定の閾値を超えたか否かを判定し、  Determine whether the size of the level 3 inverted index file exceeds a third predetermined threshold;
前記レベル３転置インデックスファイルの前記サイズが前記第３の所定の閾値を超えた場合に、前記レベル３転置インデックスファイルを２つの転置インデックスサブファイルに分割し、  Dividing the level 3 inverted index file into two inverted index subfiles when the size of the level 3 inverted index file exceeds the third predetermined threshold;
前記２つの分割された転置インデックスサブファイルに従って、前記パスファイルを更新するよう構成されている、システム。  A system configured to update the path file according to the two divided inverted index subfiles.
［適用例１０］[Application Example 10]
適用例８に記載のシステムであって、前記レベル２キャッシュファイル内の前記転置インデックスレコードをレベル３転置インデックスファイルに転送することは、  The system according to application example 8, wherein the transfer index record in the level 2 cache file is transferred to the level 3 transfer index file.
前記レベル２キャッシュファイル内の前記レベル２転置インデックスレコードをメモリにフェッチし、  Fetching the level 2 inverted index record in the level 2 cache file into memory;
前記パスファイルに基づいて、前記レベル２転置インデックスレコード内のキーワードに対応する前記レベル３転置インデックスファイルを決定し、  Determining the level 3 inverted index file corresponding to the keyword in the level 2 inverted index record based on the path file;
前記キーワードに従って、前記決定されたレベル３転置インデックスファイルに前記レベル２転置インデックスレコードを転送することを含む、システム。  Transferring the level 2 inverted index record to the determined level 3 inverted index file according to the keyword.
［適用例１１］[Application Example 11]
適用例１０に記載のシステムであって、前記キーワードに従って、前記決定されたレベル３転置インデックスファイルに前記レベル２キャッシュファイル内の前記転置インデックスレコードを転送することは、アペンドモードで実行される、システム。  The system according to application example 10, wherein the transfer of the inverted index record in the level 2 cache file to the determined level 3 inverted index file according to the keyword is executed in an append mode. .
［適用例１２］[Application Example 12]
適用例１０に記載のシステムであって、１つのレベル３転置インデックスファイルが、初期静的圧縮ファイルおよび増分ファイルを備える、システム。  The system of application example 10, wherein one level 3 inverted index file comprises an initial static compressed file and an incremental file.
［適用例１３］[Application Example 13]
適用例１２に記載のシステムであって、前記キーワードに従って、前記レベル３転置インデックスファイルに前記レベル２キャッシュファイル内の前記転置インデックスレコードを転送することは、  The system according to Application Example 12, wherein the transfer index record in the level 2 cache file is transferred to the level 3 transfer index file according to the keyword.
前記キーワードに従って、前記増分ファイルに前記レベル２転置インデックスレコードを転送し、  Transferring the level 2 inverted index record to the incremental file according to the keyword;
前記増分ファイルが増分閾値を超えたか否かを判定し、  Determining whether the incremental file exceeds an incremental threshold;
前記増分閾値を超えた場合に、前記初期静的圧縮ファイルを解凍して、解凍済みの初期静的ファイルを取得し、  If the incremental threshold is exceeded, decompress the initial static compressed file to obtain a decompressed initial static file;
前記解凍済みの初期静的ファイルおよび前記増分ファイルをマージして、マージファイルを取得し、  Merge the unzipped initial static file and the incremental file to obtain a merge file,
前記マージファイルを圧縮して、現行の静的圧縮を生成することを含む、システム。  Compressing the merge file to generate a current static compression.
［適用例１４］[Application Example 14]
適用例８に記載のシステムであって、前記１または複数のプロセッサは、さらに、  The system according to application example 8, wherein the one or more processors further include:
ユーザによって送信された１または複数の検索キーワードのセットを取得し、  Get a set of one or more search keywords sent by the user,
前記パスファイルに従って、前記ユーザによって送信された前記検索キーワードに対応するレベル３転置インデックスファイルを決定し、  Determining a level 3 inverted index file corresponding to the search keyword sent by the user according to the path file;
前記レベル３転置インデックスファイルに基づいて、前記検索キーワードを含む第１のメールセットを、前記レベル１キャッシュに基づいて、前記検索キーワードを含む第２のメールセットを、そして、前記レベル２キャッシュファイルに基づいて、前記検索キーワードを含む第３のメールセットを決定し、  Based on the level 3 inverted index file, a first mail set including the search keyword is converted to a second mail set including the search keyword based on the level 1 cache, and then the level 2 cache file is stored. And determining a third mail set that includes the search keyword,
前記第１のメールセット、第２のメールセット、および、第３のメールセットをマージして、検索結果を得るよう構成されている、システム。  A system configured to merge the first mail set, the second mail set, and the third mail set to obtain a search result.
［適用例１５］[Application Example 15]
電子メールメッセージ処理のためのコンピュータプログラム製品であって、有形のコンピュータ読み取り可能記憶媒体内に具現化され、  A computer program product for processing e-mail messages, embodied in a tangible computer readable storage medium,
電子メールメッセージに関連する１セットのキーワードを取得するためのコンピュータ命令と、  Computer instructions for obtaining a set of keywords associated with the email message;
前記１セットのキーワードに少なくとも部分的に基づいて、レベル１キャッシュに格納された１セットの転置インデックスレコードを更新するためのコンピュータ命令と、  Computer instructions for updating a set of inverted index records stored in a level 1 cache based at least in part on the set of keywords;
前記レベル１キャッシュに格納された前記１セットの転置インデックスレコードのサイズが第１の所定の閾値を超えたか否かを判定するためのコンピュータ命令と、  Computer instructions for determining whether the size of the set of transposed index records stored in the level 1 cache exceeds a first predetermined threshold;
前記第１の所定の閾値を超えた場合に、前記レベル１キャッシュ内の前記１セットの転置インデックスレコードをレベル２キャッシュに転送するためのコンピュータ命令と、  Computer instructions for transferring the set of inverted index records in the level 1 cache to a level 2 cache when the first predetermined threshold is exceeded;
レベル２キャッシュファイルのサイズが第２の所定の閾値を超えたか否かを判定するためのコンピュータ命令と、  Computer instructions for determining whether the size of the level 2 cache file exceeds a second predetermined threshold;
前記第２の所定の閾値を超えた場合に、前記キーワードと、対応するレベル３転置インデックスファイルとのマッピング関係情報を格納するパスファイルに従って、１セットの転置インデックスファイルを格納するレベル３キャッシュに前記レベル２キャッシュファイル内の転置インデックスレコードを転送するためのコンピュータ命令と、  When the second predetermined threshold is exceeded, the level 3 cache that stores a set of inverted index files according to a path file that stores mapping relationship information between the keyword and the corresponding level 3 inverted index file Computer instructions for transferring the inverted index record in the level 2 cache file;
を備える、コンピュータプログラム製品。A computer program product comprising:

Claims

A method for processing email messages, comprising:
Obtaining a set of keywords related to the email message;
Using one or more computer processors to update a set of transposed index records stored in a level 1 cache based at least in part on the set of keywords;
Determining whether the size of the set of inverted index records stored in the level 1 cache exceeds a first predetermined threshold;
Transferring the set of inverted index records in the level 1 cache to a level 2 cache when the first predetermined threshold is exceeded;
Determining whether the size of the level 2 cache file exceeds a second predetermined threshold;
When the second predetermined threshold is exceeded, the level 3 cache that stores a set of inverted index files according to a path file that stores mapping relationship information between the keyword and the corresponding level 3 inverted index file Transferring the inverted index record in the level 2 cache file;
A method comprising:

The method of claim 1, further comprising:
Determining whether the size of the level 3 inverted index file exceeds a third predetermined threshold;
Dividing the level 3 inverted index file into two inverted index subfiles when the size of the level 3 inverted index file exceeds the third predetermined threshold;
Updating the path file according to the two split inverted index subfiles;
A method comprising:

The method of claim 1, wherein transferring the inverted index record in the level 2 cache file to a level 3 inverted index file comprises:
A step of fetching the level 2 inverted index records of the level 2 cache file in memory,
Determining the level 3 inverted index file corresponding to the keyword in the level 2 inverted index record based on the path file;
Transferring the level 2 inverted index record to the determined level 3 inverted index file according to the keyword;
Including the method.

4. The method of claim 3, wherein the step of transferring the inverted index record in the level 2 cache file to the determined level 3 inverted index file according to the keyword is performed in append mode. .

4. The method of claim 3, wherein one level 3 inverted index file comprises an initial static compressed file and an incremental file.

6. The method of claim 5, wherein the step of transferring the inverted index record in the level 2 cache file to the level 3 inverted index file according to the keyword comprises:
Transferring the level 2 inverted index record to the incremental file according to the keyword;
Determining whether the incremental file exceeds an incremental threshold;
Decompressing the initial static compressed file to obtain a decompressed initial static file when the incremental threshold is exceeded;
Merging the unzipped initial static file and the incremental file to obtain a merge file;
Compressing the merge file to generate a current static compression;
Including the method.

The method of claim 1, further comprising:
Obtaining a set of one or more search keywords sent by a user;
Determining a level 3 inverted index file corresponding to the search keyword sent by the user according to the path file;
Based on the level 3 inverted index file, a first mail set including the search keyword is converted to a second mail set including the search keyword based on the level 1 cache, and then the level 2 cache file is stored. Based on, determining a third mail set that includes the search keyword;
Merging the first mail set, the second mail set, and the third mail set to obtain a search result;
A method comprising:

A system for processing email messages,
One or more processors,
Retrieve a set of keywords related to an email message,
Updating a set of inverted index records stored in a level 1 cache based at least in part on the set of keywords;
Determining whether the size of the set of transposed index records stored in the level 1 cache exceeds a first predetermined threshold;
Transferring the set of inverted index records in the level 1 cache to a level 2 cache when the first predetermined threshold is exceeded;
Determine whether the size of the level 2 cache file exceeds a second predetermined threshold;
When the second predetermined threshold is exceeded, the level 3 cache that stores a set of inverted index files according to a path file that stores mapping relationship information between the keyword and the corresponding level 3 inverted index file One or more processors configured to transfer an inverted index record in a level 2 cache file;
One or more memories connected to the one or more processors and configured to provide instructions to the one or more processors;
A system comprising:

9. The system of claim 8, wherein the one or more processors are further
Determine whether the size of the level 3 inverted index file exceeds a third predetermined threshold;
Dividing the level 3 inverted index file into two inverted index subfiles when the size of the level 3 inverted index file exceeds the third predetermined threshold;
A system configured to update the path file according to the two divided inverted index subfiles.

9. The system of claim 8, wherein transferring the inverted index record in the level 2 cache file to a level 3 inverted index file is:
The level 2 inverted index records of the level 2 cache file fetched in the memory,
Determining the level 3 inverted index file corresponding to the keyword in the level 2 inverted index record based on the path file;
Transferring the level 2 inverted index record to the determined level 3 inverted index file according to the keyword.

11. The system of claim 10, wherein transferring the inverted index record in the level 2 cache file to the determined level 3 inverted index file according to the keyword is performed in append mode. .

12. The system of claim 10, wherein one level 3 inverted index file comprises an initial static compressed file and an incremental file.

13. The system of claim 12, wherein transferring the inverted index record in the level 2 cache file to the level 3 inverted index file according to the keyword is:
Transferring the level 2 inverted index record to the incremental file according to the keyword;
Determining whether the incremental file exceeds an incremental threshold;
If the incremental threshold is exceeded, decompress the initial static compressed file to obtain a decompressed initial static file;
Merge the unzipped initial static file and the incremental file to obtain a merge file,
Compressing the merge file to generate a current static compression.

9. The system of claim 8, wherein the one or more processors are further
Get a set of one or more search keywords sent by the user,
Determining a level 3 inverted index file corresponding to the search keyword sent by the user according to the path file;
Based on the level 3 inverted index file, a first mail set including the search keyword is converted to a second mail set including the search keyword based on the level 1 cache, and then the level 2 cache file is stored. And determining a third mail set that includes the search keyword,
A system configured to merge the first mail set, the second mail set, and the third mail set to obtain a search result.

A computer program for processing email messages,
A function to retrieve a set of keywords related to an email message;
A function for updating a set of inverted index records stored in a level 1 cache based at least in part on the set of keywords;
A function for determining whether the size of the one set of inverted index records stored in the level 1 cache exceeds a first predetermined threshold;
A function for transferring the set of inverted index records in the level 1 cache to a level 2 cache when the first predetermined threshold is exceeded;
A function for determining whether the size of the level 2 cache file has exceeded a second predetermined threshold;
When the second predetermined threshold is exceeded, the level 3 cache that stores a set of inverted index files according to a path file that stores mapping relationship information between the keyword and the corresponding level 3 inverted index file A function for transferring inverted index records in level 2 cache files;
A computer program that causes a computer to realize