JP2006012140A

JP2006012140A - Anomaly detection in data perspective

Info

Publication number: JP2006012140A
Application number: JP2005163148A
Authority: JP
Inventors: Allan Folting; フォルティングアラン; Bo Thiesson; ティエソンボ; David E Heckerman; イー．ヘッカーマンデビッド; M Chickering David; エム．チッカリングデビッド; Eric B Vigesaa; バーバービジェサエリック
Original assignee: Microsoft Corp
Current assignee: Microsoft Corp
Priority date: 2004-06-23
Filing date: 2005-06-02
Publication date: 2006-01-12
Also published as: RU2378694C2; KR20060045490A; RU2005114223A; EP1610264A3; CN1713182A; US20060106560A1; MXPA05005537A; CA2505983A1; US7065534B2; CN100568234C; EP1610264A2; US20050288883A1; US7162489B2; BRPI0501784A; KR101083519B1; AU2005201997B2; CA2505983C; AU2005201997A1

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system and a method for providing automatic detection of data anomalies in a data perspective. <P>SOLUTION: The utilization of curve fitting data techniques provides automatic detection of data anomalies in a "data tube" from the data perspective, and makes it possible to detect data anomalies such as on-screen, and drill down and drill across data anomalies in, for example, pivot tables and/or OLAP cubes. Whether or not data substantially deviates from a predicted value established by a curve fitting process such as, for example, a piece-wise linear function applied to the data base is determined. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、一般にはデータマイニングに関し、より詳細には、データパースペクティブ(data perspectives)の自動的なデータ異常検出(data anomaly detection)を提供するシステムおよび方法に関する。 The present invention relates generally to data mining, and more particularly to a system and method for providing automatic data anomaly detection of data perspectives.

情報をデジタル化することにより、膨大な量のデータを驚くほど少量の空間に記憶することができる。この手法は、例えば、図書館の蔵書の記憶を１つのコンピュータハードドライブに取り込むことを可能にする。これは、データがバイナリ状態に変換され、その状態で、デジタル符号化デバイスを介して、ハードドライブ、ＣＤ−ＲＯＭディスク、フロッピー（登録商標）ディスクなどの各種のデジタル記憶媒体に記憶することができるために可能になる。デジタル記憶技術の発展に伴い、記憶装置の密度は、以前に比べて所与の量の空間にかなり多くのデータを記憶できるようになっており、データの密度は、主に物理的特性と製法によって制限される。 By digitizing information, a huge amount of data can be stored in a surprisingly small amount of space. This approach, for example, allows library library storage to be captured on a single computer hard drive. In this state, data is converted into a binary state, and can be stored in various digital storage media such as a hard drive, a CD-ROM disk, and a floppy (registered trademark) disk through the digital encoding device. To be possible. With the development of digital storage technology, the density of storage devices has been able to store much more data in a given amount of space than before, and the density of data mainly depends on physical properties and manufacturing methods. Limited by.

記憶容量の増大に伴い、効果的なデータ検索の課題も増えており、データに容易にアクセスできることが最も重要視される。例えば、ある本が図書館にあるが、その本を見つけられないのでは、その本を読みたい利用者の役に立たない。同様に、データに容易にアクセスできるのでなければ、単にデータをデジタル化することは、前進とはならない。この結果、効率的なデータ検索を助けるデータ構造が作成された。そうした構造は、一般には「データベース」として知られる。データベースは、構造化されたフォーマットでデータを保持して、データへの効率的なアクセスを提供する。データ記憶を構造化することにより、構造化されていないデータ記憶に比べてデータ検索の効率を高めることができる。索引付けおよび他の編成技術も利用することができる。データと併せてデータ間の関係も記憶して、データの価値を高めることができる。 As the storage capacity increases, there are increasing issues of effective data retrieval, and it is most important that data can be easily accessed. For example, if a book is in a library but cannot find it, it will not help the user who wants to read the book. Similarly, simply digitizing data is not a step forward unless the data is easily accessible. As a result, a data structure that helps efficient data retrieval was created. Such a structure is commonly known as a “database”. A database holds data in a structured format and provides efficient access to the data. By structuring the data store, the efficiency of data retrieval can be increased compared to an unstructured data store. Indexing and other organization techniques can also be utilized. In addition to data, the relationship between data can be stored to increase the value of the data.

データベース開発の初期には、ユーザは、「生データ」、すなわちデータベースに入力された通りに表示されるデータを見るのが一般的であった。最終的に、より効率的な方式でデータをフォーマットし、操作し、閲覧できるようにする技術が開発された。これにより、ユーザは、例えば、データに数学演算子を適用し、さらには報告書を作成することができるようになった。ビジネスユーザは、個々の売上高だけを含むデータベースのデータから「総売上高」などの情報にアクセスできるようになった。ユーザインターフェースの開発が続けられて、使いやすいフォーマットでデータを検索し、表示することがさらに容易になった。ユーザは、最終的には、個々の売上高からの総売上高のようなデータの異なるビューで、データベースの生データから追加的な情報を得られることを認識するようになった。この追加的なデータの収集は、「データマイニング」として知られ、「メタデータ」（データについてのデータ）を生成する。データマイニングで、生データから有用な追加的情報を抽出することができる。これは、特に、企業の売上げや生産高の理由を説明する情報を見つけることができるビジネスにおいて有用であり、データベースの生の入力データだけから得られる結果より優れる。 In the early days of database development, it was common for users to see “raw data”, ie data displayed as entered into the database. Eventually, technologies were developed that allow data to be formatted, manipulated, and viewed in a more efficient manner. This allows the user to apply mathematical operators to the data and create reports, for example. Business users can now access information such as “total sales” from database data that includes only individual sales. User interface development continued, making it easier to search and display data in an easy-to-use format. Users eventually realized that they could obtain additional information from raw data in the database with different views of the data, such as total sales from individual sales. This collection of additional data is known as “data mining” and produces “metadata” (data about the data). Data mining can extract useful additional information from raw data. This is particularly useful in businesses where information can be found that explains the reasons for a company's sales or output, and is superior to results obtained from raw input data in a database alone.

したがって、データの操作により、生データからきわめて重要な情報を抽出することができる。このデータ操作は、記憶されたデータのデジタル性のために可能になる。莫大な量のデジタル化されたデータを、手作業で試みた場合よりもかなり高速に、様々な側面から見ることができる。データの新しい各パースペクティブは、ユーザが、そのデータについてのさらに洞察を得ることを可能にすることができる。これは、ビジネスがこの概念を用いて成功するか、あるいは用いないで失敗するかを導きかねない、非常に影響力のある概念である。例えば、トレンド分析、原因結果分析、インパクト研究、および予測などを、データベースに入力された生データから求めることができ、その価値と適時性は、デジタル化された情報への直感的で利用しやすいアクセスを備えることにある。 Therefore, extremely important information can be extracted from the raw data by manipulating the data. This data manipulation is possible because of the digital nature of the stored data. Enormous amounts of digitized data can be viewed from various aspects, much faster than if attempted manually. Each new perspective of data can allow the user to gain further insight into the data. This is a very influential concept that can lead to the success or failure of a business using this concept. For example, trend analysis, cause-and-effect analysis, impact research, and prediction can be obtained from raw data entered in the database, and its value and timeliness are intuitive and easy to use for digitized information To have access.

現在、データマイニング機能を高めるためのデータ操作は、各種のデータパースペクティブに誤りデータが含まれないようにするために、相当のユーザ入力とユーザの知識を必要とする。これは、データの深い知識と、データに起こりうる誤りの種類についての見識を持つことをユーザに要求する。この事前の知識がないと、ユーザは、所与のデータパースペクティブに埋もれたデータ異常を見つけることを期待して「行き当たりばったり」式の手法を試みなければならない。この手法は、一般には、たまに利用するユーザには歯が立たず、かつ／または、上級ユーザにとっては時間がかかりすぎる。一般には、記憶されたデータの量は、ユーザが、すべてのデータ異常を確実に見つけるための使用可能な方式を効率的に開発するには、あまりにも膨大で、関係が複雑である。 Currently, data manipulation to enhance the data mining function requires considerable user input and user knowledge in order to prevent error data from being included in various data perspectives. This requires the user to have deep knowledge of the data and insight into the types of errors that can occur in the data. Without this prior knowledge, the user must try a “spot” approach in the hope of finding data anomalies buried in a given data perspective. This approach generally does not add up to occasional users and / or is too time consuming for advanced users. In general, the amount of stored data is too large and complex to allow a user to efficiently develop a usable scheme for reliably finding all data anomalies.

以下に、本発明のいくつかの態様の基本的な理解を提供するために、簡略化した本発明の要約を述べる。この要約は、本発明の広範囲にわたる概要ではない。この要約は、本発明の主要／不可欠な要素を明らかにするものでも、本発明の範囲を定義するものでもない。その唯一の目的は、その後に記載されるより詳細な説明の前置きとして、簡略化した形で本発明のいくつかの概念を述べることである。 The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. This summary is not an extensive overview of the invention. This summary is not intended to identify key / essential elements of the invention or to define the scope of the invention. Its sole purpose is to present some concepts of the invention in a simplified form as a prelude to the more detailed description that is presented later.

本発明は、一般的にはデータマイニングに関し、より詳細には、データパースペクティブの自動的なデータ異常検出を提供するシステムおよび方法に関する。データ曲線当てはめ(data curve fitting)技術を活用して、データパースペクティブからの「データチューブ」のデータ異常の自動的な検出を提供する。データチューブは、１つのみの変化するデータ次元を有するデータを含んでいる。これにより、例えば、表計算ピボットテーブルおよび／またはＯＬＡＰ（オンライン分析処理）キューブなどの中の画面内のデータ異常(on-screen data anomaly)、下方への展開データ異常(drill down data anomaly)、および横方向に展開したデータ異常(drill across data anomaly)などのデータ異常を検出することができる。自動的なデータパースペクティブ分析を提供することにより、本発明は、経験の浅いユーザが、データベースの誤りデータ情報を容易に見つけられるようにする。これは、例えばデータチューブに適用される区分線形関数(a piece-wise linear function)などの曲線当てはめプロセスによって確定される予測値からデータが相当逸脱(deviation)するかどうかを判定することによって達成される。本発明で閾値を用いて、データ値が異常であると見なされる前に必要とされる逸脱の度合いを判定することを助けることもできる。この閾値は、例えばシステムにより、および／またはユーザインターフェースなどを介してユーザにより、動的かつ／または静的に供給されることができる。また、本発明は、検出された異常のタイプと、最上位のデータパースペクティブからの場所を容易にユーザに示すことができ、ユーザがより低いレベルにあるデータ異常を探し求める必要性をなくす。 The present invention relates generally to data mining and, more particularly, to a system and method for providing automatic data anomaly detection in a data perspective. Utilizes data curve fitting technology to provide automatic detection of “data tube” data anomalies from the data perspective. A data tube contains data having only one changing data dimension. This allows, for example, on-screen data anomaly, drill down data anomaly, and down-down data anomaly in spreadsheets and / or OLAP (online analytical processing) cubes, etc. It is possible to detect data anomalies such as laterally developed data anomalies (drill across data anomaly). By providing automatic data perspective analysis, the present invention allows inexperienced users to easily find erroneous data information in a database. This is accomplished by determining if the data deviates significantly from the predicted value established by a curve fitting process, such as a piece-wise linear function applied to the data tube. The The threshold can also be used in the present invention to help determine the degree of deviation required before a data value is considered abnormal. This threshold can be provided dynamically and / or statically, for example by the system and / or by the user, such as via a user interface. The present invention also allows the user to easily indicate the type of anomaly detected and the location from the top data perspective, eliminating the need for the user to search for data anomalies at a lower level.

前述の目的とそれに関連する目的を達成するために、以下の説明と添付図面との関連で本発明の例示的態様を本明細書に記載する。ただし、それらの態様は、本発明の原理を用いることが可能な各種方式の数例を示すに過ぎず、本発明は、そのような態様とその均等物をすべて包含するものとする。本発明の他の利点と新規の特徴は、図面と併せて以下の本発明の詳細な説明を検討することから明らかになろう。 To the accomplishment of the foregoing and related ends, illustrative aspects of the invention are described herein in connection with the following description and the annexed drawings. However, these modes merely show some examples of various systems in which the principle of the present invention can be used, and the present invention includes all such modes and equivalents thereof. Other advantages and novel features of the invention will become apparent from the following detailed description of the invention when considered in conjunction with the drawings.

次いで、図面を参照して本発明を説明する。すべての図面で、同様の要素は同様の参照符号を用いて参照する。以下の説明では、説明の目的で、本発明の完全な理解を提供するために多数の具体的な詳細事項を述べる。しかし、本発明は、それらの具体的な詳細を用いずに実施できることは明らかであろう。他の事例では、本発明の説明を容易にするために、よく知られた構造および装置は、ブロック図の形態で示す。 Next, the present invention will be described with reference to the drawings. In all the drawings, like elements are referred to with like reference numerals. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent that the invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing the present invention.

本願で使用される用語「コンポーネント」は、ハードウェア、ハードウェアとソフトウェアの組合せ、ソフトウェア、あるいは実行中のソフトウェアを問わず、コンピュータに関連するエンティティを指すものとする。例えば、コンポーネントは、これらに限定しないが、プロセッサで実行されるプロセス、プロセッサ、オブジェクト、実行ファイル、実行のスレッド、プログラム、および／またはコンピュータ等である。例示として、サーバで実行されるアプリケーションとサーバの両方がコンピュータコンポーネントとなることができる。プロセスおよび／または実行のスレッドには１つまたは複数のコンポーネントが存在することができ、コンポーネントは、１つのコンピュータに集中させても、かつ／または２つ以上のコンピュータに分散してもよい。「スレッド」とは、オペレーティングシステムカーネルが実行をスケジュールするプロセス中のエンティティである。当技術分野でよく知られるように、各スレッドは、関連付けられた「コンテキスト」を有し、コンテキストは、そのスレッドの実行に関連付けられた揮発性データである。スレッドのコンテキストは、システムレジスタの内容と、そのスレッドのプロセスに属する仮想アドレスを含む。したがって、スレッドのコンテキストからなる実際のデータは、実行時に様々に異なる。 As used herein, the term “component” is intended to refer to an entity associated with a computer, whether hardware, a combination of hardware and software, software, or running software. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and / or a computer. By way of illustration, both an application running on a server and the server can be a computer component. There may be one or more components in a process and / or thread of execution, and the components may be centralized on one computer and / or distributed across two or more computers. A “thread” is an entity in the process that the operating system kernel schedules for execution. As is well known in the art, each thread has an associated “context”, which is volatile data associated with the execution of that thread. The context of a thread includes the contents of the system register and a virtual address belonging to the thread's process. Therefore, the actual data consisting of the thread context varies at runtime.

本発明は、自動的に異常データを検出することにより、データパースペクティブの分析を助ける。表示を使用して、特定のデータパースペクティブのいずれかのレベルに誤りデータがある旨の通知をユーザに提供する。このレベルは、例えば、画面内すなわち最上位レベル、および／または現在は表示されておらず、その誤りデータの価値を明らかにするためにデータを下方に展開する、かつ／またはデータを横方向に展開することを必要とするレベルである。このようにして、ユーザは、データ異常が存在することと、その誤りデータを明らかにするために必要とされる労力の量および／またはデータビューを容易に判断することができる。ユーザおよび／またはシステムは、自動的な検出を助けるために、閾値を静的および／または動的に設定することもできる。ユーザは、各種タイプのデータ異常に異なる閾値を選択することもできる。閾値は、データが異常と見なされる前に、データ値がどれだけ逸脱しなければならないかを決定する。この逸脱は、データ値と、１つのみの変化するデータ次元を有するデータチューブに適用される曲線当てはめプロセスから得られる予測データ値とを比較することによって求められる。この曲線当てはめプロセスで利用される関数も、ユーザが選択できるようにすることができる。このように、本発明は、ユーザが、検討するデータの興味ある特性を容易に識別できるようにする。 The present invention aids data perspective analysis by automatically detecting anomalous data. The display is used to provide a notification to the user that there is erroneous data at any level of the particular data perspective. This level is, for example, within the screen, ie the top level, and / or currently not displayed, expanding the data downwards to reveal the value of the erroneous data and / or moving the data horizontally It is a level that needs to be expanded. In this way, the user can easily determine the presence of a data anomaly and the amount of effort and / or data view required to reveal the erroneous data. Users and / or systems can also set thresholds statically and / or dynamically to aid in automatic detection. The user can also select different thresholds for various types of data anomalies. The threshold determines how far the data value must deviate before the data is considered abnormal. This deviation is determined by comparing the data value with the predicted data value obtained from the curve fitting process applied to a data tube having only one changing data dimension. The functions used in this curve fitting process can also be made selectable by the user. Thus, the present invention allows a user to easily identify interesting characteristics of the data being considered.

図１に、本発明の一態様による、自動データパースペクティブ異常検出システム１００のブロック図を示す。自動データパースペクティブ異常検出システム１００は、データパースペクティブ１０４を受け取り、自動的にデータ異常１０６を判定する自動データパースペクティブ異常検出コンポーネント１０２からなる。データパースペクティブには、これらに限定しないが、表計算ピボットテーブル、ＯＬＡＰキューブなどが含まれる。オプションの外部閾値入力１０８を自動データパースペクティブ異常検出コンポーネント１０２によって利用して、どのデータに異常があるかの判定を助けることができる。例えばシステムによって決定される値および／またはシステムによって決定される逸脱の割合などとして、自動データパースペクティブ異常検出コンポーネント１０２の一部として閾値を決定することもできる。ユーザによって指定された複数の閾値を本発明で用いて、異なるタイプのデータ異常に利用することもできる。自動データパースペクティブ異常検出コンポーネント１０２は、データパースペクティブからのデータチューブに適用される曲線当てはめプロセスを利用して、どのデータに異常があるかを判定する。曲線当てはめプロセスは、データ異常の自動的な検出を助ける、ユーザによって指定された関数を取り込むこともできる。 FIG. 1 shows a block diagram of an automatic data perspective anomaly detection system 100 in accordance with an aspect of the present invention. The automatic data perspective anomaly detection system 100 comprises an automatic data perspective anomaly detection component 102 that receives a data perspective 104 and automatically determines a data anomaly 106. Data perspectives include but are not limited to spreadsheets, OLAP cubes, and the like. An optional external threshold input 108 can be utilized by the automatic data perspective anomaly detection component 102 to help determine which data is anomalous. The threshold may also be determined as part of the automatic data perspective anomaly detection component 102, such as, for example, a value determined by the system and / or a percentage of deviation determined by the system. A plurality of threshold values specified by the user can be used in the present invention to be used for different types of data anomalies. The automatic data perspective anomaly detection component 102 utilizes a curve fitting process applied to data tubes from the data perspective to determine which data is anomalous. The curve fitting process can also incorporate a user-specified function that helps automatic detection of data anomalies.

図２を参照すると、本発明の一態様による、自動データパースペクティブ異常検出システム２００の別のブロック図が示される。自動データパースペクティブ異常検出システム２００は、データチューブコンポーネント２０４と異常検出コンポーネント２０６とから構成される自動データパースペクティブ異常検出コンポーネント２０２からなる。データチューブコンポーネント２０４は、データパースペクティブ２０８を受け取り、データパースペクティブ２０８を処理してデータチューブにする。データチューブは、１つのみの変化するデータ次元を有する、データパースペクティブ２０８からのデータのスライスからなる。異常検出コンポーネント２０６は、データチューブを受け取り、曲線当てはめプロセスを利用してデータチューブを処理して、データ異常があるかどうかを判定する。曲線当てはめプロセスは、データチューブ中のデータを推定することが可能な関数を生成することを試みるプロセスから構成される。推定されたデータは、データチューブ中のデータの逸脱スコアを判定するために利用される「予測データ」になる。閾値入力２１２が異常検出コンポーネント２０６によって用いられて、許容できる逸脱の量を決定する。閾値入力２１２は、システムによって生成されても、かつ／またはユーザによって生成されてもよい。異常検出コンポーネント２０６で閾値入力２１２を超えると判定されたデータは、異常２１０として出力される。 Referring to FIG. 2, another block diagram of an automatic data perspective anomaly detection system 200 is shown in accordance with an aspect of the present invention. The automatic data perspective anomaly detection system 200 includes an automatic data perspective anomaly detection component 202 composed of a data tube component 204 and an anomaly detection component 206. The data tube component 204 receives the data perspective 208 and processes the data perspective 208 into a data tube. A data tube consists of a slice of data from the data perspective 208 with only one changing data dimension. Anomaly detection component 206 receives the data tube and processes the data tube using a curve fitting process to determine if there is a data anomaly. The curve fitting process consists of a process that attempts to generate a function that can estimate the data in the data tube. The estimated data becomes “predicted data” that is used to determine the deviation score of the data in the data tube. A threshold input 212 is used by the anomaly detection component 206 to determine the amount of deviation that can be tolerated. The threshold input 212 may be generated by the system and / or generated by the user. Data determined by the abnormality detection component 206 to exceed the threshold input 212 is output as an abnormality 210.

図３を見ると、本発明の一態様による、自動データパースペクティブ異常検出コンポーネント３００のさらに別のブロック図が示される。自動データパースペクティブ異常検出コンポーネント３００は、データチューブコンポーネント３１０と異常検出コンポーネント３０２からなる。異常検出コンポーネント３０２は、曲線当てはめ関数コンポーネント３０４、データ逸脱スコアコンポーネント３０６、および異常判定コンポーネント３０８からなる。曲線当てはめ関数コンポーネント３０４は、データチューブコンポーネント３１０からデータチューブを受け取り、そのデータチューブのデータを表すのに適した関数を決定する。これにより、そのデータチューブのデータ値の予測データを生成することができる。曲線当てはめ関数コンポーネント３０４は、ユーザによって指定されたオプションの関数３１６を受け取って、適切な関数として利用することもできる。これにより、ユーザが、検出プロセスを調整することができる。データ逸脱スコアコンポーネント３０６は、曲線当てはめ関数コンポーネント３０４からの曲線当てはめ関数とともに、データチューブからデータを受け取る。データ逸脱スコアコンポーネント３０６は、曲線当てはめ関数を利用してそのデータの値を予測する。次いで、それらの値が実際のデータ値と比較され、予測値からの逸脱の量に基づくスコアが求められる。異常検出コンポーネント３０８は、逸脱スコアを受け取り、閾値入力３１４を利用して、閾値を超えるデータを検出する。閾値を超えると判定されたデータは、異常があると見なされ、データ異常３１２として出力される。 Turning to FIG. 3, yet another block diagram of an automatic data perspective anomaly detection component 300 in accordance with an aspect of the present invention is shown. The automatic data perspective abnormality detection component 300 includes a data tube component 310 and an abnormality detection component 302. The abnormality detection component 302 includes a curve fitting function component 304, a data deviation score component 306, and an abnormality determination component 308. The curve fitting function component 304 receives a data tube from the data tube component 310 and determines a suitable function to represent the data tube data. Thereby, the prediction data of the data value of the data tube can be generated. The curve fitting function component 304 may receive an optional function 316 specified by the user and use it as an appropriate function. This allows the user to adjust the detection process. The data deviance score component 306 receives data from the data tube along with the curve fitting function from the curve fitting function component 304. The data deviance score component 306 uses a curve fitting function to predict the value of the data. These values are then compared with actual data values to determine a score based on the amount of deviation from the predicted value. Anomaly detection component 308 receives the departure score and utilizes threshold input 314 to detect data that exceeds the threshold. Data determined to exceed the threshold is considered to be abnormal and is output as data abnormality 312.

上記のシステムをより理解するには、データのコンテキストと意味を理解することが役立つ。ピボットテーブルおよび／またはＯＬＡＰキューブなどのデータパースペクティブは、ビジネスにとって重要なツールである。それらは、ユーザが大きなデータセットを迅速かつ容易に操作する(navigate)ことを可能にし、それによりビジネス上の、および他の決定を助ける。基本的に、ピボットテーブルやＯＬＡＰキューブなどのデータパースペクティブは、データセットのｎ次元のビューである。例えば、表１に一部を示すデータに対応するピボットテーブルを表２に示す。 To better understand the above system, it is helpful to understand the context and meaning of the data. Data perspectives such as pivot tables and / or OLAP cubes are important tools for business. They allow users to navigate large data sets quickly and easily, thereby helping business and other decisions. Basically, data perspectives such as pivot tables and OLAP cubes are n-dimensional views of the dataset. For example, Table 2 shows a pivot table corresponding to data partially shown in Table 1.

このデータパースペクティブでは、日付と「製品カテゴリ」の関数としての平均売上高が示され、売上高は、地域（販売地域）で平均されている。この例では、「売上高」がターゲットであり、一方「日付」と「製品カテゴリ」は表示される次元であり、「地域」は、集約された次元である。表２では、集約は平均であるが、合計、最小値、最大値など他の集約が可能である。同じデータセットの他のデータパースペクティブが可能である。例えば、製品カテゴリで平均された、日付と地域の関数としての売上高が可能である。表示される次元の数は、２つより多くてよい（表４参照）。 In this data perspective, average sales as a function of date and “product category” are shown, and sales are averaged by region (sales region). In this example, “Sales” is the target, while “Date” and “Product Category” are the displayed dimensions, and “Region” is the aggregated dimension. In Table 2, the aggregation is average, but other aggregations are possible, such as total, minimum, maximum. Other data perspectives of the same data set are possible. For example, sales as a function of date and region, averaged by product category, are possible. The number of dimensions displayed may be more than two (see Table 4).

各次元は、階層を有することができる。この例では、日付の階層は、年、四半期、週であり、製品の階層は、製品カテゴリ、製品であり、場所の階層は、地域、州である。ピボットテーブルなどのデータパースペクティブの重要な部分の１つは、表示される階層のレベルである。表２では、表示されるレベルは、日付の次元の年、製品の次元の製品カテゴリ、および場所の次元の地域である。ユーザは、表示された次元まで展開(drill down)することができる。これは、その次元の階層中の次に低いレベルまで移動することに相当する（表５参照）。ユーザは、現在ピボットテーブルにない次元に従ってピボットを拡大することにより、所与のピボットを横に展開(drill across)することもできる。例えば、表４に、表２のピボットテーブルを地域単位で横に展開した結果を示す。 Each dimension can have a hierarchy. In this example, the date hierarchy is year, quarter, week, the product hierarchy is product category, product, and the location hierarchy is region, state. One important part of a data perspective, such as a pivot table, is the level of the hierarchy that is displayed. In Table 2, the displayed levels are the year in the date dimension, the product category in the product dimension, and the region in the location dimension. The user can drill down to the displayed dimensions. This corresponds to moving to the next lower level in the dimension hierarchy (see Table 5). The user can also drill across a given pivot by expanding the pivot according to a dimension that is not currently in the pivot table. For example, Table 4 shows the result of horizontally expanding the pivot table of Table 2 on a regional basis.

ピボットテーブルは、示されるデータを選択する次元を、その階層のいずれかのレベルで含むページフィールドも有する。表２で、ページフィールドは、地域のレベルに場所の次元を含んでいる。すべての地域の売上高が選択されている。あるいは、ユーザは、特定の地域または州の売上高を選択してもよい。一般に、データセットのピボットテーブルは、（１）ターゲット、（２）階層のいずれかのレベルに表示される次元、（３）階層のいずれかのレベルのページフィールドの次元、および（４）集約関数、に対応する。 The pivot table also has a page field that contains the dimensions for selecting the data shown at any level of its hierarchy. In Table 2, the page field contains the location dimension at the regional level. Sales in all regions are selected. Alternatively, the user may select sales for a specific region or state. In general, a dataset pivot table consists of (1) a target, (2) a dimension displayed at any level of the hierarchy, (3) a dimension of a page field at any level of the hierarchy, and (4) an aggregate function. , Corresponding to.

通例は、例えばピボットテーブルなどのデータパースペクティブ中の１つまたは複数のセルに異常がある可能性がある。本発明は、例えば（１）画面内の異常、（２）横方向に展開した異常、（３）下方への展開異常、などの少なくとも３つのタイプのセル異常を自動的に検出し、表示する。セルは、画面に表示された他のデータとの関係で異常がある場合に画面内の異常となる。セルは、そのセルを横方向に展開することで異常が明らかになる場合に横方向に展開した異常となる。セルは、そのセルの中へと展開することで異常が明らかになる場合に下方への展開異常となる。これらのタイプの異常を表３に示す。表３は、フォーマットを除いては表２と同じである。 Typically, there may be an anomaly in one or more cells in a data perspective, such as a pivot table. The present invention automatically detects and displays at least three types of cell abnormalities such as (1) abnormalities in the screen, (2) abnormalities developed in the horizontal direction, and (3) abnormalities developed downward. . A cell becomes abnormal in the screen when there is an abnormality in relation to other data displayed on the screen. A cell becomes an anomaly developed in the lateral direction when the anomaly becomes apparent by expanding the cell in the lateral direction. A cell becomes a downward expansion abnormality when an abnormality becomes apparent by expanding into the cell. These types of abnormalities are shown in Table 3. Table 3 is the same as Table 2 except for the format.

表３では、カテゴリ２／１９９９のセルが、同じ行または列にあるどの他のセルよりも高い売上げ平均を有するので、画面内の異常である。表３のカテゴリ２／２００１のセルは、横方向に展開した異常である。この異常は、下の表４に示すように、ユーザがこのデータパースペクティブを地域単位で横に展開するまで明らかにならない。 In Table 3, category 2/1999 cells are anomalies in the screen because they have a higher sales average than any other cell in the same row or column. The cell of category 2/2001 in Table 3 is an anomaly developed in the horizontal direction. This anomaly will not become apparent until the user deploys this data perspective horizontally on a regional basis, as shown in Table 4 below.

表４で、ｒ３の売上高は、ｒ１およびｒ２の売上高より大幅に低く示されている。また、表３のセル、カテゴリ３／２００２は、下方に展開した異常である。この場合も、この異常は、下の表５に示すようにユーザが製品の階層を下方に展開するまで明らかにならない。 In Table 4, sales of r3 are shown significantly lower than sales of r1 and r2. Moreover, the cell of Table 3, category 3/2002 is an abnormality developed downward. Again, this anomaly will not become apparent until the user expands the product hierarchy down as shown in Table 5 below.

表５で、製品３の売上げは、製品１および製品２の売上げよりはるかに低く示されている。これらの例では、画面内の異常は、ハイライト表示され、横方向に展開した異常と下方への展開異常は、枠線で示されている。しかし、当業者は、多くの他の種類が可能であることを認識されよう。 In Table 5, sales of product 3 are shown much lower than sales of product 1 and product 2. In these examples, abnormalities in the screen are highlighted, and abnormalities developed in the horizontal direction and abnormalities developed downward are indicated by frame lines. However, those skilled in the art will recognize that many other types are possible.

本発明による自動的な異常検出の例を続ける。用語「チューブ」を利用して、１つのみの次元が変化する所与のデータパースペクティブのスライスを意味する。２次元のデータパースペクティブでは、チューブは、単に、行および／または列に対応する。表４の３次元のピボットテーブルにチューブの例をいくつか示すが、この表は、（１）日付と地域を固定した、様々に異なる製品カテゴリ、（２）製品カテゴリと日付を固定した、様々に異なる地域、（３）製品カテゴリと地域を固定した、様々に異なる日付、に対応する。 Continuing with the example of automatic anomaly detection according to the present invention. The term “tube” is used to mean a slice of a given data perspective in which only one dimension changes. In the two-dimensional data perspective, tubes simply correspond to rows and / or columns. Some examples of tubes are shown in the three-dimensional pivot table in Table 4, which shows (1) different product categories with fixed dates and regions, and (2) various fixed product categories and dates. (3) The product category and the region are fixed, and various dates are different.

セルは、曲線当てはめ関数で計算されたそのセルに予想される値から著しく逸脱する場合に、チューブに対して異常があることになる。データパースペクティブ中の値が連続することを必要とするのではなく、データパースペクティブは１次元であり、順序付けられた索引を有すると想定する。例えば、データパースペクティブは、時間、距離、または金額で索引付けされたデータパースペクティブとすることができる。したがって、データパースペクティブの値は、連続していても、かつ／または離散していてもよい。そして、異常検出のために、そのパースペクティブに例えば「自己回帰」(auto-regressive)型の曲線当てはめ法などの曲線当てはめ法を適用することができる。本発明の一事例では、予測値からの逸脱の量に逸脱スコアを割り当てることにより、異常検出を容易にすることができる。そして、逸脱スコアを所与の閾値と比較して、異常が存在するかどうかを判定することができる。例えば、離散データには、データパースペクティブ中の観測値の確率が求められる。その確率が著しく低い場合、データは異常と見なされる。 A cell will be abnormal for the tube if it deviates significantly from the expected value for that cell as calculated by the curve fitting function. Rather than requiring the values in the data perspective to be contiguous, assume that the data perspective is one-dimensional and has an ordered index. For example, the data perspective can be a data perspective indexed by time, distance, or amount. Thus, the data perspective values may be continuous and / or discrete. In order to detect anomalies, a curve fitting method such as an “auto-regressive” type curve fitting method can be applied to the perspective. In one example of the present invention, anomaly detection can be facilitated by assigning a deviation score to the amount of deviation from the predicted value. The deviation score can then be compared with a given threshold to determine if an anomaly exists. For example, the probability of the observed value in the data perspective is obtained for discrete data. If the probability is significantly low, the data is considered abnormal.

本発明の別の事例では、チューブ中の連続した次元データが、例えば回帰木(regression tree)を利用して区分線形関数に当てはめられる。そして、セルは、
｜セル中の値−セルの予測値｜＞閾値（式１）
の場合に異常であることになり、この式の左側は、セルの逸脱スコアである。 In another example of the present invention, continuous dimensional data in a tube is applied to a piecewise linear function using, for example, a regression tree. And the cell
| Value in cell−Predicted value of cell |> Threshold (Formula 1)
In this case, the left side of this formula is the cell deviation score.

本発明のさらに別の事例では、チューブ中の離散した次元データが自己回帰モデルに当てはめられる。すると、セル中の値の確率が何らかの閾値より低い場合に、セルが異常であることになる。 In yet another case of the present invention, discrete dimensional data in the tube is fitted to the autoregressive model. Then, when the probability of the value in the cell is lower than some threshold value, the cell is abnormal.

上記から明らかなように、本発明は、連続したデータと離散データに異なる曲線当てはめ関数を適用する。しかし、次元が離散しているか連続しているかを判定する方法は複数ある。例えば、例えばフォーマットコマンドなどを介して次元に「数」という名前を付けるなどユーザが、選択を指定することができる。あるいはさらなる例として、この選択は、データを調べることにより自動的に行うことができる（１９９９年４月２３日にＨｅｃｋｒｍａｎ他によって出願された「Determining Whether A Variable Is Numeric Or Non-Numeric」という名称の米国特許出願第０９／２９８，７３７号明細書に記載されるようなシステムおよび方法を利用する）。 As is apparent from the above, the present invention applies different curve fitting functions to continuous data and discrete data. However, there are multiple methods for determining whether a dimension is discrete or continuous. For example, the user can specify a selection, for example by naming the dimension “number” via a format command or the like. Alternatively, as a further example, this selection can be made automatically by examining the data (named “Determining Whether A Variable Is Numeric Or Non-Numeric” filed by Heckrman et al. On April 23, 1999). System and method as described in US patent application Ser. No. 09 / 298,737).

次いで、ピボットテーブルなど所与のデータパースペクティブについて、この例の３つのタイプの異常を定義する。セルは、表示されたチューブのいずれかに対して異常がある場合に画面内の異常になる。他の定義としては、これらに限定しないが、（１）表示されたすべてのチューブに対してセルに異常がある場合に画面内の異常となる、（２）画面のすべてのチューブに沿って平均された逸脱の度合いが閾値を超える場合に、画面内の異常になる、等が可能である。セルは、表示されていない次元にわたって変化する異常なチューブがある場合に、表示された次元を固定した状態で横方向に展開した異常となる。セルは、より深いレベルまで展開された、現在表示されている次元にわたって変化する異常なチューブがある場合に、すべての他の表示された次元を固定した状態で下方への展開異常となる。 The three types of anomalies in this example are then defined for a given data perspective, such as a pivot table. A cell becomes an anomaly in the screen when there is an anomaly for any of the displayed tubes. Other definitions include, but are not limited to: (1) Anomaly in the screen when there is a cell abnormality for all displayed tubes, (2) An average along all tubes on the screen When the degree of deviation made exceeds a threshold value, an abnormality in the screen is possible. When there is an abnormal tube that changes over a dimension that is not displayed, the cell becomes an anomaly that is expanded laterally with the displayed dimension fixed. A cell becomes a downward expansion anomaly with all other displayed dimensions fixed when there is an abnormal tube that has been expanded to a deeper level and changes across the currently displayed dimension.

横方向に展開した異常と下方への展開異常は、当然のことながら、ユーザには見ることができない。それらを明らかにする機構により、ユーザは、異常を見るためにどの次元および／または階層を拡大する必要があるかが分かる。一部のデータパースペクティブアプリケーションでは、これは、異常なセルの上でマウスなどポインティングデバイスを右クリックすることを介して行うことができる。異常を含む次元および／または階層を示すことに加えて、例えば、対応する逸脱スコアに従って次元および階層を分類することにより、異常の程度も示すことができる。 Obviously, the abnormality developed in the horizontal direction and the abnormality developed in the downward direction cannot be seen by the user. The mechanism that reveals them tells the user which dimension and / or hierarchy needs to be expanded to see the anomaly. In some data perspective applications, this can be done via right clicking on a pointing device such as a mouse over an abnormal cell. In addition to indicating dimensions and / or hierarchies that include anomalies, the degree of anomalies can also be indicated, for example, by classifying dimensions and hierarchies according to corresponding deviation scores.

画面内の異常に関しては、ユーザは、セルが異常である理由の説明を求める場合がある。これは、本発明を介して、その逸脱スコアが閾値を超えるチューブを表示する、例えばハイライト表示することによって実現される。この機能は、一部のアプリケーションではマウスなどポインティングデバイスの右クリックを利用することによって用いることもできる。 With regard to the abnormality in the screen, the user may ask for an explanation of why the cell is abnormal. This is achieved through the present invention by displaying, for example, highlighting tubes whose deviation score exceeds a threshold. This function can also be used in some applications by using a right click of a pointing device such as a mouse.

閾値については、２つの事例を考慮すべきである。１つ目は、問題となるセルの中へと展開する、かつ／または、問題となるセルを横に展開することができ、かつ／または元のデータに同じセルの複数のエントリがある。この場合は、閾値ｃσを利用することができ、ｃは、ユーザによって制御される定数であり、σは、そのセルを１回または複数回拡大した結果生じるデータの標準的な逸脱である。２つ目は、セルが拡大されることができない場合、または上記の閾値の代替法として、ｃ＜予測値＞あるいは単に閾値としてのｃを用いることができ、この場合も、ｃは、ユーザによって制御される定数である。あるいは、上位ｋ個の異常を示すことができ、ｋは、ユーザによって選択される。あるいは、拡大することができないセルは名前が付けられないように選択することができる。 Two cases should be considered for the threshold. The first can expand into the cell in question and / or expand the cell in question horizontally and / or have multiple entries of the same cell in the original data. In this case, a threshold cσ can be used, where c is a constant controlled by the user, and σ is the standard deviation of data resulting from expanding the cell one or more times. Second, if the cell cannot be enlarged, or as an alternative to the above threshold, c <predicted value> or simply c as the threshold can be used, where c is also determined by the user. A constant to be controlled. Alternatively, the top k anomalies can be indicated and k is selected by the user. Alternatively, cells that cannot be expanded can be selected so that they are not named.

図示し、上記で説明した例示的システムに照らして、本発明により実施することが可能な方法論が、図４〜５のフローチャートを参照してよりよく理解されよう。説明を分かりやすくするために、方法論は、ブロックの連続として図示し、説明するが、本発明は、このブロックの順序によって制限されず、一部のブロックは、本発明により、図示し、ここに説明する順序とは異なる順序で、かつ／または他のブロックと同時に行ってよいことを理解されたい。さらに、本発明による方法論を実施するために、図示されるブロックがすべて必要であるとは限らない。 In light of the exemplary system illustrated and described above, methodologies that can be implemented in accordance with the present invention will be better understood with reference to the flowcharts of FIGS. For clarity of explanation, the methodology is illustrated and described as a series of blocks, but the present invention is not limited by this block order, and some blocks are illustrated and described herein according to the present invention. It should be understood that this may be done in a different order than described and / or simultaneously with other blocks. Moreover, not all illustrated blocks may be required to implement a methodology in accordance with the present invention.

本発明は、１つまたは複数のコンポーネントによって実行されるプログラムモジュールなどのコンピュータ実行可能命令との一般的なコンテキストで説明することができる。一般に、プログラムモジュールには、特定のタスクを行うか、特定の抽象データ型を実装する、ルーチン、プログラム、オブジェクト、データ構造などがある。通例、プログラムモジュールの機能は、各種実施形態で必要に応じて組み合わせても、分散してもよい。 The invention can be described in the general context of computer-executable instructions, such as program modules, being executed by one or more components. Generally, program modules include routines, programs, objects, data structures, etc. that perform particular tasks or implement particular abstract data types. In general, the functions of the program modules may be combined or distributed as necessary in various embodiments.

図４に、本発明の一態様による、自動的なデータパースペクティブ異常の検出を容易にする方法４００の流れ図を示す。方法４００は、１つのみの次元が変化するデータパースペクティブからデータスライスを表すチューブデータを受け取る（４０４）ことによって開始する（４０２）。次いで、データに適用される曲線当てはめ関数を利用して、データ中の異常が自動的に検出される（４０６）。曲線当てはめ関数は、導出する、かつ／またはユーザによって指定することができる。異常検出は、システムおよび／またはユーザによって供給される閾値逸脱値によってさらに容易にすることができる。閾値逸脱値は、データ異常のタイプに応じて異なってもよい。次いで、検出された異常がデータ異常として出力され（４０８）、流れが終了する（４１０）。 FIG. 4 illustrates a flowchart of a method 400 that facilitates automatic data perspective anomaly detection in accordance with an aspect of the present invention. The method 400 begins (402) by receiving (404) tube data representing a data slice from a data perspective in which only one dimension changes. Anomalies in the data are then automatically detected (406) using a curve fitting function applied to the data. The curve fitting function can be derived and / or specified by the user. Anomaly detection can be further facilitated by threshold deviation values supplied by the system and / or user. The threshold departure value may vary depending on the type of data anomaly. The detected anomaly is then output as a data anomaly (408) and the flow ends (410).

図５を参照すると、本発明の一態様による自動的なデータパースペクティブ異常の検出を容易にする方法５００の別の流れ図が示される。方法５００は、１つのみの次元が変化するデータパースペクティブからのデータスライスを表すチューブデータを受け取ることによって開始する（５０４）。次いで、そのデータチューブのデータを最もよく表す関数が判定される（５０６）。この関数は、連続したデータと離散データに対して、区分的線形プロセスや回帰木プロセスなどの自己回帰プロセスを通じて得ることができる。この関数は、ユーザによって提供される関数として得ることもできる。次いで、曲線当てはめ関数から得られるデータの予測値と実際のデータ値とに基づいて逸脱スコアを求める（５０８）。次いで、データ値が異常と見なされる前に許容される逸脱の量を決定する閾値が受け取られる（５１０）。この閾値は、システムを介して決定するか、かつ／またはユーザから提供されることができる。この閾値は、静的な値および／または動的な値である。閾値は、データ異常のタイプに応じて異なってもよい。次いで、閾値を超える逸脱スコアを有するデータ値を判定することによってデータ異常が検出され（５１２）、流れが終了する（５１４）。通例、データ異常は、ハイライト表示、枠線、および／または色分けなどの画面内の表示を介してユーザに伝達される。しかし、アイコンおよび他のグラフィック表示(indicator)も利用することができる。そうした表示により、ユーザは、どのレベルにデータ異常を見つけることができるかを判断することができる。表示を利用して、データ異常のタイプおよび／またはデータ異常の逸脱の度合いを示すこともできる。本発明の他の事例は、実際の異常データを見るためにさらなるユーザ入力を必要とせずに、自動的にデータ異常をユーザに表示するさらなる動作を含む。これにより、ユーザが、データ異常に到達し、見るためのすべてのデータレベル表示を知り、理解することが必要でなくなるので、ユーザへのデータの伝達が激減する。 Referring to FIG. 5, another flow diagram of a method 500 that facilitates automatic data perspective anomaly detection in accordance with an aspect of the present invention is shown. The method 500 begins by receiving 504 tube data representing a data slice from a data perspective in which only one dimension changes. The function that best represents the data tube data is then determined (506). This function can be obtained through autoregressive processes such as piecewise linear processes and regression tree processes for continuous and discrete data. This function can also be obtained as a function provided by the user. Next, a deviation score is obtained based on the predicted value of the data obtained from the curve fitting function and the actual data value (508). A threshold is then received (510) that determines the amount of deviation allowed before the data value is considered abnormal. This threshold can be determined via the system and / or provided by the user. This threshold is a static value and / or a dynamic value. The threshold may be different depending on the type of data abnormality. A data anomaly is then detected by determining a data value having a deviation score that exceeds a threshold (512) and the flow ends (514). Typically, data anomalies are communicated to the user via in-screen displays such as highlighting, borders, and / or color coding. However, icons and other graphic indicators can also be used. Such a display allows the user to determine at what level data anomalies can be found. The display can also be used to indicate the type of data anomaly and / or the degree of deviation of the data anomaly. Other instances of the present invention include further operations that automatically display data anomalies to the user without requiring further user input to view the actual anomaly data. This drastically reduces the transmission of data to the user because the user has reached a data anomaly and does not need to know and understand all data level displays for viewing.

本発明の各種態様を実施する追加的なコンテキストを提供するために、図６と以下の説明で、本発明の各種態様を実施することが可能な適切なコンピューティング環境６００の簡単で概略的な説明を提供する。上記ではローカルコンピュータおよび／またはリモートコンピュータで実行されるコンピュータプログラムのコンピュータ実行可能命令との一般的関連で本発明を説明したが、当業者は、本発明は、他のプログラムモジュールと組み合わせて実施してもよいことを認識されよう。一般に、プログラムモジュールには、特定のタスクを行うか、かつ／または特定の抽象データ型を実施する、ルーチン、プログラム、コンポーネント、データ構造などが含まれる。さらに、当業者は、本発明の方法は、シングルプロセッサまたはマルチプロセッサのコンピュータシステム、ミニコンピュータ、メインフレームコンピュータ、パーソナルコンピュータ、ハンドヘルドコンピューティングデバイス、マイクロプロセッサを利用した、かつ／またはプログラム可能な家庭電化製品などを含む他のコンピュータシステム構成で実施できることを理解されよう。上記のような各デバイスは、１つまたは複数の関連付けられたデバイスと動作上通信することができる。ここに説明する本発明の態様は、通信ネットワークを通じてつながれた遠隔の処理デバイスによって特定のタスクが行われる分散コンピューティング環境で実施することもできる。しかし、本発明のすべてではなくとも一部の態様は、独立型のコンピュータで実施することができる。分散コンピューティング環境では、プログラムモジュールは、ローカルおよび／またはリモートのメモリ記憶装置に置くことができる。 To provide additional context for implementing various aspects of the present invention, a simplified and schematic representation of a suitable computing environment 600 capable of implementing various aspects of the present invention is shown in FIG. 6 and the following description. Provide a description. Although the present invention has been described above in the general context of computer-executable instructions for a computer program executing on a local computer and / or a remote computer, those skilled in the art will implement the invention in combination with other program modules. Recognize that you may. Generally, program modules include routines, programs, components, data structures, etc. that perform particular tasks and / or implement particular abstract data types. Further, those skilled in the art will understand that the method of the present invention may be implemented using a single processor or multiprocessor computer system, minicomputer, mainframe computer, personal computer, handheld computing device, microprocessor, and / or programmable home electrification. It will be appreciated that other computer system configurations including products and the like can be implemented. Each device as described above can be in operational communication with one or more associated devices. The aspects of the invention described herein may also be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network. However, some, if not all aspects of the present invention can be implemented in a stand-alone computer. In a distributed computing environment, program modules can be located in local and / or remote memory storage devices.

本願で使用される用語「コンポーネント」は、ハードウェア、ハードウェアとソフトウェアの組合せ、ソフトウェア、あるいは実行中のソフトウェアを問わず、コンピュータに関連するエンティティを指すものとする。例えば、コンポーネントは、これらに限定しないが、プロセッサで実行されるプロセス、プロセッサ、オブジェクト、実行ファイル、実行のスレッド、プログラム、およびコンピュータなどである。例示として、サーバで実行されるアプリケーションおよび／またはサーバがコンポーネントとなることができる。また、コンポーネントは、１つまたは複数の下位コンポーネントを含むことができる。 As used herein, the term “component” is intended to refer to an entity associated with a computer, whether hardware, a combination of hardware and software, software, or running software. For example, a component may be, but is not limited to being, a process executed by a processor, a processor, an object, an executable, a thread of execution, a program, a computer, and the like. By way of illustration, an application running on a server and / or the server can be a component. A component can also include one or more subcomponents.

図６を参照すると、本発明の各種態様を実施する例示的なシステム環境６００は、従来のコンピュータ６０２を含み、コンピュータ６０２は、処理装置６０４、システムメモリ６０６、およびシステムメモリを含む各種のシステム構成要素を処理装置６０４に結合するシステムバス６０８を含む。処理装置６０４は、市販のプロセッサでも独自のプロセッサでもよい。また、処理装置は、並列に接続できるような２つ以上のプロセッサから形成されるマルチプロセッサとして実施してよい。 With reference to FIG. 6, an exemplary system environment 600 for implementing various aspects of the invention includes a conventional computer 602 that includes a processing unit 604, a system memory 606, and various system configurations including system memory. A system bus 608 is included that couples the elements to the processing unit 604. The processing device 604 may be a commercially available processor or a unique processor. The processing device may be implemented as a multiprocessor formed of two or more processors that can be connected in parallel.

システムバス６０８は、数例を挙げるとＰＣＩ、ＶＥＳＡ、Ｍｉｃｒｏｃｈａｎｎｅｌ、ＩＳＡ、ＥＩＳＡなどの従来のバスアーキテクチャを使用した、メモリバスまたはメモリコントローラ、ペリフェラルバス、ローカルバスを含む数種のバス構造のいずれでもよい。システムメモリ６０６には、読み取り専用メモリ（ＲＯＭ）６１０とランダムアクセスメモリ（ＲＡＭ）６１２が含まれる。起動時などにコンピュータ６０２内の要素間の情報転送を助ける基本ルーチンを含む基本入出力システム（ＢＩＯＳ）６１４は、ＲＯＭ６０１に記憶される。 The system bus 608 can be any of several bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using conventional bus architectures such as PCI, VESA, Microchannel, ISA, EISA, to name a few. Good. The system memory 606 includes a read only memory (ROM) 610 and a random access memory (RAM) 612. A basic input / output system (BIOS) 614 including basic routines that help transfer information between elements in the computer 602 at the time of startup or the like is stored in the ROM 601.

コンピュータ６０２は、例えば、ハードディスクドライブ６１６、例えば取り外し可能ディスク６２０の読み書きを行う磁気ディスクドライブ６１８、例えばＣＤ−ＲＯＭディスク６２４や他の光学媒体の読み書きを行う光ディスクドライブ６２２も含むことができる。ハードディスクドライブ６１６、磁気ディスクドライブ６１８、および光ディスクドライブ６２２は、それぞれハードディスクドライブインターフェース６２６、磁気ディスクドライブインターフェース６２８、および光ドライブインターフェース６３０によりシステムバス６０８に接続される。これらのドライブ６１６〜６２２とそれに関連付けられたコンピュータ可読媒体は、データ、データ構造、コンピュータ実行可能命令などの不揮発性の記憶をコンピュータ６０２に提供する。上記のコンピュータ可読媒体の説明では、ハードディスク、取り外し可能磁気ディスク、およびＣＤを挙げたが、当業者には、磁気カセット、フラッシュメモリカード、デジタルビデオディスク、ベルヌーイカートリッジなど、コンピュータ可読の他タイプの媒体も例示的動作環境６００で使用することができ、さらに、そのような媒体は、本発明の方法を行うコンピュータ実行可能命令を保持することができることを理解されよう。 The computer 602 can also include, for example, a hard disk drive 616, such as a magnetic disk drive 618 that reads and writes a removable disk 620, such as an optical disk drive 622 that reads and writes a CD-ROM disk 624 and other optical media. The hard disk drive 616, magnetic disk drive 618, and optical disk drive 622 are connected to the system bus 608 by a hard disk drive interface 626, a magnetic disk drive interface 628, and an optical drive interface 630, respectively. These drives 616-622 and associated computer readable media provide computer 602 with non-volatile storage of data, data structures, computer-executable instructions, and the like. While the above description of computer readable media includes hard disks, removable magnetic disks, and CDs, those skilled in the art will recognize other types of computer readable media such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, etc. It will be appreciated that can also be used in the exemplary operating environment 600, and that such media can hold computer-executable instructions for performing the methods of the present invention.

ドライブ６１６〜６２２およびＲＡＭ６１２には複数のプログラムモジュールを記憶することができ、これには、オペレーティングシステム６３２、１つまたは複数のアプリケーションプログラム６３４、他のプログラムモジュール６３６、およびプログラムデータ６３８が含まれる。オペレーティングシステム６３２は、任意の適当なオペレーティングシステム、またはオペレーティングシステムの組合せであってよい。例として、アプリケーションプログラム６３４は、本発明の一態様によるデータパースペクティブ分析法を含むことができる。 A plurality of program modules may be stored in the drives 616-622 and the RAM 612, including an operating system 632, one or more application programs 634, other program modules 636, and program data 638. Operating system 632 may be any suitable operating system or combination of operating systems. By way of example, the application program 634 can include a data perspective analysis method according to an aspect of the present invention.

ユーザは、キーボード６４０およびポインティングデバイス（マウスなど６４２）などの１つまたは複数のユーザ入力装置を通じてコンピュータ６０２にコマンドと情報を入力することができる。他の入力装置（図示せず）には、マイクロフォン、ジョイスティック、ゲームパッド、衛星受信アンテナ、無線リモコン、スキャナなどがある。上記および他の入力装置は、多くの場合、システムバス６０８に結合されたシリアルポートインターフェース６４４を通じて処理装置６０４に接続されるが、パラレルポート、ゲームポート、ユニバーサルシリアルバス（ＵＳＢ）などの他のインターフェースで接続してもよい。モニタ６４６または他のタイプの表示装置も、ビデオアダプタ６４８などのインターフェースを介してシステムバス６０８に接続される。モニタ６４６に加えて、コンピュータ６０２は、スピーカやプリンタなど他の周辺出力装置（図示せず）を含むことができる。 A user may enter commands and information into the computer 602 through one or more user input devices such as a keyboard 640 and a pointing device (such as a mouse 642). Other input devices (not shown) include a microphone, joystick, game pad, satellite reception antenna, wireless remote controller, scanner, and the like. These and other input devices are often connected to the processing unit 604 through a serial port interface 644 coupled to the system bus 608, although other interfaces such as parallel ports, game ports, universal serial bus (USB), etc. You may connect with. A monitor 646 or other type of display device is also connected to the system bus 608 via an interface, such as a video adapter 648. In addition to the monitor 646, the computer 602 can include other peripheral output devices (not shown) such as speakers and printers.

コンピュータ６０２は、１つまたは複数のリモートコンピュータ６６０との論理接続を使用するネットワーク環境で動作できることは理解されたい。リモートコンピュータ６６０は、ワークステーション、サーバコンピュータ、ルータ、ピアデバイス、または他の一般的なネットワークノードであり、図を簡潔にするために図６にはメモリ記憶装置６６２のみを示すが、通例は、コンピュータ６０２との関連で上述した要素の多くまたはすべてを含む。図６に示す論理接続は、ローカルエリアネットワーク（ＬＡＮ）６６４とワイドエリアネットワーク（ＷＡＮ）６６６を含むことができる。このようなネットワーキング環境は、オフィス、企業内のコンピュータネットワーク、イントラネット、およびインターネットに一般的に見られる。 It should be understood that the computer 602 can operate in a network environment that uses logical connections with one or more remote computers 660. The remote computer 660 is a workstation, server computer, router, peer device, or other common network node, and for the sake of brevity, only the memory storage 662 is shown in FIG. Includes many or all of the elements described above in connection with computer 602. The logical connections shown in FIG. 6 may include a local area network (LAN) 664 and a wide area network (WAN) 666. Such networking environments are commonly found in offices, corporate computer networks, intranets, and the Internet.

例えばＬＡＮネットワーキング環境で使用される場合、コンピュータ６０２は、ネットワークインターフェースあるいはアダプタ６６８を通じてローカルネットワーク６６４に接続される。ＷＡＮネットワーキング環境で使用される場合、コンピュータ６０２は通例、モデム（電話モデム、ＤＳＬモデム、ケーブルモデムなど）６７０を含むか、ＬＡＮ上の通信サーバに接続されるか、インターネットなどのＷＡＮ６６６を通じて通信を確立する他の手段を有する。モデム６７０は、コンピュータ６０２に対して内部にあっても外部にあってもよく、シリアルポートインターフェース６４４を介してシステムバス６０８に接続される。ネットワーク環境では、プログラムモジュール（アプリケーションプログラム６３４を含む）および／またはプログラムデータ６３８は、遠隔のメモリ記憶装置６６２に記憶することができる。ここで示すネットワーク接続は例示的なものであり、本発明の態様を実行する際には、コンピュータ６０２と６６０の間に通信リンクを確立する他の手段（有線または無線）を使用できることは理解されよう。 For example, when used in a LAN networking environment, the computer 602 is connected to the local network 664 through a network interface or adapter 668. When used in a WAN networking environment, the computer 602 typically includes a modem (telephone modem, DSL modem, cable modem, etc.) 670, is connected to a communication server on a LAN, or establishes communication through a WAN 666 such as the Internet. Have other means to do. The modem 670 may be internal or external to the computer 602 and is connected to the system bus 608 via the serial port interface 644. In a network environment, program modules (including application programs 634) and / or program data 638 can be stored in remote memory storage 662. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers 602 and 660 (wired or wireless) can be used in carrying out aspects of the invention. Like.

コンピュータプログラミングの当業者の慣行に従い、本発明は、特に指摘しない限りは、コンピュータ６０２やリモートコンピュータ６６０などのコンピュータによって行われる動作と操作の記号的表現を参照して説明した。そのような動作と操作は、時に、コンピュータによって実行されると言及される。動作と記号的に表現される操作は、処理装置６０４による、データビットを表す電気信号の操作を含み、その結果、電気信号表現が変換または変形され、また、メモリシステム（システムメモリ６０６、ハードドライブ６１６、フロッピー（登録商標）ディスク６２０、ＣＤ−ＲＯＭ６２４、リモートメモリ６６２を含む）の記憶場所にデータビットが保持されて、それにより、コンピュータシステムの動作と他の信号処理が再設定されるか、またはその他の形で変更されることは理解されよう。そのようなデータビットが保持される記憶場所は、そのデータビットに対応する特定の電気的、磁気的、または光学的特性を有する物理的な場所である。 In accordance with the practice of those skilled in the art of computer programming, the invention has been described with reference to acts and symbolic representations of operations performed by a computer, such as the computer 602 and remote computer 660, unless otherwise indicated. Such operations and operations are sometimes referred to as being performed by a computer. The operations represented symbolically as operations include the manipulation of electrical signals representing data bits by the processing unit 604, so that the electrical signal representation is converted or transformed, and the memory system (system memory 606, hard drive) 616, including floppy disk 620, CD-ROM 624, remote memory 662), data bits are retained, thereby reconfiguring the operation of the computer system and other signal processing, It will be understood that other or other changes may be made. The storage location where such data bits are held is a physical location having specific electrical, magnetic, or optical properties corresponding to the data bits.

図７は、本発明が対話することのできるコンピュータ環境例７００の別のブロック図である。システム７００はさらに、１つまたは複数のクライアント７０２を含むシステムを表す。クライアント７０２は、ハードウェアおよび／またはソフトウェア（スレッド、プロセス、コンピューティングデバイスなど）である。システム７００は、１つまたは複数のサーバ７０４も含む。サーバ７０４もハードウェアおよび／またはソフトウェア（スレッド、プロセス、コンピューティングデバイスなど）である。サーバ７０４は、例えば本発明を用いることにより変換を行うスレッドを収容することができる。クライアント７０２とサーバ７０４間で可能な通信の１つは、２つ以上のコンピュータプロセス間で送信されるように適合されたデータパケットの形態である。システム７００は、クライアント７０２とサーバ７０４間の通信を助けるために用いることができる通信フレームワーク７０８を含む。クライアント７０２は、クライアント７０２にとってローカルの情報を記憶するために用いることができる１つまたは複数のクライアントデータストア７１０に動作的に接続される。同様に、サーバ７０４も、サーバ７０４にとってローカルの情報を記憶するために用いることができる１つまたは複数のサーバデータストア７０６に動作的に接続される。 FIG. 7 is another block diagram of an example computer environment 700 with which the present invention can interact. System 700 further represents a system that includes one or more clients 702. Client 702 is hardware and / or software (threads, processes, computing devices, etc.). System 700 also includes one or more servers 704. Server 704 is also hardware and / or software (threads, processes, computing devices, etc.). The server 704 can accommodate a thread that performs conversion by using the present invention, for example. One possible communication between client 702 and server 704 is in the form of a data packet adapted to be transmitted between two or more computer processes. System 700 includes a communication framework 708 that can be used to facilitate communication between client 702 and server 704. Client 702 is operatively connected to one or more client data stores 710 that can be used to store information local to client 702. Similarly, server 704 is operatively connected to one or more server data stores 706 that can be used to store information local to server 704.

本発明の１つの事例では、２つ以上のコンピュータコンポーネント間でデータパースペクティブ分析を助けるデータパケットが送信され、このデータパケットは、少なくとも一部分は、データチューブのデータに適用される曲線当てはめプロセスを少なくとも部分的に利用するデータパースペクティブ分析システムに関連する情報からなり、データチューブは、１つのみのデータ次元が変化するデータパースペクティブの少なくとも１つのデータセルを含むデータスライスからなる。 In one example of the present invention, a data packet is transmitted between two or more computer components that aids data perspective analysis, the data packet at least partially comprising a curve fitting process applied to data tube data. The data tube consists of data slices that contain at least one data cell of the data perspective where only one data dimension changes.

本発明のシステムおよび／または方法は、コンピュータコンポーネントと非コンピュータ関連コンポーネントの双方を助けるデータパースペクティブ分析方式で利用することができることを理解されたい。さらに、本発明のシステムおよび／または方法は、これらに限定しないがコンピュータ、サーバ、および／またはハンドヘルド電子機器など、多数の電子関連技術で用いることができることを当業者は認識されよう。 It should be understood that the system and / or method of the present invention can be utilized in a data perspective analysis scheme that helps both computer and non-computer related components. Further, those skilled in the art will recognize that the systems and / or methods of the present invention can be used in numerous electronic-related technologies such as, but not limited to, computers, servers, and / or handheld electronics.

上記の説明には本発明の例が含まれる。言うまでもなく、本発明を説明するために構成要素または方法論のあらゆる着想可能な組合せを記載することは不可能であり、当業者は、本発明の多数のさらなる組合せおよび置き換えが可能であることを理解されよう。したがって、本発明は、頭記の特許請求の主旨および範囲に該当するそのような改変、変更、および変形をすべて包含するものとする。さらに、詳細な説明または特許請求の範囲で用語「〜を含む」が使用される限りでは、この語は、請求項で接続語として用いられた場合の用語「〜を備える」の解釈と同様に、包含的な意味とする。 What has been described above includes examples of the present invention. Of course, it is not possible to describe every conceivable combination of components or methodologies to illustrate the invention, and those skilled in the art will appreciate that many further combinations and substitutions of the invention are possible. Let's be done. Accordingly, the present invention is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Further, to the extent that the term “comprising” is used in the detailed description or in the claims, this term is similar to the interpretation of the term “comprising” when used as a conjunctive in a claim. Inclusive meaning.

本発明の一態様による自動データパースペクティブ異常検出システムのブロック図である。1 is a block diagram of an automatic data perspective anomaly detection system according to an aspect of the present invention. 本発明の一態様による自動データパースペクティブ異常検出システムの別のブロック図である。FIG. 3 is another block diagram of an automatic data perspective anomaly detection system according to an aspect of the present invention. 本発明の一態様による自動データパースペクティブ異常検出システムのさらに別のブロック図である。FIG. 6 is yet another block diagram of an automatic data perspective anomaly detection system according to an aspect of the present invention. 本発明の一態様による自動的なデータパースペクティブ異常の検出を容易にする方法の流れ図である。6 is a flowchart of a method that facilitates automatic data perspective anomaly detection in accordance with an aspect of the present invention. 本発明の一態様による自動的なデータパースペクティブ異常の検出を容易にする方法の別の流れ図である。4 is another flow diagram of a method that facilitates automatic data perspective anomaly detection in accordance with an aspect of the present invention. 本発明が機能することができる例示的動作環境の図である。FIG. 2 is an exemplary operating environment in which the present invention can function. 本発明が機能することができる別の例示的動作環境の図である。FIG. 4 is a diagram of another exemplary operating environment in which the present invention can function.

Explanation of symbols

１０４データパースペクティブ
１０２自動データパースペクティブ異常検出コンポーネント
１０６異常
１０８閾値入力
２０８データパースペクティブ
２０２自動データパースペクティブ異常検出コンポーネント
２０４データチューブコンポーネント
２０６異常検出コンポーネント
２１０異常
２１２閾値入力
３１０データチューブコンポーネント
３０２異常検出コンポーネント
３０４曲線当てはめ関数コンポーネント
３０６データ逸脱スコアコンポーネント
３０８異常判定コンポーネント
３１２異常
３１６ユーザ指定関数
３１４閾値入力
６０６システムメモリ
６３２オペレーティングシステム
６３４アプリケーションプログラム
６３６他のプログラムモジュール
６３８プログラムデータ
６０４処理装置
６０８バス
６２６ハードディスクドライブインターフェース
６２８磁気ディスクドライブインターフェース
６３０光学ドライブインターフェース
６４４シリアルポートインターフェース
６４６モニタ
６４８ビデオアダプタ
６６８ネットワークインターフェース
６６４ローカルエリアネットワーク
６６６ワイドエリアネットワーク
６７０モデム
６６０リモートコンピュータ
７０２クライアント
７０４サーバ
７１０クライアントデータストア
７０８通信フレームワーク
７０６サーバデータストア
104 Data Perspective 102 Automatic Data Perspective Anomaly Detection Component 106 Anomaly 108 Threshold Input 208 Data Perspective 202 Automatic Data Perspective Anomaly Detection Component 204 Data Tube Component 206 Anomaly Detection Component 210 Anomaly 212 Threshold Input 310 Data Tube Component 302 Anomaly Detection Component 304 Curve Fitting Function Component 306 Data deviation score component 308 Anomaly determination component 312 Anomaly 316 User specified function 314 Threshold input 606 System memory 632 Operating system 634 Application program 636 Other program module 638 Program data 604 Processing device 608 Bus 626 Har Disk drive interface 628 Magnetic disk drive interface 630 Optical drive interface 644 Serial port interface 646 Monitor 648 Video adapter 668 Network interface 664 Local area network 666 Wide area network 670 Modem 660 Remote computer 702 Client 704 Server 710 Client data store 708 Communication framework 706 Server data store

Claims

A system that facilitates analysis of data perspectives,
A component that receives at least one data perspective;
An anomaly detection component that automatically analyzes the data perspective to detect at least one data anomaly through a curve fitting process applied to continuous and / or discrete data from a data tube; and The system includes a data slice including at least one data cell of the data perspective, wherein only one data dimension varies.

The system of claim 1, wherein the curve fitting process includes a process that uses a piecewise linear function at least in part.

The system of claim 2, wherein the piecewise linear function includes a function that at least partially uses a regression tree.

The curve fitting process comprises a process that at least partially uses a probabilistic model that predicts a value in the data perspective, the probabilistic model being dependent on the location of the value in the data perspective in a non-obvious manner. The system of claim 1, characterized in that:

The system of claim 4, wherein the probability model comprises an autoregressive model.

The system of claim 1, wherein the data anomalies include anomalies based on significant deviations in data values from other data values found in the data tube.

The system of claim 6, wherein the significant deviation is based on at least one deviation score that exceeds a given threshold.

8. The divergence score is based at least in part on the value of the data cell compared to a predicted value of the data cell obtained from a piecewise linear function representing a data tube that includes the data cell. The described system.

The deviation score is based at least in part on the value of the data cell compared to the predicted value of the data cell obtained from a probability model that predicts discrete values in the data perspective, the probability model being in the data perspective. 8. The system of claim 7, wherein the system depends on the location of the value in a non-obvious manner.

The system of claim 7, wherein the given threshold includes at least one selected from the group consisting of a dynamic threshold and a static threshold.

11. The system of claim 10, wherein the given threshold includes at least one selected from the group consisting of a user defined threshold and a system defined threshold.

The system of claim 11, further comprising a user interface component that provides a plurality of selectable user-defined thresholds for use with different types of data anomalies.

The system of claim 1, wherein the data perspective includes at least one selected from the group consisting of a pivot table and an online analytical processing (OLAP) cube.

The system of claim 1, further comprising a user interface component that indicates the data anomaly to at least one user.

The system of claim 14, wherein the user interface component indicates the data anomaly via at least one selected from the group consisting of visual instructions and audio instructions.

The user interface component via at least one usage selected from the group consisting of highlighting at least one anomaly at the top level and enclosing at least one hidden anomaly with a border; 15. The system of claim 14, which assists in indicating the data abnormality.

The system of claim 14, wherein the user interface component comprises a user interface having user input controls for adjusting the level of indication based on at least one degree of data anomaly.

15. The system of claim 14, wherein the user interface comprises a component that assists in indicating the data anomaly through an automatic screen display of at least one data anomaly.

A method that facilitates analysis of data perspectives,
Receiving at least one data perspective;
Establishing a data tube from the data perspective, the data tube comprising a data slice including at least one data cell of the data perspective, wherein only one data dimension varies;
Determining a curve fitting function representing continuous and / or discrete data of the data tube;
Calculating a deviation score based at least in part on a difference between an actual data value and a predicted value provided via the curve fitting function;
Detecting data anomalies through evaluation of the deviation score and detection criteria.

The method of claim 19, wherein the curve fitting function comprises a user-selectable curve fitting function.

Classifying the data anomalies according to their accessibility,
20. The method of claim 19, further comprising: displaying the data anomaly to a user utilizing a set of indications of accessibility to the anomaly.

The method further includes the step of limiting the data abnormality displayed to the user by using the number k of data abnormality selectable by the user, limiting the data abnormality to the top k abnormality based on a deviation score. The method according to claim 21.

The method of claim 21, further comprising: automatically displaying on the screen at least one data anomaly to the user.

The display of the accessibility to the abnormality includes at least one selected from the group consisting of a display within a screen, a display expanded downward, and a display expanded horizontally. The method described.

The method of claim 19, wherein the data perspective includes at least one selected from the group consisting of a pivot table and an online analytical processing (OLAP) cube.

The method of claim 19, wherein the detection criteria includes a threshold.

The evaluation of the deviation score is:
27. The method of claim 26, comprising determining whether a departure score exceeds the threshold.

27. The method of claim 26, wherein the threshold includes at least one selected from the group consisting of a dynamic threshold and a static threshold.

30. The method of claim 28, wherein the threshold includes at least one selected from the group consisting of a user defined threshold and a system defined threshold.

30. The method of claim 29, further comprising adjusting a threshold defined by the user according to a type of data anomaly.

27. The method of claim 26, wherein the curve fitting process includes a process that uses at least partially a piecewise linear function.

32. The method of claim 31, wherein the piecewise linear function includes a function that at least partially utilizes a regression tree.

The curve fitting process includes a process that at least partially uses a probabilistic model that predicts discrete values in the data perspective, wherein the probabilistic model depends in a non-trivial manner on the position of the values in the data perspective. 27. A method according to claim 26.

The method of claim 33, wherein the probabilistic model comprises a function that at least partially utilizes an autoregressive model.

A system that facilitates analysis of data perspectives,
Means for receiving at least one data perspective;
Means for automatically analyzing said data perspective to detect at least one data anomaly through a curve fitting process applied to continuous and / or discrete data from a data tube;
The data tube comprises a data slice comprising at least one data cell of the data perspective, wherein only one data dimension changes.

Data packets transmitted between two or more computer components that facilitate analysis of data perspective, at least partially utilizing a curve fitting process applied to continuous and / or discrete data from data tubes A data packet comprising at least a portion of information relating to a data perspective analysis system, wherein the data tube comprises a data slice comprising at least one cell of the data perspective, wherein only one data dimension varies .

A computer readable medium having stored thereon computer executable components of the system of claim 1.

20. A device using the method of claim 19, comprising at least one selected from the group consisting of a computer, a server, and a handheld electronic device.

The device using the system of claim 1, comprising at least one selected from the group consisting of a computer, a server, and a handheld electronic device.