JP6552196B2

JP6552196B2 - Cloud-based distributed persistence and cache data model

Info

Publication number: JP6552196B2
Application number: JP2014524063A
Authority: JP
Inventors: アジャイジャドハブ
Original assignee: アジャイジャドハブ
Priority date: 2011-08-02
Filing date: 2012-08-02
Publication date: 2019-07-31
Anticipated expiration: 2032-08-02
Also published as: AU2017218964A1; US20130110961A1; JP2014528114A; AU2012290042A1; EP2740041A4; JP2018022514A; AU2017218964B2; EP2740041B1; JP6602355B2; CA2843886C; WO2013019913A1; US10853306B2; CA2843886A1; EP2740041A1

Description

本発明は、構造化照会言語（ＳＱＬ）アプリケーションインタフェースを備えたクラウドベースの分散永続性及びキャッシュデータモデルを使用して、クラウドコンピューティングの拡張可能性を向上させ待ち時間を低減するためのシステム及び方法を提供する。 The present invention uses a cloud-based distributed persistence and cache data model with a structured query language (SQL) application interface to improve cloud computing scalability and reduce latency, and Provide a method.

クラウドコンピューティング環境は、ウェブ上でネットワークを介して、インフラストラクチャ、アプリケーション、及びソフトウェアを提供することができる。初期のウェブベースのインフラストラクチャは、メインフレーム又はサーバベースの関係データベースモデル及びｎ階層ネットワーククラスタリングアプリケーションサーバに基づいていた。ウェブが成長して対話型プラットフォームになったので、インフラストラクチャは、２つのフロント−データ層とアプリケーション層になった。アプリケーション層に対しては多数のソリューションが存在し、オフザシェルフ・ソリューションを使用することによって直線的な拡張可能性をかなり容易に達成することができ、アプリケーション層が、モデルビューコントローラ（ＭＶＣ）アーキテクチャでビュー及びコントローラを提供する。データ層は本来、ＭＶＣアーキテクチャにおけるモデルを構成する。データ層は、関係データベース、オブジェクト指向データベース、及びキー／値ペアデータベースを含むデータベース管理システムの１つを使用して、アプリケーションに対する非構造化／構造化データを提供する。 Cloud computing environments can provide infrastructure, applications, and software over networks on the web. Early web-based infrastructures were based on mainframe or server-based relational database models and n-tier network clustering application servers. As the web grew into an interactive platform, the infrastructure became two front-data and application layers. There are a number of solutions for the application layer, and linear scalability is fairly easily achieved by using an off-the-shelf solution, and the application layer is model view controller (MVC) architecture. Provide views and controllers. The data layer essentially constitutes a model in the MVC architecture. The data layer provides unstructured / structured data for the application using one of database management systems including relational databases, object-oriented databases, and key / value pair databases.

関係データベース
関係データベースは、本技術分野で公知である。関係データベースは、構造化照会言語（ＳＱＬ）を使用してアクセスすることができる構造化テーブル内のビット及びバイトの形式で様々なデータ型を格納する構造化データストアある。関係データベースの基点は、関係代数にさかのぼることができる。現在の関係データベース提供物の基本的な前提は、標準化されたインタフェースを介して何らかのハードウエア又はソフトウェアから独立してデータがアクセス可能である必要がある点にある。初期段階では、データ要素は、最小又はゼロ関係属性によって独立している。データベースエンジンがより強力になるにつれて、データ構造及び関係データグラフは、テーブルとデータ要素との間の関係が複雑になるように複雑になる。本来、関係データベースは、データベースにアクセスするアプリケーションのためのＳＱＬインタフェースを備えた行及び列から構成されるテーブルの集まりである。関係データベースは、一般的なデータを管理する場合に、単純性、ロバスト性、融通性、性能、拡張可能性、及び互換性のベストミックスをこれまで提供してきた。データベースを供給するベンダーに関わらず、全ての関係データベースは、例外なくデータの構造化を要求する。関係データベース管理システム（ＲＤＢＭＳ）の欠点は、手動介入及び維持を必要とするパーティショニング及び／又はシャーディングを使用しない限り、動的又は自動的にネットワーク上でシームレスに物理マシン境界全体にわたって分散できないということである。 Relational Databases Relational databases are known in the art. Relational databases are structured data stores that store various data types in the form of bits and bytes in structured tables that can be accessed using Structured Query Language (SQL). The origin of the relational database can be traced back to relational algebra. The basic premise of current relational database offerings is that the data needs to be accessible independently of any hardware or software via a standardized interface. Initially, data elements are independent by minimal or zero relationship attributes. As database engines become more powerful, data structures and relational data graphs become more complex as relationships between tables and data elements become more complex. Essentially, a relational database is a collection of tables comprised of rows and columns with SQL interfaces for applications accessing the database. Relational databases have provided the best mix of simplicity, robustness, flexibility, performance, extensibility, and compatibility when managing generic data. Regardless of the vendor supplying the database, all relational databases require data structuring without exception. The disadvantage of a relational database management system (RDBMS) is that it cannot be distributed across physical machine boundaries seamlessly or dynamically over the network, unless using partitioning and / or sharding that requires manual intervention and maintenance. It is.

この手動介入は、単一マシンの物理境界内又は外部データアレイ内に格納することができるデータ量の物理的制限を克服する必要がある。ウェブ２．０及びその大規模なデータスケールが出現する前は、上記のシナリオは、マルチコアコンピュータ処理ユニットを備えた単一マシンの計算能力が大部分の機構のデータ成長よりも速く成長したので絶えず機能することができたが、現在のアプリケーションに対するデータスケールは、毎日、指数的に成長し、上記の前提が当てはまらない。上記の欠点は、極めて高額な使用許諾及びサポート費用に加えて、エクサバイト及びゼッタバイトであるデータスケールを有する現在の及び将来のクラウドベースのアプリケーションに対して関係データベースを不都合なものにする。 This manual intervention needs to overcome the physical limitations of the amount of data that can be stored within the physical boundaries of a single machine or in an external data array. Before the advent of Web 2.0 and its large scale of data, the above scenario was constantly growing as the computing power of single machines with multi-core computing units grew faster than the data growth of most mechanisms Although able to work, the data scale for current applications grows exponentially every day, and the above assumptions do not apply. The above drawbacks make the relational database inconvenient for current and future cloud based applications with data scales that are exabyte and zettabyte, in addition to extremely high licensing and support costs.

関係／オブジェクト関係データベース
関係データベース設計は、１世代又は２世代ずつオブジェクト指向設計パラダイムに先行しているので、複雑なオブジェクトグラフに対する真のサポートが欠如している。情報の複雑性の進歩が、関係データベースに対する別の欠点を引き起こしている。関係データベースは、詳細には、共通の特性によってデータを組織化するために作られる。複雑な画像、数字、設計、及びマルチメディア製品は、簡単なカテゴリー化を不可能にし、最終的には非構造化データに変わる複雑なオブジェクトグラフをもたらし、オブジェクト−関係データベース管理システムと呼ばれるデータベースの新しい形式のための方法をもたらす。現在のシステムは、より複雑なアプリケーションを処理するよう設計されており、クラウドにおける拡張可能及び分散可能である能力を必要とする。オブジェクト関係データベースは、これらが拡張可能又はネットワーク分散可能でないので必要条件を満足させず、従って不適当である。 Relationships / Object Relationships Databases Since relationship database design precedes object-oriented design paradigms by one or two generations, there is a lack of true support for complex object graphs. Advances in information complexity cause another drawback to relational databases. Relational databases are specifically created to organize data by common characteristics. Complex images, numbers, designs, and multimedia products make it impossible to easily categorize and ultimately lead to complex object graphs that turn into unstructured data. Bringing a way for a new format. Current systems are designed to handle more complex applications and require the ability to be extensible and distributable in the cloud. Object-relational databases do not meet the requirements because they are not extensible or network distributable, and are therefore inadequate.

キー／値データベース
新しいウェブ２．０パラダイムは、ギガバイトとは対照的にテラバイト及びペタバイト単位で測定されるデータを処理する。関係データベースは４０年間機能しているが、これらは、毎日のようにテラバイトの大きさで急成長するデータを処理するのに適切ではない。この欠点に対する主な理由は、関係データベースに対して、拡張可能性が、下層マシン又は分割マシンの計算能力に直接関係付けられるからである。コンピュータ処理のすべてのファセットにソーシャルネットワーキングの態様を追加したウェブ２．０の出現の前は、サーバ設計における進歩によって、データベースは拡張して、スアプリケーションのニーズにサービスを提供するようになっているが、ペタバイト及びより大きな大量データを処理するために、新しい形式のデータベース管理システムが定着しており、非関係データベース管理システム（非ＲＤＢＭＳ）又はスキームレスデータベースとして公知のキー／値ストアを使用している。新しい形式のデータベース管理システムは、一般的にはキー／値ストアを使用する非関係及び／又はＮｏＳＱＬデータベースと呼ばれる。実際には、標準的な名前がまだ存在せず、文書指向、インターネットフェーシング、属性指向、分散データベース（これも関係性とすることができるが）、シャードソートアレイ、分散ハッシュテーブル、又はキー／値データベースと呼ぶことができる。これらの名前の各々は、この新しい方式の特定の特徴を示すが、これらは、発明者らがキー／値データベースと呼ぶ、１つのテーマにおける全てのバリエーションである。以下を含む幾つかの選択肢が、この新しいキー／値方式によって現在の市場で利用可能である。 Key / Value Databases The new Web 2.0 paradigm handles data measured in terabytes and petabytes as opposed to gigabytes. Although relational databases have been functioning for 40 years, they are not suitable for processing rapidly growing data in terabyte sizes on a daily basis. The main reason for this shortcoming is that, for relational databases, extensibility is directly related to the computing power of the underlying machine or split machine. Prior to the advent of Web 2.0, which added social networking aspects to all facets of computer processing, advances in server design have expanded the database to serve the needs of applications. However, to handle petabytes and larger volumes of data, new types of database management systems have become established, using key / value stores known as non-relational database management systems (non-RDBMS) or schemeless databases There is. New types of database management systems are commonly referred to as non-relational and / or NoSQL databases that use key / value stores. In practice, there is no standard name yet, document-oriented, internet-facing, attribute-oriented, distributed database (although it can also be related), shard sort array, distributed hash table, or key / value It can be called a database. Each of these names represents a particular feature of this new scheme, but these are all variations on one subject that we call the key / value database. Several options, including the following, are available in the current market with this new key / value scheme.

Ｃａｓｓａｎｄｒａ（カサンドラ）は、オープンソース分散データベース管理システムである。これは、単一障害点がない高可用性サービスを提供すると同時に多くのコモディティサーバ全体に拡散される大量のデータを処理するよう設計されたＡｐａｃｈｅソフトウェア財団のトップレベルプロジェクトである。これは、最初にＦａｃｅｂｏｏｋによって開発されそれらのインボックス検索特徴を強力にするＮｏＳＱＬソリューションである。Ｃａｓｓａｎｄｒａは、Ａｍａｚｏｎ−Ｄｙｎａｍｏのようなインフラストラクチャで実行されるＢｉｇＴａｂｌｅ（ビッグテーブル）データモデルである。 Cassandra (Kassandra) is an open source distributed database management system. This is the Apache Software Foundation's top level project designed to handle high volumes of data that is spread across many commodity servers while providing high availability services without a single point of failure. This is a NoSQL solution that was originally developed by Facebook and makes those inbox search features powerful. Cassandra is a BigTable data model that runs on an infrastructure such as Amazon-Dynamo.

Ｃａｓｓａｎｄｒａは、結果整合性を有する構造化キー−値ストアを提供する。キーは、カラムファミリーにグループ分けされる複数の値にマップされる。カラムファミリーは、Ｃａｓｓａｎｄｒａデータベースが作成された時に固定されるが、いつでもファミリーにカラムを追加することができる。更に、カラムは、指定されたキーだけに追加され、異なるキーは、いずれの所与のファミリーにおいてもカラムの異なる数を有することができる。 Cassandra provides structured key-value stores with result consistency. Keys are mapped to multiple values grouped into column families. The column family is fixed when the Cassandra database is created, but columns can be added to the family at any time. In addition, columns are added only to the specified key, and different keys can have different numbers of columns in any given family.

各キーに対するカラムファミリーからの値が一緒に格納され、Ｃａｓｓａｎｄｒａをカラム指向のＤＢＭＳと行指向のストアの間のハイブリッドにする。 The values from the column family for each key are stored together, making Cassandra a hybrid between a column-oriented DBMS and a row-oriented store.

一般的には、ＣｏｕｃｈＤＢと呼ばれるＡｐａｃｈｅＣｏｕｃｈＤＢは、Ｅｒｌａｎｇプログラミング言語で書かれたフリーオープンソース文書指向データベースである。これは、ローカル複製のために設計されたＮｏＳＱＬ製品であり、広範囲のデバイスに沿って垂直に拡張する。ＣｏｕｃｈＤＢは、営利事業ＣｏｕｃｈＯｎｅａｎｄＣｌｏｕｄａｎｔによってサポートされる。 In general, Apache CoachDB, called CoachDB, is a free open source document-oriented database written in the Erlang programming language. It is a NoSQL product designed for local replication and extends vertically along a wide range of devices. CoachDB is supported by the commercial businesses CouchOne and Cloudant.

ハイパーテーブルは、ＧｏｏｇＬｅのＢｉｇＴａｂｌｅ（ビッグテーブル）の設計での公開に端を発するオープンソースデータベースである。プロジェクトは、大規模データインテンシブタスクを解決したエンジニアの経験に基づく。ハイパーテーブルは、ＡｐａｃｈｅＨａｄｏｏｐＤＦＳ、グラスタＦＳ、又はコスモスファイルシステム（ＫＦＳ）のような分散ファイルシステム（ＤＦＳ）の上部で実行される。これはほとんど全てが、性能を求めてＣ＋＋で書かれている。 HyperTable is an open source database that starts with publishing GoogLe's BigTable design. The project is based on the experience of engineers who have solved large data intensive tasks. The hypertable is executed on top of a distributed file system (DFS) such as Apache Hadoop DFS, Glasta FS, or Cosmos File System (KFS). This is almost entirely written in C ++ for performance.

ＭｏｎｇｏＤＢは、Ｃ＋＋プログラミング言語で書かれたオープンソース、拡張可能、高性能、スキーマフリー、文書指向データベースである。このデータベースは、ＪＳＯＮのような文書の集まりを管理するので文書指向である。従って、データを複雑な階層にネストすることができ照会可能及び索引可能であるので、多くのアプリケーションがより自然な方法でデータをモデル化することができる。ＭｏｎｇｏＤＢの開発は１０ｇｅｎによって２００７年１０月に始まった。最初の公開は、２００９年２月であった。 MongoDB is an open source, extensible, high-performance, schema-free, document-oriented database written in the C ++ programming language. This database is document-oriented because it manages a collection of documents such as JSON. Thus, many applications can model data in a more natural way, as the data can be nested in complex hierarchies and queryable and indexable. MongoDB's development began in October 2007 by 10gen. The first release was in February 2009.

ＴｏｋｙｏＣａｂｉｎｅｔは、データベースを管理するためのルーチンのライブラリである。データベースは、各々がキーと値のペアである記録を包含する単純なデータファイルである。すべてのキー及び値は、可変長のシリアルバイトである。バイナリデータと文字列の両方を、キー及び値として使用することができる。データテーブルの概念もデータ型も存在しない。記録は、ハッシュテーブル、Ｂ＋ツリー、又は固定長アレイに組織化される。ＴｏｋｙｏＣａｂｉｎｅｔは、ＧＤＢＭ及びＱＤＢＭの後継者として開発されてきた。 TokyoCabinet is a library of routines for managing databases. A database is a simple data file that contains records, each of which is a key-value pair. All keys and values are variable length serial bytes. Both binary data and strings can be used as keys and values. There is no data table concept or data type. Records are organized into hash tables, B + trees, or fixed length arrays. Tokyo Cabinet has been developed as a successor to GDBM and QDBM.

Ｖｏｌｄｅｍｏｒｔは、関係データベースではなく、ＡＣＩＤプロパティを満足させながら属性関係を満足させようと試みることはせず、オブジェクトリファレンスグラフをトランスペアレントにマップしようとするオブジェクトデータベースでもなく、文書指向のような新しいアブストラクションを導入することもない。これは、基本的には大きな分散型の永久耐障害性ハッシュテーブルである。アクティブ記録又はハイバーネイトのようなＯ／Ｒマッパーを使用することができるアプリケーションでは、Ｖｏｌｄｅｍｏｒｔが、水平拡張可能性及びより高い可用性を提供するが、利便性を大いに犠牲にする。インターネットタイプの拡張可能性の圧力下の大きなアプリケーションでは、システムは、幾つかの機能的に区分されたサービス又はアプリケーションプログラミングインタフェースから構成される可能性が高く、これは、水平に区分することができる記憶システムを使用して複数のデータセンタ間で記憶資源を管理することができる。このスペースにおけるアプリケーションでは、データの全てが何らかの単一データベースでは利用できないので、任意のインデータベース結合は既に不可能である。典型的なパターンは、ハッシュテーブル意味論を何らかの方法で要求するキャッシング層を導入することである。 Voldemort is not a relational database, it does not attempt to satisfy attribute relationships while satisfying ACID properties, it is not an object database that attempts to map object reference graphs transparently, it introduces new abstractions such as document orientation. There is no introduction. This is basically a large distributed permanent fault tolerant hash table. For applications that can use O / R mappers such as active recording or hibernate, Voldemort offers horizontal extensibility and higher availability, but at the cost of greater convenience. In large applications under the pressure of Internet type scalability, the system is likely to consist of several functionally partitioned services or application programming interfaces, which can be partitioned horizontally. A storage system can be used to manage storage resources between multiple data centers. For applications in this space, arbitrary in-database joins are already impossible, as all of the data is not available in any single database. A typical pattern is to introduce a caching layer that requires hash table semantics in some way.

Ｄｒｉｚｚｌｅは、キー／値ストアが解決しようとする問題に対する対抗策と考えることができる。Ｄｒｉｚｚｌｅは、ＭｙＳＱＬ（６．０）関係データベースの派生として始まった。最後の数カ月に渡って、この開発者は、非コア特徴のホスト（ビュー、トリガ、準備された命令文、格納された手順、問合せキャッシュ、ＡＣＬ、及び幾つかのデータ型を含む）を、スリムで単純な高速データベースシステムを作成する目的で取り除いてきた。Ｄｒｉｚｚｌｅは、関係データを格納することができ、目的は、１６コア又はそれ以上を備えたシステムで実行されるウェブ及びクラウドベースのアプリケーションに合った準関係データベースプラットフォームを構築することである。 Drizzle can be thought of as a countermeasure to the problem that the key / value store seeks to solve. Drizzle began as a derivation of the MySQL (6.0) relational database. Over the last few months, this developer has slimmed down non-core feature hosts (including views, triggers, prepared statements, stored procedures, query cache, ACLs, and some data types) In order to create a simple high-speed database system. Drizzle can store relational data and the goal is to build a quasi-relational database platform for web and cloud-based applications running on systems with 16 cores or more.

複雑なオブジェクトグラフを有するアプリケーションに関する前述のような分散型キー／値データベースの最大の欠点は、応答時間における待ち時間及び何らかのオフザシェルフ汎用関係データベースにおいて当然のことと考える機能の欠如である。現在のソーシャルネットワーキングアプリケーションの全てではないが大部分は、極めて複雑なオブジェクトグラフを必要とする。 The biggest drawback of the distributed key / value database as described above for applications with complex object graphs is the latency in response time and the lack of features that we consider natural in some off-the-shelf generic relationship databases. Most if not all of the current social networking applications require extremely complex object graphs.

ＢｉｇＴａｂｌｅは、Ｇｏｏｇｌｅファイルシステム、ＣｈｕｂｂｙＬｏｃｋＳｅｒｖｉｃｅ、ＳＳＴａｂｌｅ、及び幾つかの他のＧｏｏｇｌｅプログラムに構築された、圧縮されて高性能かつ専用のデータベースシステムであり、これは現在ではＧｏｏｇｌｅの外部では分散又は使用されていないが、Ｇｏｏｇｌｅは、Ｇｏｏｇｌｅアプリケーションエンジンの一部としてそれへのアクセスを勧めている。 BigTable is a compressed, high-performance, dedicated database system built on the Google file system, ChubbyLockService, SStable, and several other Google programs that are now distributed or used outside of Google. Although not, Google recommends access to it as part of the Google application engine.

ＨＢａｓｅは、ＧｏｏｇｌｅのＢｉｇＴａｂｌｅの後でモデル化されたオープンソースの非関係分散データベースであり、Ｊａｖａで書かれている。これは、Ａｐａｃｈｅソフトウェア財団のＨａｄｏｏｐプロジェクトの一部として開発され、Ｈａｄｏｏｐ分散ファイルシステムのトップで実行され、ＨａｄｏｏｐのためのＢｉｇＴａｂｌｅのような機能を提供する。これは、大量のスパースデータを記憶する耐障害性の方法を提供する。 HBase is an open source non-relational distributed database modeled after Google's BigTable, written in Java. It was developed as part of the Apache Software Foundation's Hadoop project, runs on top of the Hadoop distributed file system, and provides features like BigTable for Hadoop. This provides a fault tolerant method of storing large amounts of sparse data.

データベースメモリキャッシュ
動的ウェブの出現により、データアクセス時間の待ち時間がウェブページの性能に影響を与えるようになってきた。読み取り及び書込みの両方に対するデータアクセス時間における待ち時間は、永続的データを保持するハードドライブのアクセス時間に直線的に関係している。情報を得るためにディスクにアクセスする障害を取り除くために、コンピュータ開発者は、同じデータの頻繁な読み取りのためにサーバにメモリの一部分（ＲＡＭ）を確保する方法として、共有メモリ／キャッシュの概念を考え出した。読み取りのためのデータをキャッシュしておくことで、頻繁なディスクアクセスの必要性を無くし、従ってデータ待ち時間が短縮される。時間が進むにつれて、キャッシュはよりエキゾチックに成長する。様々な利用可能な選択肢には、ＧｏｏｇｌｅＣａｃｈｅ；ＣＳＱＬキャッシュ‐ＭｙＳＱＬ、Ｐｏｓｔｇｒｅｓ及びＯｒａｃｌｅからのキャッシュテーブル；Ｍｅｍｃａｃｈｅｄ‐問合せのキャッシュ結果セット；ＴｉｍｅｓＴｅｎ−キャッシュＯＲＡＣＬＥテーブル；及びＳａｆｅＰｅａｋ−フルデータの正確性のための自動化キャッシュエビクションによるＳＱＬサーバからの問合せ及び手順の結果セットの自動化キャッシングを含む。Ｍｅｍｃａｃｈｅｄは、フリー及びオープンソース、高性能、分散メモリオブジェクトキャッシングシステムであり、性質上汎用的であるが、データベース負荷を軽減することによって動的ウェブアプリケーションをスピードアップする場合に使用することが意図されている。Ｍｅｍｃｈａｃｈｅｄは、データベース呼出し、アプリケーションプログラミングインタフェース呼出し、又はページレンダリングの結果からの任意データ（ストリング、オブジェクト）の小さなチャンクに対するメモリ内キー−値ストアである。 Database Memory Cache With the advent of the dynamic web, the latency of data access time has come to affect the performance of web pages. The latency in data access time for both reads and writes is linearly related to the access time of the hard drive holding persistent data. To remove the obstacles of accessing the disk to get information, computer developers use the shared memory / cache concept as a way to reserve a portion of memory (RAM) on the server for frequent reading of the same data. I figured it out. By caching data for reading, the need for frequent disk access is eliminated, thus reducing data latency. As time progresses, the cache grows more exotic. Various available options include: Google Cache; cache tables from CSQL cache-MySQL, Postgres and Oracle; Memcached-cache result set of queries; TimesTen-cache ORACLE tables; and SafePeak-for full data accuracy Includes automated caching of query result from SQL server and result set of procedure by automated cache eviction. Memcached is a free and open source, high performance, distributed memory object caching system that is general in nature but intended to be used to speed up dynamic web applications by reducing database load. ing. Memchached is an in-memory key-value store for small chunks of database calls, application programming interface calls, or arbitrary data (strings, objects) from the results of page rendering.

構造化データとアプリケーション（キャッシュと共に）の間のインタラクションに対する全ての３つのメインストリームアプローチは賛否両論である。本発明は、クラウドにおいて完全に機能的に関係的であり完全に分散可能なデータストアを単一のパッケージで提示することによって、新しいクラウドベースのパラダイムに対するソリューションを提供する。 All three mainstream approaches to interaction between structured data and applications (with cache) are pros and cons. The present invention provides a solution to a new cloud-based paradigm by presenting a fully functionally relevant and fully distributable data store in the cloud in a single package.

本発明は、クラウドコンピューティングのためのデータベース階層における強化された拡張可能性及び低減された待ち時間のためのシステム及び方法を提供する。これらは、関係、又は非関係（構造化、又は非構造化）データベースフォーマットのいずれかにおいてデータを格納することができるキャッシュアダプタを含むシステムを備え、キャッシュアダプタは、データの永続性のためのデータキャッシュのバックエンドにおける分散ファイルシステム及び分散ファイルシステムからのデータ同期のためのクライアントデータベース（キャッシュ）のフロントエンド上のクライアントキャッシュと通信する。 The present invention provides systems and methods for enhanced scalability and reduced latency in a database hierarchy for cloud computing. These include systems that include a cache adapter that can store data in either a relational or non-relational (structured or unstructured) database format, which is a data adapter for data persistence Communicate with the client cache on the front end of the client database (cache) for data synchronization from the distributed file system at the back end of the cache and the distributed file system.

本発明の更なる理解を可能にするために用いられ本明細書の一部に組み入れられてこれを構成する添付の図面は、本発明の原理を説明するための記述と共に本発明の例示的な実施形態を示す。 BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are used to provide a further understanding of the present invention and which are incorporated in and constitute a part of this specification, are illustrative of the present invention, together with a description for the purpose of illustrating the principles of the invention. An embodiment is shown.

本発明の実施形態におけるキャッシュアダプタとクライアントとの間のデータの流れを示す図である。It is a figure which shows the flow of the data between the cache adapter and client in embodiment of this invention. 本発明の実施形態におけるキャッシュアダプタを利用する分散ファイルシステムとクライアントとの間のデータの流れ及び同期化を示す図である。FIG. 5 illustrates data flow and synchronization between a distributed file system utilizing a cache adapter and a client according to an embodiment of the present invention. 本発明の実施形態を使用して行われるクライアント要求に続くデータの流れを示す流れ図である。4 is a flow diagram illustrating the flow of data following a client request made using an embodiment of the present invention. 本発明の実施形態を使用してユーザペルソナを設定するためのクライアントログインに続くデータの流れを示す流れ図である。4 is a flow diagram illustrating the flow of data following a client login to set up a user persona using an embodiment of the present invention. 本発明の実施形態におけるデータセルを示す図である。It is a figure which shows the data cell in embodiment of this invention. 本発明の実施形態における１つ又はそれ以上のバーチカルアプリケーションへのユーザ加入を示す図である。FIG. 4 illustrates user subscription to one or more vertical applications in an embodiment of the present invention. クライアントがデータベースの単一のインスタンスと通信する本発明の特定の実施形態のアーキテクチャを示す図である。FIG. 2 illustrates the architecture of a particular embodiment of the invention in which a client communicates with a single instance of a database. 本発明の実施形態によるクライアントマシンデータベースとのデータセルにおけるユーザデータとのバックエンド同期を示す図である。FIG. 6 is a diagram illustrating backend synchronization with user data in a data cell with a client machine database according to an embodiment of the present invention. データベースの複数のインスタンスが並行してクライアントと通信する本発明の特定の実施形態のアーキテクチャを示す図である。FIG. 5 illustrates the architecture of a particular embodiment of the invention in which multiple instances of the database communicate with the client in parallel. データベースの複数のインスタンスが並行してクライアントと通信し各クライアントがその固有のデータベースインスタンスを有する本発明の特定の実施形態のアーキテクチャを示す図である。FIG. 4 illustrates the architecture of a particular embodiment of the present invention in which multiple instances of a database communicate with a client in parallel and each client has its own database instance. キャッシュアダプタがデータ永続性目的のためにデータベースと対話することができ、次にデータベースが分散ファイルシステムにデータを格納することになる、本発明の特定の実施形態のアーキテクチャを示す図である。FIG. 6 illustrates the architecture of a particular embodiment of the present invention, which allows the cache adapter to interact with the database for data persistence purposes, which will in turn store the data in the distributed file system.

本発明は、本明細書で説明される多様な特定の方法、複合物、材料、製造技術、使用、及び応用に限定されない。本明細書で使用される用語は、特定の実施形態を説明する目的のためだけであり、本発明の範囲を限定するものではない。単数形「ａ」、「ａｎ」及び「ｔｈｅ」は、本文脈が他に明確に指示しない限り複数形の参照を含む。従って、「ある要素」への参照は、１つ又はそれ以上の要素への参照であり、当業者に公知のその等価物を含む。同様に、別の実施例では、「あるステップ」又は「ある手段」への参照は、１つ又はそれ以上のステップ又は手段への参照であり、サブステップ及び従属する手段を含むことができる。使用される全ての接続詞は、可能な限り最も包含的な意味に理解すべきである。従って「又は」という語は、本文脈が他に明確に必要としない限り論理的な「排他的な又は」ではなく論理的な「又は」の定義を有するものとして理解する必要がある。説明される構造は、このような構造の機能的な等価物を示すものと理解すべきである。近似を表現すると解釈できる用語は、本文脈が他の明確に指示しない限りそのように理解すべきである。 The present invention is not limited to the various specific methods, compounds, materials, manufacturing techniques, uses, and applications described herein. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present invention. The singular forms "a", "an" and "the" include plural references unless the context clearly dictates otherwise. Thus, a reference to "an element" is a reference to one or more elements, including equivalents known to one of ordinary skill in the art. Similarly, in another embodiment, a reference to "a step" or "a means" is a reference to one or more steps or means, and may include substeps and dependent means. All conjunctions used should be understood in the most inclusive sense possible. Thus, the term "or" should be understood as having a logical "or" definition rather than a logical "exclusive or" unless the present context expressly requires otherwise. The structure described should be understood to represent a functional equivalent of such a structure. Terms that can be construed as expressing an approximation should be understood as such unless the context clearly indicates otherwise.

他に定義されない限り、本明細書で使用される全ての技術的及び科学的用語は、本発明が属する当業者が一般に理解するのと同じ意味を有する。好ましい方法、技術、デバイス、及び材料が説明されるが、本明細書で説明されるものに類似又は等価の何らかの方法、技術、デバイス、又は材料は、本発明の実施又は検証に使用することができる。本明細書で説明される構造は、このような構造の機能的な等価物を示すものと理解すべきである。 Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although preferred methods, techniques, devices, and materials are described, any method, technique, device, or material similar or equivalent to that described herein can be used in the practice or verification of the invention. it can. It is to be understood that the structures described herein are indicative of functional equivalents of such structures.

特定される全ての特許及び他の公報は、例えば、本発明に関して使用することができる当該公報で説明される方法論を説明及び開示する目的で、引用により本明細書に組み入れられる。これらの公報は、本出願の出願日の前のその開示のためだけに提供される。ここで、従来の発明又は何らかの他の理由によるそのような開示に対して本発明者が先行しないものであると自認すると解釈されるべきではない。 All patents and other publications identified are incorporated herein by reference, for example, for the purpose of explaining and disclosing the methodology described in that publication that may be used in connection with the present invention. These publications are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the inventor is not entitled to antedate such disclosure by virtue of prior inventions or for any other reason.

本明細書で使用する用語「自動的」及びその変形形態は、処理又は動作が実行される場合に有形な人間の入力なしで行われる何らかの処理又は動作を示す。しかしながら、入力が処理又は動作の実行前に受け取られた場合、処理又は動作の実行が有形又は無形な人間の入力を使用したとしても、処理又は動作は自動的とすることができる。このような入力が処理又は動作が実行される方法に影響を与える場合、人間の入力は有形と見なされる。処理又は動作の実行に同意する人間の入力は「有形」と見なされない。 As used herein, the term “automatic” and variations thereof refer to any process or operation that occurs without tangible human input when the process or operation is performed. However, if the input is received prior to execution of the process or operation, the process or operation may be automatic, even though the execution of the process or operation uses tangible or intangible human input. Human input is considered tangible if such input affects the manner in which the processing or action is performed. Human input that consents to perform a process or action is not considered “tangible”.

本明細書で使用する用語「コンピュータ可読媒体」は、実行のためにプロセッサに命令を提供することに関係する何らかの有形記憶装置を示す。このような媒体は、限定されるものではないが、不揮発性媒体、揮発性媒体、及び送信媒体を含む多くの形式を取ることができる。不揮発性媒体は、例えば、ＮＶＲＡＭ、又は磁気又は光学ディスクを含む。揮発性媒体は、主メモリのような動的メモリを含む。コンピュータ可読媒体の一般的な形式には、例えば、フロッピーディスク、フレキシブルディスク、ハードディスク、磁気テープ、又は任意の他の磁気媒体、磁気光学媒体、ＣＤ−ＲＯＭ、任意の他の光学媒体、パンチカード、紙テープ、穴のパターンを有する任意の他の物理的媒体、ＲＡＭ、ＰＲＯＭ、及びＥＰＲＯＭ、ＦＬＡＳＨ−ＥＰＲＯＭ、メモリカードのような固体媒体、任意の他のメモリチップ又はカートリッジ、又はコンピュータが読み取ることができる任意の他の媒体が含まれる。コンピュータ可読媒体がデータベースとして構成される場合、データベースは、関係、階層、オブジェクト指向、及び／又は同様のもののような任意の種類のデータベースとすることができることを理解されたい。従って、本発明は、本発明のソフトウェア実施構成が格納される有形記憶媒体及び従来技術で認められている等価物及び後継媒体を含むと考えられる。 The term “computer-readable medium” as used herein refers to any tangible storage device that participates in providing instructions to a processor for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, NVRAM, or magnetic or optical disks. Volatile media includes dynamic memory, such as main memory. Common forms of computer readable media are, for example, floppy disks, flexible disks, hard disks, magnetic tapes or any other magnetic media, magneto-optical media, CD-ROM, any other optical media, punch cards, It can be read by paper tape, any other physical media with a pattern of holes, RAM, PROM, and solid media such as EPROM, FLASH-EPROM, memory cards, any other memory chip or cartridge, or computer Any other media is included. If the computer readable medium is configured as a database, it should be understood that the database can be any type of database such as relational, hierarchical, object oriented, and / or the like. Accordingly, the present invention is considered to include tangible storage media on which the software implementation of the present invention is stored and equivalents and replacement media recognized in the prior art.

本明細書で使用する用語「決定する」、「計算する」、及び「コンピュータ計算する」、及びその変形形態は同義的に使用され、方法論、処理、数学演算、又は技術の任意の種類を含む。 As used herein, the terms “determining”, “computing”, and “computer computing” and variations thereof are used interchangeably and include any kind of methodology, processing, mathematical operations, or techniques. .

本明細書で使用する用語「モジュール」は、何らかの公知の又は後々に開発されたハードウエア、ソフトウェア、ファームウエア、人工的知性、ファジー論理、又はその要素に関連付けられる機能を実行できるハードウエアとソフトウェアの組み合わせを示す。また、本発明は例示的な実施形態に関して説明されるが、本発明の個々の態様を別々に請求できることを理解されたい。 As used herein, the term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or hardware and software capable of performing the function associated with that element. The combination of is shown. Also, while the invention is described with respect to exemplary embodiments, it should be understood that individual aspects of the invention can be claimed separately.

本明細書で使用する用語「〜と通信する」は、何らかのシステム、ハードウエア、ソフトウェア、プロトコル、又はフォーマットを使用して情報又はデータを交換するために電気信号を使用する何らかの結合、接続、又は対話を示す。 As used herein, the term “communicate with” refers to any coupling, connection, or use of electrical signals to exchange information or data using any system, hardware, software, protocol, or format. Show dialogue.

本明細書で使用する用語「仮想」又は「仮想化」は、物理的ディスクドライブのような一部の他の構成要素の論理的表現を示す。換言すると、「仮想」構成要素は、「仮想」構成要素が表わす物理的構成要素とは実際には同じではないが、コンピュータシステムの他の構成要素、ハードウエア、ソフトウェアなどにとっては同じに見える。 The terms "virtual" or "virtualization" as used herein refer to a logical representation of some other component such as a physical disk drive. In other words, a "virtual" component is not actually the same as the physical component that the "virtual" component represents, but looks the same to other components of the computer system, hardware, software, etc.

本明細書で使用する用語「ディスク」は、コンピュータシステムのためのデータを格納することができる記憶ディスク又は他のメモリを示す。 As used herein, the term “disk” refers to a storage disk or other memory that can store data for a computer system.

本明細書で使用する用語「クラウド」又は「クラウドコンピューティング」は、インターネットに基づくコンピューティングを示し、これによって共有される資源、ソフトウェア、及び情報が、公共施設のようにオンデマンドでコンピュータ及び他のデバイスに提供される。 As used herein, the terms "cloud" or "cloud computing" refer to Internet-based computing, and the resources, software, and information shared thereby are on demand like computers and other like public facilities. Provided on devices.

クラウドコンピューティングは、広義では、コンピュータシステムの標準的な構成要素の仮想化である。これは、通常は単一のコンピュータ内に含まれるデータ及びソフトウェアの両方を用い、これら、並びに他の離れた構成要素に広げる。例えば、クライアント又はユーザは、情報が１つの位置に格納され、問合せが別の位置のソフトウェアによって処理され、クライアントが更に他の位置にいる場合に、プログラムを通じて情報を取得するためにサーバ又はデータベースにアクセスすることができる。実際には、ソフトウェアは、複数の本質的に異なる物理的位置にデータ記憶装置を維持しながら単一のサーバとして動作する仮想サーバを作成することができる。データにアクセスするために使用されるソフトウェア機構に関わらず、問合せ処理は、通常は、データ永続性のための分散ファイルシステムと共に関係データベースインスタンスとして動作するキャッシュを含む。キャッシュは、クライアントの要求に応じてより速く分散ファイルシステムにデータを格納し提供できるようにする方式で分散ファイルシステムからのデータを格納又は検索するデータ記憶構成要素である。キャッシュに格納されるデータは、システムのあらゆる場所に格納されたデータの複製とすること、又は前のクライアント要求又は問合せに応じて生成されたデータとすることができる。キャッシュ内のデータの全てが、最終的には分散ファイルシステム及びクライアントキャッシュに同期される。データセルの位置を保持するキャッシュされたデータ記憶装置マップディレクトリが、通常はキャッシュ内のデータセルと通信するための手段として使用される。 Cloud computing is, in a broad sense, virtualization of standard components of computer systems. It uses both data and software, usually contained within a single computer, and extends to these and other remote components. For example, a client or user can access a server or database to obtain information programmatically when information is stored in one location, a query is processed by software at another location, and the client is still in another location. Can be accessed. In practice, the software can create a virtual server that operates as a single server while maintaining data storage at a plurality of disparate physical locations. Regardless of the software mechanism used to access the data, query processing typically includes a cache that operates as a relational database instance with a distributed file system for data persistence. A cache is a data storage component that stores or retrieves data from a distributed file system in a manner that allows data to be stored and provided in the distributed file system faster in response to client requests. The data stored in the cache can be a duplicate of the data stored anywhere in the system, or can be data generated in response to a previous client request or query. All of the data in the cache is eventually synchronized to the distributed file system and client cache. A cached data storage map directory holding data cell locations is typically used as a means to communicate with data cells in the cache.

本発明の様々な実施形態は、データにアクセス及びこれを格納する場合の速度を向上させ、同時に強化された拡張可能性を提供するよう設計されている。１つの実施形態では、キャッシュアダプタが、関係データベーステーブルフォーマットにデータを格納することができるデータセルを含み、キャッシュアダプタは、分散ファイルシステム及びクライアントキャッシュと通信する。別の実施形態では、キャッシュアダプタが、関係データベーステーブルフォーマットにデータを格納することができる複数のデータベースインスタンスのデータセルを含み、キャッシュアダプタは、分散ファイルシステム及びクライアントキャッシュと通信する。キャッシュアダプタは、キャッシュを組み入れクライアントマシンに存在するフロントエンドクライアントデータベースと共に分散ファイルシステムへのバックエンド接続を維持することができると考えられる。代替えの実施形態では、キャッシュアダプタが、既存のキャッシュと、分散ファイルシステムへの仲介物としての関係データベース又は他の種類のデータベースとの間で通信することができる。キャッシュアダプタは、小さなデータセルから構成することができる。キャッシュアダプタは、キャッシュ内のデータセルと分散ファイルシステムとの間でデータを移動させることができる。 Various embodiments of the present invention are designed to improve the speed with which data can be accessed and stored, while at the same time providing enhanced scalability. In one embodiment, the cache adapter includes data cells capable of storing data in a relational database table format, the cache adapter in communication with the distributed file system and the client cache. In another embodiment, the cache adapter includes data cells of multiple database instances that can store data in a relational database table format, and the cache adapter communicates with the distributed file system and the client cache. It is believed that the cache adapter can incorporate cache and maintain back-end connectivity to the distributed file system with front-end client databases residing on client machines. In an alternative embodiment, a cache adapter can communicate between an existing cache and a relational database or other type of database as an intermediary to the distributed file system. A cache adapter can be composed of small data cells. The cache adapter can move data between data cells in the cache and the distributed file system.

本発明の実施形態では、キャッシュアダプタが、クラウド分散ファイルシステムの前に存在する。このキャッシュアダプタは、ＣｌｏｕｄＣａｃｈｅを作成し、データベースにアクセスするための全ての外部クライアントのためのインタフェースを提供する。このＣｌｏｕｄＣａｃｈｅは、構成ファイル内に記述される全ての利用可能な指定システム全体に広がる連続スペースである。キャッシュアダプタは、データアクセス要求をクラウド内に格納された適当なデータセルに送る。これらのデータセルは、クラウド分散ファイルシステムに加えてクラウドに分散されたキャッシュ内にデータを永久に存続させることができる。更に、キャッシュアダプタは、必要に応じて、キャッシュを通過することなく直接クラウド分散ファイルシステムに存在するデータへのアクセスを可能にすることができる。これは、大きなバルク負荷又は大きなデータ検索を実現し、単一の要求で複数のデータセルにアクセスする必要性を軽減することができる。クライアントからのデータの要求に応じて、キャッシュアダプタは最初に、データがキャッシュ内で利用可能かどうかチェックし、キャッシュが要求されたデータを本当に有する場合、システムは、キャッシュからデータを提供する。しかし、データがキャッシュ内に存在しない場合、システムは自動的に分散ファイルシステムからデータを検索し、データをクライアントに送信する前にこれをキャッシュし、これによってデータ要求を満足させる。この処理は、図３のデータ流れ図と共に以下に示す。 In the embodiment of the present invention, the cache adapter exists in front of the cloud distributed file system. This cache adapter creates CloudCache and provides an interface for all external clients to access the database. This CloudCache is a contiguous space that spans all available designated systems described in the configuration file. The cache adapter sends the data access request to the appropriate data cell stored in the cloud. These data cells can persist data permanently in a cache distributed in the cloud in addition to the cloud distributed file system. Furthermore, the cache adapter can allow access to data residing in the cloud distributed file system directly without going through the cache, if necessary. This can realize large bulk loads or large data searches and alleviate the need to access multiple data cells with a single request. In response to a request for data from the client, the cache adapter first checks whether the data is available in the cache, and if the cache really has the requested data, the system provides the data from the cache. However, if the data is not in the cache, the system automatically retrieves the data from the distributed file system and caches it before sending it to the client, thereby satisfying the data request. This process is shown below with the data flow diagram of FIG.

特定の実施形態では、クラウド分散キャッシュが、このキャッシュアダプタにインタフェース接続するアプリケーションのためのＳＱＬインタフェースを備えた関係データベースとして機能する。例えば図１１に示すように、キャッシュは、データ永続性のためにクラウドにおける完全に分散可能なファイルシステムによってバックエンドでサポートすることができる。このセットアップを使用することで、データベースは、ハードウエアの直線的な関数として成長し、潜在的に無限のスケーリングを提供する。エクサバイトを超える領域では、最終的な閾値を存在させることができる。本発明のキャッシュアダプタは、分散ファイルシステムに統合することができる。このアダプタは、構造化関係データフォーマットでキャッシュからのデータを完全に取り入れることができ、次にリアルタイムでの動的な記憶のためにこれを分散ファイルシステム又は中間データベースに変換し、戻り経路で分散ファイルシステム又は中間データベースからのデータをキャッシュ準拠フォーマットに変換する。 In particular embodiments, the cloud distributed cache acts as a relational database with a SQL interface for applications interfacing to this cache adapter. For example, as shown in FIG. 11, the cache can be supported at the back end by a fully distributable file system in the cloud for data persistence. Using this setup, the database grows as a linear function of the hardware, providing potentially infinite scaling. In the region beyond exabytes, there can be a final threshold. The cache adapter of the present invention can be integrated into a distributed file system. The adapter can fully incorporate data from the cache in a structured relational data format, then convert it to a distributed file system or intermediate database for real-time dynamic storage and distribute it on the return path Convert data from a file system or intermediate database to a cache-compliant format.

アダプタフレームワーク及びデータフロー
例えば図１に示すように、全ての外部クライアントは、ウェブサーバを介してキャッシュアダプタにインタフェースするが、これによりクライアントのためのゲートウエイは、標準的なセキュアｈｔｔｐｓインタフェースを使用してデータベースと通信することが容易になる。キャッシュアダプタ１００はウェブサーバ１１０と双方向に通信し、ウェブサーバ１１０はクライアントと双方向に通信する。クライアントは、移動デバイス固有のアプリケーションクライアント１２０又はＨＴＭＬ５アプリケーションクライアント１２１とすることができる。プレゼンテーション層１３０、１３１及びクライアントデータベース１４０、１４１はクライアント１２０、１２１内に包含される。 Adapter framework and data flow For example, as shown in FIG. 1, all external clients interface to the cache adapter via a web server, so that the gateway for the client uses the standard secure https interface. Communication with the database. The cache adapter 100 communicates bi-directionally with the web server 110, and the web server 110 bi-directionally communicates with the client. The client may be a mobile device specific application client 120 or an HTML 5 application client 121. The presentation layers 130, 131 and the client databases 140, 141 are contained within the clients 120, 121.

キャッシュアダプタは、標準的なＪａｖａデータベース接続性（ＪＤＢＣ）インタフェースを使用してデータセル及びファイルシステムに内部でインタフェースすることができる。個別のデータセルの各々は、クラウド分散ファイルシステムに永続的にデータを存続させることができる。キャッシュに存在するデータは、関係データベーステーブルフォーマットに類似の構造化形式になる。しかしながら、クラウド分散ファイルシステムに存在するデータは、非構造化フォーマットである。アダプタ処理フレームワークが、機能呼出し、読み取り／書込みの要件に応じて、１つのフォーマットから他のフォーマットにデータを変換するよう要求される。 The cache adapter can interface internally to data cells and file systems using a standard Java Database Connectivity (JDBC) interface. Each individual data cell can permanently persist data in the cloud distributed file system. The data present in the cache will be in a structured form similar to the relational database table format. However, the data present in the cloud distributed file system is in an unstructured format. An adapter processing framework is required to convert data from one format to another depending on the function call, read / write requirements.

通例、キャッシュ（代替えとして、複数のマシンにわたる１つの連続ユニットとしてのキャッシュを示す際のＣｌｏｕｄＣａｃｈｅ）は、メモリの確保された部分である。図２に示すように、キャッシュアダプタ２００は、自らの使用のために、連続スペース、ＣｌｏｕｄＣａｃｈｅとしてマシンのネットワーク化クラスタ（物理的及び／又は仮想的）における１つ又はそれ以上のノード２６０、２６１、２６２の個々のＲＡＭの指定された部分を確保する。処理は、さらに２つの接続、つまりＣｌｏｕｄＣａｃｈｅから下の分散ファイルシステムへの１つの接続、及びＣｌｏｕｄＣａｃｈｅから軽量アプリケーションサーバ２５０及びウェブサーバ２１０を介したクラウドへのもう１つの接続を設定する。キャッシュアダプタは、データセルにデータを格納するためにこの指定されたメモリスペースを使用する。各データセルは、そのデータサイズの点でフレキシブルであるが、インフラストラクチャ全体の維持及び一貫した機能を容易にするために、構成ファイルで指定された上限値が存在する。外部アプリケーションは、標準的なＪＤＢＣインタフェース又は標準的なｈｔｔｐｓインタフェースの拡張であるインタフェースを使用して、キャッシュアダプタにインタフェースすることができる。外部アプリケーションは、データセルと直接通信することはできない。データセルは、処理の不可欠の部分である。 Typically, the cache (Alternatively, CloudCache in showing cache as one continuous unit across multiple machines) is a reserved part of memory. As shown in FIG. 2, the cache adapter 200 may be used for its own use as one or more nodes 260, 261, 261 in a networked cluster (physical and / or virtual) of machines as a continuous space, CloudCache. Reserve specified portions of 262 individual RAMs. The process sets up two more connections, one from CloudCache to the distributed file system below, and another from CloudCache to the cloud via lightweight application server 250 and web server 210. The cache adapter uses this designated memory space to store data in the data cells. Although each data cell is flexible in its data size, there are upper limits specified in the configuration file to facilitate maintenance and consistent functionality of the entire infrastructure. External applications can interface to the cache adapter using an interface that is a standard JDBC interface or an extension of a standard https interface. External applications cannot communicate directly with data cells. Data cells are an integral part of the process.

キャッシュアダプタは、関係データベーステーブルフォーマットでキャッシュ内のデータセルにデータを格納する。データがキャッシュに存在し分散ファイルシステムを含む標準的なオペレーティングシステムファイルシステムには存在しないという意味で、これは従来の関係データベース管理システム（ＲＤＢＭＳ）とは異なる。キャッシュに存在するデータは永続性であり揮発性ではなく、クラウド分散ファイルシステムは永続性のために使用される。従来のＲＤＢＭＳは、ディスクファイルシステムにデータを格納し、データベースによっては、データのある部分を短待ち時間でキャッシュすることができるが、２つの媒体のいずれも単一のエンティティとしてクラウドに完全に分散されない。シャーディング（区分化）が類似の機能を提供できるが、シームレスではなく、従って大幅なカスタマイゼーションを必要とし他の制限を有する。 The cache adapter stores data in data cells in the cache in relational database table format. This differs from traditional relational database management systems (RDBMS) in the sense that the data is present in cache and not in standard operating system file systems, including distributed file systems. The data present in the cache is persistent and not volatile, and cloud distributed file systems are used for persistence. A conventional RDBMS stores data in a disk file system, and some databases can cache portions of data with low latency, but both of the two media are completely distributed in the cloud as a single entity Not. Sharding can provide a similar function, but is not seamless and therefore requires significant customization and has other limitations.

データセルは、単一の又は複数のユーザのためのユーザデータを包含する。個別のデータセルの各々は、複数のデータテーブルを包含する。これらのテーブルは単一の又は複数のユーザのためのデータを包含する。キャッシュアダプタは、ユーザのデータスペース要件を自動的に割り出し、ユーザのためのデータを検索するか又はユーザによって入力された新しいデータを格納する。 A data cell contains user data for a single or multiple users. Each individual data cell includes a plurality of data tables. These tables contain data for single or multiple users. The cache adapter automatically determines the user's data space requirements, retrieves data for the user, or stores new data entered by the user.

キャッシュアダプタが外部分散ファイルシステムと通信し、図１及び２に示すようにクライアントキャッシュからのデータを分散ファイルシステムに、分散ファイルシステムからのデータをクライアントデータベースに変換する。 The cache adapter communicates with the external distributed file system and converts data from the client cache to the distributed file system and data from the distributed file system to the client database as shown in FIGS.

確定的な説明は実施形態のみを提供し、本発明の範囲、可用性、又は構成を制限するものではない。むしろ、確定的な説明は、実施形態を実施するための実施可能な説明を当業者に提供するであろう。様々な変更が請求項の精神及び範囲から逸脱することなく要素の機能及び構成に対して可能であることを理解されたい。例えば、本発明は、ソーシャルネットワーキング、クライアント／顧客管理及びサービス、金融及びビジネスサービス、ヘルスケア記録管理、トランザクション管理、販売、マーケティング、分析、セキュリティ警告、インテリジェンス収集、及びコラボレーションに関する応用分野を見つけるために考えられた。 A definitive description provides examples only, and does not limit the scope, availability, or configuration of the present invention. Rather, a definitive description will provide those skilled in the art with a viable description for carrying out the embodiments. It should be understood that various changes can be made in the function and arrangement of elements without departing from the spirit and scope of the claims. For example, the invention finds applications for social networking, client / customer management and services, financial and business services, healthcare record management, transaction management, sales, marketing, analysis, security alerts, intelligence gathering, and collaboration. it was thought.

実施例−クラウドベースのソーシャルネットワーキングインフラストラクチャのためのアーキテクチャシステム及び構成要素
本発明の特定の実施形態では、キャッシュアダプタは、性能及び拡張可能性の両方を向上させることによって、既存のソーシャルネットワーキングサイトアーキテクチャを改善する。Ｆａｃｅｂｏｏｋ（登録商標）、Ｔｗｉｔｔｅｒ（登録商標）、ＬｉｎｋｅｄＩｎ（登録商標）等のソーシャルネットワーキングサイトは、コンタクトリレーションシップ管理（ＣＲＭ）スペクトルのサブセットの一部の形式にすぎない。これらのネットワーキングサイトによって、その許可されたユーザ（個人／エンティティ）はパーソナルプロファイルを作成し、そのソーシャルリレーションシップを構築し、互いに対話する際にそれを拡張及び育成することができる。このＣＲＭスペクトルのサブセット内の多くのサイトは、ｎ階層化ウェブインフラストラクチャと対話するクライアントとしてインターネットブラウザを利用する。本発明は、インタラクティブ及びマッシブ拡張可能である堅牢なアーキテクチャによってサポートされる、次世代ソーシャルネットワーキングサイトを提供するために利用することができる。本発明のシステムは、バックエンドインフラストラクチャによってサポートされるクライアントから構成される。 EXAMPLE Architecture-Systems and Components for Cloud-Based Social Networking Infrastructure In a specific embodiment of the present invention, the cache adapter improves existing social networking site architecture by improving both performance and scalability. Improve. Social networking sites, such as Facebook®, Twitter®, LinkedIn®, etc., are only some form of subset of contact relationship management (CRM) spectrum. These networking sites allow their authorized users (individuals / entities) to create personal profiles, build their social relationships, and extend and nurture them as they interact with each other. Many sites within this subset of CRM spectrum utilize the internet browser as a client to interact with the n-tiered web infrastructure. The present invention can be utilized to provide a next generation social networking site supported by a robust architecture that is interactive and massively extensible. The system of the present invention is comprised of clients supported by the back end infrastructure.

現在の市場に存在する全ての主な提供物は、クライアントが対話するインターネットブラウザを使用する。クライアントは、このインターネットブラウザを介して新しい情報を検索又は提示する。ブラウザは要求をアプリケーションサーバに転送し、アプリケーションサーバがこれらの着信要求を処理する。処理の一部として、アプリケーション層がバックエンドでデータベースと対話する。このような多くのサイトは、バックエンドで関係データベースによってサポートされ、データの流れのボトルネックを生じる。関係データベースアプローチは、（１）アーキテクチャは連続したインターネット接続を必要とする非同期ｈｔｔｐ要求及び応答システムに基づき、インターネット接続なしでは、ブラウザは応答を停止及び中止する（２）応答時間が一般的に遅い（３）追加の特徴は、アプリケーション層及びバックエンドデータベースの両方での同時変更を必要とする場合に困難なタスクである（４）ソフトウェアップグレードリリースサイクルは困難であり時間を消費する、及び（５）インフラストラクチャの維持に費用がかかるという以下の欠点を含む。 All major offerings present in the current market use an Internet browser with which the client interacts. The client searches or presents new information via this internet browser. The browser forwards the request to the application server, which processes these incoming requests. The application layer interacts with the database at the back end as part of the process. Many such sites are supported by relational databases at the back end, creating data flow bottlenecks. The relational database approach is (1) based on an asynchronous http request and response system where the architecture requires a continuous internet connection, without the internet connection the browser stops and stops responding (2) response time is generally slow (3) An additional feature is a difficult task if it requires simultaneous changes in both the application layer and the backend database. (4) The software upgrade release cycle is difficult and time consuming, and ( 5) Including the following disadvantages of maintaining the infrastructure cost.

本発明は、ワークフロー及びデータフローに関して現在の提供物とは大幅に異なる特徴を提供するバックエンドと共に、現在存在する標準的なブラウザベースのクライアントから構成されるインフラストラクチャを提供することができる。本発明のクライアントは、ＨＴＭＬ５クライアント又は元のオペレーティングシステム、特にＡｐｐｌｅ社のｉＯＳ（登録商標）及びＧｏｏｇｌｅ社のＡｎｄｒｏｉｄ（登録商標）に固有の専用アプリケーションのいずれかとして、プレゼンテーション層のためのアプリケーションコードから構成することができる。クライアントがＰＣ用のＨＴＭＬ５クライアント又はプロプラエタリ移動デバイスＯＳ用の専用アプリケーションか否かに関わらず、クライアントは、常にバックエンドデータベースに直接アクセスすることはない。クライアントは、クライアントマシンに存在するローカルデータベース上のユーザデータの小さなフットプリントにほとんどの時間アクセスする。もし要求された時は、クライアントは、関係データベースとは対照的にクラウドに全てが存在するバックエンドからの付加的なデータをシームレスに検索する。更に、バックエンドは、データをクライアントに送信することに加えて、何らかのクライアントの介入なしにクライアントデータベースをバックエンドからの最新のアップデートデータに同期させ、付加的な又は更なるクライアント対話なしにデータの同期を続ける。 The present invention can provide an infrastructure consisting of currently existing standard browser-based clients, as well as backends that provide significantly different workflow and data flow features from current offerings. The client of the present invention is an application code for the presentation layer, either as an HTML5 client or a proprietary application specific to the original operating system, in particular iOS (Apple's iOS) and Google's Android (Registered Trademark). Can be configured. Regardless of whether the client is an HTML5 client for a PC or a dedicated application for a proprietary mobile device OS, the client does not always access the backend database directly. Clients mostly access a small footprint of user data on a local database residing on the client machine. If required, the client seamlessly retrieves additional data from the back end that is all present in the cloud as opposed to the relational database. Furthermore, the back end synchronizes the client database with the latest update data from the back end without any client intervention in addition to sending the data to the client, and the data's data without additional or further client interaction. Keep syncing.

別の差別化は、現在の提供物が、ＭＶＣ（モデルビューコントローラ）オブジェクト設計パターンでサーバ側にコンパイルされたＨＴＭＬページを配信する場合に存在する。ＭＶＣパターンモデルでは、コントローラは、クライアントからの要求の全てを受信して処理するモジュールである。次に、ビューであるプレゼンテーション層はコンパイルされ、次にブラウザにＨＴＭＬストリームを配信し、ＨＴＭＬストリームがクライアントブラウザウェブサーバによって表示される。 Another differentiation exists when the current offering delivers HTML pages compiled server-side with the MVC (Model View Controller) object design pattern. In the MVC pattern model, the controller is a module that receives and processes all requests from clients. The view, the presentation layer, is then compiled and then delivered to the browser with an HTML stream, which is displayed by the client browser web server.

バックエンドでは、本発明のシステムは、Ａｐａｃｈｅのようなインターネット／クラウドフェイシング工業規格ウェブサーバを開始する。 In the back end, the system of the present invention starts an Internet / cloud facing industry standard web server such as Apache.

アプリケーションサーバ
古いモデルのｎ階層ウェブアーキテクチャでは、中間層は、ＷｅｂＳｐｈｅｒｅ、Ｏｒａｃｌｅ等の専用アプリケーションサーバから構成される。ここで、本発明のアーキテクチャにおける本格的なＪ２ＥＥアプリケーションサーバ構成要素に対する必要性は存在しない。軽量アプリケーションサーバは、コントローラ／ディスパッチャとして作動し、キャッシュアダプタ内に組み込まれる。この主な理由は、本処理の実施形態において、アプリケーションコードが存在する場所にデータを転送するのとは対照的に、データが存在する場所にアプリケーションコードが送られるためである。クライアントは小さなフットプリントデータベースを有し、このデータベースは、バックエンドデータベースに常にそれ自体を同期させる。クライアントは、ユーザにデータを提示するためにローカルデータベースにアクセスする。プレゼンテーション層コードは移動デバイスクライアントのクライアントに存在するが、ＨＴＭＬ５クライアントでは、プレゼンテーション層コードはＨＴＭＬ５にある。 Application Server In the old model n-tier web architecture, the middle tier consists of a dedicated application server such as WebSphere, Oracle, etc. Here, there is no need for a full-fledged J2EE application server component in the architecture of the present invention. The lightweight application server acts as a controller / dispatcher and is embedded in the cache adapter. The main reason for this is that in the present embodiment of the process the application code is sent where the data is present, as opposed to transferring the data where the application code is present. The client has a small footprint database, which always synchronizes itself to the backend database. The client accesses the local database to present data to the user. While presentation layer code resides on the mobile device client's client, in HTML5 clients, presentation layer code resides on HTML5.

データベースサーバ
図２に示すように、データベース層は、分散ファイルシステム２７０の上部に位置するキャッシュアダプタ２００から構成される。上述したように、別々のアプリケーション及びデータ層を備えた真のｎ階層アーキテクチャとは対照的に、軽量アプリケーションサーバ２５０がキャッシュアダプタ２００の一部として組み入れられるという意味で現在設計されているいずれとも非常に異なる。本発明のシステムには２つのデータキャッシュ構成要素が存在する。すなわち、１つはクライアントマシンに固有のクライアントデータベース（キャッシュ）２４０、２４１としてクライアントに存在し、他のデータキャッシュは、例えば、クラウドに存在するクラスタ化ノードに存在するノード２６０、２６１、２６２から成るＣｌｏｕｄＣａｃｈｅである。ＣｌｏｕｄＣａｃｈｅは、分散ファイルシステム２７０の上部のノード、つまり図２に示すようにクラウドに存在するクラスタ化ノードに位置する。 Database Server As shown in FIG. 2, the database layer is composed of a cache adapter 200 located at the top of the distributed file system 270. As mentioned above, in contrast to the true n-tier architecture with separate application and data layers, anything currently designed in the sense that lightweight application server 250 is incorporated as part of cache adapter 200 It is different. There are two data cache components in the system of the present invention. That is, one resides on the client as a client database (cache) 240, 241 specific to the client machine, and the other data cache consists of nodes 260, 261, 262 residing, for example, on clustered nodes residing in the cloud It is CloudCache. The CloudCache is located in the upper node of the distributed file system 270, that is, the clustered node existing in the cloud as shown in FIG.

本発明のシステムは、２つの主な構成要素、すなわち軽量ウェブサーバクラスタ２１０と、本発明によるデータベースサーバとして作動するキャッシュアダプタ２００（軽量アプリケーションサーバ２５０を組み込んだ）とを有することができる。図３はシステムのデータフローを示す。クライアントは、２つの種類の要求をバックエンドに提示することができる。要求は、格納されたデータの検索又は新しいデータの提示（記憶）のいずれかとすることができる。 The system of the present invention may have two main components: a lightweight web server cluster 210, and a cache adapter 200 (which incorporates a lightweight application server 250) acting as a database server according to the present invention. FIG. 3 shows the data flow of the system. Clients can submit two types of requests to the back end. The request can either be a retrieval of stored data or a presentation (storage) of new data.

本発明によるデータ検索要求は簡単である。クライアント３００は、クライアントマシンのローカルデータベースで利用可能である可能性のあるデータ３１０を要求し、利用可能である場合、要求はクライアントマシンを離れず、クライアントマシンの境界内で満たされる。データがクライアントマシンキャッシュに存在しないインスタンスでは、要求がバックエンドに提示される。要求されたデータがクライアントデータベースに存在しない場合、着信要求は、キャッシュアダプタの記憶マップ３２５によってインターセプトされ、要求されたデータの位置を要求に提供する。データ位置は、データがキャッシュにない場合はキャッシュされたデータのポインタであり、次にデータ（アーカイブデータ）が生の非構造化フォーマット３６０で分散ファイルシステムから検索される。分散ファイルシステムからのデータが最初に検索され、次にキャッシュ互換性フォーマット３５５に変換され最後にクライアントキャッシュに送信される。要求されたデータの受信に応じて、クライアントはクライアントのためのデータを表示する。 The data retrieval request according to the invention is simple. Client 300 requests data 310 that may be available in the client machine's local database, and if so, the request does not leave the client machine and is satisfied within the boundaries of the client machine. In instances where the data does not exist in the client machine cache, the request is presented to the backend. If the requested data does not exist in the client database, the incoming request is intercepted by the cache adapter's storage map 325 to provide the location of the requested data to the request. The data location is a pointer to the cached data if the data is not in the cache, and then the data (archive data) is retrieved from the distributed file system in a raw unstructured format 360. Data from the distributed file system is retrieved first, then converted to cache compatibility format 355 and finally sent to the client cache. In response to receiving the requested data, the client displays the data for the client.

データ提出要求は、単純なクライアントデータ要求に比べて幾つかの付加的なステップを有する。まず、データは最初にＣａｃｈｅＡｄａｐｔｅｒ（固有のＣｌｏｕｄＣａｃｈｅ）に送信され、次にＷｒｉｔｅＣａｃｈｅＡｄａｐｔｅｒ３３０によってバックエンドデータキャッシュフォーマットに変換される。バックエンドデータキャッシュ３４０は常にクライアントアクセスのために最新のデータを格納するが、この段階では、データは一時的でありクライアントは見ることができない。データは更に、分散ファイルフォーマット３５０に変換されセーブされる。ファイルシステム３６０におけるデータの記憶が成功すると、キャッシュ同期アダプタ３７０は格納されたデータを検索してこれをバックエンドキャッシュ３８０に送信する。ビットチェックを行ってデータがキャッシュにセーブされた一時データと一致することを確認し、チェックが成功すると、データロックが解放され、データがクライアントに利用可能になる。 A data submission request has several additional steps compared to a simple client data request. First, the data is sent first to the Cache Adapter (a unique CloudCache) and then converted to a backend data cache format by the WriteCache Adapter 330. The backend data cache 340 always stores the latest data for client access, but at this stage the data is temporary and cannot be seen by the client. The data is further converted to a distributed file format 350 and saved. When the data storage in the file system 360 is successful, the cache synchronization adapter 370 retrieves the stored data and transmits it to the backend cache 380. A bit check is performed to confirm that the data matches the temporary data saved in the cache. If the check is successful, the data lock is released and the data is available to the client.

上記に加えて、バックエンドシステムは、定期的に最新のバックエンドデータキャッシュによってクライアントデータキャッシュを常にアップデートする。この持続する双方向同期は、図２の太い点線で示されている。 In addition to the above, the backend system regularly updates the client data cache with the latest backend data cache. This sustained bi-directional synchronization is indicated by the thick dotted line in FIG.

関係データベースでは、複数のユーザのためのデータが、データベースの所与のインスタンスの単一の又は区分化された複数のテーブルに並べられる。現在のｎ階層アーキテクチャでは、クライアント要求がウェブサーバに、次にアプリケーションサーバに送信される。アプリケーションサーバは要求を分析し、データベースから特定のデータを要求し、要求を処理し、次に応答をクライアントに送信する。本発明の特定の実施形態のアーキテクチャでは、クライアント要求は、ウェブサーバに、次に軽量アプリケーションサーバ（ＬＡＳ）に送信される。ＬＡＳは要求を分析するが、データベースからデータを要求する代わりに、要求を、データが何らかのアプリケーションコードと一緒に存在するクラスタノードに送信する。 In a relational database, data for multiple users is arranged into single or partitioned multiple tables of a given instance of the database. In the current n-tier architecture, client requests are sent to the web server and then to the application server. The application server analyzes the request, requests specific data from the database, processes the request, and then sends a response to the client. In the architecture of a particular embodiment of the present invention, client requests are sent to the web server and then to the lightweight application server (LAS). The LAS analyzes the request, but instead of requesting data from the database, it sends the request to a cluster node where the data resides with some application code.

分散ファイルシステムは、永続的データのための主媒体である。全てのバルクデータに関する問合せが具体的に分散ファイルシステムに送られ、キャッシュに触れることはない。本来、キャッシュへの全てのデータに関するユーザ問合せは、クライアントインタフェースからの単一のユーザのためのデータに対してである。データセルのデータは、分散ファイルシステムクラスタ内で発生する全てのデータ処理の最終生成物であり、キャッシュからのデータは、クライアント表示及び対話目的のためにほとんどが使用される。 Distributed file systems are the main medium for persistent data. All bulk data queries are specifically sent to the distributed file system and do not touch the cache. Essentially, user queries for all data into the cache are for data for a single user from the client interface. Data cell data is the end product of all data processing that occurs in a distributed file system cluster, and data from the cache is mostly used for client display and interaction purposes.

本発明の特定の実施形態のアーキテクチャの利点は、データを検索しアプリケーションサーバに戻す代わりに、要求がキャッシュアダプタに送られて、適当なデータが存在するクラスタの適切なノードにキャッシュアダプタが要求を送信する。 The architectural advantage of certain embodiments of the present invention is that instead of retrieving data and returning it to the application server, the request is sent to the cache adapter and the cache adapter requests the appropriate node of the cluster where the appropriate data is present. Send.

図７に示すように、単一のデータベースインスタンスにデータを格納しそこから検索することができる。複数のマシンにデータベースを区分することができるが、インスタンスは常に単数であり、要求のフローは、単一の関係データベース管理システムインスタンスを通過しなくてはならない。これは、潜在的な単一障害点である。図９に示すように、本発明の特定の実施形態では、並行してクライアントと通信することができるデータベースの複数のインスタンスを備えたバックエンドインフラストラクチャを存在させることがでる。各クライアントは、スループット及び直線拡張可能性を強化できる固有のデータベースインスタンスを持つことができるように、この設計を拡張することができる。 As shown in FIG. 7, data can be stored and retrieved from a single database instance. Although the database can be partitioned into multiple machines, the instance is always singular and the flow of requests must go through a single relational database management system instance. This is a potential single point of failure. As shown in FIG. 9, in certain embodiments of the invention, there can be a back-end infrastructure with multiple instances of a database that can communicate with clients in parallel. This design can be extended so that each client can have its own database instance that can enhance throughput and linear extensibility.

クライアントの数が増えてデータのサイズが指数的に大きくなり始めると、モデルは、データフロー及びロッキングを管理する単一のマシンの入力／出力制限のために、一部のポイントで維持できなくなる。インスタンスが単一のマシンに存在するので、データベースを収容するマシンのパワーに関わらず、モデルは、非常に大きなデータセットを必要として結果的に著しいコンピュータ処理ユニットチャーンニングパワーを必要とするアプリケーションからの極めて大きなデータ負荷に耐えることができない。シャーディング及びデータアーカイブのような幾つかの革新的なソリューションが存在しているが、そのほとんどすべてが、手動の介入を必要とし依然として単一障害点になる傾向がある。 As the number of clients increases and the size of the data begins to grow exponentially, the model cannot be maintained at some point due to the input / output limitations of a single machine that manages data flow and locking. Because the instance resides on a single machine, regardless of the power of the machine that houses the database, the model requires applications from very large data sets and consequently significant computing unit churning power. It can not withstand extremely large data loads. There are several innovative solutions such as sharding and data archiving, but almost all require manual intervention and still tend to be a single point of failure.

システム障害の場合、障害回復に利用可能な幾つかの洗練されたソリューションが存在するが、この処理は実施するのに多くの時間がかかり、長期の機能停止に起因して大規模な影響が生じることがある。障害回復時間を短縮するためにデータ複製を使用できるが、これは付加的なコストを伴い、アーカイブテーブルに対してアクティブテーブルに存在するデータを危険にさらす。 In the case of system failure, there are several sophisticated solutions available for failure recovery, but this process takes a lot of time to implement and has a major impact due to long-term outages Sometimes. Data replication can be used to reduce disaster recovery time, but this involves additional cost and jeopardizes the data present in the active table relative to the archive table.

図４に示すように、ログインした全てのユーザは、ログイン要求をシステムバックエンドに送信する。システムは、クラスタの最小ビジーノードのＣｌｏｕｄＣａｃｈｅにユーザのための固有ペルソナオブジェクトを作成する。次に、システムは、ユーザが加入している各バーチカルに対するバーティカルオブジェクトグラフの空シェルを更に作成する。ユーザログイン要求の２つの種類、すなわち１つは最初のログイン及びもう１つは次のログインが存在する。 As shown in FIG. 4, all logged-in users send a login request to the system back end. The system creates a unique persona object for the user in CloudCache on the least busy node of the cluster. Next, the system further creates an empty shell of vertical object graphs for each vertical to which the user has subscribed. There are two types of user login requests: one for the first login and one for the next login.

最初のログインは、ユーザプロファイルオブジェクトをインスタンス化する段階及び次に同じものを読み込む段階を必要とする。最初のログイン又は初期ユーザ登録では、ユーザペルソナオブジェクト作成の次に、ユーザが登録の時間に加入することができる各バーチカルに対するデータスキームの空シェルが作成される。次のログインで、ユーザは、付加的なバーチカルに登録する、又は必要であればこれらを切り離すことができる。バーチカルを所与のプロファイルに添付する段階を実現することができる。データセルのデータの全ては、永久記憶のために分散ファイルシステムに存続する。 Initial login requires instantiating a user profile object and then reading the same. For the first login or initial user registration, after the user persona object creation, an empty shell of the data scheme is created for each vertical that the user can subscribe to at the time of registration. At the next login, the user can register additional verticals, or disconnect them if necessary. The step of attaching the verticals to a given profile can be implemented. All of the data cell data persists in the distributed file system for permanent storage.

次のログインに各々では、ログイン要求の受信に応じて、バックエンドは、ユーザペルソナオブジェクトを作成し、ユーザ固有のデータを検索し、ユーザペルソナオブジェクトにユーザオブジェクトに関連付けられる固有のデータを読み込む。更に、バックエンドは、ユーザが加入している全てのバーチカルの空シェルを作成し、次にバーティカルシェルに分散ファイルシステムからのデータを読み込む。 At each subsequent login, in response to receiving a login request, the back end creates a user persona object, retrieves user specific data, and loads the user persona object with the unique data associated with the user object. In addition, the back end creates an empty shell of all verticals to which the user has subscribed, and then loads the data from the distributed file system into the vertical shell.

バックエンドは、分散ファイルシステムからＣｌｏｕｄＣａｃｈｅにデータを検索することができる。更に、バックエンドは、データセルからのキャッシュされたデータをクライアントマシンに存在するクライアントデータベースに同期させることができる。 The back end can retrieve data from the distributed file system to CloudCache. Additionally, the back end can synchronize cached data from data cells to a client database residing on the client machine.

提案されるシステムにおける全てのユーザは、ユーザログインＩＤとの１対１関係を作成する固有のキーによって識別される。固有のキーは、最初にドメイン名を逆にすること（ＧｏｏｇｌｅによるＢｉｇＴａｂｌｅでのキー識別子作成に類似）、及び登録時及び最初のログイン時にユーザが作成した固有のユーザＩＤで終わることによって生成される。 All users in the proposed system are identified by a unique key that creates a one-to-one relationship with the user login ID. The unique key is generated by first reversing the domain name (similar to Google creating a key identifier in BigTable), and ending with a unique user ID created by the user at registration and at first login .

ユーザＩＤを有するユーザ、ＵｓｅｒＩＤ０に関して、キーは、次の、ｃｏｍ：ｃｌｏｕｄｃｏｍｐｏｎｅｎｔｓ：ｄｂｉｎｓｔａｎｃｅ：ｕｓｅｒｉｄ０のように見える。 For a user with a user ID, UserID0, the key looks like the following: com: cloudcomponents: dinstance: userid0.

バックエンドでは、本システムは、全てのユーザのための固有のデータセルを作成する。データセルは、キャッシュアダプタが全てのユーザのために確保するキャッシュの一部分である。各データセルは、ユーザに関連付けられる固有のユーザＩＤによって識別される。 In the back end, the system creates a unique data cell for every user. A data cell is a portion of the cache that the cache adapter reserves for all users. Each data cell is identified by a unique user ID associated with the user.

図５は、ユーザのデータセルのグラフ図である。データセル内には、各々が対応するバーチカルに対するユーザデータを保持する埋め込みデータセルのグループがある。これらの埋め込みセルの各セルは、特定の産業バーチカル（industry vertical）に関する特定の固有の構造を有する。 FIG. 5 is a graph of a user's data cell. Within a data cell is a group of embedded data cells that hold user data for each corresponding vertical. Each cell of these embedded cells has a specific and unique structure for a specific industry vertical.

従来の関係データベースでは、データは、データベースの単一のインスタンスのテーブルに並べられる。所与のテーブルは、複数のエンティティに対するデータを包含する。この場合のエンティティの単純な例は、ユーザになる。各ユーザは、固有のユーザＩＤによって識別される。所与のエンティティでは、そのエンティティに関係付けられるデータを、データベースインスタンス全体にわたって複数のテーブルで拡散させることができる。数百のテーブルが存在することがあるが、これは、常にオリジナルエンティティＩＤからキーオフ（ｋｅｙｏｆｆ）する。これによって、関係データベースの単一のテーブルは、複数のユーザのためのデータを格納することができる。従来の関係データベースを使用してデータを管理するために、インスタンスが存在し、人間が複数のマシンにおけるデータの単純な区分化及び／又は複数のマシンにおけるデータのシャーディングを使用している。これは手動の介入を必要とし、モデルは、マシンの入力／出力制限、及び地理的に多様なユーザに関する１つの中心位置のデータ位置に起因して、クラウドベース環境において維持することができない。 In a traditional relational database, data is arranged in a table of a single instance of the database. A given table contains data for multiple entities. A simple example of an entity in this case would be a user. Each user is identified by a unique user ID. For a given entity, the data associated with that entity can be spread in multiple tables across the database instance. There may be hundreds of tables, which are always keyed off from the original entity ID. This allows a single table in the relational database to store data for multiple users. In order to manage data using traditional relational databases, instances exist and humans use simple partitioning of data on multiple machines and / or sharding of data on multiple machines. This requires manual intervention and the model can not be maintained in a cloud based environment due to machine input / output limitations and data location of one central location for geographically diverse users.

キャッシュアダプタは、インスタンスを自動的に作成するための能力を有する。キャッシュアダプタは、単一の連続キャッシュ（複数のマシンの確保された部分）としてＣｌｏｕｄＣａｃｈｅを作成することによって始動する。キャッシュアダプタは、構成ファイル内の「最大サイズ」パラメータに基づくそのサイズに関する上限値を有するインスタンスを作成する。インスタンスの作成後、キャッシュアダプタは、アクティブ（ログインした）ユーザデータをインスタンスに読み込む段階を開始する。インスタンスがその最大サイズに達してそれ以上データを格納できない場合、キャッシュアダプタは新しいインスタンスを作成する。キャッシュアダプタが作成できるインスタンスの数に制限値はない。記憶マップは、これらのインスタンスに格納された全てのインスタンス及びユーザＩＤのディレクトリリストを維持する。キャッシュに格納されたデータは、クライアントの消費（クライアントのレポート）に対して準備ができている処理データである。 The cache adapter has the ability to automatically create instances. The cache adapter is started by creating CloudCache as a single continuous cache (reserved part of multiple machines). The cache adapter creates an instance with an upper limit on its size based on the “maximum size” parameter in the configuration file. After creating the instance, the cache adapter begins the phase of loading active (logged in) user data into the instance. If the instance reaches its maximum size and cannot store any more data, the cache adapter creates a new instance. There is no limit on the number of instances that a cache adapter can create. The storage map maintains a directory listing of all instances and user IDs stored in these instances. The data stored in the cache is processing data that is ready for client consumption (client reports).

従来の単一のインスタンスとは対照的に、データベースの複数のインスタンスを作成することによって、本システムは、複数のインスタンス（ノード／マシン）全体に作業負荷を拡散することができる。クライアント要求が適切なデータベースインスタンスに送られると、通信は、クライアント及びインスタンスに限定される。これは、ドメイン：データベースインスタンス：インスタンス内の単純な固有のユーザＩＤとは異なるユーザＩＤから成るユーザＩＤを構築することによって実現される。更に、全てのユーザに対する全てのインスタンス内に、ユーザが登録された全てのバーチカルに関するデータを保持するためのオブジェクトグラフが存在する。 By creating multiple instances of the database, as opposed to a single instance of the prior art, the system can spread the workload across multiple instances (nodes / machines). Once the client request is sent to the appropriate database instance, communication is limited to the client and instance. This is achieved by constructing a user ID consisting of a user ID different from the domain: database instance: a simple unique user ID in the instance. Furthermore, in every instance for every user there is an object graph for holding data about every vertical with which the user is registered.

関係データベースでは、同じインスタンス内に全てのテーブルの作成を試みることができるが、テーブルの数及びデータのサイズが大きくなるにつれて、モデルは性能劣化に起因して維持できなくなる。障害の主な理由は、物理的システムの入力／出力制限又はデータベースが区分化される場合の複数のシステムの構成及びコラボレーションの制限である。 In relational databases, it is possible to try to create all the tables in the same instance, but as the number of tables and the size of the data grows, the model can not be maintained due to performance degradation. The main reason for the failure is physical system input / output limitations or multi-system configuration and collaboration limitations when the database is partitioned.

本発明の特定の実施形態のアーキテクチャは、データベースの複数のインスタンスを作成することによって入力／出力ボトルネックを取り除く。複数のインスタンスを作成することで、データの一貫性の点で全体としてシステムの複雑さが増すが、これは、ＳｔｏｒａｇｅＭａｐと呼ばれるスマートディレクトリ構造を実施することによって解決される。ＳｔｏｒａｇｅＭａｐは、起動又は停止される全てのインスタンスを追跡する。全てのインスタンスのリストに加えて、ＳｔｏｒａｇｅＭａｐは、各インスタンス内の各ユーザＩＤに基づくデータ分散を追跡する。図７はＳｔｏｒａｇｅＭａｐ及びデータセルコンテンツを示している。 The architecture of certain embodiments of the present invention removes input / output bottlenecks by creating multiple instances of the database. Creating multiple instances adds to the overall complexity of the system in terms of data consistency, which is solved by implementing a smart directory structure called StorageMap. StorageMap tracks all instances started or stopped. In addition to the list of all instances, StorageMap tracks data distribution based on each user ID in each instance. FIG. 7 shows StorageMap and data cell contents.

更に、所与のユーザＩＤのためのデータセルは、全てのデータベースインスタンスにわたってそのユーザプロファイルの唯一のインスタンスだけを有する。そのユーザプロファイルに関係付けられる全てのトランザクションが列に並べられ、その単一のインスタンス内で連続して処理される。 Furthermore, data cells for a given user ID have only one instance of that user profile across all database instances. All transactions associated with the user profile are queued and processed sequentially within the single instance.

登録又は登録後処理の一部として、ユーザは、システムに１つ又はそれ以上の産業バーチカルアプリケーションに加入することができる。各バーチカルは、産業バーチカルセグメントに関係付けられる属性を固有に取り込むデータレイアウトを有する。ユーザによる加入時のこのレイアウトは、固有のユーザプロファイルに添付される。ユーザがそのペルソナに添付できるバーチカルの数に制限はない。 As part of the registration or post-registration process, the user can subscribe to the system for one or more industrial vertical applications. Each vertical has a data layout that uniquely captures attributes associated with the industrial vertical segment. This layout at the time of subscription by the user is attached to a unique user profile. There is no limit to the number of verticals that a user can attach to the persona.

図６は、異なる観点からのインスタンスの断面図を示している。ユーザペルソナは、ユーザのアイデンティティを反映する中心オブジェクトである。様々な産業バーチカル区分を、所与のユーザペルソナに添付することができる。バーチカルは、ソーシャルネットワーキング、医療記録、コンタクト関係管理、個人及び企業媒体記憶装置、個人及び企業文書アーカイブ、金融サービス、電子ゲーム、及びコラボレーションなどの文脈に関するデータを包含することができる。 FIG. 6 shows a cross-sectional view of the instance from different points of view. The user persona is a central object that reflects the user's identity. Various industrial vertical categories can be attached to a given user persona. Vertical can include context data such as social networking, medical records, contact relationship management, personal and business media storage, personal and business document archives, financial services, electronic games, and collaborations.

図８に示すように、バックエンドは、データセルのユーザデータをクライアントマシンデータベースに同期させる。各ログイン時に、ユーザはバックエンドへのクレデンシャルとしてパスワードと共にユーザＩＤを提示する。ユーザＩＤを使用して、バックエンドはファイルシステムからデータを検索して、データセルの形式でそのユーザに対するユーザペルソナオブジェクトに対するキャッシュエントリを作成し、クライアントマシンとのデータ同期リンクを設定する。全てのクライアントマシンが以前にシステムに登録されている限り、ユーザは複数のマシンから同じユーザＩＤでログインすることができる。複数のマシンは、同じユーザＩＤでシステムにアクセスすることができ、これらのマシンがリンクされると、そのデータベースは、データセルからの最新ユーザデータと同期される。 As shown in FIG. 8, the backend synchronizes user data in the data cell with the client machine database. At each login, the user presents the user ID along with the password as a credential to the backend. Using the user ID, the back end retrieves data from the file system, creates a cache entry for the user persona object for the user in the form of data cells, and establishes a data synchronization link with the client machine. As long as all client machines have been registered in the system before, users can log in from multiple machines with the same user ID. Multiple machines can access the system with the same user ID, and when these machines are linked, their database is synchronized with the latest user data from the data cell.

データセルのデータは、バックエンド分散ファイルシステムに永久に存続する。クライアントからの要求を受信すると、バックエンドＳｔｏｒａｇｅＭａｐは、データ要求をそのクライアントのデータを含む適切なデータベースインスタンスに送る。データセルがユーザＩＤに基づいて位置付けられると、接続（通信チャネル）が、その特定のユーザＩＤに対応するデータセルインスタンスに設定される。また、バックエンドは、応答と共にセキュリティトークンを設定して送信する。次の全ての要求には、このセキュリティトークンが添付される。セキュリティトークンが無効であるか（タイムアウト又は他の理由で）又は紛失している場合、次にバックエンドは、システムに再ログインするようクライアントに要求する。 Data cell data permanently resides in the back-end distributed file system. Upon receiving a request from a client, the backend StorageMap sends a data request to the appropriate database instance containing that client's data. When a data cell is located based on a user ID, a connection (communication channel) is set to the data cell instance corresponding to that particular user ID. Also, the back end sets up and sends a security token together with the response. This security token is attached to all the following requests. If the security token is invalid (due to timeout or other reasons) or is lost, then the backend requests the client to re-login to the system.

本発明の特定の実施形態のアーキテクチャは、ソーシャルネットワーキングアプリケーション、例えばＦａｃｅｂｏｏｋに適用することができる。２００８年には、サイトがＭｙＳＱＬ専用の１，８００サーバ及びメムキャッシュ専用に８０５サーバを有したと推測される。しかしながら、複数のＭｙＳＱＬがシャードして、メムキャッシュインスタンスは単一のサーバで仮想的に実行することができるので、インフラストラクチャを実行する物理サーバの数が少なくなる。しかし最近では、４０００を超えるＭｙＳＱＬサーバがありメムキャッシュデータ専用の類似の相応の数があるといううわさがある。そのマシンの数は、２００８年からの実質的な増加を表わしており、成長予測が予定通りに進んだ場合、将来的には維持できない数が推定される。 The architecture of certain embodiments of the present invention can be applied to social networking applications, such as Facebook. In 2008, it is speculated that the site had 1,800 servers dedicated to MySQL and 805 servers dedicated to memcache. However, since a plurality of MySQLs are sharded and the memcache instance can be executed virtually on a single server, the number of physical servers executing the infrastructure is reduced. Recently, however, there are rumors that there are more than 4000 MySQL servers and there are similar numbers dedicated to Memcache data. The number of machines represents a substantial increase from 2008, and if the growth forecast goes on schedule, a number that cannot be maintained in the future is estimated.

ＭｙＳＱＬ、さらに言えば、あらゆる関係データベースは、クラウドに必要なデータサイズ及びスループットによって設計されていない。あらゆるＲＤＢＭＳの主な欠点も、ＡＣＩＤトランザクションコンプライアンス、バッファプール、又はメモリスワップスペースのロックに関係する過負荷と共に、シングルインスタンスであるということである。複数のインスタンスの作成を試みることができるが、アプリケーションは、インスタンス全体にわたるデータ分配及び複数のインスタンスの間の多相コミットによって設計する必要がある。これは、コミット障害、障害回復などの場合にロールバックのような多数のハードルを生じる。本発明の実施形態は、単一のＲＤＢＭＳコーディネータを介するのではなくクライアントと直接通信することができる複数のインスタンスを含む。 MySQL, or more specifically, any relational database is not designed with the data size and throughput required for the cloud. The main drawback of any RDBMS is that it is single instance with overload related to ACID transaction compliance, buffer pool, or memory swap space locks. Although multiple instances can be attempted, applications need to be designed with data distribution across instances and multi-phase commit between multiple instances. This creates a number of hurdles such as rollback in the case of commit failure, failure recovery, etc. Embodiments of the invention include multiple instances that can communicate directly with the client rather than through a single RDBMS coordinator.

本発明の特定の実施形態のアーキテクチャでは、互いに独立して作用するデータベースの複数のインスタンスが存在する。これは、主データベースの外側でデータ分散及び位置ディレクトリ（ＳｔｏｒａｇｅＭａｐに維持される）を最初に取り、単一のデータセル内に単一のユーザのためのデータを閉じ込めるデータセルを作成することによって実現される。発明者らのソリューションによって、並行して実行される複数のインスタンス全体にわたって作業負荷を分散することができる。前記の実行モードに対する要件は、データが全てのデータ書込みのための専用データセルに送られるよう各単一のユーザに要求することだけである。 In the architecture of certain embodiments of the present invention, there are multiple instances of the database that operate independently of each other. This is achieved by first taking the data distribution and location directory (maintained in StorageMap) outside the main database and creating data cells that confine the data for a single user within a single data cell Is done. Our solution allows the workload to be distributed across multiple instances running in parallel. The requirement for the above execution mode is only to request each single user that data is sent to a dedicated data cell for all data writes.

ユーザデータを包含する従来の関係データベースの一般テーブルを以下に表す。これは、所与のポイントでのユーザテーブルのスナップショットである。例えば、このテーブルは、テーブルの同じ数の行によって表される５００万を超えるユーザＩＤを包含することができる。
ユーザＩＤファーストネームラストネームジップコード
１ジョンスミス１０１７０
２フランククラーク１００１７
｜
｜
｜
６００００００００マークベイカー２２１５０ The following is a general table of a conventional relational database containing user data. This is a snapshot of the user table at a given point. For example, this table can contain over 5 million user IDs represented by the same number of rows of the table.
User ID First Name Last Name Zip Code
1 John Smith 10170
2 Frank Clark 10017
|
|
|
60 million mark baker 22150

あらゆるユーザが少なくとも５人の友達を有する場合、ユーザ間のこの関係を明示するテーブル、「友情テーブル」が、実施例によって作成される。
ユーザＩＤ友達＿ユーザ＿ＩＤ
１６
１５
１４
１３
１２
２１
２９
２８
２７
２６

５００２３２６５７１ If every user has at least 5 friends, a table, "friendship table", which demonstrates this relationship between the users is created according to the example.
User ID Friend_User_ID
1 6
1 5
1 4
1 3
1 2
2 1
2 9
2 8
2 7
2 6

500232657 1

従って、このテーブルは、全てのユーザに対する５人の友達に対応するために５００，０００，０００×５行の最小値を必要とする。 Therefore, this table requires a minimum of 500,000,000 × 5 rows to accommodate 5 friends for all users.

ユーザＩＤが特定のジップコードに移動することを計画しており、訪問中にジップコードのユーザの友達の誰かのカレンダー上で自由時間を有するかどうか見つけたい場合、その種類の問合せは、３つのテーブル間の統合をもたらす必要がある。索引を使用することによって問合せを最適化できるが、これはほんの開始点であり、オブジェクトグラフがより複雑になるにつれて、システムはプログラムするのが難しくなる。 If you plan to move your user ID to a specific zip code and you want to find out if you have free time on someone's calendar of your zip code user friend during a visit, that kind of query It is necessary to bring about consolidation between tables. You can optimize the query by using an index, but this is just a starting point and the system becomes more difficult to program as the object graph becomes more complex.

本発明の特定の実施形態のアーキテクチャでは、友情テーブルの実施例より１００倍大きなサイズ又はこれを超える匹敵するテーブルが存在できるが、これらのテーブルは、分散ファイルシステムに存在し、データ処理のためだけに使用される。単一のユーザのための全てのユーザに関係付けられるデータが常に単一のデータセルに存在するので、ユーザ問合せは、これらのテーブルに決して問合せない。ユーザジップコード問合せの実施例では、本発明の特定の実施形態のアーキテクチャにおいて、本出願は、そのユーザのためのデータセルに要求を送信し、ＳｔｏｒａｇｅＭａｐを使用してそのユーザのためのデータセルを位置付ける。データセルにおいて一度、本出願は、友達の数に等しい行の数を有する友情テーブルをスキャンする（５００，０００，０００×５行を備えたテーブルに問い合わせるより処理しやすいスキャン）。データの区分化と組み合わせたインデクシング、ロッキングは、相対的に負荷を軽減できるが、所与のマシンの物理的制限がボトルネックを生じる。このようなモデルは、多くの資源の割り当て（複数のノード／マシン）があっても経済的に耐えられない。要求されるジップコードにおけるユーザに対する友達の行をフィルタ処理すると、本出願は次に、友達（フィルタ処理されたユーザ）のカレンダーオブジェクトと対話することができる。このシナリオは、従来の関係データベースセットアップにおける数十億の行をスキャンするより効率的であり高速である。本発明の特定の実施形態のアーキテクチャは、クライアント要求のために設定される作業データのサイズを低減する。これは、区分にまたがる大きなテーブルの結合を実行する極めて大きな数の行又は問合せを備えたテーブルとは対照的に、単一のデータセルとの着信要求作業を有することによって実現される。 In the architecture of certain embodiments of the present invention, there can be comparable tables that are 100 times larger or larger than the friendship table example, but these tables exist in the distributed file system and are only for data processing. Used for User queries never query these tables, as the data associated with all users for a single user is always present in a single data cell. In the user zip code query example, in the architecture of a particular embodiment of the present invention, the application sends a request to a data cell for that user and uses StorageMap to locate the data cell for that user. Position. Once in the data cell, the application scans a friendship table with a number of rows equal to the number of friends (a scan that is easier to process than querying a table with 500,000,000 × 5 rows). Indexing, locking in combination with data partitioning can be relatively lightly burdened, but the physical limitations of a given machine create a bottleneck. Such a model cannot be economically tolerated even with many resource allocations (multiple nodes / machines). Having filtered the line of friends for the user in the required zip code, the application can then interact with the calendar object of the friend (the filtered user). This scenario is more efficient and faster than scanning billions of rows in a traditional relational database setup. The architecture of certain embodiments of the present invention reduces the size of working data set for client requests. This is achieved by having incoming request work with a single data cell, as opposed to a table with a very large number of rows or queries that perform large table joins across partitions.

複数のインスタンスが、作業負荷の分散を提供する。データセルが、作業負荷の分岐を更に提供する。適切に設計されたマルチスレッディッドアプリケーションはこの特徴を活用しマルチフォールドスループットを提供することができる。データセルは、データベースコントローラからのいずれの助けもなしにクライアントに同期し、これによってバックエンドと複数のクライアント間のマルチチャネルセキュア通信を同時に可能にする。分散ファイルシステムは、障害回復のための堅牢なバックエンドを提供する。メモリのデータセルを作成及び破壊するためのキャッシュアダプタの能力によって、本システムは短待ち時間で非常に応答性が良いものとなる。 Multiple instances provide workload distribution. Data cells further provide workload branching. A well-designed multithreaded application can take advantage of this feature to provide multifold throughput. The data cells are synchronized to the client without any help from the database controller, thereby enabling multi-channel secure communication between the back end and multiple clients simultaneously. Distributed file systems provide a robust back end for disaster recovery. The ability of the cache adapter to create and destroy memory data cells makes the system very responsive with low latency.

本明細書で説明した本発明の実施形態は、単に例示的なものである。当業者であれば、本開示の範囲内であることが意図される、本明細書に具体的に説明した実施形態の変形例を理解することができる。同様に、本発明は、請求項によってのみ制限される。本発明は、請求項及びその均等物の範囲内に入る場合にこれらの変形例を網羅する。 The embodiments of the invention described herein are merely exemplary. Those skilled in the art can understand variations of the embodiments specifically described herein, which are intended to be within the scope of the present disclosure. Likewise, the invention is limited only by the claims. The present invention covers these modifications when they come within the scope of the claims and their equivalents.

１００キャッシュアダプタ
１１０ウェブサーバ
１２０移動デバイス固有のアプリケーションクライアント
１２１ＨＴＭＬ５アプリケーションクライアント
１３０プレゼンテーション層
１３１プレゼンテーション層
１４０クライアントデータベース
１４１クライアントデータベース 100 cache adapter 110 web server 120 mobile device specific application client 121 HTML5 application client 130 presentation layer 131 presentation layer 140 client database 141 client database

Claims

Comprising a cache adapter configured to reserve a designated portion of memory as a contiguous space in at least one node of a networked cluster of machines located on top of a distributed file system without page-based memory allocation; A distributed cache data system,
A lightweight application server controller is embedded in the cache adapter,
The designated portion of the memory comprises essentially different physical locations from the client database at the at least one node comprising data cells configured to store, retrieve and persist data from external clients. Forms a cache that exists in
The cache adapter is configured to interface to the client database at the front end of the distributed cache data system and to the distributed file system at the back end of the distributed cache data system and bidirectionally synchronized with the client database ;
It said cache adapter, in order to access the data, is configured to provide an interface for the external client in the front end of the distributed cache data system,
The cache adapter is configured to communicate bi-directionally to the external client via a web server,
A distributed cache data system, wherein the cache adapter is configured to send data access requests to the appropriate data cell of the data cell .

The distributed cache data system of claim 1, wherein the cache adapter is configured to store the data in a relational or non-relational database table format.

The distributed cache data system of claim 1, wherein the cache adapter is configured to interface to the client database and the distributed file system using a Java database connectivity interface.

The distributed cache data system according to claim 1, wherein the cache adapter is configured to directly access the data present in the distributed file system without passing through the cache.

The cache adapter is configured to initially check whether the data is available in the cache in response to a data request from the client database ;
If the cache contains the requested data, the cache sends the data from the cache;
If the data is not present in the cache, the cache adapter retrieves the data from the distributed file system and then sends the requested data directly to the client or the data to the client database The distributed cache data system of claim 1, wherein the data is cached prior to transmission.

The cache adapter is configured to retrieve the data from the cache in a structured relationship data format and then convert the data to one of a distributed file system and an unstructured intermediate database;
The distributed cache data system of claim 1, wherein the cache adapter is further configured to convert the unstructured data from the distributed file system or the intermediate database to a cache compliant format on a return path.

The distributed cache data system according to claim 1, wherein the client database comprises one of a mobile machine specific application or an HTML application client.

The distributed cache data system of claim 7, wherein the client database comprises a presentation layer, a client database, and a persistent socket connection to the cache adapter.

The distributed cache data system of claim 1, wherein the cache is located in at least one node on top of the distributed file system.

The distributed cache data system of claim 1, wherein the cache adapter creates a unique instance of a cache for each user that functions as a relational database with an SQL interface for applications that interface to the cache adapter. .

The distributed cache data system according to claim 1, wherein the cache adapter exists before the distributed file system.

The distributed cache data system according to claim 1, wherein the cache is a contiguous space which spans all designated parts of the networked cluster of the machine described in a configuration file.

A method for distributing cache data in a data system, comprising:
Reserving a designated portion of memory as continuous space in at least one node of a networked cluster of machines located on top of a distributed file system using a cache adapter without using page-based memory allocation;
Integrating a lightweight application server controller into the cache adapter;
Using a specified portion of the memory that includes data cells configured to store, retrieve, and persist data from external clients, a cache that resides in a physical location that is essentially different from the client database. Forming stage,
Interfacing the client database at the front end of the distributed cache data system and the distributed file system at the back end of the distributed cache data system and bidirectionally synchronizing with the client database ;
And providing an interface to the external client via the web server to access the data using the cache adapter,
Bi-directionally communicating with the external client via the web server using the cache adapter;
Sending an access request to an appropriate one of the data cells using the cache adapter;
Method including.

The method of claim 13, further comprising storing the data in a relational or non-relational database table format.

The method of claim 13, further comprising interfacing the data and the distributed file system using Java database connectivity.

The method of claim 13, further comprising directly accessing the data residing in the distributed file system without passing through the cache.

A distributed cache data system,
With a cache adapter,
The cache adapter is
A lightweight application server controller embedded within the cache adapter;
Means for reserving a designated portion of memory as contiguous space for forming a cache in at least one node of a networked cluster of machines located on top of a distributed file system without using page-based memory allocation a is, specified portion of said memory stores the data from the external client, search, and Ru with a configuration data cell so as to persist, the physical location of essentially different from the client database A means of forming an existing cache;
The client database at the front end of the distributed cache data system, and means for interfacing the distributed file system at the back end of the distributed cache data system and bidirectionally synchronizing with the client database ;
Means for the dispersion provides an interface to the external client in the front end of the cache data systems, access to the data,
And means for communicating to said external client bidirectionally via the web server,
Means for sending an access request to the appropriate one of the data cells ;
A distributed cache data system comprising:

The distributed cache data system of claim 17, wherein the cache adapter further comprises means for storing the data in a relational or non-relational database table format.

18. The distributed cache data system of claim 17, wherein the cache adapter further comprises means for interfacing the client database to the distributed file system using a Java database connectivity interface.

The distributed cache data system according to claim 17, wherein said cache adapter further comprises means for directly accessing said data present in said distributed file system without passing through said cache.