JP2020154381A

JP2020154381A - Information processing system, information processing device, information processing method, and program

Info

Publication number: JP2020154381A
Application number: JP2019049583A
Authority: JP
Inventors: 侃太蔵元; Kanta Kuramoto; 貴之池上; Takayuki Ikegami; 青児斉藤; Seiji Saito; 義夫長安; Yasuo Nagayasu; 大貴鎌田; Daiki Kamata; 智耶桑山; Tomoya Kuwayama; 大平櫻井; Taihei Sakurai
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2019-03-18
Filing date: 2019-03-18
Publication date: 2020-09-24

Abstract

To provide an information processing system, an information processing device, an information processing method, and a program which can suitably collect logs from a plurality of service servers.SOLUTION: An information processing system comprises: a log collection unit which acquires non-structured log information regarding a use situation of a plurality of service servers, and groups the non-structured log information, based on at least service identification information and a time stamp to store the information in a storage unit; a dispersion-treatment unit which dispersion-treats the non-structured log information stored in the storage unit; an access authority management unit which manages an access right of a user to the grouped non-structured log information stored in the storage unit; and an aggregation and analysis unit which receives designation of a reference condition when the user refers to the grouped and dispersion-treated non-structured log information.SELECTED DRAWING: Figure 2

Description

本発明は、情報処理システム、情報処理装置、情報処理方法およびプログラムに関する。 The present invention relates to information processing systems, information processing devices, information processing methods and programs.

従来、時間経過とともに順次受け付けたログデータをログのフィールドに基づいて設定した一まとまりの操作ごとにグループ化する情報処理装置に関する技術情報が開示されている（特許文献１参照）。 Conventionally, technical information regarding an information processing apparatus that groups log data sequentially received with the passage of time for each set of operations set based on log fields has been disclosed (see Patent Document 1).

特許第６３９６６１５号公報Japanese Patent No. 6396615

しかしながら、上記従来の技術では、ログの出力規格が統一されていない複数のサービスサーバからログを収集することについて十分に考慮されていない可能性があった。 However, in the above-mentioned conventional technique, there is a possibility that the collection of logs from a plurality of service servers whose log output standards are not unified is not sufficiently considered.

本発明は、このような事情を考慮してなされたものであり、複数のサービスサーバからログを好適に収集することができる情報処理システム、情報処理装置、情報処理方法およびプログラムを提供することを目的の一つとする。 The present invention has been made in consideration of such circumstances, and provides an information processing system, an information processing device, an information processing method and a program capable of suitably collecting logs from a plurality of service servers. It is one of the purposes.

本発明の一態様は、複数のサービスサーバの利用状況に関する非構造化ログ情報を取得して、前記非構造化ログ情報を、少なくともサービス識別情報とタイムスタンプとに基づいてグループ化して記憶部に記憶させるログ収集部と、前記記憶部に記憶させる前記非構造化ログ情報を分散処理する分散処理部と、前記記憶部に記憶されたグループ化された前記非構造化ログ情報に対して利用者のアクセス権を管理するアクセス権限管理部と、グループ化され分散処理された前記非構造化ログ情報に対する利用者が参照する際の参照条件の指定を受け付ける集約・分析部と、を備える、情報処理システムである。 One aspect of the present invention is to acquire unstructured log information regarding the usage status of a plurality of service servers, and group the unstructured log information based on at least service identification information and a time stamp in a storage unit. A user for a log collecting unit to be stored, a distributed processing unit for distributed processing of the unstructured log information stored in the storage unit, and a grouped unstructured log information stored in the storage unit. Information processing that includes an access authority management unit that manages the access rights of the information processor and an aggregation / analysis unit that accepts the specification of reference conditions when the user refers to the unstructured log information that has been grouped and distributed. It is a system.

本発明の一態様によれば、複数のサービスサーバからログを好適に収集することができる。 According to one aspect of the present invention, logs can be suitably collected from a plurality of service servers.

情報処理システム１の利用環境を説明するための図。The figure for demonstrating the usage environment of the information processing system 1. 情報処理システム１の模式図。The schematic diagram of the information processing system 1. 非構造化ログ情報Ｌの一例を示す図。The figure which shows an example of the unstructured log information L. グループ化部１３０によるグループ化処理を模式的に示す図。The figure which shows typically the grouping process by the grouping unit 130. グループ化部１３０によるグループ化処理について説明するための図。The figure for demonstrating the grouping process by the grouping unit 130. サービス提供者による情報処理システム１の利用シーンについて説明するための図。The figure for demonstrating the usage scene of the information processing system 1 by a service provider. 情報処理システム１によりサービスに関連する非構造化ログ情報Ｌが収集されるまでの処理の流れの一例を示すタイミングチャート。A timing chart showing an example of a processing flow until the unstructured log information L related to a service is collected by the information processing system 1. 情報処理システム１により収集された非構造化ログ情報Ｌが参照されるまでの処理の流れの一例を示すタイミングチャート。A timing chart showing an example of a processing flow until the unstructured log information L collected by the information processing system 1 is referred to. 情報処理システム１のログ変換処理の流れの一例を示すフローチャート。The flowchart which shows an example of the flow of the log conversion process of the information processing system 1.

以下、図面を参照し、本発明の情報処理システム、情報処理装置、情報処理方法およびプログラムの実施形態について説明する。 Hereinafter, embodiments of the information processing system, information processing apparatus, information processing method, and program of the present invention will be described with reference to the drawings.

〔概要〕
情報処理システムは、複数のサービスサーバの利用履歴（ログ）を収集し、サービスを横断して収集した利用履歴を展開することを支援するシステムである。〔Overview〕
The information processing system is a system that collects usage histories (logs) of a plurality of service servers and supports the development of the collected usage histories across services.

複数のサービスサーバのそれぞれは、例えば、ユーザにより操作される端末装置からのリクエストに対応するウェブページを提供するウェブサーバ、アプリケーションが起動された端末装置と通信を行って各種情報の受け渡しを行ってコンテンツ情報を提供するアプリケーションサーバなどである。サービスサーバは、例えば、ニュースを提供するサービスやショッピングサービス、オークションサービス、マッチングサービス、金融決済サービス、ナビゲーションサービス、ウェブメールサービスなどを、ネットワークを介して提供する。 Each of the plurality of service servers, for example, communicates with a web server that provides a web page corresponding to a request from a terminal device operated by a user, and a terminal device on which an application is started to exchange various information. An application server that provides content information. The service server provides, for example, a news providing service, a shopping service, an auction service, a matching service, a financial payment service, a navigation service, a webmail service, and the like via a network.

サービスサーバのそれぞれが提供するサービスは、サービス特性に違いがあったり、サービス提供に用いるＯＳ（Operation System）やミドルウェアに違いが合ったりするため、すべてのサービスサーバにおける利用履歴の出力規格を統一することは困難である場合がある。 Since the services provided by each service server have different service characteristics and the OS (Operation System) and middleware used to provide the service are different, the output standard of the usage history of all service servers is unified. Things can be difficult.

そこで、情報処理装置は、各サービスサーバのアプリケーション構成を大きく変更することなく出力可能な形式で利用履歴を出力させるようにし、複数のサービスサーバにより提供されるウェブページのエンドユーザの利用履歴を収集して、サービスごとにグループ化する。これにより、サービスサーバが提供するサービスの利用履歴を、他のサービスで自サービスのユーザ利用履歴と同様に参照させることを実現したり、統計データ収集などの別の用途に流用したりすることを可能にする。 Therefore, the information processing device outputs the usage history in a format that can be output without significantly changing the application configuration of each service server, and collects the usage history of the end user of the web page provided by a plurality of service servers. Then group by service. As a result, the usage history of the service provided by the service server can be referred to by other services in the same way as the user usage history of the own service, or it can be diverted to another purpose such as statistical data collection. enable.

情報処理システムは、例えば、一以上のプロセッサにより実現される情報処理装置を備える。情報処理装置は、複数の参照主体（サービスサーバを用いてサービスを提供するユーザ）により参照されるログデータを管理する装置である。ログデータとは、エンドユーザの利用端末においてサービスアプリケーションが実行されることで生じるイベントを時間経過に沿って記録される履歴を収集して記録したものである。 The information processing system includes, for example, an information processing device realized by one or more processors. The information processing device is a device that manages log data referred to by a plurality of reference subjects (users who provide services using a service server). The log data is a collection of records of events generated by the execution of a service application on an end user's terminal, which are recorded over time.

情報処理装置は、例えば、複数のサービスサーバのサービス提供、特にエンドユーザの利用状況に関する利用履歴である非構造化ログ情報を取得する取得部と、非構造化ログ情報に含まれる情報のうち、少なくともサービス識別情報とタイムスタンプとに基づいてグループ化してデータレイクに記憶させるグループ化部とを備える。 The information processing device includes, for example, an acquisition unit that acquires unstructured log information that is usage history related to service provision of a plurality of service servers, particularly usage status of end users, and information included in the unstructured log information. It is provided with a grouping unit that groups at least based on the service identification information and the time stamp and stores it in the data lake.

情報処理システムは、上述の情報処理装置に加え、例えば、複数のサービスサーバと、サービスサーバにより提供されるサービスを管理するユーザ（以下、サービス提供者）の利用端末と、サービスサーバにより提供されるサービスを利用するユーザ（以下、エンドユーザ）の利用端末とネットワークを介して接続される。 In addition to the above-mentioned information processing apparatus, the information processing system is provided by, for example, a plurality of service servers, terminals used by users who manage services provided by the service servers (hereinafter, service providers), and service servers. It is connected to the user terminal of the user who uses the service (hereinafter referred to as the end user) via the network.

データレイクは、データウェアハウス（ＤＷＨ；Data Warehouse）などの記憶装置とは異なり、格納するデータのデータ構造を事前に把握しておきデータ格納のための事前設計をする必要がなく、非構造化データの記憶が可能である。データレイクは、非構造化データを記憶可能なリポジトリであると解釈されてもよい。データレイクに格納されたデータは、専用の分散処理装置を用いて読み出し時に解釈され、以後の分析等の処理に用いられる。 Unlike storage devices such as data warehouses (DWHs), data lakes are unstructured because there is no need to know the data structure of the data to be stored in advance and design in advance for data storage. Data can be stored. A data lake may be interpreted as a repository that can store unstructured data. The data stored in the data lake is interpreted at the time of reading using a dedicated distributed processing device, and is used for subsequent processing such as analysis.

［全体構成１］
図１は、情報処理システム１の利用環境を示す図である。情報処理システム１は、例えば、サービスサーバＳＳ−１〜ＳＳ−Ｎ（Ｎは自然数）およびエンドユーザ端末Ｔ−１〜Ｔ−Ｍ（Ｍは自然数）と、サービス提供者端末Ｄと、ネットワークＮＷを介して通信する。ネットワークＮＷは、例えば、ＷＡＮ（Wide Area Network）、ＬＡＮ（Local Area Network）、インターネット、プロバイダ装置、無線基地局、専用回線などのうちの一部または全部を含む。以下の説明において、個々のサービスサーバＳＳ−１〜ＳＳ−Ｎを区別しない場合には、単にサービスサーバＳＳと呼ぶ。また、以下の説明において、個々のエンドユーザ端末Ｔ−１〜Ｔ−Ｍを区別しない場合には、単にエンドユーザ端末Ｔと呼ぶ。 [Overall configuration 1]
FIG. 1 is a diagram showing a usage environment of the information processing system 1. The information processing system 1 includes, for example, service servers SS-1 to SS-N (N is a natural number), end user terminals T-1 to TM (M is a natural number), a service provider terminal D, and a network NW. Communicate via. The network NW includes, for example, a part or all of WAN (Wide Area Network), LAN (Local Area Network), the Internet, a provider device, a wireless base station, a dedicated line, and the like. In the following description, when the individual service servers SS-1 to SS-N are not distinguished, they are simply referred to as the service server SS. Further, in the following description, when the individual end user terminals T-1 to TM are not distinguished, they are simply referred to as end user terminals T.

エンドユーザ端末Ｔおよびサービス提供者端末Ｄは、例えば、スマートフォンなどの携帯電話、タブレット端末、パーソナルコンピュータ等である。サービス提供者端末Ｄは、操作に応じて、ログ情報の参照リクエスト（コマンド）を情報処理システム１に送信し、返信されたリクエストの処理結果をサービス提供者端末Ｄの表示部に表示する。 The end user terminal T and the service provider terminal D are, for example, a mobile phone such as a smartphone, a tablet terminal, a personal computer, and the like. The service provider terminal D transmits a log information reference request (command) to the information processing system 1 in response to the operation, and displays the processing result of the returned request on the display unit of the service provider terminal D.

図２は、情報処理システム１の模式図である。情報処理システム１は、例えば、情報処理装置１００と、分散処理装置２００と、集約・分析装置３００と、アクセス権限管理装置４００とを備える。 FIG. 2 is a schematic diagram of the information processing system 1. The information processing system 1 includes, for example, an information processing device 100, a distributed processing device 200, an aggregation / analysis device 300, and an access authority management device 400.

情報処理装置１００は、例えば、取得部１１０と、データレイク１２０と、グループ化部１３０とを備える。情報処理装置１００のデータレイク１２０を除くこれらの構成要素は、例えば、ＣＰＵ（Central Processing Unit）などのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。また、これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）などのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予め情報処理装置１００のＨＤＤやフラッシュメモリなどの記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ−ＲＯＭなどの着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることで情報処理装置１００のＨＤＤやフラッシュメモリにインストールされてもよい。情報処理装置１００は、「ログ収集部」の一例である。 The information processing device 100 includes, for example, an acquisition unit 110, a data lake 120, and a grouping unit 130. These components except the data lake 120 of the information processing device 100 are realized by, for example, a hardware processor such as a CPU (Central Processing Unit) executing a program (software). In addition, some or all of these components are hardware (circuits) such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), and GPU (Graphics Processing Unit). It may be realized by the part (including circuitry), or it may be realized by the cooperation of software and hardware. The program may be stored in advance in a storage device (a storage device including a non-transient storage medium) such as an HDD or a flash memory of the information processing device 100, or a removable storage such as a DVD or a CD-ROM. It is stored in a medium (non-transient storage medium), and may be installed in the HDD or flash memory of the information processing device 100 by mounting the storage medium in the drive device. The information processing device 100 is an example of a “log collecting unit”.

取得部１１０は、サービスアプリケーションＳＡまたはサービスサーバＳＳから非構造化ログ情報Ｌを取得して、データレイク１２０に格納する。取得部１１０は、非構造化ログ情報Ｌをデータレイク１２０に格納するための適した形式に変更してもよく、例えば、非構造化ログ情報Ｌをシリアライズ（並び順を整えること）したり、バイナリ化したりしてからデータレイク１２０に格納する。非構造化ログ情報Ｌについては後述する。 The acquisition unit 110 acquires the unstructured log information L from the service application SA or the service server SS and stores it in the data lake 120. The acquisition unit 110 may change the unstructured log information L to a format suitable for storing the unstructured log information L in the data lake 120. For example, the unstructured log information L may be serialized (arranged in order). It is binarized and then stored in the data lake 120. The unstructured log information L will be described later.

取得部１１０は、例えば、サービスアプリケーションＳＡがサービスサーバＳＳにリクエスト送信するタイミングで、取得部１１０にもリクエスト送信に含まれる（またはリクエストに関係する）ログ情報を送信させるＡＰＩ通信を行わせることで、サービスアプリケーションＳＡから非構造化ログ情報Ｌを取得する。また、取得部１１０は、サービスサーバＳＳがサービスアプリケーションＳＡから送信されたリクエストを受信した結果として自サーバの記憶部にログ情報を格納する場合に、ログ情報の一部または全部を送信させるＡＰＩ通信を行わせることで、サービスサーバＳＳから非構造化ログ情報Ｌを取得する。このように取得部１１０による非構造化ログ情報Ｌを取得する処理は、REST APIと称される場合がある。 For example, when the service application SA transmits a request to the service server SS, the acquisition unit 110 causes the acquisition unit 110 to perform API communication for transmitting log information included in (or related to the request) in the request transmission. , Acquires unstructured log information L from service application SA. Further, the acquisition unit 110 causes API communication to transmit a part or all of the log information when the service server SS stores the log information in the storage unit of the own server as a result of receiving the request transmitted from the service application SA. Is performed to acquire the unstructured log information L from the service server SS. The process of acquiring the unstructured log information L by the acquisition unit 110 in this way may be referred to as a REST API.

取得部１１０は、例えば、Apache Kafkaなどのメッセージキュー機能を有するオープンソースによって実現されてもよい。 The acquisition unit 110 may be realized by an open source having a message queue function such as Apache Kafka.

なお、取得部１１０は、複数設けられてもよく、サービスサーバＳＳごとに専用の取得部１１０が設けられてもよいし、一つの取得部１１０が複数のサービスサーバＳＳから非構造化ログ情報Ｌを取得するものであってもよい。 A plurality of acquisition units 110 may be provided, a dedicated acquisition unit 110 may be provided for each service server SS, or one acquisition unit 110 may be provided with unstructured log information L from the plurality of service server SSs. It may be the one to acquire.

グループ化部１３０は、データレイク１２０に格納された非構造化ログ情報Ｌに含まれる情報のうち、少なくともサービス識別情報に基づいて非構造化ログ情報Ｌをグループ化して分散処理装置２００に出力する。グループ化部１３０は、グループ化した非構造化ログ情報Ｌを圧縮してもよい。グループ化部１３０によるグループ化処理については後述する。 The grouping unit 130 groups the unstructured log information L based on at least the service identification information among the information contained in the unstructured log information L stored in the data lake 120 and outputs the unstructured log information L to the distributed processing device 200. .. The grouping unit 130 may compress the grouped unstructured log information L. The grouping process by the grouping unit 130 will be described later.

［非構造化ログ情報］
図３は、非構造化ログ情報Ｌの一例を示す図である。非構造化ログ情報Ｌには、少なくとも、サービスを識別するための情報（以下、サービス識別情報ＬＥ）が含まれる。なお、非構造化ログ情報Ｌにサービス識別情報ＬＥが含まれない場合には、取得部１１０が非構造化ログ情報Ｌの出力元のサービスサーバＳＳを識別して、サービス識別情報ＬＥを追記してもよい。 [Unstructured log information]
FIG. 3 is a diagram showing an example of unstructured log information L. The unstructured log information L includes at least information for identifying a service (hereinafter, service identification information LE). If the unstructured log information L does not include the service identification information LE, the acquisition unit 110 identifies the service server SS that is the output source of the unstructured log information L, and adds the service identification information LE. You may.

なお、以下の説明において、サービス識別情報ＬＥは、サービス種別の大区分であるProjectと、Projectよりも粒度の細かい分類区分であるDatasetとを含むものとして説明する。図４は、グループ化部１３０によるグループ化処理を模式的に示す図である。例えば、サービス識別情報ＬＥに含まれるProjectがProject1“ショッピング”である場合、DatasetはDataset10“トップページ”、Dataset11“特集ページ”、Dataset12“商品紹介ページ”、Dataset13“購入手続きページ”などのようにウェブページの特性やサービスの内容を区分する情報が含まれる。 In the following description, the service identification information LE will be described as including Project, which is a major classification of service types, and Dataset, which is a classification classification having a finer particle size than Project. FIG. 4 is a diagram schematically showing the grouping process by the grouping unit 130. For example, if the Project included in the service identification information LE is Project1 “shopping”, the Dataset is Dataset10 “top page”, Dataset11 “special page”, Dataset12 “product introduction page”, Dataset13 “purchase procedure page”, etc. Contains information that distinguishes the characteristics of web pages and the content of services.

非構造化ログ情報Ｌに含まれる構成要素や構成要素の並び順は、サービス提供者によって設定可能であり、構成要素の追加、削除、並び順変更などが随時行われてもよい。また、サービスサーバＳＳの提供するウェブページの種別や特性に応じて、それぞれのウェブページごとに異なる構成の非構造化ログ情報Ｌが出力されてもよい。 The components and the order of the components included in the unstructured log information L can be set by the service provider, and the components may be added, deleted, or the order may be changed at any time. Further, unstructured log information L having a different configuration may be output for each web page according to the type and characteristics of the web page provided by the service server SS.

非構造化ログ情報Ｌは、例えば、サービスアプリケーションＳＡやエンドユーザ端末Ｔのブラウザから、ＨＴＭＬのＰＯＳＴメソッドを用いて情報処理システム１に送信される。なお、非構造化ログ情報Ｌは、サービスアプリケーションＳＡからサービスサーバＳＳに送信されるログ情報と同一であってもよいし、サービスアプリケーションＳＡからサービスサーバＳＳに送信されるログ情報の一部が用いられてもよい。取得部１１０は、例えば、図示のようにＪＳＯＮ（JavaScript（登録商標） Object Notation）形式のログファイルとして非構造化ログ情報Ｌを取得して、データレイク１２０に格納する。 The unstructured log information L is transmitted to the information processing system 1 from the browser of the service application SA or the end user terminal T, for example, by using the POST method of HTML. The unstructured log information L may be the same as the log information transmitted from the service application SA to the service server SS, or a part of the log information transmitted from the service application SA to the service server SS is used. May be done. The acquisition unit 110 acquires the unstructured log information L as a JSON (JavaScript (registered trademark) Object Notation) format log file as shown in the figure and stores it in the data lake 120.

［全体構成２］
図２に戻り、分散処理装置２００は、例えば、Ｈａｄｏｏｐ（登録商標）などに代表される分散型ストレージによって実現されるものであって、構造化されていないメタデータを複数のデータに分割し、分割した其々のデータに対して、並列に処理を行う装置である。分散処理装置２００は、「分散処理部」の一例である。 [Overall configuration 2]
Returning to FIG. 2, the distributed processing device 200 is realized by distributed storage represented by, for example, Hadoop (registered trademark), and divides unstructured metadata into a plurality of data. It is a device that processes each divided data in parallel. The distribution processing device 200 is an example of a “distribution processing unit”.

集約・分析装置３００は、例えば、Ｈｉｖｅなどに代表されるデータウェアハウス構築環境を実現するための装置であって、データレイク１２０に格納された非構造化ログ情報Ｌを集約したり、問い合わせしたり、分析したりする。集約・分析装置３００は、サービス提供者端末Ｄにより送信されるＨＱＬ（Hibernate Query Language）を受け付ける。ＨＱＬは、リレーショナルデータベースの管理や操作を行うための問い合わせ言語であるＳＱＬ（Structured Query Language）に似た規約で記述可能な問い合わせ言語である。分散処理装置２００および集約・分析装置３００は、「収集・分析部」の一例である。 The aggregation / analysis device 300 is a device for realizing a data warehouse construction environment represented by Hive, for example, and aggregates or inquires about the unstructured log information L stored in the data lake 120. Or analyze. The aggregation / analysis device 300 receives an HQL (Hibernate Query Language) transmitted by the service provider terminal D. HQL is an query language that can be described by a convention similar to SQL (Structured Query Language), which is an query language for managing and operating a relational database. The distributed processing device 200 and the aggregation / analysis device 300 are examples of the “collection / analysis unit”.

アクセス権限管理装置４００は、サービス提供者が他のサービスを提供するサービスサーバＳＳのログ情報にアクセスしてもよいか否かを管理する。サービス提供者は、自らが管理するサービスサーバＳＳのログ情報のうち、他のサービス提供者に公開してもよいものや公開しないものをアクセス権限管理装置４００にあらかじめ設定しておく。アクセス権限管理装置４００は、「アクセス権限管理部」の一例である。 The access authority management device 400 manages whether or not the service provider may access the log information of the service server SS that provides other services. The service provider sets in advance in the access authority management device 400 what may or may not be disclosed to other service providers among the log information of the service server SS managed by the service provider. The access authority management device 400 is an example of the “access authority management unit”.

［グループ化処理］
図５は、グループ化部１３０によるグループ化処理について説明するための図である。グループ化部１３０によるグループ化処理は、例えば、４回の段階的な変換処理から構成される。 [Grouping process]
FIG. 5 is a diagram for explaining the grouping process by the grouping unit 130. The grouping process by the grouping unit 130 is composed of, for example, four stepwise conversion processes.

グループ化部１３０は、第１の変換として、非構造化ログ情報Ｌをシリアライズし、さらに非構造化ログ情報Ｌをバイナリ化する。グループ化部１３０は、第１の変換の処理結果を、例えば、所定の第１所定時間（例えば、１［ｍｉｎ］）の単位ごとにファイル化する（図５の（１））。 As the first conversion, the grouping unit 130 serializes the unstructured log information L and further binarizes the unstructured log information L. The grouping unit 130 files the processing result of the first conversion in units of, for example, a predetermined first predetermined time (for example, 1 [min]) ((1) in FIG. 5).

次に、グループ化部１３０は、第１の変換の処理結果が第１の所定個数（例えば、１０［個］）になった場合、すなわち１０［ｍｉｎ］のログ情報が集まったら、第２の変換を開始する。グループ化部１３０は、例えば、第１の変換の処理結果をサービスごとに分類して、分類結果を集約したファイルに変換する（図５の（２））。 Next, when the processing result of the first conversion becomes the first predetermined number (for example, 10 [pieces]), that is, when the log information of 10 [min] is collected, the grouping unit 130 makes a second. Start the conversion. For example, the grouping unit 130 classifies the processing result of the first conversion for each service and converts the classification result into an aggregated file ((2) in FIG. 5).

次に、グループ化部１３０は、第２の変換の処理結果が第２の所定個数（例えば、６［個］）になった場合、すなわち６０［ｍｉｎ］のログ情報が集まったら、第３の変換を開始する。グループ化部１３０は、例えば、第２の変換の処理結果を１つのファイルに集約する（図５の（３））。 Next, when the processing result of the second conversion becomes the second predetermined number (for example, 6 [pieces]), that is, when the log information of 60 [min] is collected, the grouping unit 130 makes a third. Start the conversion. For example, the grouping unit 130 aggregates the processing results of the second conversion into one file ((3) in FIG. 5).

次に、グループ化部１３０は、第４の変換として、第３の変換の処理結果を圧縮し、アクセス権限の設定を行う。グループ化部１３０は、第３の変換の処理結果を、第４の変換における圧縮処理によって、例えば、ＭＤＳ（Multiple-Dimension-Spread）形式ファイルに変換される（図５の（４））。ＭＤＳは、スキーマレスな圧縮を実現することができるオープンソースである。なお、第４の変換における圧縮処理は、ＯＲＣ形式ファイル（Ｈｉｖｅにおいて用いられるのに好適なファイルフォーマット）、ＬＺ４形式ファイル（ＬＺ４アルゴリズムと称される圧縮アルゴリズムで圧縮されたファイル）などのスキーマ参照を要する方式で変換されてもよい。グループ化部１３０は、ＯＲＣ形式またはＬＺ４形式で変換する場合、各サービスのスキーマの名称や構成が格納されたスキーマ情報ＳＤＢを参照する。 Next, as the fourth conversion, the grouping unit 130 compresses the processing result of the third conversion and sets the access authority. The grouping unit 130 converts the processing result of the third conversion into, for example, an MDS (Multiple-Dimension-Spread) format file by the compression processing in the fourth conversion ((4) in FIG. 5). MDS is open source that can achieve schemaless compression. The compression process in the fourth conversion refers to a schema such as an ORC format file (a file format suitable for use in Hive) and an LZ4 format file (a file compressed by a compression algorithm called the LZ4 algorithm). It may be converted by the required method. When converting in the ORC format or the LZ4 format, the grouping unit 130 refers to the schema information SDB in which the schema name and configuration of each service are stored.

［利用シーン］
図６は、サービス提供者による情報処理システム１の利用シーンについて説明するための図である。サービスＡを管理するサービス提供者Ｐは、例えば、「サービスＢの木曜日の閲覧結果と金曜日の閲覧結果を比較参照したい」と考え、情報処理システム１を利用するものとして説明する。 [Use scene]
FIG. 6 is a diagram for explaining a usage scene of the information processing system 1 by a service provider. The service provider P who manages the service A thinks that, for example, "wants to compare and refer to the browsing result on Thursday and the browsing result on Friday of the service B", and explains that the information processing system 1 is used.

まず、集約・分析装置３００は、サービス提供者端末Ｄから「サービスＢの木曜日の閲覧結果と金曜日の閲覧結果を比較参照したい」というサービス提供者ＰのＨＱＬリクエストを送信する（ステップＳ１）。次に、集約・分析装置３００は、ＨＱＬリクエストによって参照されるデータがサービス提供者Ｐに参照されてもよいデータであるか否か、すなわちサービス提供者Ｐにアクセス権限があるか否かを、アクセス権限管理装置４００に問い合わせ（ステップＳ２）、そのアクセス権限の問い合わせ結果（ステップＳ３）に応じて、ステップＳ４以降の処理を行うか否かを決定する。簡略化のため、ステップＳ２およびＳ３においてサービス提供者Ｐにはアクセス権限があると判定されたものとして説明する。 First, the aggregation / analysis device 300 transmits an HQL request of the service provider P, "I want to compare and refer to the browsing result on Thursday and the browsing result on Friday of the service B" from the service provider terminal D (step S1). Next, the aggregation / analysis device 300 determines whether or not the data referred to by the HQL request is data that may be referred to by the service provider P, that is, whether or not the service provider P has access authority. The access authority management device 400 is inquired (step S2), and it is determined whether or not to perform the processing after step S4 according to the inquiry result (step S3) of the access authority. For the sake of simplicity, it is assumed that the service provider P is determined to have access authority in steps S2 and S3.

次に、集約・分析装置３００は、受信したＨＱＬリクエストを解釈して分散処理装置２００に送信する（ステップＳ４）。次に、分散処理装置２００は、ステップＳ４において受信したＨＱＬリクエストの解釈結果に基づいて、データレイク１２０に格納されたデータにアクセスし（ステップＳ５）、データレイク１２０はステップＳ３においてアクセスされたデータを分散処理装置２００に送信し（ステップＳ６）、それによって分散処理装置２００および集約・分析装置３００はステップＳ２において受信したＨＱＬリクエストの解釈結果に対応する応答をサービス提供者端末Ｄに送信する（ステップＳ７、Ｓ８）。 Next, the aggregation / analysis device 300 interprets the received HQL request and transmits it to the distribution processing device 200 (step S4). Next, the distributed processing apparatus 200 accesses the data stored in the data lake 120 based on the interpretation result of the HQL request received in step S4 (step S5), and the data lake 120 accesses the data accessed in step S3. Is transmitted to the distributed processing device 200 (step S6), whereby the distributed processing device 200 and the aggregation / analysis device 300 transmit a response corresponding to the interpretation result of the HQL request received in step S2 to the service provider terminal D (step S6). Steps S7 and S8).

以上のように、サービス提供者Ｐは、サービス提供者端末Ｄを介して情報処理システム１にＨＱＬリクエストを送信することで、自らが管理するサービスＡとは異なるサービスＢにおけるログ情報を参照することができる。なお、分散処理装置２００は、既にデータレイク１２０に格納されたデータを自装置の分散ファイルシステム下に蓄積する処理が終了している場合などには、ステップＳ５およびステップＳ６に該当する処理は省略されてもよい。 As described above, the service provider P transmits an HQL request to the information processing system 1 via the service provider terminal D to refer to the log information in the service B different from the service A managed by the service provider P. Can be done. The distributed processing device 200 omits the processes corresponding to steps S5 and S6 when the process of accumulating the data stored in the data lake 120 under the distributed file system of its own device has already been completed. May be done.

［処理フローログ収集］
以下、情報処理システム１の処理フローについて説明する。図７は、情報処理システム１によりサービスに関連する非構造化ログ情報Ｌが収集されるまでの処理の流れの一例を示すタイミングチャートである。 [Processing flow log collection]
Hereinafter, the processing flow of the information processing system 1 will be described. FIG. 7 is a timing chart showing an example of the processing flow until the unstructured log information L related to the service is collected by the information processing system 1.

まず、サービスアプリケーションＳＡまたはサービスサーバＳＳは、エンドユーザによる利用履歴をログ出力する（ステップＳ１０）。次に、情報処理装置１００の取得部１１０は、ステップＳ１０において出力されたログ情報を非構造化ログ情報Ｌの形式で取得して（ステップＳ１２）、所定の形式に変換して（ステップＳ１４）、データレイク１２０に格納する（ステップＳ１６）。次に分散処理装置２００は、データレイク１２０に格納された非構造化ログ情報Ｌに対して分散処理を行う（ステップＳ１８）。なお、ステップＳ１４およびステップＳ１８に対応付いた処理詳細は後述するため、ここでの説明は割愛する。以上、本タイミングチャートの処理の説明を終了する。 First, the service application SA or the service server SS outputs the usage history by the end user as a log (step S10). Next, the acquisition unit 110 of the information processing apparatus 100 acquires the log information output in step S10 in the format of unstructured log information L (step S12), converts it into a predetermined format (step S14). , Stored in the data lake 120 (step S16). Next, the distributed processing device 200 performs distributed processing on the unstructured log information L stored in the data lake 120 (step S18). Since the details of the processes corresponding to steps S14 and S18 will be described later, the description thereof is omitted here. This is the end of the description of the processing of this timing chart.

［処理フローログ参照］
図８は、情報処理システム１により収集された非構造化ログ情報Ｌが参照されるまでの処理の流れの一例を示すタイミングチャートである。 [Refer to processing flow log]
FIG. 8 is a timing chart showing an example of the processing flow until the unstructured log information L collected by the information processing system 1 is referred to.

まず、サービス提供者端末Ｄは、サービス提供者の参照したい内容が反映されたＨＱＬリクエストを情報処理システム１に送信する（ステップＳ２０）。次に、情報処理システム１の集約・分析装置３００は、ステップＳ２０において送信されたＨＱＬリクエストを受け付け（ステップＳ２２）、サービス提供者にアクセス権限があるか否かの判定を行う（ステップＳ２４）。 First, the service provider terminal D transmits an HQL request reflecting the content that the service provider wants to refer to to the information processing system 1 (step S20). Next, the aggregation / analysis device 300 of the information processing system 1 receives the HQL request transmitted in step S20 (step S22), and determines whether or not the service provider has access authority (step S24).

サービス提供者にアクセス権限がない場合、集約・分析装置３００は、参照権限がないことを示すエラー情報をサービス提供者端末Ｄに送信して処理を終了する（ステップＳ２６）。サービス提供者にアクセス権限がある場合、分散処理装置２００は、ステップＳ２２において受信されたＨＱＬリクエストに対する応答を生成し（ステップＳ２８）、処理結果をサービス提供者端末Ｄに送信する（ステップＳ３０）。以上、本タイミングチャートの処理の説明を終了する。 If the service provider does not have the access authority, the aggregation / analysis device 300 transmits error information indicating that the service provider does not have the reference authority to the service provider terminal D, and ends the process (step S26). When the service provider has access authority, the distributed processing device 200 generates a response to the HQL request received in step S22 (step S28), and transmits the processing result to the service provider terminal D (step S30). This is the end of the description of the processing of this timing chart.

［処理フローログ変換］
図９は情報処理システム１により収集された非構造化ログ情報Ｌが変換される処理の流れの一例を示すタイミングチャートである。なお、図９に示すフローチャートは、図７のステップＳ１４およびステップＳ１８に対応付いたものである。また、図９に示すフローチャートは、図５に示した処理の流れの一例に対応付いたものである。 [Processing flow log conversion]
FIG. 9 is a timing chart showing an example of a processing flow in which the unstructured log information L collected by the information processing system 1 is converted. The flowchart shown in FIG. 9 corresponds to steps S14 and S18 of FIG. 7. Further, the flowchart shown in FIG. 9 corresponds to an example of the processing flow shown in FIG.

まず、取得部１１０は非構造化ログ情報Ｌを取得する（ステップＳ１００）。次に、取得部１１０は、取得したログ情報を第１の変換（例えば、シリアライズやバイナリ化）を行う（ステップＳ１０２）。 First, the acquisition unit 110 acquires the unstructured log information L (step S100). Next, the acquisition unit 110 performs the first conversion (for example, serialization or binarization) of the acquired log information (step S102).

次に、分散処理装置２００は、第２の変換の条件（例えば、第１の変換の処理結果が第１の所定個数になったことなど）を満たすか否かを判定する（ステップＳ１０４）。第２の変換の条件を満たすと判定されなかった場合、分散処理装置２００は、ステップＳ１００に処理を戻す。第２の変換の条件を満たすと判定された場合、分散処理装置２００は、第２の変換処理（例えば、第１の変換の処理結果をサービスごとに分類して、分類結果を集約したファイルに変換する処理）を行う（ステップＳ１０６）。 Next, the distributed processing apparatus 200 determines whether or not the second conversion condition (for example, the processing result of the first conversion has reached the first predetermined number) is satisfied (step S104). If it is not determined that the condition of the second conversion condition is satisfied, the distribution processing apparatus 200 returns the processing to step S100. When it is determined that the condition of the second conversion condition is satisfied, the distribution processing device 200 classifies the processing result of the second conversion processing (for example, the processing result of the first conversion for each service) into a file in which the classification results are aggregated. The conversion process) is performed (step S106).

次に、分散処理装置２００は、第３の変換の条件（例えば、第２の変換の処理結果が第２の所定個数になったことなど）を満たすか否かを判定する（ステップＳ１０８）。第３の変換の条件を満たすと判定されなかった場合、分散処理装置２００は、ステップＳ１００に処理を戻す。第３の変換の条件を満たすと判定された場合、分散処理装置２００は、第３の変換処理（例えば、所定個数の第２の変換の処理結果を１つのファイルに集約する処理）を行う（ステップＳ１１０）。 Next, the distributed processing apparatus 200 determines whether or not the third conversion condition (for example, the processing result of the second conversion has reached the second predetermined number) is satisfied (step S108). If it is not determined that the condition of the third conversion condition is satisfied, the distribution processing apparatus 200 returns the processing to step S100. When it is determined that the condition of the third conversion condition is satisfied, the distribution processing device 200 performs a third conversion process (for example, a process of aggregating a predetermined number of second conversion processing results into one file) ( Step S110).

次に、分散処理装置２００は、第４の変換の条件（例えば、所定の処理時間になることなど）を満たすか否かを判定する（ステップＳ１１２）。第４の変換の条件を満たすと判定されなかった場合、分散処理装置２００は、ステップＳ１００に処理を戻す。第４の変換の条件を満たすと判定された場合、分散処理装置２００は、第４の変換処理（例えば、第３の変換の処理結果をアーカイブ化する処理）を行う（ステップＳ１１４）。以上、本フローチャートの処理の説明を終了する。 Next, the distributed processing apparatus 200 determines whether or not the condition for the fourth conversion (for example, a predetermined processing time is reached) is satisfied (step S112). If it is not determined that the condition of the fourth conversion condition is satisfied, the distribution processing apparatus 200 returns the processing to step S100. When it is determined that the condition of the fourth conversion condition is satisfied, the distribution processing apparatus 200 performs a fourth conversion process (for example, a process of archiving the processing result of the third conversion) (step S114). This is the end of the description of the processing of this flowchart.

以上、説明したように、本実施形態の情報処理システム１は、複数のサービスサーバＳＳ−１〜サービスサーバＳＳ―Ｎの利用状況に関する非構造化ログ情報Ｌを取得する取得部１１０と、非構造化ログ情報Ｌに含まれる情報を、少なくともサービス識別情報ＬＥとタイムスタンプとに基づいてグループ化してデータレイク１２０に記憶させるグループ化部１３０と、データレイク１２０に記憶させる非構造化ログ情報Ｌを分散処理する分散処理装置２００と、データレイク１２０に記憶されたグループ化された非構造化ログ情報Ｌに対するサービス提供者のアクセス権限を管理するアクセス権限管理装置４００と、グループ化され分散処理された非構造化ログ情報Ｌに対してＨＱＬなどの形式でサービス提供者が参照する際の参照条件の指定を受け付ける集約・分析装置３００と、を備えることで、複数のサービスサーバＳＳから非構造化ログ情報Ｌを好適に収集することができ、さらにサービス提供者に他のサービスのログ情報を参照させることができる。 As described above, the information processing system 1 of the present embodiment has an acquisition unit 110 that acquires unstructured log information L regarding the usage status of a plurality of service servers SS-1 to SS-N, and an unstructured unit 110. The grouping unit 130 that groups the information included in the organization log information L based on at least the service identification information LE and the time stamp and stores it in the data lake 120, and the unstructured log information L that stores the information in the data lake 120. The distributed processing device 200 for distributed processing and the access authority management device 400 for managing the access authority of the service provider to the grouped unstructured log information L stored in the data lake 120 are grouped and distributed processed. By providing an aggregation / analysis device 300 that accepts the designation of reference conditions when the service provider refers to the unstructured log information L in a format such as HQL, the unstructured log can be logged from a plurality of service servers SS. The information L can be preferably collected, and the service provider can refer to the log information of another service.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the embodiments for carrying out the present invention have been described above using the embodiments, the present invention is not limited to these embodiments, and various modifications and substitutions are made without departing from the gist of the present invention. Can be added.

１情報処理システム
１００情報処理装置
１１０取得部
１３０グループ化部
２００分散処理装置
３００集約・分析装置
４００アクセス権限管理装置
ＳＳサービスサーバ 1 Information processing system 100 Information processing device 110 Acquisition unit 130 Grouping unit 200 Distributed processing device 300 Aggregation / analysis device 400 Access authority management device SS service server

Claims

A log collection unit that acquires unstructured log information related to the usage status of a plurality of service servers, groups the unstructured log information based on at least service identification information and a time stamp, and stores it in a storage unit.
A distributed processing unit that performs distributed processing of the unstructured log information stored in the storage unit,
An access authority management unit that manages user access rights to the grouped unstructured log information stored in the storage unit, and
An aggregation / analysis unit that accepts the specification of reference conditions when the user refers to the unstructured log information that has been grouped and distributed.
An information processing system equipped with.

An acquisition unit that acquires unstructured log information related to the service provision of multiple service servers,
A grouping unit that groups the information included in the unstructured log information based on at least the service identification information and stores it in the storage unit.
Information processing device equipped with.

The grouping unit groups the unstructured log information stepwise by a predetermined number or a predetermined time based on the time stamp included in the unstructured log information, and the grouped unstructured log. The information is compressed and stored in the storage unit.
The information processing device according to claim 2.

The computer
Get unstructured log information about the usage status of multiple service servers
Of the information included in the unstructured log information, the information is grouped based on at least the service identification information and the time stamp and stored in the storage unit.
Information processing method.

On the computer
Get unstructured log information about the usage status of multiple service servers
Of the information included in the unstructured log information, a process of grouping the information based on at least the service identification information and the time stamp and storing the information in the storage unit is performed.
program.