JP2024039064A

JP2024039064A - Information processing system, information processing method, and information processing program

Info

Publication number: JP2024039064A
Application number: JP2024005826A
Authority: JP
Inventors: 隼人久米村
Original assignee: Datafluct
Current assignee: Datafluct
Priority date: 2021-10-31
Filing date: 2024-01-18
Publication date: 2024-03-21
Also published as: JP2023067730A; JP7429374B2

Abstract

【課題】非構造化データを構造化データに変換するに際し、ユーザに求められるデータサイエンスに関する専門性を緩和する情報処理システム、方法及びプログラムを提供する。【解決手段】情報処理システムにおいて、情報処理装置は、次の各ステップがなされるようにプログラムを実行可能なプロセッサを備える。取得ステップＡ４では、入力データを取得する。入力データは、複数種類の非構造化データのうちの少なくとも１つを含む。種類特定ステップＡ６では、取得した入力データの形式に基づき、入力データに含まれる少なくとも１つの非構造化データの種類を特定する。生成ステップＡ１２では、取得した入力データに対して、特定した非構造化データの種類に応じた変換処理を行うことにより、所定のデータ構造を有する第１の構造化データを生成する。【選択図】図５[Problem] To provide an information processing system, method, and program that ease the level of data science expertise required of a user when converting unstructured data into structured data. [Solution] In the information processing system, the information processing device has a processor capable of executing a program to perform the following steps. In an acquisition step A4, input data is acquired. The input data includes at least one of a plurality of types of unstructured data. In a type identification step A6, a type of at least one of the unstructured data included in the input data is identified based on the format of the acquired input data. In a generation step A12, a conversion process is performed on the acquired input data according to the identified type of unstructured data, thereby generating first structured data having a predetermined data structure. [Selected Figure] Figure 5

Description

本発明は、情報処理システム、情報処理方法及び情報処理プログラムに関する。 The present invention relates to an information processing system, an information processing method, and an information processing program.

特許文献１には、非構造化情報管理システム（ＵＩＭＳ）用のシステム・アーキテクチャ、コンポーネント、および検索技法が開示されている。 U.S. Pat. No. 5,300,300 discloses system architecture, components, and search techniques for an unstructured information management system (UIMS).

ＵＩＭＳは、情報ソースの幅広いアレイに関する非構造化情報の効果的な管理および交換のためのミドルウェアとして提供することができる。このアーキテクチャは一般に、検索エンジンと、データ記憶域と、パイプライン化した文書アノテータを含む分析エンジンと、様々なアダプタとを含む。この検索技法は２レベル検索技法を利用する。検索照会は、それぞれが関連重み値を有する複数の検索部分式からなる検索演算子を含む。検索エンジンは、しきい重み値合計を超える重み値合計を有する１つまたは複数の文書を返す。検索演算子は、加重ＡＮＤ（ＷＡＮＤ）として機能するブール述部として実現される。 UIMS can serve as middleware for the effective management and exchange of unstructured information on a wide array of information sources. This architecture typically includes a search engine, data storage, an analysis engine including pipelined document annotators, and various adapters. This search technique utilizes a two-level search technique. A search query includes a search operator consisting of multiple search subexpressions, each having an associated weight value. The search engine returns one or more documents that have a total weight value that exceeds a threshold total weight value. Search operators are implemented as Boolean predicates that function as weighted ANDs (WANDs).

特開２００４－３６２５６３号公報Japanese Patent Application Publication No. 2004-362563

ところで、非構造化データは構造化データに比べてデータの取扱が困難である。また、非構造化データの種類は、テキストデータ以外にも、画像、動画、音声など多様である。このような多様な非構造化データを構造化データとして処理するためには、非構造化データの種類に応じた適切な処理が必要となる。しかし、非構造化データに対する適切な処理の特定には、データサイエンスに関する高度な専門性が要求されてきた。 By the way, unstructured data is more difficult to handle than structured data. In addition to text data, there are various types of unstructured data such as images, videos, and audio. In order to process such a variety of unstructured data as structured data, appropriate processing is required depending on the type of unstructured data. However, identifying appropriate processing for unstructured data has required a high level of expertise in data science.

本発明の一態様によれば、情報処理システムが提供される。この情報処理システムでは、次の各ステップがなされるようにプログラムを実行可能なプロセッサを備える。取得ステップでは、入力データを取得する。入力データは、複数種類の非構造化データのうちの少なくとも１つを含む。種類特定ステップでは、取得された入力データの形式に基づき、入力データに含まれる、少なくとも１つの非構造化データの種類を特定する。生成ステップでは、取得された入力データに対して、特定された非構造化データの種類に応じた変換処理を行うことにより、所定のデータ構造を有する第１の構造化データを生成する。 According to one aspect of the present invention, an information processing system is provided. This information processing system includes a processor capable of executing programs to perform the following steps. In the acquisition step, input data is acquired. The input data includes at least one of multiple types of unstructured data. In the type identifying step, at least one type of unstructured data included in the input data is identified based on the format of the acquired input data. In the generation step, first structured data having a predetermined data structure is generated by performing a conversion process on the acquired input data according to the specified type of unstructured data.

かかる情報処理システムによれば、非構造化データを構造化データに変換するに際し、ユーザに求められるデータサイエンスに関する専門性を緩和することができる。 According to such an information processing system, when converting unstructured data into structured data, it is possible to reduce the level of expertise required of the user in data science.

情報処理システム１を表す構成図である。1 is a configuration diagram showing an information processing system 1. FIG. 情報処理装置２のハードウェア構成を示すブロック図である。2 is a block diagram showing the hardware configuration of an information processing device 2. FIG. ユーザ端末３のハードウェア構成を示すブロック図である。3 is a block diagram showing the hardware configuration of a user terminal 3. FIG. プロセッサ２３が備える機能部の一例を示す図である。2 is a diagram showing an example of a functional unit included in a processor 23. FIG. 情報処理システム１において実行される情報処理の流れの一例を示すアクティビティ図である。2 is an activity diagram showing an example of the flow of information processing executed in the information processing system 1. FIG. 表示部３４に表示される第１の画像ＩＭ１の一例を示す図である。3 is a diagram showing an example of a first image IM1 displayed on a display unit 34. FIG. 表示部３４に表示される第２の画像ＩＭ２の一例を示す図である。3 is a diagram showing an example of a second image IM2 displayed on the display unit 34. FIG. 表示部３４に表示される第３の画像ＩＭ３の一例を示す図である。3 is a diagram showing an example of a third image IM3 displayed on a display unit 34. FIG. 表示部３４に表示される第４の画像ＩＭ４の一例を示す図である。FIG. 4 is a diagram showing an example of a fourth image IM4 displayed on the display unit 34. FIG.

以下、図面を用いて本発明の実施形態について説明する。以下に示す実施形態中で示した各種特徴事項は、互いに組み合わせ可能である。 Embodiments of the present invention will be described below with reference to the drawings. Various features shown in the embodiments described below can be combined with each other.

ところで、本実施形態に登場するソフトウェアを実現するためのプログラムは、コンピュータが読み取り可能な非一時的な記録媒体（Ｎｏｎ－ＴｒａｎｓｉｔｏｒｙＣｏｍｐｕｔｅｒ－ＲｅａｄａｂｌｅＭｅｄｉｕｍ）として提供されてもよいし、外部のサーバからダウンロード可能に提供されてもよいし、外部のコンピュータで当該プログラムを起動させてクライアント端末でその機能を実現（いわゆるクラウドコンピューティング）するように提供されてもよい。 By the way, the program for implementing the software appearing in this embodiment may be provided as a non-transitory computer-readable recording medium, or may be downloaded from an external server. The program may be provided in a manner that allows the program to be started on an external computer and the function thereof is realized on the client terminal (so-called cloud computing).

また、本実施形態において「部」とは、例えば、広義の回路によって実施されるハードウェア資源と、これらのハードウェア資源によって具体的に実現されうるソフトウェアの情報処理とを合わせたものも含みうる。また、本実施形態においては様々な情報を取り扱うが、これら情報は、例えば電圧・電流を表す信号値の物理的な値、０又は１で構成される２進数のビット集合体としての信号値の高低、又は量子的な重ね合わせ（いわゆる量子ビット）によって表され、広義の回路上で通信・演算が実行されうる。 Furthermore, in this embodiment, the term "unit" may include, for example, a combination of hardware resources implemented by circuits in a broad sense and software information processing that can be concretely implemented by these hardware resources. . In addition, various types of information are handled in this embodiment, and these information include, for example, the physical value of a signal value representing voltage and current, and the signal value as a binary bit collection consisting of 0 or 1. It is expressed by high and low levels or quantum superposition (so-called quantum bits), and communication and calculations can be performed on circuits in a broad sense.

また、広義の回路とは、回路（Ｃｉｒｃｕｉｔ）、回路類（Ｃｉｒｃｕｉｔｒｙ）、プロセッサ（Ｐｒｏｃｅｓｓｏｒ）、及びメモリ（Ｍｅｍｏｒｙ）等を少なくとも適当に組み合わせることによって実現される回路である。すなわち、特定用途向け集積回路（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ：ＡＳＩＣ）、プログラマブル論理デバイス（例えば、単純プログラマブル論理デバイス（ＳｉｍｐｌｅＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ：ＳＰＬＤ）、複合プログラマブル論理デバイス（ＣｏｍｐｌｅｘＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ：ＣＰＬＤ）、及びフィールドプログラマブルゲートアレイ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ：ＦＰＧＡ））等を含むものである。 Further, a circuit in a broad sense is a circuit realized by at least appropriately combining a circuit, a circuit, a processor, a memory, and the like. That is, Application Specific Integrated Circuit (ASIC), programmable logic device (for example, Simple Programmable Logic Device (SPLD)), Complex Programmable Logic Device (Complex Pr) ogrammable Logic Device: CPLD), and field This includes a field programmable gate array (FPGA) and the like.

１．ハードウェア構成
本節では、ハードウェア構成について説明する。 1. Hardware configuration This section explains the hardware configuration.

<情報処理システム１>
図１は、情報処理システム１を表す構成図である。情報処理システム１は、情報処理装置２と、ユーザ端末３と、第１のデータベースＤＢ１と、第２のデータベースＤＢ２と、を備える。情報処理装置２と、ユーザ端末３と、第１のデータベースＤＢ１と、第２のデータベースＤＢ２と、は、電気通信回線を通じて通信可能に構成されている。一実施形態において、情報処理システム１とは、１つ又はそれ以上の装置又は構成要素からなるものである。仮に例えば、情報処理装置２のみからなる場合であれば、情報処理システム１は、情報処理装置２となりうる。以下、これらの構成要素について説明する。 <Information processing system 1>
FIG. 1 is a configuration diagram showing an information processing system 1. As shown in FIG. The information processing system 1 includes an information processing device 2, a user terminal 3, a first database DB1, and a second database DB2. The information processing device 2, the user terminal 3, the first database DB1, and the second database DB2 are configured to be able to communicate through a telecommunications line. In one embodiment, the information processing system 1 is comprised of one or more devices or components. For example, if the information processing system 1 is composed of only the information processing device 2, the information processing system 1 can be the information processing device 2. These components will be explained below.

<情報処理装置２>
図２は、情報処理装置２のハードウェア構成を示すブロック図である。情報処理装置２は、通信バス２０と、通信部２１と、記憶部２２と、プロセッサ２３とを備える。通信部２１、記憶部２２、及びプロセッサ２３は、情報処理装置２の内部において通信バス２０を介して電気的に接続されている。 <Information processing device 2>
FIG. 2 is a block diagram showing the hardware configuration of the information processing device 2. As shown in FIG. The information processing device 2 includes a communication bus 20, a communication section 21, a storage section 22, and a processor 23. The communication unit 21, the storage unit 22, and the processor 23 are electrically connected via the communication bus 20 inside the information processing device 2.

<通信部２１>
通信部２１は、ＵＳＢ、ＩＥＥＥ１３９４、Ｔｈｕｎｄｅｒｂｏｌｔ（登録商標）、有線ＬＡＮネットワーク通信等といった有線型の通信手段が好ましいものの、無線ＬＡＮネットワーク通信、３Ｇ／ＬＴＥ／５Ｇ等のモバイル通信、ＢＬＵＥＴＯＯＴＨ（登録商標）通信等を必要に応じて含めてもよい。すなわち、これら複数の通信手段の集合として実施することがより好ましい。すなわち、情報処理装置２は、通信部２１及びネットワークを介して、外部から種々の情報を通信してもよい。 <Communication Department 21>
Although the communication unit 21 is preferably a wired communication means such as USB, IEEE1394, Thunderbolt (registered trademark), wired LAN network communication, etc., it is also suitable for wireless LAN network communication, mobile communication such as 3G/LTE/5G, and BLUETOOTH (registered trademark). Communication etc. may be included as necessary. That is, it is more preferable to implement it as a set of these plurality of communication means. That is, the information processing device 2 may communicate various information from the outside via the communication unit 21 and the network.

<記憶部２２>
記憶部２２は、前述の記載により定義される様々な情報を記憶する。これは、例えば、プロセッサ２３によって実行される情報処理装置２に係る種々のプログラム等を記憶するソリッドステートドライブ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ：ＳＳＤ）等のストレージデバイスとして、あるいは、プログラムの演算に係る一時的に必要な情報（引数、配列等）を記憶するランダムアクセスメモリ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ：ＲＡＭ）等のメモリとして実施されうる。記憶部２２は、プロセッサ２３によって実行される情報処理装置２に係る種々のプログラムや変数等を記憶している。 <Storage unit 22>
The storage unit 22 stores various information defined by the above description. This may be used, for example, as a storage device such as a solid state drive (SSD) that stores various programs related to the information processing device 2 executed by the processor 23, or as a temporary storage device related to program calculations. It can be implemented as a memory such as a random access memory (RAM) that stores necessary information (arguments, arrays, etc.). The storage unit 22 stores various programs, variables, etc. related to the information processing device 2 executed by the processor 23.

<プロセッサ２３>
プロセッサ２３は、情報処理装置２に関連する全体動作の処理・制御を行う。プロセッサ２３は、例えば不図示の中央処理装置（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ：ＣＰＵ）である。プロセッサ２３は、記憶部２２に記憶された所定のプログラムを読み出すことによって、情報処理装置２に係る種々の機能を実現する。すなわち、記憶部２２に記憶されているソフトウェアによる情報処理が、ハードウェアの一例であるプロセッサ２３によって具体的に実現されることで、プロセッサ２３に含まれる各機能部として実行されうる。これらについては、次節においてさらに詳述する。なお、プロセッサ２３は単一であることに限定されず、機能ごとに複数のプロセッサ２３を有するように実施してもよい。またそれらの組合せであってもよい。 <Processor 23>
The processor 23 processes and controls overall operations related to the information processing device 2 . The processor 23 is, for example, a central processing unit (CPU) not shown. The processor 23 implements various functions related to the information processing device 2 by reading predetermined programs stored in the storage unit 22. That is, information processing by software stored in the storage unit 22 is specifically implemented by the processor 23, which is an example of hardware, and can be executed as each functional unit included in the processor 23. These will be explained in more detail in the next section. Note that the processor 23 is not limited to a single processor, and may be implemented so as to have a plurality of processors 23 for each function. It may also be a combination thereof.

<ユーザ端末３>
図３は、ユーザ端末３のハードウェア構成を示すブロック図である。ユーザ端末３は、通信バス３０と、通信部３１と、記憶部３２と、プロセッサ３３と、表示部３４と、入力部３５と、を備える。通信部３１、記憶部３２、プロセッサ３３、表示部３４、及び入力部３５は、ユーザ端末３の内部において通信バス３０を介して電気的に接続されている。通信部３１、記憶部３２及びプロセッサ３３の説明は、情報処理装置２における各部の説明と同様のため省略する。 <User terminal 3>
FIG. 3 is a block diagram showing the hardware configuration of the user terminal 3. The user terminal 3 includes a communication bus 30, a communication section 31, a storage section 32, a processor 33, a display section 34, and an input section 35. The communication section 31 , the storage section 32 , the processor 33 , the display section 34 , and the input section 35 are electrically connected via the communication bus 30 inside the user terminal 3 . Descriptions of the communication unit 31, storage unit 32, and processor 33 are the same as those of each unit in the information processing device 2, and will therefore be omitted.

<表示部３４>
表示部３４は、ユーザが操作可能なグラフィカルユーザインターフェース（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ：ＧＵＩ）の画面を表示する。表示部３４は、ユーザ端末３筐体に含まれるものであってもよいし、外付けされるものであってもよい。具体的には、表示部３４は、ＣＲＴディスプレイ、液晶ディスプレイ、有機ＥＬディスプレイ、又はプラズマディスプレイ等の表示デバイスとして実施され得る。これらの表示デバイスは、ユーザ端末３の種類に応じて使い分けて実施されることが好ましい。 <Display section 34>
The display unit 34 displays a graphical user interface (GUI) screen that can be operated by the user. The display unit 34 may be included in the user terminal 3 housing, or may be externally attached. Specifically, the display unit 34 may be implemented as a display device such as a CRT display, a liquid crystal display, an organic EL display, or a plasma display. It is preferable that these display devices be used differently depending on the type of user terminal 3.

<入力部３５>
入力部３５は、ユーザによってなされた操作入力を受け付ける。操作入力は、命令信号として通信バス３０を介してプロセッサ３３に転送される。プロセッサ３３は、必要に応じて、転送された命令信号に基づいて所定の制御や演算を実行しうる。入力部３５は、ユーザ端末３の筐体に含まれるものであってもよいし、外付けされるものであってもよい。例えば、入力部３５は、表示部３４と一体となってタッチパネルとして実施されてもよい。入力部３５がタッチパネルとして実施される場合、ユーザは、入力部３５に対してタップ操作、スワイプ操作等を入力することができる。入力部３５としては、タッチパネルに代えて、スイッチボタン、マウス、ＱＷＥＲＴＹキーボード等が採用可能である。 <Input section 35>
The input unit 35 receives operational inputs made by the user. The operation input is transferred to the processor 33 via the communication bus 30 as a command signal. The processor 33 can perform predetermined control and calculations based on the transferred command signal as necessary. The input unit 35 may be included in the housing of the user terminal 3 or may be externally attached. For example, the input section 35 may be integrated with the display section 34 and implemented as a touch panel. When the input unit 35 is implemented as a touch panel, the user can input a tap operation, a swipe operation, etc. to the input unit 35. As the input unit 35, a switch button, a mouse, a QWERTY keyboard, etc. can be used instead of a touch panel.

<第１のデータベースＤＢ１、第２のデータベースＤＢ２>
図１に示すように、第１のデータベースＤＢ１は、参照データＤ０を含む種々のデータを記憶可能に構成されている。参照データＤ０は、一般に無償で利用可能ないわゆるオープンデータや、利用許諾を得る等の所定の条件を満たした場合に利用可能な限定提供データなど、任意のデータを含み得る。参照データＤ０は、例えば、人流データ、交通データ、気象データ、ＳＮＳによる発信データ、ＰＯＳデータ、衛星観測データ、インターネット上で公開されている文書データ、画像データ、音声データなどを含み得る。また、第１のデータベースＤＢ１は、ユーザ、又はユーザが所属する組織が管理する情報を記憶していてもよい。当該情報は、当該組織から権限を付与されたユーザのみがアクセス可能に構成されている。第１のデータベースＤＢ１の具体的態様は任意であるが、例えば、ソリッドステートドライブ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ：ＳＳＤ）等のストレージデバイスとして実施され得る。第１のデータベースＤＢ１の数は任意であり、１つであっても複数であってもよい。第１のデータベースＤＢ１は、上記データ以外にも、情報処理装置２やユーザ端末３等の任意のデバイスによって実行される種々のプログラム等を記憶していてもよい。 <First database DB1, second database DB2>
As shown in FIG. 1, the first database DB1 is configured to be able to store various data including reference data D0. The reference data D0 may include arbitrary data, such as so-called open data that is generally available free of charge, and limited provision data that is available when predetermined conditions such as obtaining usage permission are met. The reference data D0 may include, for example, human flow data, traffic data, weather data, SNS transmission data, POS data, satellite observation data, document data published on the Internet, image data, audio data, and the like. Further, the first database DB1 may store information managed by the user or the organization to which the user belongs. The information is configured so that only users authorized by the organization can access it. Although the specific aspect of the first database DB1 is arbitrary, it may be implemented as a storage device such as a solid state drive (SSD), for example. The number of first databases DB1 is arbitrary and may be one or more. In addition to the above-mentioned data, the first database DB1 may store various programs that are executed by arbitrary devices such as the information processing device 2 and the user terminal 3.

第２のデータベースＤＢ２は、情報処理装置２によって出力される種々のデータを記憶可能に構成されている。第２のデータベースＤＢ２の具体的態様は任意であり、第１のデータベースＤＢ１と同様に実施され得る。本実施形態では、第２のデータベースＤＢ２は、第１の記憶領域ＤＢ２１と、第２の記憶領域ＤＢ２２と、を備える。第１の記憶領域ＤＢ２１には、情報処理装置２に入力される入力データＤ１が記憶される。第２の記憶領域ＤＢ２２には、情報処理装置２から出力されるデータが記憶される。 The second database DB2 is configured to be able to store various data output by the information processing device 2. The specific aspect of the second database DB2 is arbitrary and can be implemented in the same manner as the first database DB1. In this embodiment, the second database DB2 includes a first storage area DB21 and a second storage area DB22. Input data D1 input to the information processing device 2 is stored in the first storage area DB21. The second storage area DB22 stores data output from the information processing device 2.

２．情報処理装置２の機能構成
図４は、プロセッサ２３が備える機能部の一例を示す図である。図４に示すように、プロセッサ２３は、取得部２３１と、特定部２３２と、生成部２３３と、表示処理部２３４と、を備える。 2. Functional Configuration of Information Processing Device 2 FIG. 4 is a diagram illustrating an example of functional units included in the processor 23. As shown in FIG. 4, the processor 23 includes an acquisition section 231, a specification section 232, a generation section 233, and a display processing section 234.

<取得部２３１>
取得部２３１は、取得ステップを実行可能に構成されている。取得部２３１は、ユーザ端末３又は他のデバイスからの情報を取得可能に構成されている。例えば、取得部２３１は、ユーザ端末３や第１のデータベースＤＢ１から入力される入力データＤ１の入力を取得可能に構成されている。取得部２３１は、記憶部２２の少なくとも一部であるストレージ領域に記憶されている種々のデータを読み出し、読み出されたデータを記憶部２２の少なくとも一部である作業領域に書き込むことで、種々のデータを取得可能に構成されている。ストレージ領域とは、例えば、記憶部２２のうち、ＳＳＤ等のストレージデバイスとして実施される領域である。作業領域とは、例えば、ＲＡＭ等のメモリとして実施される領域である。取得部２３１は、記憶部３２、第１のデータベースＤＢ１、及び第２のデータベースＤＢ２等の情報処理装置２以外のデバイスに記憶されている種々のデータを、記憶部２２のストレージ領域に記憶されている種々の情報と同様に取得可能に構成されている。 <Acquisition unit 231>
The acquisition unit 231 is configured to be able to execute the acquisition step. The acquisition unit 231 is configured to be able to acquire information from the user terminal 3 or other devices. For example, the acquisition unit 231 is configured to be able to acquire input data D1 input from the user terminal 3 or the first database DB1. The acquisition unit 231 reads various data stored in a storage area that is at least a part of the storage unit 22 and writes the read data to a work area that is at least a part of the storage unit 22. It is configured so that data can be obtained. The storage area is, for example, an area of the storage unit 22 that is implemented as a storage device such as an SSD. The work area is, for example, an area implemented as a memory such as a RAM. The acquisition unit 231 retrieves various data stored in devices other than the information processing apparatus 2, such as the storage unit 32, the first database DB1, and the second database DB2, from the storage area of the storage unit 22. It is configured so that it can be obtained in the same way as various other information.

<特定部２３２>
特定部２３２は、取得された種々の情報に基づき、種類特定ステップ、候補特定ステップ、及び処理特定ステップ等の種々の特定ステップを実行可能に構成されている。特定部２３２は、取得部２３１によって取得されたデータに関する種々の情報に基づき、情報処理に用いられる候補、情報又は条件などを特定可能に構成されている。 <Specific part 232>
The specifying unit 232 is configured to be able to execute various specifying steps such as a type specifying step, a candidate specifying step, and a process specifying step based on the various acquired information. The identifying unit 232 is configured to be able to identify candidates, information, conditions, etc. used in information processing based on various information regarding the data acquired by the acquiring unit 231.

<生成部２３３>
生成部２３３は、生成ステップを実行可能に構成されている。生成部２３３は、種々のデータ、特に構造化データＤ２を生成可能に構成されている。例えば、生成部２３３は、取得され、又は特定された種々の情報に基づき予め定められた演算処理を実行することにより、種々のデータを生成する。 <Generation unit 233>
The generation unit 233 is configured to be able to execute the generation step. The generation unit 233 is configured to be able to generate various data, particularly structured data D2. For example, the generation unit 233 generates various data by performing predetermined arithmetic processing based on various acquired or specified information.

<表示処理部２３４>
表示処理部２３４は、表示処理ステップを実行可能に構成されている。表示処理部２３４は、受け付けられ、又は生成されたデータ等に基づき、種々の情報を表示可能に構成されている。当該情報は、ユーザ端末３の表示部３４又は他のデバイスを介して、ユーザに提示可能である。かかる場合、例えば、表示処理部２３４は、画面、静止画又は動画を含む画像、アイコン、メッセージ等の視覚情報を、ユーザ端末３の表示部３４に表示させるように制御する。表示処理部２３４は、視覚情報をユーザ端末３に表示させるためのレンダリング情報だけを生成してもよい。なお、表示処理部２３４は、ユーザ端末３又は他のデバイスユーザを介さずに、出力された情報をユーザに対して提示してもよい。 <Display processing unit 234>
The display processing unit 234 is configured to be able to execute display processing steps. The display processing unit 234 is configured to be able to display various information based on received or generated data. The information can be presented to the user via the display unit 34 of the user terminal 3 or another device. In such a case, for example, the display processing unit 234 controls the display unit 34 of the user terminal 3 to display visual information such as a screen, an image including a still image or a moving image, an icon, a message, and the like. The display processing unit 234 may generate only rendering information for displaying visual information on the user terminal 3. Note that the display processing unit 234 may present the output information to the user without going through the user terminal 3 or other device users.

３．情報処理について
本節では、情報処理システム１において実行される情報処理について説明する。 3. About Information Processing In this section, information processing executed in the information processing system 1 will be explained.

３．１．情報処理の流れについて
図５は、情報処理システム１において実行される情報処理の流れの一例を示すアクティビティ図である。なお、当該情報処理は、図示されない任意の例外処理を含みうる。例外処理は、当該情報処理の中断や、各処理の省略を含む。当該情報処理にて行われる選択又は入力は、ユーザによる操作に基づくものでも、ユーザの操作に依らず自動で行われるものでもよい。 3.1. About the flow of information processing FIG. 5 is an activity diagram showing an example of the flow of information processing executed in the information processing system 1. As shown in FIG. Note that the information processing may include any exception processing not shown. Exception handling includes interruption of the information processing and omission of each process. The selection or input performed in the information processing may be based on a user's operation, or may be automatically performed without depending on a user's operation.

［アクティビティＡ１］
まず、アクティビティＡ１にて、ユーザ端末３のプロセッサ３３は、ユーザからの操作に基づき、アクセス要求を情報処理装置２に送信する。アクセス要求は、ユーザが所有するユーザアカウント名やパスワードなど、ユーザの情報処理装置２へのアクセス権限の有無を示す情報を含み得る。 [Activity A1]
First, in activity A1, the processor 33 of the user terminal 3 transmits an access request to the information processing device 2 based on a user's operation. The access request may include information indicating whether the user has access authority to the information processing device 2, such as a user account name and password owned by the user.

［アクティビティＡ２］
情報処理装置２がユーザ端末３からのアクセス要求を受信した場合、情報処理装置２は、アクティビティＡ２の処理を行い、ユーザ認証を実行する。情報処理装置２は、アクセス要求に含まれる、情報処理装置２へのアクセス権限の有無を示す情報を、予め登録されたユーザ情報と照合することにより、当該ユーザが情報処理装置２にアクセスすることが可能であるか否かを判定する。ユーザ情報は、例えば、記憶部２２に記憶されている。ユーザ情報は、例えば、ユーザの氏名、役職、権限等、ユーザに関する任意の情報を含み得る。ユーザ認証の結果、ユーザが情報処理装置２にアクセスすることが不能であると判定された場合、情報処理を終了し、ユーザ端末３を介して再度アクセス要求の入力を受け付ける。 [Activity A2]
When the information processing device 2 receives an access request from the user terminal 3, the information processing device 2 processes activity A2 and performs user authentication. The information processing device 2 checks the information included in the access request indicating whether the user has access authority to the information processing device 2 with pre-registered user information, thereby allowing the user to access the information processing device 2. Determine whether or not it is possible. The user information is stored in the storage unit 22, for example. The user information may include any information regarding the user, such as the user's name, position, authority, and the like. As a result of the user authentication, if it is determined that the user is unable to access the information processing device 2, the information processing is ended and the input of the access request is accepted again via the user terminal 3.

［アクティビティＡ３］
一方、ユーザ認証の結果、ユーザが情報処理装置２にアクセスすることが可能であると判定された場合、処理がアクティビティＡ３に進む。アクティビティＡ３にて、ユーザ端末３は、ユーザからの操作に応じて、データの送信指令を情報処理装置２に出力する。例えば、ユーザ端末３は、ユーザによるデータの入力を受け付けた場合に、当該データを情報処理装置２に送信する旨の送信指令を出力し、当該データを情報処理装置２に送信する。送信指令は、ユーザがアクセス可能な、第１のデータベースＤＢ１に記憶されているデータ、例えば、参照データＤ０、の取得要求を含み得る。ユーザがアクセス可能であるか否かは、ユーザ情報に基づいて判断される。 [Activity A3]
On the other hand, if it is determined that the user can access the information processing device 2 as a result of user authentication, the process proceeds to activity A3. In activity A3, the user terminal 3 outputs a data transmission command to the information processing device 2 in response to a user's operation. For example, when the user terminal 3 receives data input by the user, it outputs a transmission command to transmit the data to the information processing device 2, and transmits the data to the information processing device 2. The transmission command may include a request to obtain data stored in the first database DB1 that is accessible by the user, for example, reference data D0. Whether or not the user can access is determined based on user information.

［アクティビティＡ４］
次に、処理がアクティビティＡ４に進み、取得部２３１は、アクティビティＡ３にて受け付けられた送信指令に基づき、記憶部２２、記憶部３２、第１のデータベースＤＢ１等の種々の情報源から、入力データＤ１を取得する。例えば、取得部２３１は、ユーザ端末３から送信されるデータを取得する。また、取得部２３１は、送信指令に参照データＤ０の取得要求が含まれている場合、第１のデータベースＤＢ１から参照データＤ０を取得する。以下、説明の便宜上、アクティビティＡ４にて取得されるデータを総称して、入力データＤ１という。入力データＤ１の単位は、ファイル単位やフォルダ単位など任意である。 [Activity A4]
Next, the process proceeds to activity A4, and the acquisition unit 231 acquires input data from various information sources such as the storage unit 22, the storage unit 32, and the first database DB1 based on the transmission command received in the activity A3. Get D1. For example, the acquisition unit 231 acquires data transmitted from the user terminal 3. Furthermore, when the transmission command includes a request to acquire the reference data D0, the acquisition unit 231 acquires the reference data D0 from the first database DB1. Hereinafter, for convenience of explanation, the data acquired in activity A4 will be collectively referred to as input data D1. The unit of the input data D1 is arbitrary, such as a file unit or a folder unit.

本実施形態の入力データＤ１は、複数種類の非構造化データのうちの少なくとも１つを含む。非構造化データは、構造化データのように標準化された所定のデータ構造を有しない任意の形式のデータである。言い換えれば、構造化データとは、所定のデータ構造を有するデータであり、非構造化データは、構造化データ以外のデータである。所定のデータ構造は、データを管理するために予め定められた規則である。データ構造は、配列、構造体など、ツリー構造などの任意の形式で構造化される。データ構造は、既存の標準規格に則ったものであっても、ユーザや情報処理システム１の提供事業者などによって構築されたものであってもよい。情報処理装置２は、このように構築されたデータ構造のフォーマットを参照可能に構成されている。情報処理装置２は、データ構造のフォーマットを参照することにより、非構造化データを、フォーマットに従ったデータ構造を有する構造化データに変換することができる。なお、当該フォーマットは、記憶部２２に記憶されていても、第１のデータベースＤＢ１等の情報処理装置２以外の記憶媒体等に記憶されていてもよい。 The input data D1 of this embodiment includes at least one of multiple types of unstructured data. Unstructured data is any type of data that does not have a standardized, predetermined data structure like structured data. In other words, structured data is data that has a predetermined data structure, and unstructured data is data other than structured data. The predetermined data structure is a predetermined rule for managing data. Data structures may be structured in any format such as arrays, structures, etc., tree structures, etc. The data structure may be based on existing standards or may be constructed by a user, a provider of the information processing system 1, or the like. The information processing device 2 is configured to be able to refer to the format of the data structure constructed in this way. By referring to the format of the data structure, the information processing device 2 can convert unstructured data into structured data having a data structure according to the format. Note that the format may be stored in the storage unit 22 or in a storage medium other than the information processing device 2, such as the first database DB1.

本実施形態における構造化データは、半構造化データを含み得る。半構造化データは、非構造化データと、当該非構造化データを識別可能なタグ（言い換えればアノテーション）と、の組み合わせからなる。半構造化データは、識別可能なタグによって構築されるデータ構造によって構造化された構造化データともいえる。半構造化データの形式は、例えば、グラフ型、キーバリュー型、ドキュメント型、カラム型などである。また、半構造化データの形式は、データ記述言語（例えばｊｓｏｎ形式）、マークアップ言語（例えばｘｍｌ形式）など、所定のコンピュータ言語を用いて表現される形式を含む。例えば、非構造化データの種類は、画像、動画、音声、三次元空間データ、及び時系列データのうちの少なくとも１つを含む。なお、非構造化データの種類はこれに限られず、データ構造を有しない文書データや、二次元図面データなどを含み得る。 The structured data in this embodiment may include semi-structured data. Semi-structured data consists of a combination of unstructured data and a tag (in other words, an annotation) that can identify the unstructured data. Semi-structured data can also be said to be structured data that is structured by a data structure constructed using identifiable tags. The format of semi-structured data is, for example, graph type, key value type, document type, column type, etc. Further, the format of semi-structured data includes a format expressed using a predetermined computer language, such as a data description language (eg, JSON format) and a markup language (eg, XML format). For example, the type of unstructured data includes at least one of images, moving images, audio, three-dimensional spatial data, and time series data. Note that the types of unstructured data are not limited to these, and may include document data without a data structure, two-dimensional drawing data, and the like.

［アクティビティＡ５］
次に、処理がアクティビティＡ５に進み、取得部２３１は、取得された入力データＤ１を、第２のデータベースＤＢ２の第１の記憶領域ＤＢ２１に格納する。第１の記憶領域ＤＢ２１は、取得した生データを格納する、いわゆるデータレイクとして機能し得る。また、取得部２３１は、格納された入力データＤ１に関する情報を取得する。当該情報は、例えば、入力データＤ１が第１の記憶領域ＤＢ２１のどこに記憶されているか（例えば、ファイルパス）、入力データＤ１が第１の記憶領域ＤＢ２１に記憶されたタイミング（例えば、タイムスタンプ）、入力データＤ１のバージョン情報などを含み得る。情報処理装置２は、これらの情報に基づき第１の記憶領域ＤＢ２１に格納されている入力データＤ１にアクセスすることができる。 [Activity A5]
Next, the process proceeds to activity A5, and the acquisition unit 231 stores the acquired input data D1 in the first storage area DB21 of the second database DB2. The first storage area DB21 can function as a so-called data lake that stores acquired raw data. The acquisition unit 231 also acquires information regarding the stored input data D1. The information includes, for example, where the input data D1 is stored in the first storage area DB21 (for example, a file path), and the timing at which the input data D1 was stored in the first storage area DB21 (for example, a timestamp). , version information of the input data D1, etc. The information processing device 2 can access the input data D1 stored in the first storage area DB21 based on this information.

［アクティビティＡ６］
次に、処理がアクティビティＡ６に進み、特定部２３２は、入力データＤ１の形式に基づき、入力データＤ１に含まれる、少なくとも１つの非構造化データの種類を特定する。入力データＤ１の形式は、入力データＤ１の拡張子を含む。例えば、特定部２３２は、予め定められた拡張子と非構造化データの種類との対応関係を用いて、入力データＤ１の拡張子に基づき、入力データＤ１に含まれる非構造化データの種類を特定する。これにより、情報処理システム１が非構造化データの種類を特定する際の処理負荷を軽減することができる。なお、非構造化データの種類の特定方法はこれに限られず任意である。例えば、特定部２３２は、特定種類の非構造化データを処理可能なソフトウェアで処理可能か否かに基づき、非構造化データの種類を特定してもよい。 [Activity A6]
Next, the process proceeds to activity A6, where the identifying unit 232 identifies at least one type of unstructured data included in the input data D1 based on the format of the input data D1. The format of the input data D1 includes the extension of the input data D1. For example, the identifying unit 232 uses the correspondence relationship between a predetermined extension and the type of unstructured data to identify the type of unstructured data included in the input data D1 based on the extension of the input data D1. Identify. Thereby, the processing load when the information processing system 1 identifies the type of unstructured data can be reduced. Note that the method for specifying the type of unstructured data is not limited to this and is arbitrary. For example, the identifying unit 232 may identify the type of unstructured data based on whether software that can process a specific type of unstructured data can process it.

［アクティビティＡ７］
次に、処理がアクティビティＡ７に進み、特定部２３２は、特定された非構造化データの種類に基づき、入力データＤ１に対して行われる変換処理が属するカテゴリＣの候補を特定する。 [Activity A7]
Next, the process proceeds to activity A7, and the identifying unit 232 identifies candidates for category C to which the conversion process to be performed on the input data D1 belongs, based on the identified type of unstructured data.

<変換処理>
変換処理は、非構造化データを構造化データに変換するための一連の処理である。情報処理装置２は、変換処理を実行することにより、非構造化データを含む入力データＤ１を構造化データに変換することができる。変換処理は、画像認識処理、音声認識処理、時系列処理、自然言語処理など、非構造化データから特徴量を抽出可能な任意の処理を含む。変換処理としては、教師あり学習、教師なし学習、強化学習など任意のアルゴリズムのものを採用可能である。 <Conversion process>
The conversion process is a series of processes for converting unstructured data into structured data. The information processing device 2 can convert the input data D1 including unstructured data into structured data by executing the conversion process. The conversion processing includes any processing that can extract feature amounts from unstructured data, such as image recognition processing, speech recognition processing, time series processing, and natural language processing. As the conversion process, any algorithm such as supervised learning, unsupervised learning, and reinforcement learning can be used.

変換処理は、分析処理を含み得る。分析処理は、抽出された特徴量や特徴量の統計分布などを入力として所定の分析結果を出力する処理である。分析処理は、例えば、分類分析、回帰分析、時系列分析、レコメンド分析、異常検知、クラスタリング、画像解析、及びテキスト解析などを含み得る。情報処理装置２は、当該分析処理による分析結果を、入力データＤ１を特徴づける特徴量として取得してもよい。 The conversion process may include an analysis process. The analysis process is a process that outputs a predetermined analysis result by inputting extracted feature quantities, statistical distribution of the feature quantities, and the like. The analysis processing may include, for example, classification analysis, regression analysis, time series analysis, recommendation analysis, anomaly detection, clustering, image analysis, text analysis, and the like. The information processing device 2 may acquire the analysis result of the analysis process as a feature amount characterizing the input data D1.

<カテゴリＣ>
カテゴリＣは、指定カテゴリＣ１、用途カテゴリＣ２、収集カテゴリＣ３などを含む。 <Category C>
Category C includes a designated category C1, a usage category C2, a collection category C3, and the like.

<指定カテゴリＣ１>
指定カテゴリＣ１は、画像、動画、音声、文書、３次元データなど、非構造化データの種類によって規定される。ユーザが入力データＤ１に含まれる非構造化データの種類を予め把握している場合、ユーザは、指定カテゴリＣ１を指定することにより、入力データＤ１に含まれる非構造化データの種類を直接指定することができる。これにより、ユーザの便宜を図ることができる。 <Specified category C1>
The designated category C1 is defined by the type of unstructured data, such as images, moving images, audio, documents, and three-dimensional data. If the user knows in advance the type of unstructured data included in the input data D1, the user directly specifies the type of unstructured data included in the input data D1 by specifying the specified category C1. be able to. Thereby, user convenience can be achieved.

<用途カテゴリＣ２>
用途カテゴリＣ２は、入力データＤ１の用途によって規定されている。入力データＤ１の用途としては、例えば、従業員情報を含む非構造化データを含む入力データＤ１（例えば、履歴書や報告書など）を用いたＩＤプラットフォームの構築や、請求書等の会計資料を用いた財務分析などが挙げられる。これにより、ユーザによる入力データＤ１の利用態様に応じた変換処理の特定が容易となる。 <Application category C2>
The usage category C2 is defined by the usage of the input data D1. The input data D1 can be used, for example, to construct an ID platform using the input data D1 (for example, resumes, reports, etc.) containing unstructured data including employee information, or to store accounting materials such as invoices. Examples include the financial analysis used. Thereby, it becomes easy to specify the conversion process according to the manner in which the input data D1 is used by the user.

<カテゴリＣ３>
収集カテゴリＣ３は、入力データＤ１の収集態様によって規定される。収集カテゴリＣ３は、例えば、自然会話を含むか否か、入力データＤ１の収集手段や情報源などにより分類される。これにより、ユーザが入力データＤ１の収集態様を把握している場合、ユーザが当該入力データＤ１に応じて適切な収集カテゴリを指定することによって、変換処理の特定精度を向上させることができる。入力データＤ１の収集手段とは、例えば、センサの種類、センサの規格、センサによる測定対象などにより特定される。情報源とは、例えば、情報の提供事業者、提供時期、収集に用いられたウェブサイト（例えば、ＵＲＬやＳＮＳの種類）などによって特定される。 <Category C3>
Collection category C3 is defined by the collection mode of input data D1. The collection category C3 is classified, for example, depending on whether natural conversation is included or not, and the collection means and information source of the input data D1. As a result, when the user knows the manner in which the input data D1 is collected, the user can specify an appropriate collection category according to the input data D1, thereby improving the identification accuracy of the conversion process. The means for collecting input data D1 is specified by, for example, the type of sensor, the standard of the sensor, the object to be measured by the sensor, and the like. The information source is specified by, for example, the information provider, the time of provision, the website used for collection (for example, URL or SNS type), and the like.

なお、上記複数のカテゴリＣ１，Ｃ２，Ｃ３と、当該複数のカテゴリＣ１，Ｃ２，Ｃ３のそれぞれに属する変換処理との対応関係は、例えば、記憶部２２に記憶されている。当該対応関係は、ユーザによって設定されても、情報処理システム１の提供主体によって設定されても、機械学習等を用いて導出される分類によって設定されてもよい。特定部２３２は、当該対応関係を用いて、変換処理が属するカテゴリＣの候補を特定する。 Note that the correspondence between the plurality of categories C1, C2, and C3 and the conversion processes belonging to each of the plurality of categories C1, C2, and C3 is stored in the storage unit 22, for example. The correspondence relationship may be set by the user, by the provider of the information processing system 1, or by classification derived using machine learning or the like. The identifying unit 232 uses this correspondence to identify candidates for category C to which the conversion process belongs.

［アクティビティＡ８］
次に、処理がアクティビティＡ８に進み、ユーザ端末３は、特定されたカテゴリＣの候補に関する情報を表示部３４に表示させる。 [Activity A8]
Next, the process proceeds to activity A8, and the user terminal 3 causes the display unit 34 to display information regarding the identified category C candidates.

［アクティビティＡ９］
次に、処理がアクティビティＡ９に進み、ユーザ端末３は、ユーザによるカテゴリＣの指定を受け付ける。次に、ユーザ端末３は、受け付けたカテゴリＣの指定を情報処理装置２に送信する。 [Activity A9]
Next, the process proceeds to activity A9, and the user terminal 3 accepts the designation of category C by the user. Next, the user terminal 3 transmits the received designation of category C to the information processing device 2.

［アクティビティＡ１０］
次に、処理がアクティビティＡ１０に進み、取得部２３１は、送信された、ユーザからのカテゴリＣの指定を取得する。 [Activity A10]
Next, the process proceeds to activity A10, and the acquisition unit 231 acquires the transmitted designation of category C from the user.

［アクティビティＡ１１］
次に、処理がアクティビティＡ１１に進み、特定部２３２は、取得された当該指定に基づき、指定されたカテゴリＣに属する変換処理を特定する。特定部２３２は、指定されたカテゴリＣに属する変換処理のうちの少なくとも一部を、入力データＤ１に対して行う変換処理として特定する。変換処理を特定することは、複数の変換処理のうちのどの変換処理を用いるかを特定することに限られず、入力データＤ１に対して行われる変換処理の順序なども含み得る。 [Activity A11]
Next, the process proceeds to activity A11, and the identifying unit 232 identifies the conversion process that belongs to the designated category C based on the acquired designation. The specifying unit 232 specifies at least part of the conversion processes belonging to the specified category C as conversion processes to be performed on the input data D1. Specifying a conversion process is not limited to specifying which conversion process to use among a plurality of conversion processes, but may also include the order of conversion processes to be performed on the input data D1.

［アクティビティＡ１２］
次に、処理がアクティビティＡ１２に進み、生成部２３３は、取得された入力データＤ１に対して、特定された非構造化データの種類に応じた変換処理を行う。これにより、生成部２３３は、所定のデータ構造を有する第１の構造化データＤ２１を生成する。本実施形態では、第１の構造化データＤ２１は、１つの入力データＤ１（例えば、ファイルごとやフォルダごとの入力データＤ１）に対して少なくとも１つ生成される。生成部２３３は、生成された第１の構造化データＤ２１や、入力データＤ１を第２のデータベースＤＢ２の第２の記憶領域ＤＢ２２に格納する。第２の記憶領域ＤＢ２２は、生成された構造化データを格納する、いわゆるデータウェアハウス（ＤＷＨ）として機能し得る。また、取得部２３１は、格納された第１の構造化データＤ２１に関する情報を取得する。当該情報は、例えば、第１の構造化データＤ２１が第２の記憶領域ＤＢ２２のどこに記憶されているか（例えば、ファイルパス）、第１の構造化データＤ２１が第２の記憶領域ＤＢ２２に記憶されたタイミング（例えば、タイムスタンプ）、第１の構造化データＤ２１のバージョン情報などを含み得る。取得部２３１は、これらの情報に基づき第２の記憶領域ＤＢ２２に格納されている第１の構造化データＤ２１にアクセスすることができる。また、生成部２３３は、入力データＤ１と第１の構造化データＤ２１との対応関係を生成し、例えば、第２の記憶領域ＤＢ２２に格納する。これにより、情報処理装置２による第１の構造化データＤ２１の情報源の参照が高速化される。以下、説明の便宜上、生成部２３３によって生成される構造化データや第２の記憶領域ＤＢ２２に格納される構造化データを総称して、「構造化データＤ２」と表記する。 [Activity A12]
Next, the process proceeds to activity A12, where the generation unit 233 performs a conversion process on the acquired input data D1 according to the identified type of unstructured data. Thereby, the generation unit 233 generates first structured data D21 having a predetermined data structure. In this embodiment, at least one piece of first structured data D21 is generated for each piece of input data D1 (for example, input data D1 for each file or folder). The generation unit 233 stores the generated first structured data D21 and input data D1 in the second storage area DB22 of the second database DB2. The second storage area DB22 can function as a so-called data warehouse (DWH) that stores generated structured data. The acquisition unit 231 also acquires information regarding the stored first structured data D21. The information includes, for example, where the first structured data D21 is stored in the second storage area DB22 (for example, a file path), and where the first structured data D21 is stored in the second storage area DB22. The information may include timing (for example, time stamp), version information of the first structured data D21, and the like. The acquisition unit 231 can access the first structured data D21 stored in the second storage area DB22 based on this information. Further, the generation unit 233 generates a correspondence between the input data D1 and the first structured data D21, and stores it in the second storage area DB22, for example. This speeds up the reference of the information source of the first structured data D21 by the information processing device 2. Hereinafter, for convenience of explanation, the structured data generated by the generation unit 233 and the structured data stored in the second storage area DB22 will be collectively referred to as "structured data D2."

［アクティビティＡ１３］
次に、複数の入力データＤ１が生成され、生成された第１の構造化データＤ２１が複数存在する場合、処理がアクティビティＡ１３に進み、生成部２３３は、複数の第１の構造化データＤ２１を結合する結合処理を行う。これにより、生成部２３３は、第２の構造化データＤ２２を生成する。なお、第１の構造化データＤ２１が複数でない場合、例えば、第１の構造化データＤ２１が単数である場合、生成部２３３は、当該結合処理を省略してもよい。また、生成部２３３は、生成された第１の構造化データＤ２１が複数存在する場合であっても、ユーザの操作に応じて結合処理を実行するか否かを決定してもよい。これにより、ユーザの意に反して第１の構造化データＤ２１が結合されることを抑制することができる。 [Activity A13]
Next, if a plurality of input data D1 are generated and a plurality of generated first structured data D21 exist, the process proceeds to activity A13, and the generation unit 233 generates a plurality of first structured data D21. Perform the join process to join. Thereby, the generation unit 233 generates the second structured data D22. Note that if there is not a plurality of first structured data D21, for example, if there is a single number of first structured data D21, the generation unit 233 may omit the combining process. Furthermore, even if there are a plurality of generated first structured data D21, the generation unit 233 may decide whether to perform the combination process according to the user's operation. Thereby, it is possible to prevent the first structured data D21 from being combined against the user's will.

生成された第１の構造化データＤ２１が複数存在する場合とは、１度の変換処理によって生成された第１の構造化データＤ２１が複数存在する場合に限られない。例えば、過去の変換処理によって生成され、第２の記憶領域ＤＢ２２に格納されている第１の構造化データＤ２１が複数存在する場合やこれらを組み合わせた場合も、生成された第１の構造化データＤ２１が複数存在する場合に該当し得る。 The case where there is a plurality of generated first structured data D21 is not limited to the case where there is a plurality of first structured data D21 generated by one conversion process. For example, even when there is a plurality of first structured data D21 generated by past conversion processing and stored in the second storage area DB22, or when these are combined, the generated first structured data This may apply when there are multiple D21s.

第２の構造化データＤ２２の数は、結合処理が行われる第１の構造化データＤ２１の数より少ない。例えば、生成部２３３は、結合処理を行うことにより、全ての第１の構造化データＤ２１が結合された、単一の第２の構造化データＤ２２を生成する。なお、第１の構造化データＤ２１のデータ構造と第２の構造化データＤ２２のデータ構造とは互いに対応関係がある。好ましくは、第２の構造化データＤ２２のデータ構造は、結合される複数の第１の構造化データＤ２１が有するデータ構造を含む。 The number of second structured data D22 is smaller than the number of first structured data D21 on which the combination process is performed. For example, the generation unit 233 performs a combination process to generate a single second structured data D22 in which all the first structured data D21 are combined. Note that there is a correspondence between the data structure of the first structured data D21 and the data structure of the second structured data D22. Preferably, the data structure of the second structured data D22 includes the data structure of the plurality of first structured data D21 to be combined.

具体例として、結合処理の対象となるある第１の構造化データＤ２１がデータ構造として第１の成分と第２の成分とによって構成される配列（第１の成分、第２の成分）＝（Ａ１，Ｂ１）によって現され、結合処理の対象となる他の第１の構造化データＤ２１がデータ構造として第２の成分と第３の成分とによって構成される配列（第２の成分、第３の成分）＝（Ｂ２，Ｃ２）によって表される場合について説明する。この場合、生成部２３３は、これらの第１の構造化データＤ２１に対して結合処理を行うことにより、データ構造として第１の成分、第２の成分、及び第３の成分を含む構造体を有する第２の構造化データＤ２２を生成する。第２の構造化データＤ２２の表現形式は任意であるが、例えば、（第１の成分、第２の成分、第３の成分）として、（Ａ１，Ｂ１，０）という配列と（０、Ｂ２，Ｃ２）という配列のそれぞれを行又は列の成分として含む行列として表現可能である。 As a specific example, the first structured data D21 to be subjected to the combination processing has the data structure as an array (first component, second component) that is composed of a first component and a second component. A1, B1), and the other first structured data D21 to be subjected to the combination processing is an array (second component, third A case represented by (components of ) = (B2, C2) will be explained. In this case, the generation unit 233 performs a combination process on these first structured data D21 to create a structure including the first component, second component, and third component as a data structure. The second structured data D22 having the following structure is generated. The expression format of the second structured data D22 is arbitrary, but for example, as (first component, second component, third component), an array (A1, B1, 0) and an array (0, B2 , C2) as row or column components.

生成部２３３は、結合処理によって生成された第２の構造化データＤ２２を、入力データＤ１と第２の構造化データＤ２２との対応関係とともに第２の記憶領域ＤＢ２２に格納する。第２の構造化データＤ２２及び当該対応関係の具体的な格納態様は、例えば、第１の構造化データＤ２１等の格納態様と同様である。なお、生成部２３３は、第１の構造化データＤ２１とは別に第２の構造化データＤ２２を第２の記憶領域ＤＢ２２に格納しても、第１の構造化データＤ２１に代えて第２の構造化データＤ２２を第２の記憶領域ＤＢ２２に格納してもよい。第２の構造化データＤ２２は、構造化データＤ２の一態様である。 The generation unit 233 stores the second structured data D22 generated by the combination process in the second storage area DB22 along with the correspondence between the input data D1 and the second structured data D22. The specific storage manner of the second structured data D22 and the corresponding relationship is, for example, the same as the storage manner of the first structured data D21 and the like. Note that even if the generation unit 233 stores the second structured data D22 in the second storage area DB22 separately from the first structured data D21, the generation unit 233 may store the second structured data D22 in place of the first structured data D21. The structured data D22 may be stored in the second storage area DB22. The second structured data D22 is one aspect of the structured data D2.

［アクティビティＡ１４］
次に、処理がアクティビティＡ１４に進み、表示処理部２３４は、表示処理を実行する。これにより、表示処理部２３４は、構造化データの要素をユーザによって編集可能な編集領域Ｒ１を含む画像を表示部３４に表示させる。なお、編集領域Ｒ１によって編集可能な構造化データは、第１の構造化データＤ２１であっても、第２の構造化データＤ２２であってもよい。好ましくは、編集領域Ｒ１によって編集可能な構造化データＤ２は、第２の記憶領域ＤＢ２２に格納されている最新の構造化データＤ２であり、結合処理が行われた場合は第２の構造化データＤ２２である。以下、説明の便宜上、編集領域Ｒ１によって編集される構造化データは、第２の構造化データＤ２２であるものとして取り扱う。編集領域Ｒ１は、第２の構造化データＤ２２の各要素が視認可能な編集視覚情報ＩＦ１によって編集可能に構成されていても、文字列等によって規定されるコマンドの入力によって第２の構造化データＤ２２の要素を編集可能に構成されていてもよい。生成部２３３は、編集領域Ｒ１に対する操作に応じて、構造化データ（詳細には第２の構造化データＤ２２）の編集態様を指示する編集指令を生成する。 [Activity A14]
Next, the process proceeds to activity A14, and the display processing unit 234 executes display processing. Thereby, the display processing unit 234 causes the display unit 34 to display an image including the editing area R1 in which the elements of the structured data can be edited by the user. Note that the structured data that can be edited using the editing area R1 may be the first structured data D21 or the second structured data D22. Preferably, the structured data D2 that can be edited by the editing area R1 is the latest structured data D2 stored in the second storage area DB22, and if the combination processing is performed, the structured data D2 is the latest structured data D2 stored in the second storage area DB22. It is D22. Hereinafter, for convenience of explanation, the structured data edited by the editing area R1 will be treated as the second structured data D22. Even if each element of the second structured data D22 is configured to be editable by the visible editing visual information IF1, the editing area R1 can be edited by inputting a command defined by a character string or the like. The elements of D22 may be configured to be editable. The generation unit 233 generates an editing command that instructs the editing mode of the structured data (specifically, the second structured data D22) in response to the operation on the editing area R1.

［アクティビティＡ１５］
次に、処理がアクティビティＡ１５に進み、取得部２３１は、生成された編集指令に応じて、第２の構造化データＤ２２に対して編集処理を行う。これにより、生成部２３３は、第２の構造化データＤ２２の要素が編集された、第３の構造化データＤ２３を生成する。第３の構造化データＤ２３は、構造化データＤ２の一態様である。 [Activity A15]
Next, the process proceeds to activity A15, and the acquisition unit 231 performs editing processing on the second structured data D22 in accordance with the generated editing command. Thereby, the generation unit 233 generates third structured data D23 in which the elements of the second structured data D22 are edited. The third structured data D23 is one aspect of the structured data D2.

生成部２３３は、編集処理によって生成された第３の構造化データＤ２３を、入力データＤ１と第３の構造化データＤ２３との対応関係とともに第２の記憶領域ＤＢ２２に格納する。第３の構造化データＤ２３及び当該対応関係の具体的な格納態様は、例えば、第１の構造化データＤ２１等の格納態様と同様である。なお、生成部２３３は、第１の構造化データＤ２１とは別に第２の構造化データＤ２２を第２の記憶領域ＤＢ２２に格納しても、第１の構造化データＤ２１に代えて第２の構造化データＤ２２を第２の記憶領域ＤＢ２２に格納してもよい。 The generation unit 233 stores the third structured data D23 generated by the editing process in the second storage area DB22 along with the correspondence between the input data D1 and the third structured data D23. The specific storage manner of the third structured data D23 and the corresponding relationship is, for example, the same as the storage manner of the first structured data D21 and the like. Note that even if the generation unit 233 stores the second structured data D22 in the second storage area DB22 separately from the first structured data D21, the generation unit 233 may store the second structured data D22 in place of the first structured data D21. The structured data D22 may be stored in the second storage area DB22.

編集処理の終了後、情報処理システム１は、情報処理を終了する。 After the editing process ends, the information processing system 1 ends the information processing.

３．２．情報処理の結果として表示される画像について
本節では、上記情報処理が行われた結果、表示部３４に表示される画像について説明する。当該画像は、第１の画像ＩＭ１と、第２の画像ＩＭ２と、第３の画像ＩＭ３と、第４の画像ＩＭ４と、を含む。 3.2. About images displayed as a result of information processing In this section, images displayed on the display section 34 as a result of the above information processing will be explained. The images include a first image IM1, a second image IM2, a third image IM3, and a fourth image IM4.

<第１の画像ＩＭ１>
図６は、表示部３４に表示される第１の画像ＩＭ１の一例を示す図である。第１の画像ＩＭ１は、例えば、アクティビティＡ３の処理の際に表示部３４に表示される。第１の画像ＩＭ１は、第１のウィンドウ４と、第２のウィンドウ５と、を含む。 <First image IM1>
FIG. 6 is a diagram showing an example of the first image IM1 displayed on the display unit 34. The first image IM1 is displayed on the display unit 34, for example, during processing of activity A3. The first image IM1 includes a first window 4 and a second window 5.

<第１のウィンドウ４>
第１のウィンドウ４は、変換処理の対象となる入力データＤ１に関する情報を表示可能に構成されている。入力データＤ１に関する情報とは、例えば、入力データＤ１の名称（いわゆるファイル名）、第１のウィンドウ４は、入力データ検索領域４１と、入力データリスト表示領域４２と、第１の操作領域４３と、を含む。 <First window 4>
The first window 4 is configured to be able to display information regarding the input data D1 to be subjected to conversion processing. The information regarding the input data D1 is, for example, the name of the input data D1 (so-called file name), the first window 4 has an input data search area 41, an input data list display area 42, a first operation area 43, etc. ,including.

<入力データ検索領域４１>
入力データ検索領域４１は、検索条件を入力することにより入力データＤ１を検索可能に構成されている。また、入力データ検索領域４１は、入力データＤ１に関する情報を検索項目として指定可能に構成されている。これにより、複数の入力データＤ１が入力されている場合に、上記情報処理を行う入力データＤ１の指定が容易となる。なお、入力データ検索領域４１は、参照データＤ０を検索対象に含めるか否かを検索条件として検索可能に構成されていてもよい。これにより、目的に応じた参照データＤ０の利活用が容易となる。 <Input data search area 41>
The input data search area 41 is configured to be able to search the input data D1 by inputting search conditions. Furthermore, the input data search area 41 is configured such that information regarding the input data D1 can be specified as a search item. Thereby, when a plurality of input data D1 are input, it becomes easy to specify the input data D1 to be subjected to the above information processing. Note that the input data search area 41 may be configured to be searchable using whether or not the reference data D0 is included in the search target as a search condition. This makes it easy to utilize the reference data D0 according to the purpose.

<入力データリスト表示領域４２>
入力データリスト表示領域４２には、変換処理の対象となる入力データＤ１の候補が一覧可能に表示される。入力データ検索領域４１を用いた検索が行われている場合、入力データリスト表示領域４２には、上記入力データＤ１の候補のうち、検索条件に適合する入力データＤ１が一覧可能に表示される。入力データリスト表示領域４２は、ユーザによる情報処理を行う入力データＤ１の指定を受付可能に構成されている。情報処理システム１は、入力データリスト表示領域４２にて指定された入力データＤ１に対して変換処理等を実行することで、第１の構造化データＤ２１等を生成する。 <Input data list display area 42>
In the input data list display area 42, candidates for the input data D1 to be subjected to the conversion process are displayed in a viewable manner. When a search is being performed using the input data search area 41, the input data list display area 42 displays input data D1 that match the search conditions among the candidates for the input data D1. The input data list display area 42 is configured to be able to accept a user's designation of input data D1 for information processing. The information processing system 1 generates first structured data D21 and the like by performing conversion processing and the like on the input data D1 specified in the input data list display area 42.

<第１の操作領域４３>
第１の操作領域４３は、変換処理の対象となる入力データＤ１を追加するためのＵＩであり、例えば、図示されるようなボタン型のオブジェクトである。第１の操作領域４３は、ボタン型のオブジェクトに代えて、テキストのハイパーリンクが採用されてもよい。第１の操作領域４３の操作によって、第２のウィンドウ５が表示される。 <First operation area 43>
The first operation area 43 is a UI for adding input data D1 to be converted, and is, for example, a button-shaped object as illustrated. For the first operation area 43, a text hyperlink may be used instead of a button-shaped object. The second window 5 is displayed by operating the first operation area 43.

<第２のウィンドウ５>
第２のウィンドウ５は、変換処理の対象となる入力データＤ１を追加可能に構成されている。第２のウィンドウ５は、形式指定領域５１と、少なくとも１つの第１の入力データ表示領域５２と、第２の操作領域５３と、を含む。 <Second window 5>
The second window 5 is configured to be able to add input data D1 to be subjected to conversion processing. The second window 5 includes a format specification area 51 , at least one first input data display area 52 , and a second operation area 53 .

<形式指定領域５１>
形式指定領域５１は、追加される入力データＤ１の形式をユーザによって指定可能に構成されている。例えば、追加される入力データＤ１が、請求書を表す画像データである場合、当該入力データＤ１の形式を請求書形式に指定可能に構成されている。特定部２３２は、指定された入力データＤ１の形式に応じて、入力データＤ１に対して行う変換処理が属するカテゴリの候補を特定してもよい。また、例えば、特定部２３２は、変換処理としてアノテーションの付与を行うに際し、指定された請求書形式に応じた当該アノテーションの候補を、カテゴリの候補として特定してもよい。また、特定部２３２は、追加された入力データＤ１が請求書を表すものである、という情報をアノテーションとして付与してもよい。 <Format specification area 51>
The format designation area 51 is configured to allow the user to designate the format of the input data D1 to be added. For example, when the input data D1 to be added is image data representing a bill, the format of the input data D1 can be specified as the bill format. The specifying unit 232 may specify candidates for the category to which the conversion process performed on the input data D1 belongs, according to the specified format of the input data D1. Further, for example, when adding an annotation as a conversion process, the specifying unit 232 may specify an annotation candidate corresponding to a specified bill format as a category candidate. Further, the specifying unit 232 may add information that the added input data D1 represents a bill as an annotation.

<第１の入力データ表示領域５２>
第１の入力データ表示領域５２の１つは、追加される入力データＤ１を指定可能に構成されている。例えば、ユーザは、第１の入力データ表示領域５２を操作することにより追加される入力データＤ１を指定する。また、第１の入力データ表示領域５２の１つは、指定された入力データＤ１を視覚的に表現する画像を、プレビュー画像として表示可能に構成されている。また、第１の入力データ表示領域５２は、追加される入力データＤ１の第１の記憶領域ＤＢ２１内での保存先を指定可能に構成されている。さらに、第１の入力データ表示領域５２の１つは、第１の記憶領域ＤＢ２１内に格納される入力データＤ１の名称を指定可能に構成されている。 <First input data display area 52>
One of the first input data display areas 52 is configured to be able to specify input data D1 to be added. For example, the user specifies input data D1 to be added by operating the first input data display area 52. Further, one of the first input data display areas 52 is configured to be able to display an image that visually represents the specified input data D1 as a preview image. Further, the first input data display area 52 is configured to be able to specify a storage destination within the first storage area DB21 of the input data D1 to be added. Further, one of the first input data display areas 52 is configured such that the name of the input data D1 stored in the first storage area DB21 can be specified.

<第２の操作領域５３>
第２の操作領域５３は、指定された入力データＤ１の追加を実行するか否かを指定可能に構成されたＵＩであり、例えば、図示されるようなボタン型のオブジェクトである。第２の操作領域５３は、ボタン型のオブジェクトに代えて、テキストのハイパーリンクが採用されてもよい。 <Second operation area 53>
The second operation area 53 is a UI configured to be able to specify whether or not to add specified input data D1, and is, for example, a button-shaped object as illustrated. For the second operation area 53, a text hyperlink may be used instead of a button-shaped object.

<第２の画像ＩＭ２>
図７は、表示部３４に表示される第２の画像ＩＭ２の一例を示す図である。第２の画像ＩＭ２は、ユーザがカテゴリＣを指定するための画像である。第２の画像ＩＭ２は、アクティビティＡ８の処理の際に表示される。第２の画像ＩＭ２は、少なくとも１つのカテゴリ指定領域６を含む。 <Second image IM2>
FIG. 7 is a diagram showing an example of the second image IM2 displayed on the display unit 34. The second image IM2 is an image for specifying category C by the user. The second image IM2 is displayed during the processing of activity A8. The second image IM2 includes at least one category designation area 6.

<カテゴリ指定領域６>
カテゴリ指定領域６は、ユーザがカテゴリＣを指定可能なＵＩである。詳細には、カテゴリ指定領域６は、指定カテゴリＣ１として、画像並びに動画、音声、及び三次元空間データのいずれかを指定可能に構成されている。また、カテゴリ指定領域６は、用途カテゴリＣ２として、ＩＤプラットフォーム、地理空間、及びＯＣＲ処理のいずれかを指定可能に構成されている。カテゴリ指定領域６は、収集カテゴリＣ３として、センサによる収集、ＳＮＳからの収集、及び自然会話を対象とする収集のいずれかを指定可能に構成されている。入力データＤ１に含まれる非構造化データの種類に応じて推奨されるカテゴリＣに対応するカテゴリ指定領域６の表示態様は、他のカテゴリＣに対応するカテゴリ指定領域６の表示態様と異なる。具体的には、推奨されるカテゴリＣに対応するカテゴリ指定領域６の輪郭は、強調表示Ｌ１によってユーザが他のカテゴリ指定領域６に比べて視認しやすく構成されている。なお、当該表示態様の差異は、輪郭に限られず、色、大きさなどによって実現されてもよい。カテゴリＣが指定された後、アクティビティＡ１０～アクティビティＡ１２の処理が行われ、入力データＤ１が生成される。その後、変換処理が第３の画像ＩＭ３が表示部３４に表示される。 <Category specification area 6>
Category designation area 6 is a UI that allows the user to designate category C. Specifically, the category designation area 6 is configured to be able to designate any one of images, moving images, audio, and three-dimensional spatial data as the designated category C1. Further, the category designation area 6 is configured to be able to designate any one of ID platform, geospatial, and OCR processing as the usage category C2. The category designation area 6 is configured to be able to specify any one of collection by sensors, collection from SNS, and collection targeting natural conversation as the collection category C3. The display mode of the category designation area 6 corresponding to the category C recommended according to the type of unstructured data included in the input data D1 is different from the display mode of the category designation area 6 corresponding to other categories C. Specifically, the outline of the category designation area 6 corresponding to the recommended category C is configured to be more visible to the user than other category designation areas 6 due to the highlighted display L1. Note that the difference in the display mode is not limited to the outline, but may be realized by color, size, etc. After category C is designated, activities A10 to A12 are processed, and input data D1 is generated. Thereafter, the third image IM3 is displayed on the display unit 34 after the conversion process.

<第３の画像ＩＭ３>
図８は、表示部３４に表示される第３の画像ＩＭ３の一例を示す図である。第３の画像ＩＭ３は、第２の構造化データＤ２２に関する情報を表示可能に構成されている。第３の画像ＩＭ３は、第２の入力データ表示領域７と、構造化データ表示領域８と、第１の保存操作領域９と、を含む。 <Third image IM3>
FIG. 8 is a diagram showing an example of the third image IM3 displayed on the display section 34. The third image IM3 is configured to be able to display information regarding the second structured data D22. The third image IM3 includes a second input data display area 7, a structured data display area 8, and a first storage operation area 9.

<第２の入力データ表示領域７>
第２の入力データ表示領域７には、生成された入力データＤ１に関する情報が表示される。本実施形態では、第２の入力データ表示領域７には入力データＤ１に含まれる請求書の画像が表示される。 <Second input data display area 7>
In the second input data display area 7, information regarding the generated input data D1 is displayed. In this embodiment, the image of the bill included in the input data D1 is displayed in the second input data display area 7.

<構造化データ表示領域８>
構造化データ表示領域８には、生成された構造化データＤ２に関する情報が表示される。詳細には、構造化データ表示領域８には、生成された構造化データＤ２に含まれる要素が一覧可能に表示される。表示態様は、グラフ形式、木構造形式、表形式など、データ構造に応じて決定されればよい。本実施形態では、構造化データＤ２の要素が二次元の表形式で一覧可能に表示されている。 <Structured data display area 8>
In the structured data display area 8, information regarding the generated structured data D2 is displayed. Specifically, in the structured data display area 8, the elements included in the generated structured data D2 are displayed in a viewable manner. The display mode may be determined depending on the data structure, such as a graph format, a tree structure format, or a table format. In this embodiment, the elements of the structured data D2 are displayed in a two-dimensional table format so that they can be viewed at a glance.

アクティビティＡ１３の結合処理が行われた場合、構造化データ表示領域８は、第１の領域８１と、第２の領域８２と、を含む。第１の領域８１及び第２の領域８２のそれぞれには、結合処理によって生成された第２の構造化データＤ２２のうち、異なる入力データＤ１の要素が表示される。特に、第１の領域８１には、最新の変換処理によって生成された第１の構造化データＤ２１の要素が表示される。第１の領域８１の表示態様と第２の領域８２の表示態様は、互いに異なる。これにより、第１の領域８１と第２の領域８２とは視覚的に区別可能に構成されている。例えば、第１の領域８１の色は、第２の領域８２の色と異なる。なお、アクティビティＡ１３の結合処理が行われない場合、構造化データ表示領域８には、入力データＤ１に対応する第１の構造化データＤ２１が表示される。この場合、構造化データ表示領域８は、第１の領域８１及び第２の領域８２のいずれか一方のみを含んでもよい。 When the combining process of activity A13 is performed, structured data display area 8 includes a first area 81 and a second area 82. In each of the first area 81 and the second area 82, different elements of the input data D1 of the second structured data D22 generated by the combination process are displayed. In particular, in the first area 81, elements of the first structured data D21 generated by the latest conversion process are displayed. The display mode of the first area 81 and the display mode of the second area 82 are different from each other. Thereby, the first area 81 and the second area 82 are configured to be visually distinguishable. For example, the color of the first region 81 is different from the color of the second region 82. Note that if the combining process of activity A13 is not performed, first structured data D21 corresponding to input data D1 is displayed in structured data display area 8. In this case, the structured data display area 8 may include only one of the first area 81 and the second area 82.

本実施形態の構造化データ表示領域８は、編集領域Ｒ１としても機能し得る。例えば、構造化データ表示領域８に表示される第２の構造化データＤ２２の要素に対してユーザが編集操作を行うことにより、第３の構造化データＤ２３を生成することが可能である。この場合、構造化データ表示領域８は、第２の構造化データＤ２２の各要素が視認可能な編集視覚情報ＩＦ１として機能する。 The structured data display area 8 of this embodiment can also function as an editing area R1. For example, the third structured data D23 can be generated by the user performing an editing operation on the elements of the second structured data D22 displayed in the structured data display area 8. In this case, the structured data display area 8 functions as edited visual information IF1 in which each element of the second structured data D22 can be visually recognized.

<第１の保存操作領域９>
第１の保存操作領域９は、入力データＤ１に対応する構造化データＤ２の保存を指示するためのＵＩであり、例えば、図示されるようなボタン型のオブジェクトである。第２の操作領域５３は、ボタン型のオブジェクトに代えて、テキストのハイパーリンクが採用されてもよい。第１の保存操作領域９が操作されることにより、アクティビティＡ１３の処理が行われる。これにより、構造化データ表示領域８に表示されている構造化データＤ２（例えば、第２の構造化データＤ２２）が第２の記憶領域ＤＢ２２に格納される。その後、第４の画像ＩＭ４が表示部３４に表示される。 <First save operation area 9>
The first save operation area 9 is a UI for instructing to save structured data D2 corresponding to input data D1, and is, for example, a button-shaped object as shown in the figure. For the second operation area 53, a text hyperlink may be used instead of a button-shaped object. The processing of activity A13 is performed by operating the first storage operation area 9. As a result, the structured data D2 (eg, second structured data D22) displayed in the structured data display area 8 is stored in the second storage area DB22. After that, the fourth image IM4 is displayed on the display section 34.

<第４の画像ＩＭ４>
図９は、表示部３４に表示される第４の画像ＩＭ４の一例を示す図である。第４の画像ＩＭ４は、生成された構造化データＤ２（例えば、第２の構造化データＤ２２）を編集するための画像である。第４の画像ＩＭ４は、構造化データ検索領域１０と、構造化データ編集領域１１と、編集結果表示領域１２と、第２の保存操作領域１３と、を含む。 <Fourth image IM4>
FIG. 9 is a diagram showing an example of the fourth image IM4 displayed on the display unit 34. The fourth image IM4 is an image for editing the generated structured data D2 (for example, the second structured data D22). The fourth image IM4 includes a structured data search area 10, a structured data editing area 11, an editing result display area 12, and a second storage operation area 13.

<構造化データ検索領域１０>
構造化データ検索領域１０は、検索条件入力領域１０１と、検索結果表示領域１０２と、データセット追加領域１０３と、を含む。 <Structured data search area 10>
The structured data search area 10 includes a search condition input area 101, a search result display area 102, and a dataset addition area 103.

<検索条件入力領域１０１>
検索条件入力領域１０１は、第２の記憶領域ＤＢ２２内に格納されている構造化データＤ２を検索するための検索条件を入力可能に構成されている。検索条件は、文字列によって指定されても、構造化データＤ２の容量や生成日などによって指定されてもよい。 <Search condition input area 101>
The search condition input area 101 is configured such that search conditions for searching the structured data D2 stored in the second storage area DB22 can be input. The search condition may be specified by a character string, or by the capacity, generation date, etc. of the structured data D2.

<検索結果表示領域１０２>
検索結果表示領域１０２は、検索条件入力領域１０１に入力された検索条件に合致する構造化データＤ２の一覧を表示可能に構成されている。検索結果表示領域１０２は、検索結果に表示された構造化データＤ２の指定を受付可能に構成されている。 <Search result display area 102>
The search result display area 102 is configured to be able to display a list of structured data D2 that match the search conditions input in the search condition input area 101. The search result display area 102 is configured to be able to accept designation of the structured data D2 displayed in the search results.

<データセット追加領域１０３>
データセット追加領域１０３は、検索結果表示領域１０２にて指定された構造化データＤ２の編集を開始するためのＵＩであり、例えば、図示されるようなボタン型のオブジェクトである。データセット追加領域１０３は、ボタン型のオブジェクトに代えて、テキストのハイパーリンクが採用されてもよい。 <Dataset addition area 103>
The dataset addition area 103 is a UI for starting editing of the structured data D2 specified in the search result display area 102, and is, for example, a button-shaped object as illustrated. The data set addition area 103 may employ a text hyperlink instead of a button-shaped object.

<構造化データ編集領域１１>
構造化データ編集領域１１は、データセット追加領域１０３の操作によって開始された構造化データＤ２を編集可能に構成されている。構造化データ編集領域１１は、コマンド入力領域１１１と、コマンド記憶領域１１２と、コマンド実行領域１１３と、を含む。 <Structured data editing area 11>
The structured data editing area 11 is configured to be able to edit the structured data D2 started by operating the dataset addition area 103. The structured data editing area 11 includes a command input area 111, a command storage area 112, and a command execution area 113.

<コマンド入力領域１１１>
コマンド入力領域１１１は、構造化データＤ２に対する編集に関する入力を行うための領域であり、編集領域Ｒ１の一態様である。コマンド入力領域１１１は、ユーザからの編集に関する入力をコマンドとして受付可能に構成されている。コマンドは、例えば構造化データＤ２の選択、構造化データＤ２の要素の選択、構造化データＤ２の要素の変更、構造化データＤ２の要素の削除、複数の構造化データＤ２の結合など、任意の編集に関するものを含み得る。 <Command input area 111>
The command input area 111 is an area for inputting information related to editing the structured data D2, and is one aspect of the editing area R1. The command input area 111 is configured to be able to receive editing-related input from the user as a command. The command may be any command, such as selecting structured data D2, selecting an element of structured data D2, changing an element of structured data D2, deleting an element of structured data D2, or combining multiple structured data D2. May include those related to editing.

<コマンド記憶領域１１２>
コマンド記憶領域１１２は、コマンド入力領域１１１に入力されたコマンドを記憶するためのＵＩである。コマンド記憶領域１１２がユーザによって操作されることにより、当該コマンドのログが生成される。 <Command storage area 112>
The command storage area 112 is a UI for storing commands input into the command input area 111. When the command storage area 112 is operated by the user, a log of the command is generated.

<コマンド実行領域１１３>
コマンド実行領域１１３は、コマンド入力領域１１１に入力されたコマンドを実行するためのＵＩである。コマンド実行領域１１３が操作されることにより、第３の構造化データＤ２３が生成される。 <Command execution area 113>
The command execution area 113 is a UI for executing commands input into the command input area 111. Third structured data D23 is generated by operating the command execution area 113.

<編集結果表示領域１２>
編集結果表示領域１２は、コマンド実行領域１１３の操作によって編集された構造化データＤ２、すなわち第３の構造化データＤ２３に関する情報を表示可能に構成されている。本実施形態では、編集結果表示領域１２は、当該第３の構造化データＤ２３の要素を一覧可能に構成されている。特に、編集結果表示領域１２は、構造化データ編集領域１１と一覧可能な態様で表示されている。これにより、構造化データ編集領域１１による編集操作と当該編集操作の結果との対応関係が把握しやすくなる。そのため、ユーザにとっての利便性が向上する。 <Editing result display area 12>
The editing result display area 12 is configured to be able to display information regarding the structured data D2 edited by the operation of the command execution area 113, that is, the third structured data D23. In this embodiment, the editing result display area 12 is configured to be able to list the elements of the third structured data D23. In particular, the editing result display area 12 is displayed in such a manner that it can be viewed together with the structured data editing area 11. This makes it easier to understand the correspondence between the editing operations in the structured data editing area 11 and the results of the editing operations. Therefore, convenience for the user is improved.

<第２の保存操作領域１３>
第２の保存操作領域１３は、構造化データ編集領域１１に入力された編集操作によって生成された第３の構造化データＤ２３を保存するためのＵＩである。第２の保存操作領域１３が操作されることにより、生成された第３の構造化データＤ２３が第２の記憶領域ＤＢ２２に格納される。 <Second save operation area 13>
The second storage operation area 13 is a UI for saving the third structured data D23 generated by the editing operation input to the structured data editing area 11. By operating the second storage operation area 13, the generated third structured data D23 is stored in the second storage area DB22.

４．その他
上記情報処理の態様はあくまで一例であり、これに限られない。例えば、図５に示される表示処理（アクティビティＡ１４）は、結合処理（アクティビティＡ１３）に組み込まれてもよい。例えば、図８に示される構造化データ表示領域８に表される要素が、ユーザによって編集可能に構成されていてもよい。この場合、第２の構造化データＤ２２と第３の構造化データＤ２３との区別は不要である。また、編集処理は、ユーザの操作によって行われる結合処理を含んでもよい。 4. Others The above information processing mode is just an example, and is not limited to this. For example, the display process (activity A14) shown in FIG. 5 may be incorporated into the combination process (activity A13). For example, the elements represented in the structured data display area 8 shown in FIG. 8 may be configured to be editable by the user. In this case, there is no need to distinguish between the second structured data D22 and the third structured data D23. Further, the editing process may include a combining process performed by a user's operation.

情報処理システム１は、過去に実行された入力データＤ１に対する処理に基づいて、データフローを生成してもよい。データフローは、同一の情報源から取得される入力データＤ１に行われる変換処理、結合処理、及び編集処理の少なくとも１つを含む。これにより、同一の情報源から取得可能な入力データＤ１を構造化データＤ２に変換する際の手間が軽減される。当該データフローは、ユーザによって変更可能に構成されていてもよい。これにより、ユーザの入力データＤ１の取扱態様に応じて最適なデータフローを構築しやすくなる。 The information processing system 1 may generate a data flow based on a process performed on the input data D1 in the past. The data flow includes at least one of a conversion process, a combination process, and an editing process performed on the input data D1 obtained from the same information source. This reduces the effort required to convert input data D1 that can be obtained from the same information source into structured data D2. The data flow may be configured to be changeable by the user. This makes it easier to construct an optimal data flow according to the manner in which the user's input data D1 is handled.

情報処理システム１は、上記データフローに基づく処理を実行するタイミングを指定可能に構成されていてもよい。言い換えれば、情報処理システム１は、上記データフローを実行するためのスケジュールを指定可能に構成されていてもよい。これにより、当該処理を行う際にユーザがユーザ端末３を逐次操作する必要がなくなるため、データフローに基づく処理によって生成される構造化データＤ２の管理が容易となる。 The information processing system 1 may be configured to be able to specify the timing for executing processing based on the data flow. In other words, the information processing system 1 may be configured to be able to specify a schedule for executing the above data flow. This eliminates the need for the user to sequentially operate the user terminal 3 when performing the processing, making it easier to manage the structured data D2 generated by the processing based on the data flow.

特定部２３２は、指定されたカテゴリＣに属する変換処理の中から、入力データＤ１に対して実行する変換処理を特定しなくてもよい。例えば、特定部２３２は、入力データＤ１に含まれる非構造化の種類に基づき、実行可能な全変換処理のなかから入力データＤ１に対して実行する変換処理を特定してもよい。 The specifying unit 232 does not need to specify the conversion process to be performed on the input data D1 from among the conversion processes belonging to the specified category C. For example, the specifying unit 232 may specify a conversion process to be performed on the input data D1 from among all executable conversion processes based on the type of unstructured data included in the input data D1.

指定カテゴリＣ１、用途カテゴリＣ２、及び収集カテゴリＣ３の区別は、便宜的なものであり、これに限られない。例えば、指定カテゴリＣ１及び用途カテゴリＣ２の両方に属する変換処理が存在してもよい。 The distinction between the designated category C1, the usage category C2, and the collection category C3 is for convenience and is not limited to this. For example, there may be conversion processes that belong to both the designated category C1 and the usage category C2.

上記情報処理は、アクティビティＡ１１にて変換処理を特定し、アクティビティＡ１２にて特定された変換処理を実行することにより、構造化データＤ２（詳細には第１の構造化データＤ２１）を生成すればよい。したがって、上記情報処理は、アクティビティＡ１３の結合処理、アクティビティＡ１４の表示処理、アクティビティＡ１５の編集処理などを含んでいなくてもよい。なお、生成部２３３が当該変換処理を行うことによって第１の構造化データＤ２１を生成することは、生成部２３３を含む情報処理装置２が自ら変換処理を行うことによって第１の構造化データＤ２１を生成することに限られない。例えば、生成部２３３が当該変換処理を行うことによって第１の構造化データＤ２１を生成することは、情報処理装置２が特定された変換処理に関する情報を他のデバイスに送信することで、他のデバイスに当該変換処理を実行させることで第１の構造化データＤ２１を生成することを含む。 The above information processing specifies a conversion process in activity A11, and generates structured data D2 (specifically, first structured data D21) by executing the specified conversion process in activity A12. good. Therefore, the above-mentioned information processing does not need to include the combining process of the activity A13, the display process of the activity A14, the editing process of the activity A15, and the like. Note that generating the first structured data D21 by the generation unit 233 performing the conversion process means that the information processing device 2 including the generation unit 233 generates the first structured data D21 by performing the conversion process itself. It is not limited to generating. For example, generating the first structured data D21 by the generation unit 233 performing the conversion process means that the information processing device 2 transmits information regarding the specified conversion process to another device. This includes generating the first structured data D21 by causing the device to execute the conversion process.

情報処理装置２は、オンプレミス形態であってもよく、クラウド形態であってもよい。クラウド形態の情報処理装置２としては、例えば、ＳａａＳ（ＳｏｆｔｗａｒｅａｓａＳｅｒｖｉｃｅ）、クラウドコンピューティングという形態で、上述の機能や処理を提供してもよい。 The information processing device 2 may be in an on-premises form or may be in a cloud form. The cloud-based information processing device 2 may provide the above-mentioned functions and processing, for example, in the form of SaaS (Software as a Service) or cloud computing.

上記実施形態では、情報処理装置２が種々の記憶・制御を行ったが、情報処理装置２に代えて、複数の外部装置が用いられてもよい。すなわち、種々の情報やプログラムは、ブロックチェーン技術等を用いて複数の外部装置に分散して記憶されてもよい。 In the embodiment described above, the information processing device 2 performs various storage and control operations, but instead of the information processing device 2, a plurality of external devices may be used. That is, various information and programs may be distributed and stored in a plurality of external devices using blockchain technology or the like.

本実施形態の態様は、情報処理システム１に限定されず、情報処理方法であっても、情報処理プログラムであってもよい。情報処理方法は、情報処理システム１の各ステップを含む。情報処理プログラムは、少なくとも１つのコンピュータに、情報処理システム１の各ステップを実行させる。 Aspects of this embodiment are not limited to the information processing system 1, and may be an information processing method or an information processing program. The information processing method includes each step of the information processing system 1. The information processing program causes at least one computer to execute each step of the information processing system 1.

上記情報処理システム１等は、次に記載の各態様で提供されてもよい。 The information processing system 1 and the like may be provided in each of the following aspects.

（１）情報処理システムであって、次の各ステップがなされるようにプログラムを実行可能なプロセッサを備え、取得ステップでは、入力データを取得し、ここで、前記入力データは、複数種類の非構造化データのうちの少なくとも１つを含み、種類特定ステップでは、取得された前記入力データの形式に基づき、前記入力データに含まれる、少なくとも１つの前記非構造化データの種類を特定し、生成ステップでは、取得された前記入力データに対して、特定された前記非構造化データの種類に応じた変換処理を行うことにより、所定のデータ構造を有する第１の構造化データを生成する、もの。 (1) An information processing system, comprising a processor capable of executing a program to perform each of the following steps, and in the acquisition step, input data is acquired, where the input data is of a plurality of types. At least one of the structured data is included, and in the type identifying step, the type of at least one of the unstructured data included in the input data is identified and generated based on the format of the acquired input data. In the step, first structured data having a predetermined data structure is generated by performing a conversion process on the acquired input data according to the specified type of the unstructured data. .

このような構成によれば、情報処理システムが非構造化データから構造化データを生成する際に、入力データに含まれる非構造化データに応じて適切な変換処理が行われる。したがって、非構造化データの種類によって適切な変換処理が異なる場合であっても、非構造化データを構造化データに変換するに際し、ユーザに求められるデータサイエンスに関する専門性を緩和することができる。 According to such a configuration, when the information processing system generates structured data from unstructured data, appropriate conversion processing is performed according to the unstructured data included in the input data. Therefore, even if the appropriate conversion process differs depending on the type of unstructured data, it is possible to reduce the level of expertise required of the user in data science when converting unstructured data to structured data.

（２）上記（１）に記載の情報処理システムにおいて、さらに、候補特定ステップでは、特定された前記非構造化データの種類に基づき、前記入力データに対して行われる前記変換処理が属するカテゴリの候補を特定し、前記取得ステップでは、ユーザによる前記カテゴリの指定を取得し、さらに、処理特定ステップでは、前記指定に基づき、指定された前記カテゴリに属する前記変換処理を特定する、もの。 (2) In the information processing system according to (1) above, further, in the candidate identification step, a category to which the conversion process to be performed on the input data belongs is determined based on the identified type of the unstructured data. A candidate is specified, and in the acquisition step, the specification of the category by the user is obtained, and in the process specification step, the conversion process belonging to the specified category is specified based on the specification.

このような構成によれば、情報処理システムは、変換処理を特定する際にユーザによるカテゴリの指定を用いることで、ユーザの要求に即した適切な変換処理を特定しやすくなる。したがって、利便性の向上を図ることができる。 According to such a configuration, the information processing system uses the user's category designation when specifying the conversion process, thereby making it easier to specify the appropriate conversion process that meets the user's request. Therefore, convenience can be improved.

（３）上記（２）に記載の情報処理システムにおいて、前記カテゴリは、前記入力データの収集態様によって規定される収集カテゴリを含む、もの。 (3) In the information processing system according to (2) above, the category includes a collection category defined by a collection mode of the input data.

このような構成によれば、ユーザが入力データの収集態様を把握している場合、ユーザが当該入力データに応じて適切な収集カテゴリを指定することによって、変換処理の特定精度を向上させることができる。 According to such a configuration, when the user knows the collection mode of input data, the user can improve the identification accuracy of the conversion process by specifying an appropriate collection category according to the input data. can.

（４）上記（１）～（３）の何れか１つに記載の情報処理システムにおいて、前記生成ステップでは、生成された前記第１の構造化データが複数存在する場合、複数の前記第１の構造化データの少なくとも一部を結合することで第２の構造化データを生成する、もの。 (4) In the information processing system according to any one of (1) to (3) above, in the generation step, when there is a plurality of generated first structured data, a plurality of first structured data that generates second structured data by combining at least part of the structured data of the second structured data.

このような構成によれば、複数の第１の構造化データがまとめられるため、構造化データの管理負担が軽減される。 According to such a configuration, since the plurality of first structured data are grouped together, the burden of managing the structured data is reduced.

（５）上記（１）～（４）の何れか１つに記載の情報処理システムにおいて、さらに、表示処理ステップでは、前記構造化データの要素をユーザによって編集可能な編集領域を表示させる、もの。 (5) In the information processing system according to any one of (1) to (4) above, the display processing step further includes displaying an editing area in which the elements of the structured data can be edited by the user. .

このような構成によれば、非構造化データの変換によって生じ得るノイズを、構造化データの編集によって修正することが可能となる。したがって、利便性の向上を図ることができる。 According to such a configuration, it is possible to correct noise that may occur due to conversion of unstructured data by editing structured data. Therefore, convenience can be improved.

（６）上記（１）～（５）の何れか１つに記載の情報処理システムにおいて、前記入力データの形式は、前記入力データの拡張子を含む、もの。 (6) In the information processing system according to any one of (1) to (5) above, the format of the input data includes an extension of the input data.

このような構成によれば、入力データの拡張子という比較的小さい情報量から、非構造化データの種類が特定される。したがって、情報処理システムが種類特定ステップを実行する際の処理負荷を軽減することができる。 According to such a configuration, the type of unstructured data is specified from a relatively small amount of information called the extension of input data. Therefore, the processing load when the information processing system executes the type identification step can be reduced.

（７）上記（１）～（６）の何れか１つに記載の情報処理システムにおいて、前記非構造化データの種類は、画像、動画、音声、三次元空間データ、及び時系列データのうちの少なくとも１つを含む、もの。 (7) In the information processing system according to any one of (1) to (6) above, the type of unstructured data is one of images, videos, audio, three-dimensional spatial data, and time series data. A thing containing at least one of the following.

このような構成によれば、非構造化データの大半を占める入力データを構造化データに変換することができるため、さらなる利便性の向上を図ることができる。 According to such a configuration, input data, which accounts for most of the unstructured data, can be converted into structured data, so that it is possible to further improve convenience.

（８）情報処理方法であって、上記（１）～（７）の何れか１つに記載の情報処理システムの各ステップを含む、方法。 (8) An information processing method, the method comprising each step of the information processing system described in any one of (1) to (7) above.

（９）情報処理プログラムであって、少なくとも１つのコンピュータに、上記（１）～（７）の何れか１つに記載の情報処理システムの各ステップを実行させる、もの。
もちろん、この限りではない。 (9) An information processing program that causes at least one computer to execute each step of the information processing system described in any one of (1) to (7) above.
Of course, this is not the case.

さらに、以下の観点にも留意されたい。 Furthermore, please note the following points.

コンピュータの発明と普及、及びインターネットと通信技術の発明と発達により、大量のデータが蓄積され続け、世界の総データ量はこの１０年で１０倍以上になった。そのうえ、多くのデータベースが運用されるようになり、必要とするデータがどのデータベースに蓄積されているのかを知ることも困難になってきた。 With the invention and spread of computers, and the invention and development of the Internet and communication technology, large amounts of data have continued to accumulate, and the total amount of data in the world has increased more than tenfold in the past ten years. Furthermore, as many databases are being used, it has become difficult to know which database has the data you need.

増大したデータ活用に立ちはだかる別の困難も存在する。すなわち、構造化されていない多くのデータの存在である。ここで、構造化されていない非構造化データとは、形式と意味が定義されていない、あるいは定義が不完全なテキストデータ、例えば、音声データ、画像データ、動画データ、センサデータなどである。企業内データの８０％が非構造化データであると言われる。非構造化データの典型例として音声データが挙げられるが、これを意味あるデータとしてコンピュータで処理できるようにするためには、近年の発展が著しい音声認識、自然言語処理だけでなく、記号化したデータに対する意味付与が必要である。 There are other challenges to leveraging increased data. In other words, there is a lot of unstructured data. Here, unstructured data refers to text data whose format and meaning are undefined or incompletely defined, such as audio data, image data, video data, and sensor data. It is said that 80% of corporate data is unstructured data. Voice data is a typical example of unstructured data, but in order to process it as meaningful data on a computer, it is necessary to use not only voice recognition and natural language processing, which have been rapidly developed in recent years, but also symbolization. It is necessary to give meaning to data.

現時点では、データの収集、非構造化データからのデータ抽出、正規化、意味づけ、相互関連付け、分類など、データの意味に関わる情報処理技術は未成熟であり、専門家による作業を必要とする。 At present, information processing technologies related to the meaning of data, such as data collection, data extraction from unstructured data, normalization, meaning, correlation, and classification, are immature and require work by experts. .

このような現状に鑑み、本発明は、専門家に依存することなくデータの収集と非構造化データの構造化データへの変換を含む、データの収集・変換・加工・活用を行う情報処理装置、システム、方法及びプログラム、同プログラムが記憶された記録媒体を提供することを課題とする。 In view of the current situation, the present invention provides an information processing device that collects, converts, processes, and utilizes data, including collecting data and converting unstructured data into structured data without relying on experts. , a system, a method, a program, and a recording medium on which the program is stored.

本発明は、専門家の介在なしにインターネットでアクセス可能なあらゆるデータを収集・蓄積し、加工し、分析し、利用する技術を提供する。すなわち、本願発明の代表的な態様に係る情報処理方法は、外部データを含むデータへ接続されるステップと、データが収集されるステップと、データが変換されるステップと、データが前処理されるステップと、データが加工されるステップと、タスクが管理されるステップと、活用のためにデータが表示されるステップと、データの全体管理が含まれる統合化された技術が提供され、各ステップにおいて事業特性に応じたテンプレートが用いられることで専門家に頼らない実行が可能にされ、非構造化データの構造化データへの変換においては、ファイル拡張子に基づくデータ属性の推定、テキストマイニング、自然言語処理、画像解析、動画解析、アノテーションの付与、及びメタ情報の付与等が行われるステップとを備える。 The present invention provides technology for collecting, storing, processing, analyzing, and utilizing all data accessible on the Internet without the intervention of experts. That is, the information processing method according to a typical aspect of the present invention includes a step of connecting to data including external data, a step of collecting data, a step of converting the data, and a step of preprocessing the data. An integrated technology is provided that includes the steps in which data is processed, the steps in which tasks are managed, the steps in which data is displayed for use, and the overall management of data. The use of templates tailored to business characteristics enables execution without relying on experts, and in the conversion of unstructured data to structured data, data attribute estimation based on file extensions, text mining, natural The method includes steps in which language processing, image analysis, video analysis, annotation, meta information, and the like are performed.

本発明により、外部データを含むデータへの接続、データ収集、データ変換、データ前処理、データ加工、タスク管理、データ活用、データの全体管理を含む、統合化された技術が提供される。 The present invention provides an integrated technology that includes connection to data, including external data, data collection, data conversion, data preprocessing, data processing, task management, data utilization, and overall data management.

データ接続・収集は、各企業や組織が必要とするデータがどこにあり、どのように収集したら良いかを知るには専門知識が必要である。本発明では、これを誰でも行えるようにするために事業特性に応じた基盤テンプレートをあらかじめ用意し、これを本技術の利用者が選択することで社内・組織内及び社外・組織外のデータへの接続と収集を行う。さらに自動的な接続だけでなく、利用者の意図を反映させて接続されてもよい。 Connecting and collecting data requires specialized knowledge to know where the data that each company or organization needs is located and how to collect it. In the present invention, in order to enable anyone to do this, a base template is prepared in advance according to the business characteristics, and the user of this technology can select this template to transfer data within the company/organization and outside the company/outside the organization. Connect and collect information. Furthermore, the connection may be made not only automatically but also by reflecting the user's intention.

データの接続では、利用者が選択したテンプレート等に基づき、接続すべきデータに接続が行われる。このとき、明示的にデータベースやファイルの指定を行わなくてもよいが、提示されたリストから選択したりＵＲＬを入力するなど、利用者が明示的に指定することも本発明の範囲である。 When connecting data, data to be connected is connected based on a template selected by the user. At this time, it is not necessary to explicitly specify the database or file, but it is also within the scope of the present invention for the user to explicitly specify the database or file by selecting from a presented list or inputting a URL.

一般には、接続先には複数種類のデータが蓄積されている。この中からどのデータに接続するかも上記テンプレートに基づいて行う。さらに、リストから利用者が選択してもよい。 Generally, multiple types of data are stored at the connection destination. Which data to connect from among these is also determined based on the above template. Furthermore, the user may select from a list.

上記接続後に収集されるデータは、接続可能な社内・組織内データ、及び社外データのすべて又は一部を含む。さらに、構造化されているデータだけでなく、音声データや画像。動画データ、チャットデータに代表されるような非構造化データなどを含む。 The data collected after the above connection includes all or part of connectable internal/organizational data and external data. Moreover, not only structured data, but also audio data and images. Includes unstructured data such as video data and chat data.

データ変換では、テキストマイニング、自然言語処理、画像解析、動画解析、アノテーションの付与、メタ情報の付与等を行うがこれらに限られない。 Data conversion includes, but is not limited to, text mining, natural language processing, image analysis, video analysis, annotation, meta information, etc.

データ加工は、まずファイル拡張子を手掛かりにして、例えば、動画、音声、文書、センサなどのどれであるのかを推定し、分類する。次に、利用者が選択した前記テンプレートに基づいてデータ加工を行う。ここで、より確実正確な加工を実現するために、利用者が欲するデータが指定される方式も本発明の範囲である。 Data processing first uses the file extension as a clue to estimate and classify whether the file is a video, audio, document, sensor, etc. Next, data processing is performed based on the template selected by the user. Here, in order to realize more reliable and accurate processing, a system in which data desired by the user is specified is also within the scope of the present invention.

前記データ加工は、データの選択、結合、分割、集計、フィルタリングが含まれるが、これらに限らない。 The data processing includes, but is not limited to, data selection, combination, division, aggregation, and filtering.

データの選択では、収集と変換されたデータから移行の処理に必要なものを選択する。この選択はテンプレートで規定されて利用者が介在しない場合と、利用者が介在する形で行われる場合の両方及び／又はいずれかを含む。 Data selection selects the collected and transformed data needed for the migration process. This selection may be defined by the template without the user's intervention, or may be made with the user's intervention and/or either.

データの結合と分割では、選択されたデータにおいて、組み合わせるべきもの、あるいは分割すべきものの処理を行う。例えば、複数店舗の売り上げデータが選択された場合、それを地域ごとに結合（まとめ）たり、あるいは、一店舗のデータを時間や曜日に分割したりする。 In data combination and division, selected data is processed to be combined or divided. For example, if sales data for multiple stores is selected, it may be combined (organized) by region, or data for one store may be divided by time or day of the week.

データの結合は、複数店舗の売り上げを結合するような単純なものから、利用者の目的に合わせた複雑な論理式で結合されるものも含む。どのような結合方法が行われるかは、テンプレートで規定されて利用者が介在しない場合と介在する場合とを含む。 Data combinations range from simple data such as combining sales from multiple stores to data combinations using complex logical formulas tailored to the user's purpose. The type of connection method to be performed includes cases in which it is defined by the template and does not involve the user's intervention, and cases in which the user intervenes.

データのフィルタリングでは、特定の条件に合致するものだけを抽出する。ここでは、ひとまとまりのデータから、特定の条件に基づいて抽出することを含む。 Data filtering extracts only those that meet specific conditions. This includes extracting data from a set of data based on specific conditions.

上記のフィルタリングでは同じデータベースの中のデータフィールドによって構成される条件式に限らず、他のデータベースと組み合わせてフィルタリングを行う方法も本発明に含まれる。 The above-mentioned filtering is not limited to the conditional expressions formed by data fields in the same database, but the present invention also includes methods of filtering in combination with other databases.

一般に、社内外、組織内外を問わず、異なるデータベースから収集されたデータの形式は異なる。データの前処理では、以降の処理のためにこれらを整える。 Generally, data collected from different databases, whether internal or external to an organization, is in different formats. Data preprocessing prepares them for subsequent processing.

上記のような前処理は単純な例あるが、例えば自然言語では同じ意味に対して複数の表現が存在する。これらを意味的に同一であるとして処理するための複雑な前処理も含む。 The above preprocessing is a simple example, but in natural language, for example, there are multiple expressions for the same meaning. It also includes complex preprocessing to treat these as semantically the same.

データ活用では、上記のようにデータ接続、データ収集、データ変換、データ前処理、データ加工が行われたデータの表示と、それを利用者が活用して検討するためのシステムが含まれる。 Data utilization includes data connection, data collection, data conversion, data preprocessing, display of data that has been processed as described above, and a system for users to utilize and examine the data.

本情報処理装置では上記に加えて、タスク管理機能を有する。たとえばデータベースによっては更新サイクルやタイミングが存在するので、それに合わせたデータ収集が必要になるし、一方、データ活用においては利用者による定期的、定時的アクセルが存在する。このような本情報処理システムにおいて必要になる時間的要素を考慮して、タスク管理が本情報処理システムの管理を行う。 In addition to the above, this information processing device has a task management function. For example, some databases have update cycles and timings, so it is necessary to collect data in accordance with that, and on the other hand, when it comes to data utilization, there are periodic and timed accelerators by users. Task management manages the information processing system by taking into account the time factors required in the information processing system.

データの全体管理では、本情報処理システムが扱うデータとソフトウェアのセキュリティの管理、データのバックアップ、データの保護、アクセス制御、及び基盤運用を行う。 Overall data management involves managing the security of data and software handled by this information processing system, data backup, data protection, access control, and infrastructure operations.

本情報処理装置において、データへの接続、データ収集、データ変換、データ前処理、データ加工、タスク管理、データ活用、データの全体管理の各サブシステムにおいて、ＡＩや深層学習を含む機械学習の手段を用いることも本発明の範囲である。 In this information processing device, machine learning methods including AI and deep learning are used in each subsystem of data connection, data collection, data conversion, data preprocessing, data processing, task management, data utilization, and overall data management. It is also within the scope of the present invention to use

そこで、上記課題を解決するために、本発明の第１の態様に係る情報処理方法は、専門家の介在無しに、利用者の属性及び／又は目的に基づいて接続すべきデータベースが決定されるステップと、選択されたデータベースへの接続が行なわれるステップと、利用者の属性及び／又は目的に応じて接続されたデータベースからデータが収集されるステップと、収集されたデータが利用者の属性及び／又は目的に応じて変換されるステップと、変換されたデータが利用者の属性及び／又は目的に応じて前処理されるステップと、前処理されたデータが利用者の属性及び／又は目的に応じて加工されるステップと、加工されたデータが利用者の属性及び／又は目的に応じて表示されるステップとを備えることを特徴とする。 Therefore, in order to solve the above problems, an information processing method according to a first aspect of the present invention determines a database to be connected to based on the user's attributes and/or purpose without the intervention of an expert. a step in which a connection is made to the selected database; a step in which data is collected from the connected database according to the user's attributes and/or purpose; and a step in which the collected data is / or a step of converting the data according to the purpose; a step of preprocessing the converted data according to the user's attributes and/or the purpose; and a step of converting the preprocessed data according to the user's attributes and/or the purpose. The method is characterized by comprising a step of processing the data according to the user's attributes and/or a step of displaying the processed data according to the user's attributes and/or purpose.

本発明の第２の態様として、第１の態様において、前記データの全体管理が行われる機能と、前記ステップが管理されるタスク管理機能及び／又はスケジューリング機能とを含んでもよい構成をとることもできる。 As a second aspect of the present invention, the first aspect may include a function for overall management of the data, and a task management function and/or a scheduling function for managing the steps. can.

本発明の第３の態様として、第１の態様において、前記接続すべきデータベースが決定されるステップと、選択されたデータベースへの接続が行なわれるステップと、接続されたデータベースからデータが収集されるステップと、収集されたデータが変換されるステップと、変換されたデータが前処理されるステップと、前処理されたデータが加工されるステップと、加工されたデータが表示されるステップにおいて、形式と意味が定義された構造化データだけでなく、それらが定義されていない、あるいは定義が不完全なテキストデータ、音声データ、画像データ、動画データ、センサデータなどの原データを含む非構造化データを含んでもよい構成をとることもできる。 As a third aspect of the present invention, in the first aspect, the steps include determining the database to be connected, connecting to the selected database, and collecting data from the connected database. In the step, the step in which the collected data is converted, the step in which the converted data is preprocessed, the step in which the preprocessed data is processed, and the step in which the processed data is displayed, In addition to structured data with defined meanings, unstructured data includes raw data such as text data, audio data, image data, video data, and sensor data that have undefined or incomplete definitions. It is also possible to adopt a configuration that may include.

本発明の第４の態様として、第１の態様において、前記専門家の介在無しに利用者の属性及び／又は目的に基づく複数のステップを実行するために、あらかじめ用意されたテンプレートが用いられること、及び／又はテンプレートに対して利用者が修正を加えるようにしてもよい。 As a fourth aspect of the present invention, in the first aspect, a template prepared in advance is used to execute the plurality of steps based on the user's attributes and/or purpose without the intervention of the expert. , and/or the user may modify the template.

本発明の第５の態様として、第１の態様において、利用者の属性及び／又は目的に応じたテンプレート、及び／又はテンプレートに対して利用者が修正を加えることにより、利用者による明示的なデータベースの指定を行うことなくデータベースへの接続と必要なデータ収集が行われるようにしてもよい。 As a fifth aspect of the present invention, in the first aspect, by the user making modifications to the template and/or template according to the user's attributes and/or purpose, Connection to the database and necessary data collection may be performed without specifying the database.

本発明の第６の態様として、第１の態様において、利用者の属性及び／又は目的に応じたテンプレートにより、及び／又はテンプレートに対して利用者が修正を加えることにより、利用者による明示的な指定なしにテキストマイニング、自然言語処理、画像解析、動画解析、アノテーションの付与、メタ情報の付与を含むデータの変換が行われるようにしてもよい。 As a sixth aspect of the present invention, in the first aspect, the user can explicitly specify the Data conversion including text mining, natural language processing, image analysis, video analysis, addition of annotations, and addition of meta information may be performed without any specification.

本発明の第７の態様として、第１の態様において、ファイル拡張子を手掛かりにして、データの種類、例えば動画、音声、文書、センサなどのどれであるのかを推定すること、及び／又は利用者が選択した前記テンプレートに基づき、及び／又はテンプレートに対して利用者が修正を加えてデータの前処理が行われるようにしてもよい。 As a seventh aspect of the present invention, in the first aspect, the type of data, for example, video, audio, document, sensor, etc., is estimated and/or used based on the file extension. The data may be pre-processed based on the template selected by the user and/or by the user making modifications to the template.

本発明の第８の態様として、第１の態様において、利用者の属性及び／又は目的に応じたテンプレートにより、及び／又はテンプレートに対して利用者が修正を加えることにより、利用者による明示的な指定なしにデータの選択、結合、分割、集計、フィルタリングを含むデータの加工が行われるようにしてもよい。 As an eighth aspect of the present invention, in the first aspect, the user can explicitly specify the Data processing including data selection, combination, division, aggregation, and filtering may be performed without any specification.

本発明の第９の態様として、第１の態様において、利用者の属性及び／又は目的に応じたテンプレートにより、利用者による明示的な指定なしに、構造化データか非構造化データかを問わず、収集、変換、前処理、加工されたデータが表示されるようにしてもよい。 As a ninth aspect of the present invention, in the first aspect, it is possible to determine whether the data is structured data or unstructured data without explicit designation by the user, using a template according to the user's attributes and/or purpose. Alternatively, collected, converted, preprocessed, and processed data may be displayed.

上記課題を解決するために、本発明の第１０の態様に係るプログラムは、コンピュータを、専門家の介在無しに、利用者の属性及び／又は目的に基づいて接続すべきデータベースを決定する接続データベース決定部と、前記選択されたデータベースへの接続を行う接続部と、利用者の属性及び／又は目的に応じて接続されたデータベースからデータを収集するデータ収集部と、前記収集されたデータを利用者の属性及び／又は目的に応じて変換するデータ変換部と、前記変換されたデータを利用者の属性及び／又は目的に応じて前処理するデータ前処理部と、前記前処理されたデータを利用者の属性及び／又は目的に応じて加工するデータ加工部と、前記加工されたデータを利用者の属性及び／又は目的に応じて表示する表示部と、データの全体管理を行う全体管理部と、タスク管理部及び／又はスケジューリング部と、として機能させることを特徴とする。 In order to solve the above problems, a program according to a tenth aspect of the present invention provides a connection database that determines a database to which a computer should be connected based on user attributes and/or purpose without the intervention of an expert. a determination unit, a connection unit that connects to the selected database, a data collection unit that collects data from the connected database according to user attributes and/or purposes, and uses the collected data. a data conversion unit that converts the converted data according to the user's attributes and/or purpose; a data preprocessing unit that preprocesses the converted data according to the user's attributes and/or purpose; A data processing unit that processes the data according to the user's attributes and/or purpose, a display unit that displays the processed data according to the user's attribute and/or purpose, and an overall management unit that performs overall data management. and a task management unit and/or a scheduling unit.

本発明の第１１の態様として、第１０の態様において、前記接続データベース決定部、前記接続部、前記データ収集部、前記データ変換部、前記データ前処理部、前記データ加工部、前記表示部のうちのいずれか少なくとも一つにおいて、ＡＩや深層学習を含む機械学習の手段が用いられるようにしてもよい。 As an eleventh aspect of the present invention, in the tenth aspect, the connection database determination section, the connection section, the data collection section, the data conversion section, the data preprocessing section, the data processing section, and the display section. In at least one of them, machine learning means including AI and deep learning may be used.

本発明の第１２の態様として、第１０もしくは第１１の態様に係るプログラムが記憶された記録媒体として実現してもよい。 A twelfth aspect of the present invention may be realized as a recording medium in which the program according to the tenth or eleventh aspect is stored.

上記課題を解決するために、本発明の第１３の態様に係る情報処理システムは、専門家の介在無しに、利用者の属性及び／又は目的に基づいて接続すべきデータベースを決定する接続データベース決定部と、前記選択されたデータベースへの接続を行う接続部と、利用者の属性及び／又は目的に応じて接続されたデータベースからデータを収集するデータ収集部と、前記収集されたデータを利用者の属性及び／又は目的に応じて変換するデータ変換部と、前記変換されたデータを利用者の属性及び／又は目的に応じて前処理するデータ前処理部と、前記前処理されたデータを利用者の属性及び／又は目的に応じて加工するデータ加工部と、前記加工されたデータを利用者の属性及び／又は目的に応じて表示する表示部とを備えることを特徴とする。 In order to solve the above problems, an information processing system according to a thirteenth aspect of the present invention provides a connection database determination system that determines a database to be connected to based on the user's attributes and/or purpose without the intervention of an expert. a connection unit that connects to the selected database; a data collection unit that collects data from the connected database according to user attributes and/or purposes; a data conversion unit that converts the data according to the attributes and/or purpose of the user; a data preprocessing unit that preprocesses the converted data according to the attributes and/or purpose of the user; and a data preprocessing unit that uses the preprocessed data. The present invention is characterized by comprising a data processing section that processes the data according to the user's attributes and/or purpose, and a display section that displays the processed data according to the user's attribute and/or purpose.

本発明の各態様によれば、専門家に頼ることなくデータへの接続、データ収集、データ変換、データ前処理、データ加工、タスク管理、データ活用、データの全体管理が行われ、非構造化データを含む多種多様のデータを利用できる情報処理システムが実現される。 According to each aspect of the present invention, connection to data, data collection, data conversion, data preprocessing, data processing, task management, data utilization, and overall data management are performed without relying on experts, and data is unstructured. An information processing system that can utilize a wide variety of data including data is realized.

最後に、本開示に係る種々の実施形態を説明したが、これらは、例として提示したものであり、発明の範囲を限定することは意図していない。当該新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。当該実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 Finally, although various embodiments according to the present disclosure have been described, these are presented as examples and are not intended to limit the scope of the invention. The new embodiment can be implemented in various other forms, and various omissions, substitutions, and changes can be made without departing from the gist of the invention. The embodiment and its modifications are included within the scope and gist of the invention, and are included within the scope of the invention described in the claims and its equivalents.

１：情報処理システム
２：情報処理装置
３：ユーザ端末
４：第１のウィンドウ
５：第２のウィンドウ
６：カテゴリ指定領域
７：第２の入力データ表示領域
８：構造化データ表示領域
９：第１の保存操作領域
１０：構造化データ検索領域
１１：構造化データ編集領域
１２：編集結果表示領域
１３：第２の保存操作領域
２０：通信バス
２１：通信部
２２：記憶部
２３：プロセッサ
３０：通信バス
３１：通信部
３２：記憶部
３３：プロセッサ
３４：表示部
３５：入力部
４１：入力データ検索領域
４２：入力データリスト表示領域
４３：第１の操作領域
５１：形式指定領域
５２：第１の入力データ表示領域
５３：第２の操作領域
８１：第１の領域
８２：第２の領域
１０１：検索条件入力領域
１０２：検索結果表示領域
１０３：データセット追加領域
１１１：コマンド入力領域
１１２：コマンド記憶領域
１１３：コマンド実行領域
２３１：取得部
２３２：特定部
２３３：生成部
２３４：表示処理部
Ａ１：アクティビティ
Ａ２：アクティビティ
Ａ３：アクティビティ
Ａ４：アクティビティ
Ａ５：アクティビティ
Ａ６：アクティビティ
Ａ７：アクティビティ
Ａ８：アクティビティ
Ａ９：アクティビティ
Ａ１０：アクティビティ
Ａ１１：アクティビティ
Ａ１２：アクティビティ
Ａ１３：アクティビティ
Ａ１４：アクティビティ
Ａ１５：アクティビティ
Ｃ：カテゴリ
Ｃ１：指定カテゴリ
Ｃ２：用途カテゴリ
Ｃ３：収集カテゴリ
Ｄ１：入力データ
Ｄ２：構造化データ
Ｄ２１：第１の構造化データ
Ｄ２２：第２の構造化データ
Ｄ２３：第３の構造化データ
ＤＢ１：第１のデータベース
ＤＢ２：第２のデータベース
ＤＢ２１：第１の記憶領域
ＤＢ２２：第２の記憶領域
ＩＭ１：第１の画像
ＩＭ２：第２の画像
ＩＭ３：第３の画像
ＩＭ４：第４の画像
Ｌ１：強調表示
Ｒ１：編集領域 1: Information processing system 2: Information processing device 3: User terminal 4: First window 5: Second window 6: Category specification area 7: Second input data display area 8: Structured data display area 9: First window 1 storage operation area 10: structured data search area 11: structured data editing area 12: editing result display area 13: second storage operation area 20: communication bus 21: communication unit 22: storage unit 23: processor 30: Communication bus 31: Communication section 32: Storage section 33: Processor 34: Display section 35: Input section 41: Input data search area 42: Input data list display area 43: First operation area 51: Format specification area 52: First Input data display area 53 : Second operation area 81 : First area 82 : Second area 101 : Search condition input area 102 : Search result display area 103 : Data set addition area 111 : Command input area 112 : Command Storage area 113: Command execution area 231: Acquisition unit 232: Specification unit 233: Generation unit 234: Display processing unit A1: Activity A2: Activity A3: Activity A4: Activity A5: Activity A6: Activity A7: Activity A8: Activity A9: Activity A10 : Activity A11 : Activity A12 : Activity A13 : Activity A14 : Activity A15 : Activity C : Category C1 : Specified category C2 : Usage category C3 : Collection category D1 : Input data D2 : Structured data D21 : First structured Data D22: Second structured data D23: Third structured data DB1: First database DB2: Second database DB21: First storage area DB22: Second storage area IM1: First image IM2 :Second image IM3 :Third image IM4 :Fourth image L1 :Highlight R1 :Edit area

Claims

An information processing system,
Equipped with a processor capable of executing a program to perform each of the following steps,
In the acquisition step, input data is acquired, where the input data includes at least one of a plurality of types of unstructured data,
The type identifying step identifies at least one type of the unstructured data included in the input data based on the format of the acquired input data;
In the generation step, first structured data having a predetermined data structure is generated by performing a conversion process on the acquired input data according to the specified type of the unstructured data. thing.

The information processing system according to claim 1,
Furthermore, in the candidate identification step, based on the identified type of the unstructured data, identify candidates for the category to which the conversion process to be performed on the input data belongs;
In the acquisition step, the user's specification of the category is acquired;
Furthermore, in the process specifying step, the conversion process belonging to the specified category is specified based on the specification.

The information processing system according to claim 2,
The category includes a collection category defined by a collection mode of the input data.

The information processing system according to claim 1,
In the generation step, when there is a plurality of generated first structured data, second structured data is generated by combining at least a part of the plurality of first structured data. .

The information processing system according to claim 1,
Furthermore, the display processing step displays an editing area in which the elements of the structured data can be edited by the user.

The information processing system according to claim 1,
The format of the input data includes an extension of the input data.

The information processing system according to claim 1,
The type of unstructured data includes at least one of images, moving images, audio, three-dimensional spatial data, and time series data.

An information processing method,
A method comprising each step of the information processing system according to any one of claims 1 to 7.

An information processing program,
A device that causes at least one computer to execute each step of the information processing system according to any one of claims 1 to 7.