TW201738776A

TW201738776A - Real-time streaming record data analysis system and method using a computed result obtained by a computing process as the system and method for increasing the future computing efficiency

Info

Publication number: TW201738776A
Application number: TW105112081A
Authority: TW
Inventors: Chi-Hua Chen; Hsin-Han Shie; Jia-Hong Lin; Ta-Sheng Kuan; Ya-Ting Yang; Chia-Min Hsieh
Original assignee: Chunghwa Telecom Co Ltd
Priority date: 2016-04-19
Filing date: 2016-04-19
Publication date: 2017-11-01
Also published as: CN107305583B; TWI636369B; CN107305583A

Abstract

This invention relates to a real-time streaming record data analysis system and method. The real-time streaming record data analysis system is composed of a plurality of user devices, a plurality of online webpage servers, a plurality of online database servers, a plurality of manager devices, a recorded data collecting device, a plurality of distributed databases, a recorded data analysis module, a data mining master module, a distributed computing device, a rapid access database and a combined node device, wherein the recorded data collecting device, the plurality of distributed databases, the recorded data analysis module, the data mining master module, the distributed computing device, the rapid access database and the combined node device are arranged in the system. By using the real-time streaming record data analysis system, recorded streaming data can be distributively subjected to synchronous computing by different mining modules after being encrypted or not encrypted, and a decrypted analysis result is produced and known by a manager, and even more, a computed result obtained by a computing process can be used as the system and method for increasing the future computing efficiency.

Description

Instant streaming record data analysis system and method

本發明有關於一種即時串流紀錄資料分析系統與方法。 The invention relates to an instant stream recording data analysis system and method.

關於串流資料的紀錄或分析系統與方法，已存在若干種習知技術，然而其各自有其力有未逮之短處。 There are a number of conventional techniques for recording or analyzing systems and methods for streaming data, but each has its own strengths and weaknesses.

首先，目前已存在一種跨層日誌記錄追蹤系統與方法可以取得不同的日誌資料來源，紀錄大量的日誌資料和存取軌跡，並且作為犯罪追查之用途。然而，此方法雖然可可以紀錄日誌資料，但卻無法對紀錄進行分析，以及產製分析結果供管理者參考。 First, there is a cross-layer log record tracking system and method that can obtain different log data sources, record a large amount of log data and access trajectories, and serve as a crime tracing. However, although this method can record log data, it cannot analyze the records, and the production analysis results are for the administrator's reference.

關於日誌資料記錄方法，係一種即時針對指定影像進行壓縮和儲存日誌資料之方法；然而，其雖然可以紀錄日誌資料，亦無法對紀錄進行分析及產製分析結果。 The log data recording method is a method for instantly compressing and storing log data for a specified image; however, although it is possible to record log data, it is impossible to analyze the record and produce the analysis result.

最後，現存一種計量通訊網路流量方法，係可以側錄方式紀錄封包，並適應不同種類之封包進行紀錄，但相同的，分析及產製分析結果供管理者參考同樣是無法透過此方法達成。 Finally, there is a method for measuring communication network traffic, which can record packets in a side-by-side manner and adapt to different types of packets for recording. However, the same analysis and production analysis results are also not available to the administrator.

故應可知提供一種可以分析且產製結果，並非僅有紀錄的即時串流資料系統，是此領域所急需的技術。 Therefore, it should be known that providing an instant streaming data system that can analyze and produce results, not just records, is a technology that is urgently needed in this field.

本發明提出一種即時串流紀錄資料分析系統係由系統外部的複數個使用者設備、複數個線上網頁伺服器、複數個線上資料庫伺服器以及複數個管理者設備，以及本發明之系統內部的一紀錄資料蒐集裝置、複數個分散式資料庫、一紀錄資料分析模組、一資料探勘主模組、一分散式運算裝置、一快取資料庫、以及一組合節點裝置構成。 The present invention provides an instant stream record data analysis system consisting of a plurality of user devices external to the system, a plurality of online web server servers, a plurality of online database servers, and a plurality of manager devices, and the system of the present invention. A record data collecting device, a plurality of distributed data bases, a record data analysis module, a data exploration main module, a distributed computing device, a cache database, and a combined node device.

複數外部使用者設備，使用者可操作這些使用者設備連線至線上網頁伺服器，以向線上網頁伺服器要求網路服務和相關資訊與應用。 The plurality of external user devices can be operated by the user to connect to the online web server to request the web service and related information and applications from the online web server.

複數外部線上網頁伺服器，主要依使用者設備傳送的網路服務要求向線上資料庫伺服器存取所需相關資料後提供網路服務和相關資訊給外部使用者設備，其亦針對所有網路服務要求進行紀錄並將網路服務使用紀錄傳送至紀錄資料處理設備進行解析和儲存。 The plurality of external online web servers mainly provide network services and related information to external user devices by accessing relevant data to the online database server according to the network service requirements transmitted by the user equipment, and are also applicable to all network devices. The service requires a record and the network service usage record is transmitted to the record data processing device for resolution and storage.

複數外部線上資料庫伺服器設備：線上資料庫伺服器設備可接收線上網頁伺服器設備的資料庫操作要求，並依其要求回覆相關資訊。並且針對每個資料庫操作要求進行紀錄，並可將資料庫操作紀錄傳送至紀錄資料蒐集裝置進行解析和儲存。 Multiple external online database server devices: Online database server devices can receive database operation requirements for online web server devices and respond to relevant information as required. And for each database operation requirements record, and the database operation record can be transmitted to the record data collection device for analysis and storage.

一紀錄資料蒐集裝置，用以解析來自線上網頁伺服器和線上資料庫伺服器的紀錄並依其紀錄之格式進行解析後分別儲存至分散式資料庫中，紀錄資料蒐集裝置亦可選擇性的具備加密功能，其包含有至少一私密金鑰、至少一公開金鑰、至少一任意整數值，即係解析後再透過上述三者以對資料進行加密後再儲存至分散式資料庫中。 A record data collection device for parsing records from online web server and online database server and parsing them according to the format of the records and storing them in a decentralized database. The data collection device can also be selectively provided. The encryption function includes at least one private key, at least one public key, and at least one arbitrary integer value, that is, after parsing and then transmitting the above three pairs The data is encrypted and stored in a decentralized database.

複數分散式資料庫主要可儲存經解析後的網路服務使用紀錄和資料庫操作紀錄，當分散式運算裝置進行分散式運算和紀錄分析時，提供紀錄資料供分散式運算裝置運算。 The plurality of decentralized databases can store the parsed network service usage records and database operation records. When the distributed computing device performs decentralized operations and record analysis, the record data is provided for the decentralized computing device operations.

複數外部管理者設備，管理者操作管理者設備連線至紀錄資料分析模組，並經紀錄資料分析模組連線至資料探勘主模組以選擇適合的探勘子模組，再指派予分散式運算裝置進行運算。 A plurality of external manager devices, the manager operation manager device is connected to the record data analysis module, and the record data analysis module is connected to the data exploration main module to select a suitable exploration sub-module, and then assigned to the distributed model. The arithmetic unit performs calculations.

一紀錄資料分析模組，由管理者手動或自動連線至資料主探勘模組並選擇其下適合的探勘子模組，再指派予分散式運算裝置進行運算，紀錄資料分析模組並可向組合節點裝置取得運算結果。 A record data analysis module is manually or automatically connected to the data master exploration module and selects the appropriate exploration sub-module, and then assigned to the distributed computing device for calculation, recording the data analysis module and The combined node device obtains the operation result.

一資料主探勘模組，可包含複數個探勘子模組以供分散式運算裝置進行運算和分析。 A data master exploration module may include a plurality of exploration sub-modules for operation and analysis by a distributed computing device.

複數個分散式運算裝置，可向分散式資料庫取得紀錄資料，並依選定的資料探勘模組進行指派任務給複數個節點設備和分散式運算模組，分別進行運算和分析，分散式運算裝置並可將計算結果暫存於快取資料庫，分散式運算裝置亦可選擇性的具有密文計算之功能，用以對密文資料進行運算。 A plurality of distributed computing devices can obtain record data from the distributed database, and assign tasks to a plurality of node devices and distributed computing modules according to the selected data mining module, respectively performing operations and analysis, and decentralized computing devices The calculation result can be temporarily stored in the cache database, and the distributed computing device can also selectively perform the function of ciphertext calculation for calculating the ciphertext data.

複數個快取資料庫，用以儲存分散式運算裝置暫存各個紀錄資料的要求分析運算結果或相關參數，以作日後加速運算應用。 A plurality of cache databases are configured to store the required analysis results or related parameters of the discrete computing device for temporarily storing the respective record data for later acceleration calculation applications.

一組合節點裝置，用以擷取分散式運算裝置各個運算結果以進行整合和分析，再將分析結果回傳予紀錄資料分析模組，相同地，組合節點裝置亦可額外配備解密功能，其具備有與紀錄資料蒐集裝置相對應的各該私密金鑰、該公開金鑰以及該任意整數值，才可對經分散式運算裝置密文計算得出之結果進行解密，解密後提供明文資料回傳至紀錄資料分析模組。 A combined node device for extracting each operation result of the distributed computing device for integration and analysis, and then transmitting the analysis result back to the record data The analysis module, in the same way, the combined node device may additionally be provided with a decryption function, which is provided with each of the private key corresponding to the record data collecting device, the public key and the arbitrary integer value, so as to be distributed The result obtained by the ciphertext calculation of the computing device is decrypted, and the plaintext data is sent back to the record data analysis module after decryption.

本發明之即時串流紀錄資料分析方法，其主要步驟包含下述之六步驟。 The method for analyzing the real-time streaming record data of the present invention comprises the following six steps.

紀錄線上資料步驟：一紀錄資料蒐集裝置將自外部線上網頁伺服器和外部線上資料庫伺服器中蒐集並儲存外部複數使用者發出之網路服務要求以及回覆紀錄。 Recording online data steps: A record data collection device will collect and store network service requests and reply records from external online users from external online web servers and external online database servers.

存入分散式資料庫步驟：紀錄資料蒐集裝置將前述網路服務要求以及回覆紀錄儲存至複數分散式資料庫。 The process of depositing into a decentralized database: The record data collection device stores the aforementioned network service requirements and reply records to a plurality of decentralized databases.

選擇資料探勘模組步驟：外部管理者與一紀錄資料分析模組連結，以經由該紀錄資料分析模組對一資料探勘主模組進行請求自複數探勘子模組中選擇以使用。 Step of selecting the data exploration module: The external manager is connected with a record data analysis module to select and use the data search module from the plurality of data search modules through the record data analysis module.

指派工作予分散式運算裝置步驟：該資料探勘探勘主模組將按照外部管理者選擇的探勘子模組以指派複數分散式運算裝置對使用者網路服務要求與回覆紀錄進行運算。 The assignment of the work to the distributed computing device step: the data exploration and exploration master module will operate the user's network service request and the reply record according to the exploration sub-module selected by the external manager to assign the plurality of distributed computing devices.

暫存至快取資料庫裝置步驟：各該分散式運算裝置產出之運算結果將被暫存至一快取資料庫，以備未來分析使用。 The process of temporarily storing to the cache database device: the calculation results produced by each of the distributed computing devices will be temporarily stored in a cache database for future analysis and use.

回傳與顯示運算結果：與各該分散式運算裝置連結之一組合節點裝置將運算結果整合成一分析結果，並透過該紀錄資料分析模組傳輸至外部管理者設備以呈現該分析結果予外部使用者。 The result of the return and display operation: one of the combined node devices connected to each of the distributed computing devices integrates the operation result into an analysis result, and transmits the result to the external manager device through the record data analysis module to present the analysis result to external use. By.

而本發明之即時串流紀錄資料分析方法亦具備加解密之功能，其中，該紀錄資料蒐集裝置將通過至少一私密金鑰、一公開金鑰以及一任意整數值來對網路服務要求以及回覆紀錄資料進行加密，而各該分散式運算裝置則直接對密文狀態之網路服務要求以及回覆紀錄資料進行計算並產生運算結果，該組合節點裝置具備相對應之各該私密金鑰、該公開金鑰以及該任意整數值對該分析結果進行解密提供給外部管理者。 The method for analyzing real-time streaming data of the present invention also has The function of encrypting and decrypting, wherein the record data collecting device encrypts the network service request and the reply record data by using at least one private key, a public key and an arbitrary integer value, and each of the distributed computing devices Directly calculating the network service request and reply record data of the ciphertext state, and generating the operation result, the combined node device having the corresponding private key, the public key and the arbitrary integer value to decrypt the analysis result Provided to external managers.

綜上所述，可知本發明之即時串流紀錄資料分析系統與方法可即時紀錄及解析線上網路服務要求紀錄資料和資料庫操作紀錄資料，亦可透過選擇不同的資料探勘子模組以進行紀錄資料分析。 In summary, it can be seen that the real-time streaming data analysis system and method of the present invention can instantly record and analyze online network service request record data and database operation record data, and can also select different data exploration sub-modules. Record data analysis.

本發明更結合分散式運算裝置和分散式資料庫，以即時進行分散式運算，而額外的快取資料庫，則提供暫存分散式運算裝置的運算結果和相關參數，以提升往後之分析效率。 The invention further combines a distributed computing device and a distributed database to perform distributed computing in an instant, and an additional cache database provides calculation results and related parameters of the temporary distributed computing device to enhance the subsequent analysis. effectiveness.

且本發明可選擇性的利用加解密方法，以將資料加密後再儲存至分散式資料庫，使資料安全性得以確保，並且亦可以於資料處於密文狀態下進行運算，令運算效率更提升。 Moreover, the present invention can selectively utilize the encryption and decryption method to encrypt the data and store it in the distributed database, so that the data security can be ensured, and the data can be operated in the ciphertext state, so that the operation efficiency is improved. .

101‧‧‧使用者設備 101‧‧‧User equipment

102‧‧‧線上網頁伺服器 102‧‧‧Online web server

103‧‧‧線上資料庫伺服器 103‧‧‧Online database server

104‧‧‧管理者設備 104‧‧‧Manager equipment

105‧‧‧紀錄資料蒐集裝置 105‧‧‧Record data collection device

106‧‧‧分散式資料庫 106‧‧‧Distributed database

107‧‧‧紀錄資料分析模組 107‧‧‧Record data analysis module

108‧‧‧資料探勘主模組 108‧‧‧Data exploration main module

109‧‧‧分散式運算裝置 109‧‧‧Distributed computing device

110‧‧‧快取資料庫 110‧‧‧Cache Database

111‧‧‧組合節點裝置 111‧‧‧Combined node device

1081‧‧‧最近鄰居探勘子模組 1081‧‧‧Recent neighbor exploration sub-module

1082‧‧‧多元加權線性迴歸探勘子模組 1082‧‧‧Multivariate weighted linear regression exploration sub-module

S201~S208‧‧‧方法步驟 S201~S208‧‧‧ method steps

圖1為本發明即時串流紀錄資料分析系統架構圖。 1 is a structural diagram of an instant stream recording data analysis system of the present invention.

圖1為本發明即時串流紀錄資料含加解密分析之方法的方法步驟圖。 FIG. 1 is a schematic diagram showing the steps of a method for encrypting and analyzing an instant stream record data according to the present invention.

以下將以實施例結合圖式對本發明進行進一步說明，有鑒於對即時而大量資料運算以及分析之需求提升，本發明即提供一種即時串流紀錄資料分析系統，本發明之運作所需整體系統之架構如下，如圖1所示：系統外部的複數個使用者設備101、複數個線上網頁伺服器102、複數個線上資料庫伺服器103以及複數個管理者設備104，以及本發明之系統內部的一紀錄資料蒐集裝置105、複數個分散式資料庫106、一紀錄資料分析模組107、一資料探勘主模組108、複數分散式運算裝置109、一快取資料庫110、以及一組合節點裝置111組成，而資料探勘主模組108更可包含一最近鄰居探勘子模組1081以及一多元加權線性迴歸探勘子模組1082。 The present invention will be further described in the following embodiments in conjunction with the drawings. In view of the increasing demand for data processing and analysis in real time, the present invention provides an instant stream recording data analysis system, and the overall system required for the operation of the present invention. The architecture is as follows, as shown in FIG. 1: a plurality of user devices 101 outside the system, a plurality of online web server 102, a plurality of online database servers 103, and a plurality of manager devices 104, and the system of the present invention. a record data collecting device 105, a plurality of distributed data banks 106, a record data analysis module 107, a data search main module 108, a plurality of distributed computing devices 109, a cache database 110, and a combined node device The data exploration main module 108 further includes a nearest neighbor exploration sub-module 1081 and a multi-weighted linear regression exploration sub-module 1082.

其中，本發明之紀錄資料蒐集裝置係可運用Splunk或Logstash等工具實施，其係用以收集線上網頁伺服器和線上資料庫伺服器傳輸而來之紀錄資料。 The record data collecting device of the present invention can be implemented by using tools such as Splunk or Logstash, which are used to collect the recorded data transmitted by the online web server and the online database server.

而本發明之分散式資料庫可結合NoSQL基礎之HBase或MongoDB進行開發，其功能係為可較永久的儲存紀錄資料蒐集裝置傳來的相關紀錄資料。 The decentralized database of the present invention can be developed in combination with NoBase-based HBase or MongoDB, and its function is to record related data from a permanent record storage device.

而本發明分散式運算裝置係運用Hadoop或MongoDB中之MapReduce開發模型以高效地將資料進行分割以及合併，可大幅提升運算分析速度。 The distributed computing device of the present invention uses the MapReduce development model in Hadoop or MongoDB to efficiently segment and merge data, which can greatly improve the speed of computation and analysis.

另外，本發明更設置有快取資料庫，用以因應即時大量資料運算的負載需求，可同時蒐集各異質資料來源且進行即時分析和運算，並可避免重覆運算並將資料回饋予外部管理者。 In addition, the present invention further provides a cache database for collecting the heterogeneous data sources and performing real-time analysis and calculation in response to the load demand of the instantaneous large amount of data operations, and avoiding repeated operations and feeding back the data to the external management. By.

更詳細來說，前述本發明之即時串流紀錄資料分析系統各部分可解釋如下：外部複數使用者設備：所述使用者設備得為個人電腦、平板、智慧型手機、個人數位助理、車載設備等電子裝置，該些裝置可運用設置於其上之的瀏覽器元件(如Internet Explorer、Chrome、Firefox、Safari等瀏覽器)或其他可與網路連線之應用程式以連線至線上網頁伺服器，並透過線上網頁伺服器提出網路服務要求和資訊應用需求。 More specifically, the foregoing instant stream recording data of the present invention is divided into The various parts of the system can be explained as follows: external plural user equipment: the user equipment can be an electronic device such as a personal computer, a tablet, a smart phone, a personal digital assistant, an in-vehicle device, etc., and the devices can be used on the device. Browser components (such as Internet Explorer, Chrome, Firefox, Safari, etc.) or other web-connected applications to connect to online web servers and request web services via online web servers and Information application needs.

外部複數線上網頁伺服器：該些線上網頁伺服器可透過微軟Internet Information Services(IIS)、Apache等網頁伺服器元件進行開發，以架設提供多種網路服務功能之網頁提供外部使用者進行操作，線上網頁伺服器可依外部使用者設備傳送來的網路服務要求和資訊應用需求以向線上資料庫伺服器獲取被要求的資料後，再提供相對應的網路服務和資訊至外部使用者設備，而線上網頁伺服器亦會紀錄每個網路服務要求，可依據伺服器元件分別儲存網路服務使用紀錄資料(如IIS Log或Apache Log)且將網路服務使用紀錄資料傳送至紀錄資料蒐集裝置進行解析並儲存。 External multiple online web server: These online web servers can be developed through web server components such as Microsoft Internet Information Services (IIS) and Apache to provide external web users with web services that provide multiple web services. The web server can obtain the requested data from the online database server according to the network service requirements and information application requirements transmitted by the external user equipment, and then provide the corresponding network service and information to the external user equipment. The online web server also records each network service request, and can store the network service usage record data (such as IIS Log or Apache Log) according to the server component and transmit the network service usage record data to the record data collecting device. Analyze and store.

複數個線上資料庫伺服器：線上資料庫伺服器設備得運用微軟SQL Server、MySQL、Oracle DB、IBM DB2、PostgreSQL等資料庫伺服器元件進行實作開發，並提供各種資料庫操作方法(至少包含有新增、修改、刪除、查詢等)以令線上網頁伺服器得以存取；可接收線上網頁伺服器設備的資料庫操作要求，並依其要求回覆相關資訊。並且針對每個資料庫操作要求進行紀錄，並可分別依不同的資料庫元件產生資料庫操作紀錄，且將資料庫操作紀錄傳送至紀錄資料處理設備進行解析和儲存。 Multiple online database servers: Online database server devices use Microsoft SQL Server, MySQL, Oracle DB, IBM DB2, PostgreSQL and other database server components for implementation development, and provide a variety of database operations (including at least There are new, modified, deleted, inquired, etc.) to enable the online web server to access; can receive the database operation requirements of the online web server device, and reply to relevant information according to the requirements. And for each database operation requirements record, and can generate database operation records according to different database components, and transfer the database operation records to the record data processing device for analysis and storage.

一紀錄資料蒐集裝置：紀錄資料蒐集裝置得運用Splunk、Logstash等紀錄和解析元件進行實作開發，並提供各種紀錄資料解析模組(至少包含有網路服務使用紀錄資料解析模組和資料庫操作紀錄資料解析模組)予使用者使用；其中，網路服務使用紀錄資料解析模組至少有IIS Log或Apache Log解析功能，以解析來自線上網頁伺服器的紀錄；此外，資料庫操作紀錄資料解析模亦包含有微軟SQL Server Log等解析功能，可解析來自線上資料庫伺服器的紀錄；當紀錄資料蒐集裝置解析完成後再依其紀錄格式進行解析後分別儲存至分散式資料庫中；舉一實例來說，可如下表一所示之範例為一IIS Log紀錄資料，紀錄資料蒐集裝置解析此資料，以分別獲取該紀錄資料的紀錄日期為2015-08-18、紀錄時間為09：12：15、客戶端IP為10.144.198.130、伺服器端IP為10.144.192.1、連結埠號為80、要求的網路服務係為”/index.html”、回應狀態碼則為200、客戶端使用瀏覽器為Mozilla/4.0+(compatible；MSIE+5.5；+Windows+2000+Server)，紀錄資料蒐集裝置亦可選擇性的具備加密功能，其包含有至少一私密金鑰、至少一公開金鑰、至少一任意整數值，即係解析後再透過上述三者以對資料進行加密後再儲存至分散式資料庫中。 A record data collection device: the record data collection device must use Splunk, Logstash and other records and analysis components for development, and provide various record data analysis modules (including at least the network service use record data analysis module and database operation) The record data analysis module is used by the user; wherein the network service use record data analysis module has at least IIS Log or Apache Log parsing function to parse the record from the online web server; in addition, the database operation record data analysis The module also includes an analysis function such as Microsoft SQL Server Log, which can parse the records from the online database server; when the record data collection device is parsed, it is parsed according to its record format and stored in a distributed database; For example, the example shown in the following Table 1 is an IIS Log record data, and the record data collecting device parses the data to obtain the record date of the record data as 2015-08-18 and the record time is 09:12: 15. The client IP is 10.144.198.130, the server IP is 10.144.192.1, the link nickname is 80, and the required network. The service system is "/index.html", the response status code is 200, the client browser is Mozilla/4.0+ (compatible; MSIE+5.5; +Windows+2000+Server), and the record data collection device is also optional. The encryption function includes at least one private key, at least one public key, and at least one arbitrary integer value, that is, the data is parsed and then encrypted by the above three, and then stored in the distributed database.

複數個分散式資料庫：分散式資料庫得運用HBase、MongoDB等分散式資料庫元作實作開發，用以儲存及操作巨量資料並可具備叢集互相備援以支援紀錄資料分析和處理；分散式資料庫主要係儲存經紀錄資料蒐集裝置解析後的網路服務使用紀錄和資料庫操作紀錄，並可於分散式運算裝置進行運算分析時持續提供紀錄資料以維持運算進程。 Multiple decentralized databases: Decentralized databases can be developed using decentralized database elements such as HBase and MongoDB for storing and manipulating huge amounts of data and with clusters to support backup data analysis and processing; The distributed database mainly stores the network service usage records and database operation records analyzed by the recorded data collection device, and continuously provides the record data to maintain the calculation process when the distributed computing device performs the operation analysis.

複數個管理者設備：管理者設備得為個人電腦、平板、智慧型手機、個人數位助理等裝置，設備中並有瀏覽器元件(得為Internet Explorer、Chrome、Firefox、Safari等瀏覽器)或其他可連線之應用程式連線至紀錄資料蒐集裝置，並經由紀錄資料蒐集裝置連線至資料探勘主模組，以選擇適合的探勘子模組來指派予分散式運算裝置進行運算，最後，再由組合節點裝置整合運算結果並回傳資料分析模組，由資料分析模組回覆予外部管理者設備。 A plurality of manager devices: the manager device is a device such as a personal computer, a tablet, a smart phone, a personal digital assistant, etc., and the device has a browser component (a browser such as Internet Explorer, Chrome, Firefox, Safari, etc.) or the like. The connectable application is connected to the record data collection device and connected to the data search main module via the record data collection device to select a suitable exploration sub-module to be assigned to the distributed computing device for calculation, and finally, The combined node device integrates the operation result and returns the data analysis module, and the data analysis module replies to the external manager device.

一紀錄資料分析模組：紀錄資料分析模組得為一具有網路服務的伺服器，可經由網路服務之介面與外部管理者設備、資料探勘主模組、組合節點裝置相介接並傳送及接收資料；紀錄資料分析模組得由管理者以手動或自動方式連線至資料探勘主模組，以並選擇適合的探勘子模組並指派予分散式運算裝置進行運算，以及向組合節點裝置取得運算結果。 A record data analysis module: the record data analysis module is a server with network service, which can interface with external manager equipment, data exploration main module, and combined node device through a network service interface. And receiving data; the record data analysis module may be manually or automatically connected to the data exploration main module by the manager to select a suitable exploration sub-module and assign it to the distributed computing device for calculation, and to the combined node The device obtains the result of the operation.

一資料主探勘模組：資料探勘主模組亦為一具有網路服務的伺服器，可經網路服務介面與探勘子模組、分散式運算裝置介接且傳送接收資料，其可包含複數個探勘子模組以供分散式運算裝置進行運算和分析；其中，至少包含有最近鄰居探勘子模組，係以k個最近鄰居法(k-Nearest Neighbors Method)之邏輯進行演算之分散式運算模組，以及多元線性迴歸探勘子模組，係以多元線性迴歸(Multi Factor Line Regression Method)之邏輯進行演算之分散式運算模組，故資料主探勘模組將可依選定之探勘子模組指派給分散式運算裝置進行運算分析。 A data exploration module: the main module of data exploration is also The server of the network service can interface with the exploration sub-module and the distributed computing device via the network service interface and transmit and receive the data, and can include a plurality of exploration sub-modules for operation and analysis by the distributed computing device; Among them, at least the nearest neighbor exploration sub-module, a decentralized computing module calculated by the logic of k nearest neighbor method (k-Nearest Neighbors Method), and a multiple linear regression exploration sub-module, which is multivariate linear The logic of the Multi Factor Line Regression Method is used to calculate the decentralized computing module. Therefore, the data exploration module can be assigned to the distributed computing device for computational analysis according to the selected exploration sub-module.

至少一分散式運算裝置：分散式運算裝置得運用Hadoop、MongoDB等分散式運算元作進行實作開發，並至少包含有複數個節點設備、複數個分散式運算模組以分析巨量資料。其中，節點設備可依紀錄資料分析設備選定之資料探勘模組產生複數個分散式運算模組，並可向分散式資料庫裝置取得紀錄資料，指派予分散式運算模組進行分析；分散式運算模組依選定的探勘子模組分別進行運算和分析紀錄資料。例如，運用Hadoop或MongoDB所提供的MapReduce分散式運算模組分別依指派之探勘子模組依任務執行分散式運算，再將運算結果整合傳送至組合節點裝置，各該分散式運算裝置亦被選擇性的設置有密文計算的功能，可用以對密文狀態的資料進行運算。 At least one distributed computing device: the distributed computing device uses a decentralized computing unit such as Hadoop or MongoDB for development, and includes at least a plurality of node devices and a plurality of distributed computing modules to analyze huge amounts of data. The node device may generate a plurality of distributed computing modules according to the data exploration module selected by the record data analysis device, and obtain the record data from the distributed database device, and assign the data to the distributed computing module for analysis; The module performs calculation and analysis of the record data according to the selected exploration sub-module. For example, the MapReduce distributed computing module provided by Hadoop or MongoDB respectively performs distributed operation according to the task of the assigned sub-module, and then integrates the operation result into the combined node device, and each distributed computing device is also selected. The succinct setting has the function of ciphertext calculation, which can be used to calculate the data in the ciphertext state.

至少一快取資料庫：快取資料庫係運用關聯式資料庫或非關聯式資料庫元件以實施，用以儲存分散式運算裝置暫存各個紀錄資料分析運算結果以及相關參數，以加速運算使用；舉例來說，在分散式運算裝置執行最近鄰居探勘子模組之分散運算後，將取得相似度最高的複數筆紀錄資料，並將各該紀錄資料傳送至快取資料庫以儲存，在往後的運算時可先自快取資料庫中獲取相似度最高的複數筆紀錄資料以比對分析。另舉例來說，若分散式運算裝置執行多元線性迴歸探勘子模組的分散式運算後，可產生線性迴歸模型參數(包含斜率或截距)儲存至快取資料庫，而往後即時運算進程中可利用快取資料庫中所儲存之線性迴歸模型參數，快取資料庫更被寫入新紀錄資料或刪除舊紀錄資料來避免重覆計算之冗時，當可大幅提升整體運算效率。 At least one cache database: the cache database is implemented by using an associated database or a non-associated database component, and is used for storing the distributed computing device to temporarily store the results of each record data analysis and related parameters to accelerate the operation. For example, after the decentralized operation of the nearest neighbor exploration sub-module is performed by the distributed computing device, the plurality of similarly recorded data will be obtained. And the record data is transmitted to the cache database for storage. In the subsequent calculation, the most similar plurality of record data can be obtained from the cache database for comparison analysis. For another example, if the distributed computing device performs the decentralized operation of the multiple linear regression exploration sub-module, the linear regression model parameters (including the slope or intercept) can be generated and stored in the cache database, and the real-time computing process is followed. The linear regression model parameters stored in the cache database can be utilized, and the cache database is further written into the new record data or the old record data is deleted to avoid redundant time calculation, which can greatly improve the overall operation efficiency.

一組合節點設備：組合節點設備亦為一具有網路服務的伺服器，經由網路服務介面與紀錄資料蒐集裝置、分散式運算裝置介接以傳送接收資料，其擷取分散式運算裝置所產生的各運算結果以整合分析，再將其分析結果回傳予紀錄資料分析模組，另外，組合節點裝置亦可具解密功能，儲存有與紀錄資料蒐集裝置相對應的私密金鑰、公開金鑰以及任意整數值，使其得以對經分散式運算裝置密文計算得出之結果進行解密，解密後提供明文資料回傳至紀錄資料分析模組。 A combined node device: the combined node device is also a server with a network service, and is connected to a record data collecting device and a distributed computing device via a network service interface to transmit and receive data, which is generated by a distributed computing device. The calculation results are integrated and analyzed, and the analysis results are transmitted back to the record data analysis module. In addition, the combined node device can also have a decryption function, and stores a private key and a public key corresponding to the record data collection device. And any integer value, so that the result obtained by the ciphertext calculation of the distributed computing device is decrypted, and the plaintext data is sent back to the record data analysis module after decryption.

以下所揭露的另一實施例亦係關於本發明之即時串流紀錄資料分析系統，該系統架構中至少包含有一紀錄資料蒐集裝置、複數分散式資料庫、一資料探勘主模組、複數分散式運算裝置、一組合節點設備、以及複數快取資料庫；其中，資料主探勘模組包含有最近鄰居探勘子模組，其得以運用k最近鄰居法進行網路紀錄資料分析之演算以產生定位資訊，其系統運作如下所述。 Another embodiment disclosed below is also related to the instant stream recording data analysis system of the present invention. The system architecture includes at least one record data collecting device, a plurality of distributed data bases, a data exploration main module, and a plurality of distributed The computing device, a combined node device, and a plurality of cache databases; wherein the data master exploration module includes a nearest neighbor exploration sub-module, which can perform a network record data analysis calculation using the k nearest neighbor method to generate positioning information. The system operates as described below.

系統包含一紀錄資料蒐集裝置，用以收集智慧型手機回報之經緯度座標資料(即訓練位置，在實施例中有m個位置)和基地台訊號強度集合資料，紀錄資料蒐集裝置並紀錄和解析上述資料，其紀錄每個訓練位置(L={l ₁,l ₂,...,l _m})以及訓練位置對應的基地台訊號強度集合資料(c _i={c ₁ ⁱ,c ₂ ⁱ,...,c _n ⁱ})於分散式資料庫中；其中，c _j ⁱ代表集合中第j個基地台之訊號強度，j=1,…,n(在實施例中設有n個基地台)；接著，往後當智慧型手機移動時，智慧型手機可測量及回報其附近的基地訊號強度集合(r={r ₁,r ₂,...,r _n})，並將由系統中資料主探勘模組、分散式運算裝置、快取資料庫以最近鄰居探勘子模組來計算基地訊號強度集合r與分散式資料庫中所有位置及其訊號強度集合交叉比對以估算出智慧型手機當時可能的位置。 The system includes a record data collecting device for collecting the latitude and longitude coordinate data of the smart phone (ie, the training position, in the embodiment, m positions) and the base station signal strength set data, recording the data collecting device, and recording and analyzing the above. Data, which records each training position ( L = { l ₁ , l ₂ , ..., l _m }) and the base station signal strength set data corresponding to the training position ( c _i = { c ₁ ⁱ , c ₂ ⁱ , ..., c _n ⁱ }) in a decentralized database; where c _j ⁱ represents the signal strength of the jth base station in the set, j = 1 ,..., n (in the embodiment there are n bases) Taiwan); then, when the smart phone moves, the smart phone can measure and report the base signal strength set ( r ={ r ₁ , r ₂ ,..., r _n }) nearby, and will be The data master exploration module, the distributed computing device, and the cache database use the nearest neighbor exploration sub-module to calculate the base signal intensity set r and all the positions in the distributed database and their signal intensity sets to cross-match to estimate the wisdom. The possible location of the phone at the time.

系統包含複數分散式資料庫裝置，係用以儲存每個訓練位置(L={l ₁,l ₂,...,l _m})及其對應的基地台訊號強度集合資料(c _i={c ₁ ⁱ,c ₂ ⁱ,...,c _n ⁱ})；當分散式運算裝置在進行運算和紀錄分析時，分散式資料庫則提供其紀錄資料。 The system includes a plurality of decentralized database devices for storing each training position ( L = { l ₁ , l ₂ , ..., l _m }) and its corresponding base station signal strength set data ( c _i = { c ₁ ⁱ , c ₂ ⁱ ,..., c _n ⁱ }); when the decentralized computing device performs calculations and record analysis, the decentralized database provides its record data.

系統更包含一資料主探勘模組，其至少具備一最近鄰居探勘子模組，係用以評估每一個訊號強度集合r之位置loc(r)；在本實施例中係應用歐幾里得距離(Euclidean Distance)運算方法，係採用下列公式(1)來將訊號強度集合(r={r ₁,r ₂,...,r _n})與資料庫中的每一個位置l _i及其訊號強度集合(c _i={c ₁ ⁱ,c ₂ ⁱ,...,c _n ⁱ})進行距離()之計算，再針對每一個訓練位置同樣進行歐幾里得距離運算，再透過公式(2)找出訊號強度最接近的位置h ₁以及其他最接近的共k個位置(即{h ₁,h ₂,...,h _k})，而資料主探勘模組會將最近鄰居探勘子模組指派予分散式運算裝置來執行。 The system further comprises a data master exploration module, which has at least one nearest neighbor exploration sub-module for evaluating the position loc(r) of each signal intensity set r ; in this embodiment, applying the Euclidean distance (Euclidean Distance) operation method, using the following formula (1) to combine the signal strength ( r = { r ₁ , r ₂ , ..., r _n }) with each position l _i and its signal in the database The intensity set ( c _i ={ c ₁ ⁱ , c ₂ ⁱ ,..., c _n ⁱ }) is calculated by the distance (), and then the Euclidean distance operation is also performed for each training position, and then the formula is passed ( 2) Find the closest position h _{1 of the} signal strength and the other closest k positions (ie { h ₁ , h ₂ ,..., h _k }), and the data master exploration module will explore the nearest neighbors. The sub-modules are assigned to the distributed computing device for execution.

複數分散式運算裝置可以包含有複數個節點設備，而每個節點設備須對應至少一分散式運算模組，分散式運算模組可依資料主探勘模組選定使用之探勘子模組進行運算，在此實施例中，分散式資料庫中共具有m個位置(即有m筆資料需進行比對)，故可將此m筆資料均勻分派至每個節點設備再由每個節點設備中的分散式運算模組分別執行最近鄰居探勘子模組，以多工分別取得最接近的共k個位置(即{h ₁,h ₂,...,h _k})，最接近的k個位置資訊將再被傳送至組合節點裝置，以供組合節點裝置運算產生最終之位置資訊。 The plurality of distributed computing devices may include a plurality of node devices, and each node device shall correspond to at least one distributed computing module, and the distributed computing module may perform operations according to the exploration sub-module selected by the data mining module. In this embodiment, the distributed database has a total of m positions (that is, there are m pen data to be compared), so the m pen data can be evenly distributed to each node device and then dispersed by each node device. The operation module respectively executes the nearest neighbor exploration sub-module, and obtains the closest total k positions (ie, { h ₁ , h ₂ , . . . , h _k }), and the closest k position information is obtained by multiplexing. It will then be transmitted to the combined node device for the combined node device to compute the final location information.

而就如前所述，組合節點設備可接收來自分散式運算裝置運算所運算資訊以進行整合和產生分析的結果，就此實施例所說，組合節點設備係接收複數個節點設備下之分散式運算模組分別計算所得到之k個位置，再從集合中比對以取得k個絕對接近位置，運用下列公式(3)來產生訊號強度集合(r={h ₁,h ₂,...,h _k})所對應的位置資訊l(r)； As described above, the combined node device can receive the information from the operation of the distributed computing device for integration and analysis. In this embodiment, the combined node device receives the distributed operation under a plurality of node devices. The module calculates the obtained k positions separately, and then compares them from the set to obtain k absolute close positions, and uses the following formula (3) to generate a signal intensity set ( r = { h ₁ , h ₂ ,..., h _k }) corresponding position information l(r) ;

複數個快取資料庫主要係用以儲存由分散式運算裝置運算之結果和相關參數以供後續分析可快速取用以提升效率；在此實施例中，快取資料庫將由每個節點設備取得最接近的q×k個位置資訊(其中q×k小於m且q為一正整數)及其所對應的基地台訊號集合並儲存起來，若之後須分析相同智慧型手機回報之基地台訊號強度集合時，即可對快取資料庫中最接近的q×k個位置資訊及其對應的基地台訊號集合進行分析，而不須再重新比對原始之m筆資料。另外，該資料可用以分析智慧型手機移動之速度，例如，當智慧型手機移動速度緩慢抑或靜止時，q值可被設定為極小值(如：1)，而當智慧型手機快速移動時，q值可被設定為較大之數值。 The plurality of cache databases are mainly used to store the results of the operations performed by the distributed computing device and related parameters for subsequent analysis to quickly obtain efficiency; in this embodiment, the cache database will be obtained by each node device. The closest q × k position information (where q × k is less than m and q is a positive integer) and its corresponding base station signal set is stored and stored. If necessary, the base station signal strength of the same smart phone report must be analyzed. When collecting, the closest q × k position information in the cache database and its corresponding base station signal set can be analyzed without re-matching the original m pen data. In addition, the data can be used to analyze the speed of smart phone movement. For example, when the smart phone moves slowly or at rest, the q value can be set to a minimum value (such as 1), and when the smart phone moves quickly, The q value can be set to a larger value.

以下所揭露的另一實施例亦是本發明之一種即時串流紀錄資料分析系統，該系統架構中至少包含有一紀錄資料蒐集裝置、複數分散式資料庫、一資料探勘主模組、複數分散式運算裝置、一組合節點設備、以及複數快取資料庫；其中，資料主探勘模組包含有最近多元線性迴歸探勘子模組，其得以運用多元線性迴歸模組進行交通紀錄資料分析之演算以產出交通預測資訊，其系統運作如下所述；其中，資料探勘模組裝置至少包含有多元線性迴歸模組，並得以運用多元線性迴歸模組進行交通紀錄資料分析，並且產製交通預測資訊，詳述如下。 Another embodiment disclosed in the following is also an instant stream record data analysis system of the present invention. The system architecture includes at least one record data collection device, a plurality of distributed data bases, a data exploration main module, and a plurality of distributed An arithmetic device, a combined node device, and a plurality of cache databases; wherein the data master exploration module includes a recent multiple linear regression exploration sub-module, which is capable of performing a traffic log data analysis calculation using a multiple linear regression module The traffic forecasting information is as follows: The data exploration module device includes at least a multiple linear regression module, and the multivariate linear regression module is used for traffic record data analysis, and the traffic prediction information is produced. As described below.

系統包含一紀錄資料蒐集裝置，用以收集設置於清潔車上之車載設備回傳之到站時間資訊，並由紀錄資料蒐集裝置解析到站時間資訊以運算產生清潔車途經由站到站之間的旅行時間，例如：欲表示第r筆資料的第i-n-j個清運站到第i-n個清運站間之旅行時間為t ^r _i-n-j,i-n；紀錄資料蒐集裝置係將如前述計算之每個旅行時間集合儲存至分散式資料庫，以供後續分析之運用。 The system comprises a record data collecting device for collecting the arrival time information of the in-vehicle device set on the cleaning vehicle, and parsing the station time information by the record data collecting device to calculate the clean car route between the station and the station. travel time, for example: the first is intended to indicate the information of the pen r th inj removal station to the travel time between the removal of the stations in a T _{inj ^r, in;} device history data collection system to travel to each of the preceding calculation Time collections are stored in a decentralized database for subsequent analysis.

系統包含複數分散式資料庫裝置，在此實施例中，分散式資料庫係運用HBase、MongoDB等分散式資料庫元件開發，以儲存每個站到站之間的旅行時間。 The system includes a plurality of distributed database devices. In this embodiment, the distributed database is developed using distributed database components such as HBase and MongoDB to store travel time between each station and the station.

系統更包含一資料主探勘模組，其至少具備一多元線性迴歸探勘子模組，用以運算產生清潔車由各站到站之間旅行時間的關聯性(如斜率、截距等)；在本實施例中，係以分析歷史資料中的m筆資料來產生k個加權線性迴歸模型(t ^r _i-n-j,i-n)為範例；第i-n個清運點到第i個清運點的預測旅行時間()可以運用多元加權線性迴歸模型(如公式(4)所示)進行運算獲得，在執行階段中主要將會依據第i-n個清運點的前k個清運點到達第i-n個清運點的旅行時間(即{t _i-n-1,i-n ,t _i-n-2,i-n ,...,t _i-n-k,i-n})協同已經過訓練之多元加權線性迴歸模型，以預測第i-n個清運點到第i個清運點的預測旅行時間(如公式(5)所示)。 The system further comprises a data master exploration module, which has at least a multiple linear regression exploration sub-module for calculating the correlation (such as slope, intercept, etc.) of the travel time between the stations and the stations; In this embodiment, k weighted linear regression models are generated by analyzing the m- pen data in the historical data. ( t ^r _inj,in ) is an example; the predicted travel time from the in- th cleaning point to the i- th clearing point ( ) Can be weighted using multiple linear regression model (equation (4)) is obtained for operation, you will reach the main point of removal in accordance with a first k removal in a removal point of the point in the execution stage Travel time (i.e., _{{t in -1, in, t} in- 2, in, ..., t ink, in}) weighted multivariate linear regression model of cooperative have been trained to predict the removal of a point in the first The predicted travel time of the i clearing points (as shown in equation (5)).

複數分散式運算裝置可以包含有複數個節點設備，而每個節點設備須對應至少一分散式運算模組，分散式運算模組可依資料主探勘模組選定使用之探勘子模組進行運算，在此實施例中，因多元加權線性迴歸探勘子模組的模型大多利用加以及乘等運算因子且具結合律之特性，舉例來說可以依照歷史資料m筆之筆數均勻分配任務至各個節點設備，再於每個節點設備中的分散式運算模組分別執行多元加權線性迴歸，或是依待產製之k個加權線性迴歸模型平均分配於各節點設備，各節點設備中之分散式運算模組將分別執行各個多元加權線性迴歸探勘子模組；且在運算完畢之後，分散式運算裝置會將各多元加權線性迴歸模型之斜率(如)、截距(如)、以及權重(如)分別儲存於快取資料庫中以供後續分析使用。 The plurality of distributed computing devices may include a plurality of node devices, and each node device shall correspond to at least one distributed computing module, and the distributed computing module may perform operations according to the exploration sub-module selected by the data mining module. In this embodiment, the models of the multivariate weighted linear regression exploration sub-module mostly use the addition and multiplication factors and have the characteristics of the combination law. For example, the task can be evenly distributed to each node according to the number of historical data m pens. The device, respectively, performs a multivariate weighted linear regression on the distributed computing module in each node device, or distributes the k weighted linear regression models to be distributed on each node device, and the distributed computing mode in each node device. Each group will perform each multivariate weighted linear regression exploration sub-module; and after the operation is completed, the distributed computing device will slope the multi-weighted linear regression models (eg ), intercept (such as ), as well as weights (such as ) Stored separately in the cache database for later analysis.

而就如前所述，組合節點設備可接收來自分散式運算裝置運算所得到之資訊並進行整合以產生分析結果；在本實施例中，組合節點設備可接收複數個節點設備分別計算所得出之k個加權線性迴歸模型及其相關參數(即斜率、截距以及權重)，再透過公式(5)之方式運算產生第i-n個清運點到第i個清運點的預測旅行時間。 As described above, the combined node device can receive the information obtained from the operation of the distributed computing device and integrate it to generate the analysis result. In this embodiment, the combined node device can receive the plurality of node devices and calculate the calculated k weighted linear regression model and its parameters (i.e. slope, intercept and weight), and then in a second operation to generate the predicted travel time to the point of removal of the i-th point through the removal equation (5) of the embodiment.

在此實施例中，複數個快取資料庫主要將由各個節點設備計算得出之各個多元加權線性迴歸模型的斜率、截距以及其權重分別儲存於各該快取資料庫當中以作為分析之數據，此外，當後續輸入的資料有異動時，因多元加權線性迴歸模型應主要為加法及乘法運算且可能具結合律等數學特性，故若是搭配快取資料庫中暫存之歷史數據，只需要加入新增的資料或減去被刪除的資料即可快速調整多元加權線性迴歸模型所利用之斜率、截距或權重，而非再費時重新計算原始的m筆資料以提升效率。 In this embodiment, the plurality of cache databases mainly store the slopes, intercepts, and weights of the respective multivariate weighted linear regression models calculated by the respective node devices in each of the cache databases for use as analysis data. In addition, when the data input subsequently has a change, the multivariate weighted linear regression model should be mainly additive and multiplication and may have mathematical characteristics such as combination law. Therefore, if it is used with the historical data temporarily stored in the cache database, only Adding new data or subtracting deleted data can quickly adjust the slope, intercept or weight used by the multivariate weighted linear regression model instead of recalculating the original m- pen data to increase efficiency.

以下所揭露的另一實施例亦是本發明之即時串流紀錄資料分析方法的另一實施例，主要係在原先之即時串流紀錄資料分析系統上增加具備密文計算功能而生的即時串流紀錄資料分析方法，如圖2中所示，此方法主要包含有八步驟，分別為步驟S201：紀錄線上資料步驟、步驟S202：資料加密步驟、步驟S203：存入分散式資料庫步驟、步驟S204：選擇資料探勘子模組步驟、步驟S205：指派工作予分散式運算裝置並進行密文計算步驟、步驟S206：暫存運算結果至快取資料庫步驟、步驟S207：回傳和解密步驟以及步驟S208：顯示結果步驟等八步驟，各步驟之詳細實施內容將敘述在以下實施例中。 Another embodiment disclosed in the following is another embodiment of the method for analyzing real-time streaming data of the present invention, which mainly adds a live string with ciphertext computing function on the original real-time streaming data analysis system. The flow record data analysis method, as shown in FIG. 2, the method mainly includes eight steps, respectively step S201: record online data step, step S202: data encryption step, step S203: deposit into the distributed database step, step S204: selecting a data exploration sub-module step, step S205: assigning a work to the distributed computing device and performing a ciphertext calculation step, step S206: temporarily storing the operation result to the cache database step, step S207: returning and decrypting steps, and Step S208: eight steps of displaying the result step, etc., the detailed implementation of each step will be described in the following embodiments.

步驟S201：紀錄線上資料步驟：紀錄資料蒐集裝置將把外部線上網頁伺服器和外部線上資料庫伺服器的服務要求以及回應紀錄蒐集並儲存至分散式資料庫；例如：車載機1(外部使用者設備)於09：00：00、09：03：20、09：07：00等三時間分別抵達站點1、站點2、站點3；車載機2(外部使用者設備)於10：00：00、10：04：00、10：08：10等時點分別抵達站點1、站點2、站點3；另外，車載機3(外部使用者設備)於11：00：00、11：03：30、11：07：20等三時點分別到達相同的站點1、站點2、站點3；以及，車載機4(外部使用者設備)於12：00：00、12：03：40等兩時點分別抵達站點1、站點2，即如下表二中所示。 Step S201: Recording online data step: The record data collecting device collects and stores the service request and response record of the external online web server and the external online database server into a distributed database; for example, the vehicle-mounted device 1 (external user) Equipment) arrive at Site 1, Site 2, Site 3 at 09:00:00, 09:03:20, 09:07:00, etc.; Onboard Device 2 (External User Equipment) at 10:00 :00, 10:04:00, 10:08:10, etc. arrive at Site 1, Site 2, Site 3 respectively; in addition, the in-vehicle device 3 (external user equipment) at 11:00:00, 11: 03:30, 11:07:20 and other three points arrive at the same site 1, site 2, site 3; and, on-board machine 4 (external user equipment) at 12:00:00, 12:03: At 40 o'clock, the two arrive at Site 1, Site 2, as shown in Table 2 below.

下列為表二，表示車載機到站時間： The following is Table 2, which indicates the arrival time of the vehicle-mounted machine:

而當上述四部車載機(外部使用者設備)到達各站點時，將經由中介軟體(例如：RESTful API)回報車載機之位置資訊和時間資訊傳輸至外部線上網頁伺服器和外部線上資料庫伺服器，而紀錄資料蒐集裝置將可對這些位置資訊和時間資訊紀錄進行儲存及分析，以計算出站到站時間之間的旅行時間，舉例來說：車載機1從站點1到站點2的旅行時間(t _1,2)為200秒、從站點2到站點3的旅行時間(t _2,3)為220秒，就如下表三所示。 When the above four in-vehicle devices (external user devices) arrive at each site, the location information and time information of the in-vehicle device are transmitted to the external online web server and the external online database server via the intermediary software (for example, RESTful API). And the record data collection device will store and analyze these location information and time information records to calculate the travel time between the station arrival time, for example: the vehicle-mounted device 1 from site 1 to site 2 The travel time ( t _{1 , 2} ) is 200 seconds, and the travel time ( t _{2 , 3} ) from station 2 to station 3 is 220 seconds, as shown in Table 3 below.

下列為表三，係用以表示站到站之間的旅行時間(單位：秒)： The following is Table 3, which is used to indicate the travel time between stations (in seconds):

步驟S202：資料加密步驟：紀錄資料蒐集裝置蒐集到外部線上網頁伺服器和外部線上資料庫伺服器的位置資訊和時間資訊紀錄後，即會透過一加密演算法對資料進行加密；紀錄資料蒐集裝置應計算所站到站之間的旅行時間，再分別計算出t _1,2乘上t _2,3的值以及t _1,2平方的值，以產生到站時間的相關參數值，如表四所示。 Step S202: Data encryption step: after the record data collecting device collects the location information and the time information record of the external online web server and the external online database server, the data is encrypted by an encryption algorithm; the record data collecting device The travel time between the station and the station should be calculated, and then t _{1 , 2} multiplied by the value of t _{2 , 3 and} the value of t _{1 , 2} square, respectively, to generate the relevant parameter values of the arrival time, as shown in Table 4 Shown.

下列為表四，呈現到站時間的相關參數值： The following is Table 4, showing the relevant parameter values for the arrival time:

資料加密步驟中，接著，紀錄資料蒐集裝置則可運用預設之一私密金鑰p、一公開金鑰q、一任意整數值z等參數值，再通過下列公式(6)對相關參數值進行加密，在本實施例中假設私密金鑰p之值為39,916,801、公開金鑰q之值為112,909、任意整數值z之值則為7，而計算之結果舉例來說：原為明文資料的數值44,000經由參數值及公式加密後可得出密文資料279,461,607，而其它範例結果則整理如下表五所示。 In the data encryption step, the record data collecting device may then use a predetermined one of the private key p, a public key q, an arbitrary integer value z, and the like, and then perform the relevant parameter values by the following formula (6). Encryption, in this embodiment, it is assumed that the value of the private key p is 39,916,801, the value of the public key q is 112,909, and the value of any integer value z is 7, and the result of the calculation is as follows: the value of the original plaintext data 44,000 is encrypted by parameter values and formulas to obtain ciphertext data 279, 461, 607, and other example results are organized as shown in Table 5 below.

公式(6)：f(x)=(x+p×z)mod(p×q)，其中，x為原始之相關參數值，mod運算則為以後項之值作為前項之值的除數以取餘數之模除運算。 Formula (6): f(x)=(x+p×z) mod(p×q), where x is the original correlation parameter value, and mod operation is the divisor of the value of the previous term as the divisor of the value of the previous term. Take the remainder of the modulus division operation.

下列為表五，呈現加密後之相關參數值： The following is Table 5, showing the relevant parameter values after encryption:

步驟S203：存入分散式資料庫步驟：本發明之紀錄資料蒐集裝置可選擇性地將資料以明文或密文方式儲存至複數分散式資料庫中，而在本實施例中，紀錄資料蒐集裝置係將表五所呈現之加密後的相關參數值儲存至各該分散式資料庫中，而本發明可在資料庫中儲存密文的功能，是一種有效防範資料庫被侵入或資料被竊取時資料立即外洩之風險的方法。 Step S203: depositing into the distributed database step: the record data collecting device of the present invention can selectively store the data in a plain text or ciphertext manner into a plurality of distributed database, and in the embodiment, the record data collecting device The encrypted parameter values presented in Table 5 are stored in each of the distributed In the database, the function of the present invention for storing ciphertext in the database is a method for effectively preventing the risk of the data being immediately leaked when the database is intruded or the data is stolen.

步驟S204：選擇資料探勘子模組步驟：外部管理者可通過外部管理者設備連線至紀錄資料分析裝置，經由紀錄資料分析設備存取資料探勘主模組，以選擇外部管理者其欲使用的探勘子模組，在此一實施例中，外部管理者係選擇了使用多元線性迴歸子模組，故後續之範例將繼續以多元線性迴歸子模組作為分析和運算之主要工具。 Step S204: Step of selecting the data exploration sub-module: the external manager can connect to the record data analysis device through the external manager device, and access the data exploration main module through the record data analysis device to select the external manager to use. The exploration sub-module, in this embodiment, the external manager chooses to use the multiple linear regression sub-module, so the subsequent examples will continue to use the multiple linear regression sub-module as the main tool for analysis and calculation.

步驟S205：指派工作予分散式運算裝置並進行密文計算步驟：資料探勘主模組可依照外部管理者選擇使用的探勘子模組，以指派複數分散式運算裝置執行運算分析，並由各該分散式運算裝置之下的複數個分散式運算模組對被分配的紀錄資料進行計算，且分散式運算裝置係採可以直接處理密文的方式對密文狀態的紀錄資料進行處理。 Step S205: assigning the work to the distributed computing device and performing the ciphertext computing step: the data exploration main module may perform the operation analysis according to the exploration sub-module selected by the external manager to assign the plurality of distributed computing devices, and each of the A plurality of distributed computing modules under the distributed computing device calculate the allocated recorded data, and the distributed computing device processes the recorded data in the ciphertext state by directly processing the ciphertext.

其中，各該分散式運算裝置將可依外部管理者所選定之多元線性迴歸子模組，通過前述的公式(4)以及公式(5)等等的運算需求，以其下分屬之複數個分散式運算模組分別進行運算以加總所需之參數值，加總後結果如下表六所示；在本實施例中，係以計算一組迴歸子模組參數a和b舉例說明，但本發明之運用不以此例為限，故各該分散式運算裝置可平行利用各該分散式運算模組來進行大量的迴歸子模組參數運算。 Wherein, each of the distributed computing devices can use the multiple linear regression sub-modules selected by the external manager to pass the operation requirements of the aforementioned formula (4) and formula (5), etc., and the plurality of subordinates The decentralized computing module performs operations to add the required parameter values, and the summed results are shown in Table 6 below; in this embodiment, the parameters of a set of regression sub-modules a and b are calculated, but The application of the present invention is not limited to this example. Therefore, each of the distributed computing devices can perform a large number of regression sub-module parameter operations in parallel using each of the distributed computing modules.

下列為表六，呈現加密後相關參數值之加總： The following is the sixth table, showing the sum of the relevant parameter values after encryption:

步驟S206：暫存運算結果至快取資料庫步驟：各該分散式運算裝置運算結果會被暫存至快取資料庫，快取資料庫亦可為複數個，暫存院算結果之目的主要係用以令往後資料分析效率提升；在本實施例中，已加總完車載機1、車載機2、車載機3的資料，而這加總結果將被暫存至快取資料庫裝置，後續若需使用時可直接套用加總結果，並不需要再重新加總車載機1、車載機2、車載機3的資料，當可節省許多時間。 Step S206: Temporarily storing the operation result to the cache database step: each of the distributed computing device operation results are temporarily stored in the cache database, and the cache database may also be plural, and the purpose of the temporary storage system calculation result is mainly It is used to improve the efficiency of data analysis in the future; in this embodiment, the data of the in-vehicle device 1, the in-vehicle device 2, and the in-vehicle device 3 have been added, and the total result will be temporarily stored in the cache database device. If the follow-up needs to be used, the total result can be directly applied, and it is not necessary to re-add the data of the in-vehicle device 1, the in-vehicle device 2, and the in-vehicle device 3, which can save a lot of time.

步驟S207：回傳和解密步驟：各該分散式運算裝置將把運算完結果回傳至組合節點裝置，並由組合節點裝置以與加密方式相同之參數值將密文資料進行解密並整合為分析結果。當組合節點設備收到分散式運算裝置運算結果後，得運用與紀錄資料處理設備相同之一私密金鑰p、一公開金鑰q、一任意整數值z，運用公式(7)進行解密，在此實施例中設定私密金鑰p為39,916,801、公開金鑰q為112,909、任意整數值z為7。如：加總後結果之密文838,405,121經由解密後得到明文為152,300，整理如表七所示。 Step S207: Backhaul and decryption steps: each of the distributed computing devices will return the calculated result to the combined node device, and the combined node device decrypts and integrates the ciphertext data into the analysis with the same parameter value as the encryption method. result. After the combined node device receives the operation result of the distributed computing device, it may use the same private key p, a public key q, an arbitrary integer value z as the record data processing device, and decrypt using the formula (7). In this embodiment, the private key p is set to 39,916,801, the public key q is 112,909, and the arbitrary integer value z is 7. For example, the ciphertext 838, 405, 121 after the summation results in the plaintext is 152,300 after decryption, as shown in Table 7.

公式(7)：g(x)=(x)mod(p)，其中mod亦為模除運算。 Equation (7): g(x) = (x) mod(p), where mod is also a modular division operation.

下列為表七，呈現解密後相關參數值之加總： The following is Table VII, showing the sum of the relevant parameter values after decryption:

而透過上表七所示之解密後的加總資料以及已知的資料筆數3筆(m=3)，可運用前述公式(4)分別計算出a和b參數，如下列計算式(8)所示，而後透過下列計算式(9)以預測計算車載機4從站點2到站點3所需的旅行時間，其結果估計約為236秒，故推知車載機4到達站點3的預測到站時間應為12：07：36。 Through the decrypted aggregate data shown in Table 7 above and the known number of data sheets (m=3), the a and b parameters can be calculated using the above formula (4), as shown in the following formula (8). ), and then calculate the travel time required for the in-vehicle device 4 from the station 2 to the station 3 by the following calculation formula (9), and the result It is estimated that it is about 236 seconds, so it is inferred that the predicted arrival time of the in-vehicle device 4 arriving at the station 3 should be 12:07:36.

步驟S208：顯示結果步驟：組合節點裝置將運算結果傳送至紀錄資料分析模組，再由紀錄資料分析模組通知外部管理者設備以於外部管理者設備上呈現本次分析結果；而本實施例中，當組合節點裝置計算出預測結果後，會將預測結果傳送至紀錄資料分析模組，再由紀錄資料分析模組通知外部管理者設備，以於外部管理者設備上呈現出車載機4到達站點3的預測到站時間為12：07：36給外部管理者知悉。 Step S208: displaying the result step: the combination node device transmits the operation result to the record data analysis module, and then the record data analysis module notifies the external manager device to present the analysis result on the external manager device; After the combined node device calculates the prediction result, the prediction result is transmitted to the record data analysis module, and the record data analysis module notifies the external manager device to present the onboard device 4 to the external manager device. Site 3's predicted arrival time is 12:07:36 for external managers.

綜上所述，本發明於技術思想上實屬創新，也具備先前技術不及的多種功效，已充分符合新穎性及進步性之法定發明專利要件，爰依法提出專利申請，懇請貴局核准本件發明專利申請案以勵發明，至感德便。 In summary, the present invention is innovative in terms of technical ideas, and also has various functions that are not in the prior art, and has fully complied with the statutory invention patent requirements of novelty and progressiveness, and has filed a patent application according to law, and invites you to approve the invention. The patent application was inspired to invent, and it was a matter of feeling.

101‧‧‧使用者設備 101‧‧‧User equipment

102‧‧‧線上網頁伺服器 102‧‧‧Online web server

103‧‧‧線上資料庫伺服器 103‧‧‧Online database server

104‧‧‧管理者設備 104‧‧‧Manager equipment

105‧‧‧紀錄資料蒐集裝置 105‧‧‧Record data collection device

106‧‧‧分散式資料庫 106‧‧‧Distributed database

107‧‧‧紀錄資料分析模組 107‧‧‧Record data analysis module

108‧‧‧資料探勘主模組 108‧‧‧Data exploration main module

109‧‧‧分散式運算裝置 109‧‧‧Distributed computing device

110‧‧‧快取資料庫 110‧‧‧Cache Database

111‧‧‧組合節點裝置 111‧‧‧Combined node device

Claims

An instant stream record data analysis system, comprising: a record data collection device, connected to an external online web server or an external online database server, and stored in an external online web server or an external online database according to format analysis The external network users in the server use the network service requirements and the reply records sent by the user equipment; the plurality of distributed databases are used to store the user network service requirements analyzed by the record data collection device. Reply record; a cache database is used to provide fast preemption to accelerate the computing process; a data exploration main module, including multiple exploration sub-modules and individually providing modular different algorithms And the calculation logic is used for calculation and analysis; a record data analysis module is connected with the external manager device, and the record data analysis module is controlled by the administrator or independently connected to the data exploration module device and selected appropriate Exploring sub-modules and assigning equipment for performing computing tasks; at least one decentralized computing device, each of the decentralized The computing device obtains the user network service request and the reply record from each of the distributed databases, and performs the task according to the exploration sub-module selected by the record data analysis module and the assigned operation task to be respectively assigned to the internal computing module. And temporarily storing the obtained operation and analysis calculation result in the cache database; and a combination node device, extracting and integrating from each node device in the distributed operation device and each of the distributed operation modules An analysis result is generated, and the analysis result is returned to the record data analysis module.

For example, the analysis system of real-time streaming records as described in item 1 of the patent application scope The data collection device encrypts the network service request and the reply record data by using at least one private key, a public key, and an arbitrary integer value, and each of the distributed computing devices directly directly accesses the ciphertext The state network service request and the reply record data are calculated and the operation result is generated, and the combined node device has the corresponding private key, the public key and the arbitrary integer value to decrypt the analysis result.

The real-time streaming record data analysis system of claim 1, wherein the data exploration main module further comprises: a nearest neighbor exploration sub-module, which is a k-Nearest Neighbors Method. A decentralized computing module for logic calculation; and a multivariate linear regression exploration sub-module, which is a decentralized computing module that is calculated by the logic of a Multi Factor Line Regression Method.

The real-time streaming record data analysis system of claim 1, wherein each of the distributed computing devices further comprises: a plurality of node devices, and the plurality of exploration sub-modules selected according to the record data analysis module are assigned to the plurality of The distributed computing module, the plurality of node devices acquires and records the records data to each of the distributed data bases, wherein each of the distributed computing modules performs operations and analyzes the recorded data according to the operation process of the exploration sub-module.

An instant stream recording data analysis method, the steps comprising: recording online data steps: a record data collecting device collects and stores network services sent by external plural users from an external online web server and an external online database server Requirements and response records; deposit in a decentralized database step: record data collection device will be the aforementioned network service Requirement and reply record storage to a plurality of decentralized databases; selection of data exploration module steps: an external manager is linked to a record data analysis module to request a data exploration main module via the record data analysis module The plurality of exploration sub-modules are selected for use; the assignment work to the distributed computing device step: the data exploration and exploration main module will assign a plurality of distributed computing devices to the user network according to the exploration sub-module selected by the external manager Requires operation with the reply record; temporary storage to the cache database device step: each of the results of the distributed computing device output will be temporarily stored in a cache database for future analysis; and backhaul and display The result of the operation: the combination node device connected to each of the distributed computing devices integrates the calculation result into an analysis result, and transmits the result to the external manager device through the record data analysis module to present the analysis result to the external user.

The method for analyzing real-time streaming data as described in claim 5, further comprising the step of: the record data collecting device serving the network by using at least one private key, a public key, and an arbitrary integer value. Requiring and replying to the record data for encryption; each of the distributed computing devices directly calculates the network service request of the ciphertext state and the reply record data and generates a calculation result; and the combined node device has corresponding corresponding private keys The public key and the arbitrary integer value decrypt the analysis result.

For example, the real-time streaming record data analysis method described in claim 5, wherein the data exploration main module further comprises: a nearest neighbor exploration sub-module, which is a k-nearest neighbor method (k-Nearest). The logic of the Neighbors Method) is a decentralized computing module; and a multivariate linear regression exploration sub-module is a decentralized computing module that is calculated by the logic of the Multi Factor Line Regression Method.

The method for analyzing real-time streaming data according to claim 5, wherein each of the distributed computing devices further comprises: a plurality of node devices, and the plurality of exploration sub-modules selected according to the record data analysis module are assigned to the plurality of The distributed computing module, the plurality of node devices acquires and records the records data to each of the distributed data bases, wherein each of the distributed computing modules performs operations and analyzes the recorded data according to the operation process of the exploration sub-module.