TWI798170B

TWI798170B - Data analysis method and device

Info

Publication number: TWI798170B
Application number: TW106105358A
Authority: TW
Inventors: 雷宗雄
Original assignee: 香港商阿里巴巴集團服務有限公司
Priority date: 2016-03-25
Filing date: 2017-02-17
Publication date: 2023-04-11
Also published as: WO2017162085A1; TW201734843A; CN107229815A

Abstract

本發明提供了一種資料分析方法及裝置，通過建構資料分析流程中各處理節點對應的處理模型，將各處理模型按照依賴關係依次連接，利用連接後的處理模型對待處理的原始資料進行處理以得到目標資料。本發明中各處理模型按照依賴關係直接連接，上一個處理模型的輸出可以直接輸入到下一個處理模型中，中間資料不再進行落地，節省資源，並且省略中間資料的加載過程，提高了資料分析效率。 The present invention provides a data analysis method and device. By constructing a processing model corresponding to each processing node in the data analysis process, the processing models are connected in sequence according to the dependency relationship, and the raw data to be processed is processed by the connected processing model to obtain target data. In the present invention, each processing model is directly connected according to the dependency relationship, the output of the previous processing model can be directly input into the next processing model, and the intermediate data is no longer landed, which saves resources, and omits the loading process of the intermediate data, which improves data analysis efficiency.

Description

Data analysis method and device

本發明屬資料處理領域，尤其關於一種資料分析方法及裝置。 The invention belongs to the field of data processing, in particular to a data analysis method and device.

資料分析過程中往往包含大量的資料清洗、特徵平滑、特徵規範化、特徵提煉、特徵選擇等處理節點。在傳統資料分析中，對於每一個處理節點，都會產生對應的中間資料，產生的中間資料需要進行落地處理，由於中間資料用於作為下一個處理節點的上游資料依賴，下一個資料節點需要加載上一個資料節點落地後的中間資料作為其輸入。 The process of data analysis often includes a large number of processing nodes such as data cleaning, feature smoothing, feature normalization, feature extraction, and feature selection. In traditional data analysis, for each processing node, corresponding intermediate data will be generated, and the generated intermediate data needs to be processed on the ground. Since the intermediate data is used as the upstream data dependency of the next processing node, the next data node needs to be loaded The intermediate data after a data node lands is used as its input.

在傳統資料分析中，上下兩個處理節點通過落地的中間資料進行鏈接，資料每經過一個處理節點就需要做一次資料加載和落地，對於海量資料或者複雜的資料分析過程來說，中間資料的資料量巨大，不僅導致計算資源、輸入輸出(IO)的資源浪費，而且資料加載和落地也會嚴重影響資料分析的效率。 In traditional data analysis, the upper and lower processing nodes are linked through the landing intermediate data. Every time the data passes through a processing node, it needs to be loaded and landed once. For massive data or complex data analysis process, the data of intermediate data The huge amount not only leads to the waste of computing resources and input and output (IO) resources, but also seriously affects the efficiency of data analysis when data loading and landing.

本發明提供一種資料分析方法及裝置，用於解決現有資料分析中資料每經過一個處理節點就需要做一次資料加載和落地，不僅導致資源費，且影響資料分析的效率的問題。 The present invention provides a data analysis method and device, which are used to solve the problem that in the existing data analysis, data needs to be loaded and landed once every time the data passes through a processing node, which not only causes resource costs, but also affects the efficiency of data analysis.

為了實現上述目的，本發明提供了一種資料分析方法，包括：建構資料分析流程中各處理節點對應的處理模型；將各處理模型按照依賴關係依次連接產生處理模組；利用連接後的處理模型對待處理的原始資料進行處理以得到目標資料。 In order to achieve the above object, the present invention provides a data analysis method, including: constructing processing models corresponding to each processing node in the data analysis process; connecting each processing model in sequence according to the dependency relationship to generate processing modules; The processed raw data are processed to obtain the target data.

為了實現上述目的，本發明提供了一種資料分析裝置，包括：建構模組，用於建構資料分析流程中各處理節點對應的處理模型；連接模組，用於將各處理模型按照依賴關係進行連接；處理模組，用於利用連接後的處理模型對待處理的原始資料進行以處理以得到目標資料。 In order to achieve the above object, the present invention provides a data analysis device, including: a construction module, used to construct the processing models corresponding to each processing node in the data analysis process; a connection module, used to connect each processing model according to the dependency relationship ; The processing module is used to use the connected processing model to process the raw data to be processed to obtain the target data.

本發明提供的資料分析方法及裝置，通過建構資料分析流程中各處理節點對應的處理模型，將各處理模型按照依賴關係依次連接，利用連接後的處理模型對待處理的原始資料進行處理以得到目標資料。本發明中各處理模型按照依賴關係直接連接，上一個處理模型的輸出可以直接輸入到下一個處理模型中，中間資料不再進行落地，節省資源，並且省略中間資料的加載過程，提高了資料分析效率。 In the data analysis method and device provided by the present invention, by constructing the processing models corresponding to each processing node in the data analysis process, each processing model is connected in sequence according to the dependency relationship, and the raw data to be processed is processed by the connected processing model to obtain the target material. In the present invention, each processing model is directly connected according to the dependency relationship, and the output of the previous processing model can be directly input Into the next processing model, the intermediate data is no longer landing, saving resources, and omitting the loading process of intermediate data, improving the efficiency of data analysis.

101~103‧‧‧步驟 101~103‧‧‧Steps

201~206‧‧‧步驟 201~206‧‧‧Steps

301~307‧‧‧步驟 Step 301~307‧‧‧

11‧‧‧建構模組 11‧‧‧Building modules

12‧‧‧連接模組 12‧‧‧connection module

13‧‧‧處理模組 13‧‧‧Processing module

14‧‧‧獲取模組 14‧‧‧Acquiring modules

15‧‧‧解析模組 15‧‧‧analysis module

16‧‧‧效果驗證模組 16‧‧‧Effect verification module

17‧‧‧落地模組 17‧‧‧Floor Module

131‧‧‧資料校驗單元 131‧‧‧Data verification unit

132‧‧‧獲取單元 132‧‧‧Acquisition unit

133‧‧‧格式轉換單元 133‧‧‧Format conversion unit

134‧‧‧處理單元 134‧‧‧processing unit

圖1為本發明實施例一的資料分析方法的流程示意圖；圖2為本發明實施例二的資料分析方法的流程示意圖；圖3為本發明實施例二的DAG圖；圖4為本發明實施例二的資料分析方法的應用示例示意圖之一；圖5為本發明實施例三的資料分析方法的流程示意圖；圖6為本發明實施例四的資料分析裝置的結構示意圖；圖7為本發明實施例五的資料分析裝置的結構示意圖。 Fig. 1 is a schematic flow diagram of the data analysis method of the first embodiment of the present invention; Fig. 2 is a schematic flow diagram of the data analysis method of the second embodiment of the present invention; Fig. 3 is a DAG diagram of the second embodiment of the present invention; Fig. 4 is the implementation of the present invention One of the schematic diagrams of the application example of the data analysis method of Example 2; FIG. 5 is a schematic flow chart of the data analysis method of the third embodiment of the present invention; FIG. 6 is a schematic structural diagram of the data analysis device of the fourth embodiment of the present invention; FIG. Schematic diagram of the structure of the data analysis device of the fifth embodiment.

下面結合圖式對本發明實施例提供的評估指標獲取方法及裝置進行詳細描述。 The evaluation index acquisition method and device provided by the embodiments of the present invention will be described in detail below with reference to the drawings.

實施例一 Embodiment one

如圖1所示，其為本發明實施例一的資料分析方法的流程示意圖。該資料分析方法包括以下步驟： As shown in FIG. 1 , it is a schematic flowchart of the data analysis method in Embodiment 1 of the present invention. The data analysis method includes the following steps:

S101、建構資料分析流程中各處理節點對應的處理模型。 S101. Construct a processing model corresponding to each processing node in the data analysis process.

首先，對設定的資料分析流程進行分析，獲取每個資料分析流程中各處理節點，在獲取到各處理節點後，能夠依據各處理節點的功能，建構每個處理節點相應的處理模型。 First, analyze the set data analysis process, and obtain each processing node in each data analysis process. After obtaining each processing node, the corresponding processing model of each processing node can be constructed according to the function of each processing node.

例如，該資料分析流程中包括有特徵平滑、特徵歸一化、特徵提取、特徵選擇等處理節點，這些處理節點都有特定的處理功能，這些處理節點能夠對輸入的資料進行處理以得到一個相應的結果，這個結果在資料分析流程中就是中間資料。例如特徵歸一化用於將原始資料基於每個特徵的均值和標準差進行歸一化處理，歸一化處理後的資料就是該特徵歸一化的中間資料。本實施例中，為了避免產生中間資料，將特徵歸一化這個處理節點進行模型化，特徵歸一化處理模型對輸入的資料具有資料轉換功能，該處理模型能夠記錄每個特徵均值和標準差，能夠對原始資料進行轉換。 For example, the data analysis process includes processing nodes such as feature smoothing, feature normalization, feature extraction, and feature selection. These processing nodes have specific processing functions, and these processing nodes can process the input data to obtain a corresponding The result, which is the intermediate data in the data analysis process. For example, feature normalization is used to normalize the original data based on the mean and standard deviation of each feature, and the normalized data is the intermediate data of the feature normalization. In this embodiment, in order to avoid the generation of intermediate data, the processing node of feature normalization is modeled. The feature normalization processing model has a data conversion function for the input data, and the processing model can record the mean value and standard deviation of each feature. , capable of transforming the original data.

S102、將各處理模型按照依賴關係進行連接。 S102. Connect each processing model according to dependency relationship.

在對資料分析流程進行分析時，需要獲取到各處理模型的依賴關係，當建構出各處理節點對應的處理模型後，按照處理節點之間的依賴關係，將各處理模型依次有序地串聯起。為了使各處理模型能夠直接連接，需要設置有資料介面，本實施例中，各處理模型的資料介面是統一的，經過資料介面依據依賴關係將各處理模型進行順次連接後，資料分析流程就可以轉換成一個有序的執行邏輯。 When analyzing the data analysis process, it is necessary to obtain the dependencies of each processing model. After constructing the processing models corresponding to each processing node, according to the dependencies between processing nodes, the processing models are sequentially connected in series. . In order for each processing model to be directly connected, it is necessary to set the Data interface. In this embodiment, the data interfaces of each processing model are unified. After the data interface connects each processing model in sequence according to the dependency relationship, the data analysis process can be transformed into an orderly execution logic.

S103、利用連接後的處理模型對待處理的原始資料進行處理以得到目標資料。 S103. Use the connected processing model to process the raw data to be processed to obtain the target data.

當各處理模型的連接後，將能夠有序地執行邏輯，可以將待處理的原始資料輸入到連接後的各處理模型中，原始資料首先進入處於執行邏輯頂端的處理模型中，然後經過處理模型後的資料依次進入下一個處理模型，直到進入到處於執行邏輯尾部的處理模型中，該處理模型最後輸出的資料就是目標資料。 When the processing models are connected, the logic can be executed in an orderly manner, and the raw data to be processed can be input into the connected processing models. The raw data first enters the processing model at the top of the execution logic, and then passes through the processing model The final data enters the next processing model in turn until it enters the processing model at the end of the execution logic, and the final output data of the processing model is the target data.

本實施例提供的資料分析方法，將處理模型按照依賴關係連接，上一個處理模型的輸出可以直接通過資料介面輸入到下一個處理模型中，中間資料不再進行落地，節省資源，而且由於上一個處理模型產生的中間資料直接進入下一個處理模型，避免了中間資料的加載過程，提高了資料分析效率。 In the data analysis method provided by this embodiment, the processing models are connected according to the dependency relationship, the output of the previous processing model can be directly input into the next processing model through the data interface, and the intermediate data is no longer landed, saving resources, and because the previous The intermediate data generated by the processing model directly enters the next processing model, which avoids the loading process of intermediate data and improves the efficiency of data analysis.

實施例二 Embodiment two

如圖2所示，其為本發明實施例二的資料分析方法的流程示意圖。該資料分析方法包括以下步驟： As shown in FIG. 2 , it is a schematic flowchart of the data analysis method in Embodiment 2 of the present invention. The data analysis method includes the following steps:

S201、獲取資料分析流程的無回路有向DAG圖。 S201. Obtain a non-loop directed DAG graph of the data analysis process.

資料分析流程是由一系列的處理節點組成的，對資料分析流程進行特徵分析，能夠獲取到該資料分析流程的無回路有向圖(Directed Acyclic Graph，簡稱DAG圖)，DAG圖可以串聯出一系列有序的處理節點。 The data analysis process is composed of a series of processing nodes. By analyzing the characteristics of the data analysis process, the infinite Directed Acyclic Graph (DAG graph for short), DAG graph can connect a series of orderly processing nodes.

S202、解析DAG圖獲取資料分析流程的相關資訊。 S202. Analyze the DAG graph to obtain relevant information about the data analysis process.

其中，該相關資訊中包括：處理節點的邏輯功能、處理節點之間的依賴關係以及各處理模型儲存地址。 Wherein, the relevant information includes: logical functions of processing nodes, dependencies among processing nodes, and storage addresses of each processing model.

對DAG圖進行解析獲取到該資料分析流程的相關資訊，其中，該相關資訊中包括資料分析流程中包括的處理節點的邏輯功能、處理節點之間的依賴關係和各處理模型儲存地址。相關資訊中還可以包括輸入資料資訊、輸出資料資訊以及用戶配置參數等。這些相關資訊可以產生一個具有節點依賴關係的可擴展標記語言(Extensible Markup Language，簡稱XML)文件，將該XML文件保存到資料庫備份中並提交到後台。 Relevant information of the data analysis process is obtained by analyzing the DAG graph, wherein the relevant information includes the logical functions of the processing nodes included in the data analysis process, dependencies between processing nodes, and storage addresses of each processing model. The relevant information may also include input data information, output data information, user configuration parameters, and the like. These relevant information can generate an Extensible Markup Language (Extensible Markup Language, XML) file with node dependencies, save the XML file to the database backup and submit it to the background.

S203、根據該相關資訊中每個處理節點的邏輯功能建構對應的處理模型。 S203. Construct a corresponding processing model according to the logical function of each processing node in the related information.

在獲取到每個處理節點的邏輯功能後，根據處理節點的邏輯功能建構對應的處理模型。例如，一個資料縮放處理節點用於將大於設定範圍的資料進行縮小，將小於預設範圍的資料進行放大，根據這個處理節點的邏輯功能就可以建構相應的資料縮放模型。 After the logical function of each processing node is acquired, a corresponding processing model is constructed according to the logical function of the processing node. For example, a data scaling processing node is used to reduce the data larger than the set range, and enlarge the data smaller than the preset range. According to the logic function of this processing node, a corresponding data scaling model can be constructed.

S204、根據該相關資訊中處理節點之間的依賴關係將各處理模型通過資料介面連接。 S204. Connect each processing model through a data interface according to the dependency relationship between the processing nodes in the relevant information.

在產生各處理節點對應的處理模型後，根據該相關資訊中處理節點之間的依賴關係，將各處理模型通過資料介面直接連接。具體地，後台接收到對DAG圖解析得到的XML文件後，能夠獲取到分析DAG圖中處理節點之間的依賴關係。後台根據處理節點之間的依賴關係對各處理模型的代碼程式自動組裝，即後台根據處理節點之間的依賴關係對各處理模型的代碼程式進行DAG化，將組裝後的代碼程式保存並編譯成可運行文件。然後基於已經設計好的資料介面，將各處理模組的代碼依次組裝，組成完成後，對每個處理模型進行初始化。為了實現處理模型的直接連接，對資料介面進行了統一化處理，從而可以方便地把一系列的代碼組合起來，在根據依賴關係進行串聯後，資料分析流程轉換成程式層面的一個有序的執行邏輯。 After the processing models corresponding to each processing node are generated, according to the dependencies between the processing nodes in the relevant information, each processing model is faces are directly connected. Specifically, after the background receives the XML file obtained by parsing the DAG graph, it can obtain the dependency relationship between the processing nodes in the analyzed DAG graph. The background automatically assembles the code programs of each processing model according to the dependencies between processing nodes, that is, the background performs DAG on the code programs of each processing model according to the dependencies between processing nodes, and saves and compiles the assembled code programs into executable file. Then, based on the data interface that has been designed, the codes of each processing module are assembled sequentially. After the assembly is completed, each processing model is initialized. In order to realize the direct connection of the processing model, the data interface is unified, so that a series of codes can be easily combined. After the series is connected according to the dependency relationship, the data analysis process is converted into an orderly execution at the program level. logic.

一般情況下，代碼程式的重新組合會引入新的程式缺陷(bug)，導致部署時風險較大，並且需要進行再次測試，導致資源的重複。本實施例中，將資料分析流程在代碼層面進行了DAG化，能夠減少程式缺陷的數量，可以一體化將各處理模型的可運行代碼程式打包直接部署到線上環境，這將極大的減少線上部署的風險。 In general, the recombination of code programs will introduce new program defects (bugs), which will lead to greater risks during deployment and require retesting, resulting in duplication of resources. In this embodiment, the data analysis process is DAGized at the code level, which can reduce the number of program defects, and can integrate and deploy the executable code programs of each processing model directly to the online environment, which will greatly reduce online deployment. risks of.

S205、根據該相關資訊中各處理模型儲存地址將各處理模型進行落地儲存。 S205. Store each processing model on the ground according to the storage address of each processing model in the relevant information.

在產生處理模型後，為了避免處理模型的重複計算，可以根據相關資訊中各處理模型的儲存地址，將各處理模型進行落地儲存，提高處理模型的複用率。實際應用中，處理模型的大小規模遠遠小於中間資料的大小，不僅能夠節省資源，而且有利於資料分析的效率。 After the processing model is generated, in order to avoid repeated calculation of the processing model, each processing model can be stored on the ground according to the storage address of each processing model in the relevant information, so as to improve the reuse rate of the processing model. In practical applications, the size of the processing model is much smaller than the size of the intermediate data, which not only saves resources, but also facilitates the efficiency of data analysis.

S206、將待處理的原始資料輸入到連接後的處理模型中進行處理以得到目標資料。 S206. Input the raw data to be processed into the connected processing model for processing to obtain the target data.

在產生將處理模型進行連接後，將待處理原始資料進行輸入，經過連接後的處理模型的處理，得到最終的目標資料。本實施例中，對原始資料進行處理的過程在記憶體中完成，進而不用落地每個處理模型的中間資料。 After generating and connecting the processing models, input the raw data to be processed, and process the connected processing models to obtain the final target data. In this embodiment, the process of processing the original data is completed in the memory, and there is no need to land the intermediate data of each processing model.

為了更好地理解上述本實施例提供的資料分析方法，下面舉例進行說明： In order to better understand the data analysis method provided in the above-mentioned embodiment, the following examples are used for illustration:

對原始資料的資料分析流程包括以下處理節點：資料規範、資料縮放和資料平滑。對該資料分析流程進行分析，得到該資料分析流程的DAG圖，如圖3所示，在該DAG圖中每個處理節點輸出的資料為中間資料，例如資料縮放處理節點輸出的中間資料為縮放後資料，資料平滑處理節點輸出的中間資料為平滑後資料。 The data analysis pipeline for raw data includes the following processing nodes: data normalization, data scaling, and data smoothing. Analyze the data analysis process to obtain the DAG diagram of the data analysis process, as shown in Figure 3, the data output by each processing node in the DAG diagram is intermediate data, for example, the intermediate data output by the data scaling processing node is scaling After data, the intermediate data output by the data smoothing node is the smoothed data.

對DAG圖進行解析，可以獲取到該資料分析流程中各處理節點之間的相關資訊，其中相關資訊中包括：處理節點的邏輯功能、處理節點之間的依賴關係以及各處理模型儲存地址。在該示例中，處理節點之間的依賴關係為：資料規範依賴資料平滑，資料平滑依賴資料縮放。 By analyzing the DAG graph, relevant information between processing nodes in the data analysis process can be obtained, and the relevant information includes: logical functions of processing nodes, dependencies between processing nodes, and storage addresses of each processing model. In this example, the dependencies between processing nodes are: data specification depends on data smoothing, and data smoothing depends on data scaling.

為了在資料分析過程中不再產生中間資料，避免中間資料的落地和加載，根據資料分析流程中各處理節點的邏輯功能，為每個處理節點建構對應的處理模型，這些處理模型具有相應的資料轉換功能。具體包括：資料縮放模型、資料平滑模型和資料規範模型。進一步地，按照處理節點之間的依賴關係，將各處理模型通過資料介面連接起來，如圖4所示。在將處理模型連接後，形成了一個有序的執行邏輯，將原始資料輸入到連接後的處理模型中，這樣上述連接後的處理模型構成的執行邏輯就開始對原始資料進行處理，得到最終的目標資料。各處理模型對原始資料的處理過程可以在記憶體中完成，避免產生中間資料。進一步地，可以將建構的這些處理模型進行落地，可以根據用戶配置的儲存地址進行儲存，以便於這些處理模型的複用。 In order not to generate intermediate data during the data analysis process and to avoid landing and loading of intermediate data, according to the logical functions of each processing node in the data analysis process, a corresponding processing model is constructed for each processing node. These processing models have corresponding data conversion function. Specifically include: data scaling model, data smoothing model and data specification model. Further, according to processing The dependencies between the nodes connect each processing model through the data interface, as shown in Figure 4. After the processing models are connected, an orderly execution logic is formed, and the original data is input into the connected processing model, so that the execution logic formed by the above connected processing models starts to process the original data to obtain the final target data. The processing of raw data by each processing model can be completed in the memory, avoiding the generation of intermediate data. Furthermore, these constructed processing models can be implemented and stored according to the storage address configured by the user, so as to facilitate the reuse of these processing models.

本實施例提供的資料分析方法，通過獲取資料分析流程的DAG圖，對DAG圖進行解析，根據解析結果建構各處理模型，以及將各處理模型按照依賴關係連接，利用連接後的處理模型對待處理的原始資料進行處理以得到目標資料。本實施例中處理模型之間直接連接，上一個處理模型的輸出可以直接輸入到下一個處理模型中，中間資料不再進行落地，節省資源，並且省略中間資料的加載過程，提高了資料分析效率。 The data analysis method provided in this embodiment, by obtaining the DAG diagram of the data analysis process, analyzes the DAG diagram, constructs each processing model according to the analysis result, and connects each processing model according to the dependency relationship, and uses the connected processing model to be processed The raw data are processed to obtain the target data. In this embodiment, the processing models are directly connected, the output of the previous processing model can be directly input into the next processing model, the intermediate data is no longer landed, saving resources, and the loading process of intermediate data is omitted, which improves the efficiency of data analysis .

實施例三 Embodiment Three

如圖5所示，其為本發明實施例三的資料分析方法的流程示意圖。在上述實施例的基礎之上，利用連接後的處理模型對待處理原始資料進行處理以得到目標資料包括以下步驟： As shown in FIG. 5 , it is a schematic flowchart of the data analysis method in Embodiment 3 of the present invention. On the basis of the above embodiments, using the connected processing model to process the raw data to be processed to obtain the target data includes the following steps:

S301、對原始資料進行資料校驗。 S301. Perform data verification on the original data.

在獲取到原始資料後，需要對原始資料進行資料校驗，首先檢測用戶是否存有相應的處理模型，如果儲存有相應的處理模型，判斷該原始資料是否未做過更改，具體地，根據相關資訊中的輸入資料資訊和輸出資料資訊，如果原始資料未做過更改，說明不需要對已儲存的處理模型進行更新，只需要直接獲取到儲存的各處理模型，則執行步驟S302，否則執行步驟S303。 After obtaining the original data, it is necessary to verify the original data. First, check whether the user has a corresponding processing model. If there is a corresponding processing model, determine whether the original data has not been changed. Specifically, according to the relevant For the input data information and output data information in the information, if the original data has not been changed, it means that there is no need to update the stored processing models, and only need to directly obtain the stored processing models, then perform step S302, otherwise perform step S302 S303.

S302、當原始資料通過資料校驗時，依據該相關資訊中各處理模型的儲存地址獲取各處理模型。 S302. When the original data passes the data verification, acquire each processing model according to the storage address of each processing model in the relevant information.

當原始資料通過資料校驗時，說明儲存的處理模型不需要更新，則可以根據相關資訊中各處理模型的儲存地址獲取到各處理模型 When the original data passes the data verification, it means that the stored processing model does not need to be updated, and each processing model can be obtained according to the storage address of each processing model in the relevant information

S303、當原始資料未通過資料校驗時，重新建構各處理模型。 S303. When the original data does not pass the data verification, rebuild each processing model.

在對原始資料進行資料校驗時，當判斷出原始資料做過更改時，需要對儲存的各處理模型進行更新，如果未檢測到已儲存的處理模型，則對原始資料的資料分析流程進行分析，建構相應的處理模型。 When performing data verification on the original data, when it is determined that the original data has been changed, it is necessary to update the stored processing models. If the stored processing model is not detected, analyze the data analysis process of the original data , to construct the corresponding processing model.

S304、將各處理模型按照依賴關係進行連接。 S304. Connect each processing model according to the dependency relationship.

將各處理模型按照處理節點之間的依賴關係進行連接，形成一個有序地執行邏輯。 Connect each processing model according to the dependency relationship between processing nodes to form an orderly execution logic.

S305、將原始資料進行格式轉換得到輸入資料。 S305. Convert the format of the original data to obtain the input data.

為了保證資料介面的統一性，需要對原始資料機械能格式轉換，產生格式統一的輸入資料，本實施例中，將原始資料統一轉換成向量(Vector)或者矩陣(Matrix)格式的輸入資料。 In order to ensure the uniformity of the data interface, it is necessary to convert the mechanical energy format of the original data to generate input data with a unified format. In this embodiment, the original The original data is uniformly converted into input data in Vector or Matrix format.

S306、將輸入資料輸入到連接後的處理模型中得到目標資料。 S306. Input the input data into the connected processing model to obtain the target data.

將處理模型按照依賴關係進行連接，形成了一個有序的執行邏輯，將原始資料輸入到連接後的處理模型中，這樣上述由連接的處理模型構成的執行邏輯就開始對原始資料進行處理，得到最終的目標資料。 The processing models are connected according to the dependencies to form an orderly execution logic, and the original data is input into the connected processing model, so that the above execution logic composed of the connected processing models starts to process the original data, and obtains final target data.

S307、根據預設的效果驗證條件對目標資料進行效果驗證。 S307. Perform effect verification on the target data according to preset effect verification conditions.

用戶可以根據自身的需求，預設一定的效果驗證條件，根據設定的效果驗證條件對目標資料的處理效果進行驗證。例如，提供資料處理前後的效果對比，用戶能夠非常直觀地看到資料在處理前和處理後的效果對比。 Users can preset certain effect verification conditions according to their own needs, and verify the processing effect of the target data according to the set effect verification conditions. For example, the effect comparison before and after data processing is provided, and the user can very intuitively see the effect comparison of the data before and after processing.

實施例四 Embodiment Four

如圖6所示，其為本發明實施例四的資料分析裝置的結構示意圖。該資料分析裝置包括：建構模組11、連接模組12和處理模組13。 As shown in FIG. 6 , it is a schematic structural diagram of a data analysis device according to Embodiment 4 of the present invention. The data analysis device includes: a construction module 11 , a connection module 12 and a processing module 13 .

其中，建構模組11，用於建構資料分析流程中各處理節點對應的處理模型。 Among them, the construction module 11 is used to construct the processing models corresponding to each processing node in the data analysis process.

連接模組12，用於將各處理模型按照依賴關係進行連接。 The connection module 12 is used to connect each processing model according to the dependency relationship.

處理模組13，用於利用連接後的處理模型對待處理的原始資料進行處理以得到目標資料。 The processing module 13 is used to utilize the connected processing model to be processed The raw data are processed to obtain the target data.

本實施例提供的資料分析裝置，將處理模型按照依賴關係連接，上一個處理模型的輸出可以直接通過資料介面輸入到下一個處理模型中，中間資料不再進行落地，節省資源，而且由於上一個處理模型產生的中間資料直接進入下一個處理模型，避免了中間資料的加載過程，提高了資料分析效率。 The data analysis device provided in this embodiment connects the processing models according to the dependency relationship, the output of the previous processing model can be directly input into the next processing model through the data interface, the intermediate data is no longer landed, saving resources, and because the previous The intermediate data generated by the processing model directly enters the next processing model, which avoids the loading process of intermediate data and improves the efficiency of data analysis.

實施例五 Embodiment five

如圖7所示，其為本發明實施例五的資料分析裝置的結構示意圖。該資料分析裝置除了包括上述實施例四中的建構模組11、連接模組12和處理模組13之外，還包括：獲取模組14、解析模組15、效果驗證模組16和落地模組17。 As shown in FIG. 7 , it is a schematic structural diagram of a data analysis device according to Embodiment 5 of the present invention. In addition to the construction module 11, the connection module 12 and the processing module 13 in the fourth embodiment, the data analysis device also includes: an acquisition module 14, an analysis module 15, an effect verification module 16 and a floor model Group 17.

獲取模組14，用於獲取該資料分析流程的無回路有向DAG圖；解析模組15，用於解析該DAG圖獲取資料分析流程的相關資訊。 The obtaining module 14 is used to obtain the non-loop directed DAG graph of the data analysis process; the analysis module 15 is used to analyze the DAG graph to obtain relevant information of the data analysis process.

本實施例中處理模組一種可選的結構方式包括：資料校驗單元131、獲取單元132、格式轉換單元133和處理單元134。 An optional structure of the processing module in this embodiment includes: a data verification unit 131 , an acquisition unit 132 , a format conversion unit 133 and a processing unit 134 .

其中，資料校驗單元131，用於對該原始資料進行資料校驗。 Wherein, the data verification unit 131 is used to carry out data verification on the original data material verification.

獲取單元132，用於當原始資料通過校驗時依據該相關資訊中各處理模型的儲存地址獲取各處理模型。 The obtaining unit 132 is configured to obtain each processing model according to the storage address of each processing model in the relevant information when the original data passes the verification.

格式轉換單元133，用於將該原始資料進行格式轉換得到輸入資料。 A format conversion unit 133, configured to perform format conversion on the original data to obtain input data.

處理單元134，用於將輸入資料依次輸入到連接後的處理模型中進行處理以得到該目標資料。 The processing unit 134 is configured to sequentially input the input data into the connected processing model for processing to obtain the target data.

進一步地，資料分析裝置還包括：效果驗證模組16。 Further, the data analysis device also includes: an effect verification module 16 .

效果驗證模組16，用於根據預設的效果驗證條件對目標資料進行效果驗證。 The effect verification module 16 is used to verify the effect of the target data according to the preset effect verification conditions.

建構模組11，具體用於根據該相關資訊中每個處理節點的該邏輯功能建構對應的該處理模型。 The construction module 11 is specifically used to construct the corresponding processing model according to the logical function of each processing node in the relevant information.

連接模組12，具體用於根據該相關資訊中處理節點之間的依賴關係將各處理模型通過資料介面連接。 The connection module 12 is specifically used to connect each processing model through a data interface according to the dependency relationship between the processing nodes in the related information.

進一步地，資料分析裝置還包括：落地模組17。 Further, the data analysis device also includes: a landing module 17 .

落地模組17，用於根據該相關資訊中各處理模型儲存地址將各處理模型進行落地儲存。 The landing module 17 is configured to store each processing model on the ground according to the storage address of each processing model in the relevant information.

本實施例中，通過獲取資料分析流程的DAG圖，對DAG圖進行解析，根據解析結果建構各處理模型，以及將各處理模型按照依賴關係連接，利用連接後的處理模型對待處理的原始資料進行處理以得到目標資料。本實施例中處理模型之間直接連接，上一個處理模型的輸出可以直接輸入到下一個處理模型中，中間資料不再進行落地，節省資源，並且省略中間資料的加載過程，提高了資料分析效率。 In this embodiment, by obtaining the DAG diagram of the data analysis process, the DAG diagram is analyzed, each processing model is constructed according to the analysis results, and each processing model is connected according to the dependency relationship, and the original data to be processed is processed using the connected processing model. processed to obtain the target data. In this embodiment, the processing models are directly connected, the output of the previous processing model can be directly input into the next processing model, and the intermediate data is no longer landed, saving It saves resources and omits the loading process of intermediate data, which improves the efficiency of data analysis.

進一步地，可以將建構的這些處理模型進行落地，可以根據用戶配置的儲存地址進行儲存，以便於這些處理模型的複用。 Furthermore, these constructed processing models can be implemented and stored according to the storage address configured by the user, so as to facilitate the reuse of these processing models.

本領域普通技術人員可以理解：實現上述各方法實施例的全部或部分步驟可以通過程式指令相關的硬體來完成。前述的程式可以儲存於一計算機可讀取儲存介質中。該程式在執行時，執行包括上述各方法實施例的步驟；而前述的儲存介質包括：ROM、RAM、磁碟或者光碟等各種可以儲存程式代碼的介質。 Those of ordinary skill in the art can understand that all or part of the steps for implementing the above method embodiments can be completed by program instructions related hardware. The aforementioned programs can be stored in a computer-readable storage medium. When the program is executed, it executes the steps of the above-mentioned method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.

最後應說明的是：以上各實施例僅用以說明本發明的技術方案，而非對其限制；儘管參照前述各實施例對本發明進行了詳細的說明，本領域的普通技術人員應當理解：其依然可以對前述各實施例所記載的技術方案進行修改，或者對其中部分或者全部技術特徵進行等同替換；而這些修改或者替換，並不使相應技術方案的本質脫離本發明各實施例技術方案的範圍。 Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.

Claims

A data analysis method, characterized in that it includes: constructing a processing model corresponding to each processing node in the data analysis process; connecting each processing model according to the dependency relationship between the processing nodes; using the connected processing model to process the original data Performing processing to obtain target data, wherein, using the connected processing model to process the to-be-processed original data to obtain target data includes: performing data verification on the original data.

According to the method described in item 1 of the patent application, wherein, before constructing the processing model of each processing node in the data analysis process, it includes: obtaining the non-loop directed DAG graph of the data analysis process; analyzing the DAG graph to obtain the data Relevant information of the analysis process; wherein, the relevant information includes: logical functions of processing nodes, dependencies between processing nodes, and storage addresses of each processing model.

According to the method described in item 2 of the scope of the patent application, wherein, the processing of the raw data to be processed by using the connected processing model to obtain the target data also includes: when the original data passes the data verification, according to the relevant information The storage address of each processing model obtains each processing model; converts the format of the original data to obtain the input data; inputs the input data into the connected processing model for processing to obtain the target data.

According to the method described in item 3 of the patent application, after inputting the input data into the connected processing model to obtain the target data, it further includes: verifying the effect of the target data according to the preset effect verification conditions.

According to the method described in any one of items 1-4 of the scope of patent application, wherein the processing model corresponding to each processing node in the construction data analysis process includes: constructing according to the logical function of each processing node in the relevant information corresponding to the processing model.

According to the method described in any one of items 1-4 of the scope of patent application, wherein, connecting each processing model according to the dependency relationship between processing nodes includes: according to the dependency relationship between processing nodes in the relevant information Connect each processing model through the data interface.

According to the method described in any one of items 1-4 of the scope of the patent application, after constructing the processing models corresponding to each processing node in the data analysis process, it includes: storing each processing model according to the relevant information. The model is stored on the floor.

A data analysis device, characterized in that it includes: a construction module, used to construct a processing model corresponding to each processing node in the data analysis process; a connection module, used to connect each processing model according to a dependency relationship; The processing module is used to use the connected processing model to process the raw data to be processed to obtain the target data, wherein the processing module includes: a data checking unit, used to check the original data.

According to the device described in item 8 of the scope of the patent application, it also includes: an acquisition module, which is used to obtain the non-loop directed DAG graph of the data analysis process; an analysis module, which is used to analyze the DAG graph to obtain the data analysis Relevant information of the process; wherein, the relevant information includes: logical functions of processing nodes, dependencies between processing nodes, and storage addresses of each processing model.

According to the device described in item 9 of the scope of the patent application, wherein the processing module further includes: an acquisition unit, which is used to acquire each processing model according to the storage address of each processing model in the relevant information when the original data passes the verification; the format The conversion unit is used to convert the format of the original data to obtain the input data; the processing unit is used to sequentially input the input data into the connected processing model for processing to obtain the target data.

According to the device described in item 10 of the scope of the patent application, it also includes: an effect verification module, which is used to test the target according to the preset effect verification conditions. data to verify the effect.

According to the device described in any one of items 8-11 of the scope of patent application, the construction module is specifically used to construct the corresponding processing model according to the logical function of each processing node in the relevant information.

According to the device described in any one of items 8-11 of the scope of patent application, wherein the connection of each processing model according to the dependency relationship between the processing models includes: according to the dependency relationship between the processing nodes in the relevant information Connect each processing model through the data interface.

According to the device described in any one of items 8-11 of the scope of patent application, it also includes: a landing module for storing each processing model on the ground according to the storage address of each processing model in the relevant information.