WO2017181786A1 - 数据分析处理的方法、装置、计算机设备及存储介质 - Google Patents
数据分析处理的方法、装置、计算机设备及存储介质 Download PDFInfo
- Publication number
- WO2017181786A1 WO2017181786A1 PCT/CN2017/076293 CN2017076293W WO2017181786A1 WO 2017181786 A1 WO2017181786 A1 WO 2017181786A1 CN 2017076293 W CN2017076293 W CN 2017076293W WO 2017181786 A1 WO2017181786 A1 WO 2017181786A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- data analysis
- new project
- analysis processing
- processing
- Prior art date
Links
- 238000007405 data analysis Methods 0.000 title claims abstract description 165
- 238000003672 processing method Methods 0.000 title abstract description 5
- 238000012545 processing Methods 0.000 claims abstract description 222
- 238000013515 script Methods 0.000 claims abstract description 64
- 238000004458 analytical method Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 118
- 238000004364 calculation method Methods 0.000 claims description 91
- 238000000034 method Methods 0.000 claims description 66
- 230000008569 process Effects 0.000 claims description 47
- 230000008676 import Effects 0.000 claims description 18
- 238000011161 development Methods 0.000 claims description 16
- 238000004806 packaging method and process Methods 0.000 claims description 4
- 238000005314 correlation function Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 6
- 238000004140 cleaning Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 4
- 238000013075 data extraction Methods 0.000 description 4
- 238000007418 data mining Methods 0.000 description 2
- 238000013178 mathematical model Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
Definitions
- the present invention relates to the field of data processing technologies, and in particular, to a data analysis processing method, apparatus, computer device, and storage medium.
- ETL Extract-Transform-Load
- Common ETL tools can include Datastage, Kettle, and OWB (Oracle). Warehouse Builder) and so on.
- Traditional ETL tools do not have the ability to execute scripts. They cannot execute existing data analysis functions and third-party extension libraries, and cannot analyze and process complex data that requires scientific calculations.
- traditional ETL tools such as kettle, can only process streaming data. In the data processing process, data needs to be loaded through one node, then the data is converted and cleaned by the next node, and finally flows to the end node. After a series of nodes, the data processing process is complicated and cumbersome, and the processing efficiency is low.
- a method, apparatus, computer device, and storage medium for data analysis processing are provided.
- a method of data analysis processing comprising:
- the data calculation processing script is called in the function node, and the data is analyzed and processed.
- a device for data analysis processing comprising:
- Entry module for entering a pre-established data analysis and processing new project
- An access module configured to access a function node in the data analysis processing new project
- a reading module for reading a target file and importing data
- a calling module configured to invoke the data calculation processing script in the function node, and perform analysis processing on the data.
- a computer device comprising a memory and a processor, the memory storing computer executable instructions, the computer executable instructions being executed by the processor, such that the processor performs the following steps:
- the data calculation processing script is called in the function node, and the data is analyzed and processed.
- One or more non-volatile readable storage media storing computer-executable instructions, when executed by one or more processors, cause the one or more processors to perform the following steps:
- the data calculation processing script is called in the function node, and the data is analyzed and processed.
- Figure 1 is a block diagram of a computer device in one embodiment
- FIG. 2 is a flow chart of a method of data analysis processing in an embodiment
- 3 is a flow chart of establishing a new project of data analysis processing in one embodiment
- FIG. 4 is a flow chart of generating a data chart in one embodiment
- Figure 5 is a functional block diagram of an apparatus for data analysis processing in an embodiment
- FIG. 6 is a functional block diagram of an apparatus for data analysis processing in another embodiment
- Figure 7 is a functional block diagram of a module in an embodiment
- Figure 8 is a functional block diagram of an apparatus for data analysis processing in another embodiment.
- the computer device includes a processor coupled through a system bus, a non-volatile storage medium, an internal memory, a network interface, a display screen, and an input device.
- the non-volatile storage medium of the computer device stores an operating system and computer-executable instructions for implementing a data analysis processing method applicable to the computer device provided in the embodiment of the present application.
- the processor is used to provide computing and control capabilities to support the operation of the entire computer device.
- the internal memory in the computer device provides an environment for operating the operating system and computer executable instructions in the non-volatile storage medium for network communication with other computer devices, such as sending processed data to the server for storage. Wait.
- the computer device may include a user interaction device including an input device and an output device.
- the output device may be a display screen of the computer device for displaying data information, wherein the display screen may be a liquid crystal display or An electronic ink display or the like;
- the input device is used for data input, wherein the input device may be a touch layer covered on the display screen, or may be a button, a trackball or a touchpad provided on a computer device casing, or may be an external device. Keyboard, trackpad or mouse.
- the computer device can be a mobile phone, a tablet, a PC (personal The terminal such as computer) may be a server or the like. It will be understood by those skilled in the art that the structure shown in FIG.
- 1 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied. It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.
- a method of data analysis processing is provided, which is applicable to a computer device as shown in FIG. 1, and the method includes the following steps:
- Step S210 entering a pre-established data analysis processing new project.
- the data analysis processing new project refers to a new project created by incorporating scientific computing functions into an ETL (Extract-Transform-Load) tool.
- the ETL tool is responsible for extracting data from distributed, heterogeneous data sources, such as relational data and flat data files, to a temporary intermediate layer for cleaning, conversion, integration, and finally loading into a data warehouse or data mart to become an online analytical process.
- Common ETL tools can include Datastage, kettle (kettle), OWB (Oracle Warehouse Builder), etc.
- Datastage is a data integration software platform with the functionality, flexibility and scalability required to meet the most demanding data integration requirements
- kettle is an open source ETL tool written in pure java.
- Python is A comprehensive tool from Oracle that provides ETL, fully integrated relationship and dimensional modeling, data quality, data auditing, and management of the entire lifecycle of data and metadata.
- Python's scientific computing function can be integrated into the ETL tool kettle. Python is an object-oriented, interpreted computer programming language. It has a rich extension library that can perform scientific calculations on data and help to complete each An advanced data analysis task. Scientific computing refers to the use of computer numerical calculations to solve mathematical problems in science and engineering. It mainly includes three stages: establishing mathematical models, establishing calculation methods for solving and computer implementation. Commonly used scientific computing languages and software include FORTRANALGOL and MATLAB. Etc., it can be understood that other languages with scientific computing functions can also be combined with ETL tools, and are not limited thereto.
- Step S220 accessing the function node in the data analysis processing new project.
- python's scientific computing functions are integrated into the kettle, and a function node is developed, which provides various scientific computing functions, such as executing python code, calling python's scientific computing extension library for data analysis operations, etc.
- Python's scientific computing extension library can include NumPy, ScriPy, and matplotlib to provide fast array processing, numerical operations, and drawing functions, respectively. Accessing functional nodes in a new project for data analysis and processing can use a variety of scientific computing functions in the functional nodes.
- step S230 the target file is read and the data is imported.
- the target files can be stored on a server cluster of a local or distributed storage system. After accessing the functional node, you can select the desired target file from the server cluster of the local or distributed storage system, read the target file, and import the data that needs to be processed.
- Step S240 generating a data calculation processing script according to the demand information.
- the demand information refers to the analysis processing requirements required for the data, such as calling a vector processing function in the NumPy extension library to process the array in the data, or performing batch processing on the imported data. Wait.
- the corresponding python data calculation processing script can be generated according to different demand information, and the generated data calculation processing script can be stored, so that the data calculation processing script can be directly called when the data processing is performed next time, without re-generation.
- step S250 the data calculation processing script is called in the function node, and the data is analyzed and processed.
- the python data calculation processing script generated according to the demand information may be directly executed in the function node, and the data is analyzed and processed according to the python data calculation processing script, such as data extraction, data cleaning, data conversion, numerical calculation, and the like.
- data cleaning refers to the process of re-examining and verifying data
- the purpose is to delete duplicate information, correct existing errors, and provide data consistency.
- Data conversion refers to changing data from a representation. Another process of expression.
- the data in the scientific computing extension library can also be called by the python data calculation processing script to perform scientific calculation on the data.
- a script file with a target suffix can be directly read in the function node, such as a script file with a .py suffix.
- the above data analysis processing method accesses the function node in the data analysis processing new project, and after reading the target file import data, the data calculation processing script generated according to the demand information can be called to process the data, and the data calculation processing script can be executed. Complex data is analyzed and processed, and all data processing processes are completed in the function nodes. The data does not need to flow between multiple nodes, and the data processing process is simple, which improves the data processing efficiency.
- the method before entering the pre-established data analysis processing new project in step S210, the method further includes: establishing a new data analysis processing project.
- the steps of establishing a new process for data analysis processing include:
- Step S302 acquiring a data analysis source engineering code.
- the data analysis source engineering code refers to the engineering source code of the ETL tool, such as Kettle's engineering source code. After obtaining the data analysis source engineering code, the data analysis source engineering code can be decompressed to obtain the corresponding project file.
- Step S304 creating a new data analysis processing project, and importing the data analysis source engineering code in the data analysis processing new project.
- the data analysis source engineering code can be imported in the form of a new project in a development environment such as Eclipse, that is, a new project is created in a development environment such as Eclipse as a data analysis processing new project, and the decompressed ETL tool is obtained.
- a development environment such as Eclipse
- Kettle's project source code imports data analysis and processing new projects.
- Step S306 creating a function node in the data analysis processing new project.
- the function node can be created in the data analysis processing new project, and the function node is developed according to multiple interfaces provided by the kettle tool, for example, the function interface of the function node is implemented by using the TemplateStepDialog dialog class.
- Creating a function node in the data analysis processing new project is equivalent to re-customizing a process processing node in the original process processing node of the kettle tool, and the function node can also be regarded as a plug-in newly developed by the kettle tool, and the function of the re-customization development Nodes are primarily used to process data that requires scientific or complex analysis.
- Step S308 invoking a data calculation tool data packet, and integrating the data in the data calculation tool data packet into the data analysis processing new project according to the preset node development template.
- the data calculation tool data package may include python code and a rich extended library data package that comes with Python, such as data packages of scientific computing extension libraries such as NumPy, ScriPy, and matplotlib.
- Python such as data packages of scientific computing extension libraries such as NumPy, ScriPy, and matplotlib.
- the data calculation work package is integrated into the new data analysis and processing project according to the original node development template in the kettle, and the function nodes are edited by using four categories in the kettle tool.
- the functions of the python data calculation processing script are executed, and the four categories are the TemplateStep step class, the TemplateStepData data class, the TemplateStepMeta metadata class, and the TemplateStepDialog dialog class.
- Different classes provide different interfaces, and each interface can be called into the data in the data calculation tool package to enable the function node to edit, execute, and save python data calculation processing scripts.
- Step S310 acquiring a scientific calculation extension library in the data calculation tool data package.
- the data calculation tool data package may include data of scientific computing extension libraries such as NumPy, ScriPy, and matplotlib, which can be used to store and process large matrices, and ScriPy can be used to crawl web sites and extract structured from pages.
- the data, matplotlib can be used to provide chart generation and more.
- Python's scientific computing function has a richer library than other scientific computing software or languages, and the extended library is open source. Python can provide various calling interfaces for analyzing and processing data, and the language is simpler. Read, easy to maintain, and easily complete advanced tasks for various data processing.
- Step S312 establishing a relationship between the scientific computing extension library and the data analysis processing new project in the function node.
- the association between the scientific computing extension library such as NumPy, ScriPy, and matplotlib and the new data analysis processing project is established in the function node, and the corresponding interface provided by Python is invoked by executing the python data calculation processing script in the function node. You can use the scientific computing functions in the scientific computing extension library to analyze and process the data.
- Step S314 modifying the basic configuration of the data analysis processing new project, and packaging the function node.
- the basic configuration of the data analysis processing new project may be modified in a configuration file such as plugin.xml, for example, adding a name and a description corresponding to the function node, but is not limited thereto.
- the function nodes can be packaged and stored in the plugin folder of Kettle.
- Step S316 storing data analysis processing new project.
- the new data analysis processing project can be stored in the server cluster of the local or distributed storage system.
- data analysis can be used to process new projects concurrently to process multiple data, improving data processing efficiency.
- the function node by creating and developing a function node in a new data analysis processing project, provides functions such as editing, executing, and saving a data calculation processing script, and can call the scientific calculation extended library in the function node to analyze and process the complex.
- the data the integration of scientific computing functions into ETL data analysis tools, enables ETL data analysis tools to perform more complex data processing and improve data processing efficiency in a simpler way.
- the data calculation processing script is invoked in the function node in step S250, and after analyzing the data, the method further includes:
- Step S402 receiving an operation request for generating a data chart.
- a button for generating a data chart may be set, and when the user clicks the button, an operation request for generating a data chart may be received.
- Step S404 calling the correlation function in the graphics processing extension library in the scientific computing extension library according to the operation request, analyzing the processed data, and generating a corresponding data chart file.
- the corresponding interface may be invoked in the python data calculation processing script, and the related functions in the extended library are processed by using a pattern such as matplotlib in the scientific computing extension library, and the processed data is analyzed, and corresponding graphics are generated. , tables, etc., displayed in a visual form, allowing users to more directly view the analysis results of the data.
- the generated data chart files can be saved in a server cluster of a local or distributed storage system, and stored in a server cluster of a distributed storage system can alleviate the pressure on the server.
- the related function in the graphics processing extension library in the scientific computing extension library can be called to analyze the processed data, so that the processed data is displayed in the form of graphs, tables, etc., and the data analysis processing is more intuitively reflected. result.
- the method for data analysis processing further includes: acquiring a Hadoop cluster with the closest distance, and storing the processed data in the Hadoop cluster with the closest distance.
- Hadoop Hadoop Distributed File System, HDFS
- HDFS Hadoop Distributed File System
- the Hadoop cluster server closest to the computer device that currently analyzes the processed data can be obtained, and the processed data and the generated chart file are stored in the nearest Hadoop cluster server, which can reduce network transmission consumption and save network resources.
- data can be stored in the nearest Hadoop cluster, which reduces network transmission consumption and saves network resources.
- an apparatus for data analysis processing including an entry module 510, an access module 520, a read module 530, a generate script module 540, and a call module 550.
- the entry module 510 is configured to enter a pre-established data analysis processing new project.
- the data analysis process new project refers to the new project established by incorporating scientific computing functions into the ETL tool.
- the ETL tool is responsible for extracting data from distributed, heterogeneous data sources, such as relational data and flat data files, to a temporary intermediate layer for cleaning, conversion, integration, and finally loading into a data warehouse or data mart to become an online analytical process.
- the basis of data mining is responsible for extracting data from distributed, heterogeneous data sources, such as relational data and flat data files, to a temporary intermediate layer for cleaning, conversion, integration, and finally loading into a data warehouse or data mart to become an online analytical process.
- ETL tools can include Datastage, kettle, OWB, etc.
- Datastage is a data integration software platform with the functionality, flexibility and scalability required to meet the most demanding data integration requirements; kettle is made of pure java
- An open source ETL tool can be run on Windows, Linux, Unix, mainly for data extraction, with high efficiency and stability;
- OWB is A comprehensive tool from Oracle that provides ETL, fully integrated relationship and dimensional modeling, data quality, data auditing, and management of the entire lifecycle of data and metadata.
- Python's scientific computing function can be integrated into the ETL tool kettle. Python is an object-oriented, interpreted computer programming language. It has a rich extension library that can perform scientific calculations on data and help to complete each An advanced data analysis task.
- Scientific computing refers to the use of computer numerical calculations to solve mathematical problems in science and engineering. It mainly includes three stages: establishing mathematical models, establishing calculation methods for solving and computer implementation. Commonly used scientific computing languages and software include FORTRANALGOL and MATLAB. Etc., it can be understood that other languages with scientific computing functions can also be combined with ETL tools, and are not limited thereto.
- the access module 520 is configured to access the function node in the data analysis processing new project.
- python's scientific computing functions are integrated into the kettle, and a function node is developed, which provides various scientific computing functions, such as executing python code, calling python's scientific computing extension library for data analysis operations, etc.
- Python's scientific computing extension library can include NumPy, ScriPy, and matplotlib to provide fast array processing, numerical operations, and drawing functions, respectively. Accessing functional nodes in a new project for data analysis and processing can use a variety of scientific computing functions in the functional nodes.
- the reading module 530 is configured to read the target file and import the data.
- the target files can be stored on a server cluster of a local or distributed storage system. After accessing the functional node, you can select the desired target file from the server cluster of the local or distributed storage system, read the target file, and import the data that needs to be processed.
- the generating script module 540 is configured to generate a data calculation processing script according to the demand information.
- the demand information refers to the analysis processing requirements required for the data, such as calling a vector processing function in the NumPy extension library to process the array in the data, or performing batch processing on the imported data. Wait.
- the corresponding python data calculation processing script can be generated according to different demand information, and the generated data calculation processing script can be stored, so that the data calculation processing script can be directly called when the data processing is performed next time, without re-generation.
- the calling module 550 is configured to invoke the data calculation processing script in the function node to perform analysis processing on the data.
- the python data calculation processing script generated according to the demand information may be directly executed in the function node, and the data is analyzed and processed according to the python data calculation processing script, such as data extraction, data cleaning, data conversion, numerical calculation, and the like.
- data cleaning refers to the process of re-examining and verifying data
- the purpose is to delete duplicate information, correct existing errors, and provide data consistency.
- Data conversion refers to changing data from a representation. Another process of expression.
- the data in the scientific computing extension library can also be called by the python data calculation processing script to perform scientific calculation on the data.
- a script file with a target suffix can be directly read in the function node, such as a script file with a .py suffix.
- the device for analyzing the data described above accesses the function node in the new data analysis processing project, and after reading the target file import data, the data calculation processing script generated based on the demand information can be called to process the data, and the data calculation processing script can be executed. Complex data is analyzed and processed, and all data processing processes are completed in the function nodes. The data does not need to flow between multiple nodes, and the data processing process is simple, which improves the data processing efficiency.
- the apparatus for data analysis processing includes an entry module 510, an access module 520, a read module 530, a generate script module 540, and a call module 550.
- a module 560 is created for establishing a new project for data analysis processing.
- the setup module 560 includes an acquisition unit 702, an import unit 704, a creation unit 706, a call unit 708, an association unit 710, a modification unit 712, and a storage unit 714.
- the obtaining unit 702 is configured to obtain a data analysis source engineering code.
- the data analysis source engineering code refers to the engineering source code of the ETL tool, such as Kettle's engineering source code. After obtaining the data analysis source engineering code, the data analysis source engineering code can be decompressed to obtain the corresponding project file.
- the import unit 704 is configured to create a new data analysis processing project, and import the data analysis source engineering code in the data analysis processing new project.
- the data analysis source engineering code can be imported in the form of a new project in a development environment such as Eclipse, that is, a new project is created in a development environment such as Eclipse as a data analysis processing new project, and the decompressed ETL tool is obtained.
- a development environment such as Eclipse
- Kettle's project source code imports data analysis and processing new projects.
- the creating unit 706 is configured to create a function node in the data analysis processing new project.
- the function node can be created in the data analysis processing new project, and the function node is developed according to multiple interfaces provided by the kettle tool, for example, the function interface of the function node is implemented by using the TemplateStepDialog dialog class.
- Creating a function node in the data analysis processing new project is equivalent to re-customizing a process processing node in the original process processing node of the kettle tool, and the function node can also be regarded as a plug-in newly developed by the kettle tool, and the function of the re-customization development Nodes are primarily used to process data that requires scientific or complex analysis.
- the calling unit 708 is configured to invoke the data calculation tool data packet, and integrate the data in the data calculation tool data packet into the data analysis processing new project according to the preset node development template.
- the data calculation tool data package may include python code and a rich extended library data package that comes with Python, such as data packages of scientific computing extension libraries such as NumPy, ScriPy, and matplotlib.
- Python such as data packages of scientific computing extension libraries such as NumPy, ScriPy, and matplotlib.
- the data calculation work package is integrated into the new data analysis and processing project according to the original node development template in the kettle, and the function nodes are edited by using four categories in the kettle tool.
- the functions of the python data calculation processing script are executed, and the four categories are the TemplateStep step class, the TemplateStepData data class, the TemplateStepMeta metadata class, and the TemplateStepDialog dialog class.
- Different classes provide different interfaces, and each interface can be called into the data in the data calculation tool package to enable the function node to edit, execute, and save python data calculation processing scripts.
- the obtaining unit 702 is further configured to acquire a scientific computing extension library in the data calculation tool data package.
- the data calculation tool data package may include data of scientific computing extension libraries such as NumPy, ScriPy, and matplotlib, which can be used to store and process large matrices, and ScriPy can be used to crawl web sites and extract structured from pages.
- the data, matplotlib can be used to provide chart generation and more.
- Python's scientific computing function has a richer library than other scientific computing software or languages, and the extended library is open source. Python can provide various calling interfaces for analyzing and processing data, and the language is simpler. Read, easy to maintain, and easily complete advanced tasks for various data processing.
- the associating unit 710 is configured to establish, in the function node, an association relationship between the scientific computing extension library and the data analysis processing new project.
- the association between the scientific computing extension library such as NumPy, ScriPy, and matplotlib and the new data analysis processing project is established in the function node, and the corresponding interface provided by Python is invoked by executing the python data calculation processing script in the function node. You can use the scientific computing functions in the scientific computing extension library to analyze and process the data.
- the modifying unit 712 is configured to modify a basic configuration of the data analysis processing new project, and package the function node.
- the basic configuration of the data analysis processing new project may be modified in a configuration file such as plugin.xml, for example, adding a name and a description corresponding to the function node, but is not limited thereto.
- the function nodes can be packaged and stored in the plugin folder of Kettle.
- the storage unit 714 is configured to store a new process of data analysis processing.
- the new data analysis processing project can be stored in the server cluster of the local or distributed storage system.
- data analysis can be used to process new projects concurrently to process multiple data, improving data processing efficiency.
- the function node by creating and developing a function node in a new data analysis processing project, provides functions such as editing, executing, and saving a data calculation processing script, and can call the scientific calculation extended library in the function node to analyze and process the complex.
- the data the integration of scientific computing functions into ETL data analysis tools, enables ETL data analysis tools to perform more complex data processing and improve data processing efficiency in a simpler way.
- the apparatus for data analysis processing includes, in addition to the entry module 510, the access module 520, the read module 530, the generation script module 540, the calling module 550, and the setup module 560, Module 570 and generation chart module 580.
- the receiving module 570 is configured to receive an operation request for generating a data chart.
- a button for generating a data chart may be set, and when the user clicks the button, an operation request for generating a data chart may be received.
- the generating chart module 580 is configured to call the related function of the graphic processing extended library in the scientific computing extended library according to the operation request to analyze the processed data, and generate a corresponding data chart file.
- the corresponding interface may be invoked in the python data calculation processing script, and the related functions in the extended library are processed by using a pattern such as matplotlib in the scientific computing extension library, and the processed data is analyzed, and corresponding graphics are generated. , tables, etc., displayed in a visual form, allowing users to more directly view the analysis results of the data.
- the generated data chart files can be saved in a server cluster of a local or distributed storage system, and stored in a server cluster of a distributed storage system can alleviate the pressure on the server.
- the related function in the graphics processing extension library in the scientific computing extension library can be called to analyze the processed data, so that the processed data is displayed in the form of graphs, tables, etc., and the data analysis processing is more intuitively reflected. result.
- the apparatus for data analysis processing further includes a storage module.
- a storage module that acquires the closest Hadoop cluster and stores the processed data in the nearest Hadoop cluster.
- Hadoop is a distributed file storage system that is highly fault tolerant and provides high throughput to access application data for applications with very large data sets.
- the Hadoop cluster server closest to the computer device that currently analyzes the processed data can be obtained, and the processed data and the generated chart file are stored in the nearest Hadoop cluster server, which can reduce network transmission consumption and save network resources.
- data can be stored in the nearest Hadoop cluster, which reduces network transmission consumption and saves network resources.
- the calling module 550 can analyze and process the data by using a function node to call a data calculation processing script by using a processor of the computer device, where the processor can be a central processing unit (CPU) or a microprocessor.
- the storage module can send the processed data and the generated chart file to the nearest Hadoop cluster server through the network interface of the computer device, and store the processed data and the generated chart file in the closest Hadoop cluster server.
- the network interface may be an Ethernet card, a wireless network card, or the like.
- Each of the above modules may be embedded in or independent of the processor of the computer device in hardware, or may be stored in the memory of the computer device in software, so that the processor can invoke the operations corresponding to the above modules.
- the storage medium may be a magnetic disk, an optical disk, or a read-only storage memory (Read-Only)
- a nonvolatile storage medium such as a memory or a ROM, or a random access memory (RAM).
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (20)
- 一种数据分析处理的方法,包括:进入预先建立的数据分析处理新工程;在所述数据分析处理新工程中访问功能节点;读取目标文件并导入数据;根据需求信息生成数据计算处理脚本;及在所述功能节点中调用所述数据计算处理脚本,对所述数据进行分析处理。
- 根据权利要求1所述的方法,其特征在于,在所述进入预先建立的数据分析处理新工程之前,所述方法还包括:建立数据分析处理新工程;所述建立数据分析处理新工程,包括:获取数据分析源工程代码;创建数据分析处理新工程,并在所述数据分析处理新工程中导入所述数据分析源工程代码;在所述数据分析处理新工程中创建功能节点;调用数据计算工具数据包,并根据预设的节点开发模板将所述数据计算工具数据包中的数据融入所述数据分析处理新工程;及存储所述数据分析处理新工程。
- 根据权利要求2所述的方法,其特征在于,在所述存储所述数据分析处理新工程之前,所述方法还包括:获取所述数据计算工具数据包中的科学计算扩展库;在所述功能节点中建立所述科学计算扩展库与所述数据分析处理新工程的关联关系;及修改所述数据分析处理新工程的基础配置,并将所述功能节点打包。
- 根据权利要求3所述的方法,其特征在于,在所述功能节点中调用所述数据计算处理脚本,对所述数据进行分析处理之后,所述方法还包括:接收生成数据图表的操作请求;及根据所述操作请求调用所述科学计算扩展库中的图形处理扩展库的相关函数对处理后的数据进行分析,并生成对应的数据图表文件。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:获取距离最近的Hadoop集群,并将处理后的数据存储在所述距离最近的Hadoop集群中。
- 一种数据分析处理的装置,包括:进入模块,用于进入预先建立的数据分析处理新工程;访问模块,用于在所述数据分析处理新工程中访问功能节点;读取模块,用于读取目标文件并导入数据;生成脚本模块,用于根据需求信息生成数据计算处理脚本;及调用模块,用于在所述功能节点中调用所述数据计算处理脚本,对所述数据进行分析处理。
- 根据权利要求6所述的装置,其特征在于,所述装置还包括:建立模块,用于建立数据分析处理新工程;所述建立模块包括:获取单元,用于获取数据分析源工程代码;导入单元,用于创建数据分析处理新工程,并在所述数据分析处理新工程中导入所述数据分析源工程代码;创建单元,用于在所述数据分析处理新工程中创建功能节点;调用单元,用于调用数据计算工具数据包,并根据预设的节点开发模板将所述数据计算工具数据包中的数据融入所述数据分析处理新工程;及存储单元,用于存储所述数据分析处理新工程。
- 根据权利要求7所述的装置,其特征在于,所述获取单元还用于获取所述数据计算工具数据包中的科学计算扩展库;所述建立模块还包括:关联单元,用于在所述功能节点中建立所述科学计算扩展库与所述数据分析处理新工程的关联关系;及修改单元,用于修改所述数据分析处理新工程的基础配置,并将所述功能节点打包。
- 根据权利要求8所述的装置,其特征在于,所述装置还包括:接收模块,用于接收生成数据图表的操作请求;及生成图表模块,用于根据所述操作请求调用所述科学计算扩展库中的图形处理扩展库的相关函数对处理后的数据进行分析,并生成对应的数据图表文件。
- 根据权利要求6所述的装置,其特征在于,所述装置还包括:存储模块,用于获取距离最近的Hadoop集群,并将处理后的数据存储在所述距离最近的Hadoop集群中。
- 一种计算机设备,包括存储器和处理器,所述存储器中储存有计算机可执行指令,所述计算机可执行指令被所述处理器执行时,使得所述处理器执行以下步骤:进入预先建立的数据分析处理新工程;在所述数据分析处理新工程中访问功能节点;读取目标文件并导入数据;根据需求信息生成数据计算处理脚本;及在所述功能节点中调用所述数据计算处理脚本,对所述数据进行分析处理。
- 根据权利要求11所述的计算机设备,其特征在于,所述计算机可执行指令被所述处理器执行时,还使得所述处理器在执行所述进入预先建立的数据分析处理新工程的步骤之前,还执行:建立数据分析处理新工程的步骤;所述建立数据分析处理新工程,包括:获取数据分析源工程代码;创建数据分析处理新工程,并在所述数据分析处理新工程中导入所述数据分析源工程代码;在所述数据分析处理新工程中创建功能节点;调用数据计算工具数据包,并根据预设的节点开发模板将所述数据计算工具数据包中的数据融入所述数据分析处理新工程;及存储所述数据分析处理新工程。
- 根据权利要求12所述的计算机设备,其特征在于,所述计算机可执行指令被所述处理器执行时,还使得所述处理器在执行所述存储所述数据分析处理新工程的步骤之前,还执行:获取所述数据计算工具数据包中的科学计算扩展库;在所述功能节点中建立所述科学计算扩展库与所述数据分析处理新工程的关联关系;及修改所述数据分析处理新工程的基础配置,并将所述功能节点打包的步骤。
- 根据权利要求13所述的计算机设备,其特征在于,所述计算机可执行指令被所述处理器执行时,还使得所述处理器在执行所述功能节点中调用所述数据计算处理脚本,对所述数据进行分析处理的步骤之后,还执行:接收生成数据图表的操作请求;及根据所述操作请求调用所述科学计算扩展库中的图形处理扩展库的相关函数对处理后的数据进行分析,并生成对应的数据图表文件的步骤。
- 根据权利要求11所述的计算机设备,其特征在于,所述计算机可执行指令被所述处理器执行时,还使得所述处理器执行:获取距离最近的Hadoop集群,并将处理后的数据存储在所述距离最近的Hadoop集群中的步骤。
- 一个或多个存储有计算机可执行指令的非易失性可读存储介质,所述计算机可执行指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:进入预先建立的数据分析处理新工程;在所述数据分析处理新工程中访问功能节点;读取目标文件并导入数据;根据需求信息生成数据计算处理脚本;及在所述功能节点中调用所述数据计算处理脚本,对所述数据进行分析处理。
- 根据权利要求16所述的非易失性可读存储介质,其特征在于,所述计算机可执行指令被一个或多个处理器执行时,还使得所述一个或多个处理器在执行所述进入预先建立的数据分析处理新工程的步骤之前,还执行:建立数据分析处理新工程的步骤;所述建立数据分析处理新工程,包括:获取数据分析源工程代码;创建数据分析处理新工程,并在所述数据分析处理新工程中导入所述数据分析源工程代码;在所述数据分析处理新工程中创建功能节点;调用数据计算工具数据包,并根据预设的节点开发模板将所述数据计算工具数据包中的数据融入所述数据分析处理新工程;及存储所述数据分析处理新工程。
- 根据权利要求17所述的非易失性可读存储介质,其特征在于,所述计算机可执行指令被一个或多个处理器执行时,还使得所述一个或多个处理器在执行所述存储所述数据分析处理新工程的步骤之前,还执行:获取所述数据计算工具数据包中的科学计算扩展库;在所述功能节点中建立所述科学计算扩展库与所述数据分析处理新工程的关联关系;及修改所述数据分析处理新工程的基础配置,并将所述功能节点打包的步骤。
- 根据权利要求18所述的非易失性可读存储介质,其特征在于,所述计算机可执行指令被一个或多个处理器执行时,还使得所述一个或多个处理器在执行所述功能节点中调用所述数据计算处理脚本,对所述数据进行分析处理的步骤之后,还执行:接收生成数据图表的操作请求;及根据所述操作请求调用所述科学计算扩展库中的图形处理扩展库的相关函数对处理后的数据进行分析,并生成对应的数据图表文件的步骤。
- 根据权利要求16所述的非易失性可读存储介质,其特征在于,所述计算机可执行指令被一个或多个处理器执行时,还使得所述一个或多个处理器执行:获取距离最近的Hadoop集群,并将处理后的数据存储在所述距离最近的Hadoop集群中的步骤。
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2017254506A AU2017254506B2 (en) | 2016-04-19 | 2017-03-10 | Method, apparatus, computing device and storage medium for data analyzing and processing |
SG11201708941TA SG11201708941TA (en) | 2016-04-19 | 2017-03-10 | Method, apparatus, computing device and storage medium for data analyzing and processing |
KR1020187015128A KR102133906B1 (ko) | 2016-04-19 | 2017-03-10 | 데이터 분석 및 처리 방법, 장치, 컴퓨터 장치 및 저장 매체 |
US15/578,690 US20180150530A1 (en) | 2016-04-19 | 2017-03-10 | Method, Apparatus, Computing Device and Storage Medium for Analyzing and Processing Data |
EP17785272.0A EP3279816A4 (en) | 2016-04-19 | 2017-03-10 | Data analysis processing method, apparatus, computer device, and storage medium |
JP2017561743A JP6397587B2 (ja) | 2016-04-19 | 2017-03-10 | データ分析処理のプログラム、コンピュータデバイス及び記憶媒体 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610243600.X | 2016-04-19 | ||
CN201610243600.XA CN105824974B (zh) | 2016-04-19 | 2016-04-19 | 数据分析处理的方法和系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017181786A1 true WO2017181786A1 (zh) | 2017-10-26 |
Family
ID=56527124
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/076293 WO2017181786A1 (zh) | 2016-04-19 | 2017-03-10 | 数据分析处理的方法、装置、计算机设备及存储介质 |
Country Status (8)
Country | Link |
---|---|
US (1) | US20180150530A1 (zh) |
EP (1) | EP3279816A4 (zh) |
JP (1) | JP6397587B2 (zh) |
KR (1) | KR102133906B1 (zh) |
CN (1) | CN105824974B (zh) |
AU (1) | AU2017254506B2 (zh) |
SG (1) | SG11201708941TA (zh) |
WO (1) | WO2017181786A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110716968A (zh) * | 2019-09-22 | 2020-01-21 | 南京信易达计算技术有限公司 | 一种大气科学计算容器包系统及方法 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105824974B (zh) * | 2016-04-19 | 2019-03-26 | 平安科技(深圳)有限公司 | 数据分析处理的方法和系统 |
CN106547865A (zh) * | 2016-11-01 | 2017-03-29 | 广西电网有限责任公司电力科学研究院 | 一种大数据便捷分布式计算支持系统 |
CN106651560A (zh) * | 2016-12-01 | 2017-05-10 | 四川弘智远大科技有限公司 | 一种政府补贴数据监管系统 |
CN110020018B (zh) * | 2017-12-20 | 2023-08-29 | 阿里巴巴集团控股有限公司 | 数据可视化展示方法及装置 |
CN111506543A (zh) * | 2020-04-22 | 2020-08-07 | 北京奕为汽车科技有限公司 | 一种m文件生成方法及装置 |
CN112179346B (zh) * | 2020-09-15 | 2024-02-27 | 国营芜湖机械厂 | 一种无人小车的室内导航系统及其使用方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101908191A (zh) * | 2010-08-03 | 2010-12-08 | 深圳市她秀时尚电子商务有限公司 | 应用于电子商务的数据分析方法及系统 |
CN103425762A (zh) * | 2013-08-05 | 2013-12-04 | 南京邮电大学 | 基于Hadoop平台的电信运营商海量数据处理方法 |
US20140025625A1 (en) * | 2012-01-04 | 2014-01-23 | International Business Machines Corporation | Automated data analysis and transformation |
CN105824974A (zh) * | 2016-04-19 | 2016-08-03 | 平安科技(深圳)有限公司 | 数据分析处理的方法和系统 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7877421B2 (en) * | 2001-05-25 | 2011-01-25 | International Business Machines Corporation | Method and system for mapping enterprise data assets to a semantic information model |
US7739267B2 (en) * | 2006-03-10 | 2010-06-15 | International Business Machines Corporation | Classification and sequencing of mixed data flows |
US8560568B2 (en) * | 2008-08-26 | 2013-10-15 | Zeewise, Inc. | Remote data collection systems and methods using read only data extraction and dynamic data handling |
US8736613B2 (en) * | 2011-11-02 | 2014-05-27 | International Business Machines Corporation | Simplified graphical analysis of multiple data series |
US9031902B2 (en) * | 2011-11-10 | 2015-05-12 | International Business Machines Corporation | Slowly changing dimension attributes in extract, transform, load processes |
US9489379B1 (en) * | 2012-12-20 | 2016-11-08 | Emc Corporation | Predicting data unavailability and data loss events in large database systems |
-
2016
- 2016-04-19 CN CN201610243600.XA patent/CN105824974B/zh active Active
-
2017
- 2017-03-10 EP EP17785272.0A patent/EP3279816A4/en not_active Withdrawn
- 2017-03-10 JP JP2017561743A patent/JP6397587B2/ja active Active
- 2017-03-10 AU AU2017254506A patent/AU2017254506B2/en active Active
- 2017-03-10 WO PCT/CN2017/076293 patent/WO2017181786A1/zh active Application Filing
- 2017-03-10 SG SG11201708941TA patent/SG11201708941TA/en unknown
- 2017-03-10 US US15/578,690 patent/US20180150530A1/en not_active Abandoned
- 2017-03-10 KR KR1020187015128A patent/KR102133906B1/ko active IP Right Grant
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101908191A (zh) * | 2010-08-03 | 2010-12-08 | 深圳市她秀时尚电子商务有限公司 | 应用于电子商务的数据分析方法及系统 |
US20140025625A1 (en) * | 2012-01-04 | 2014-01-23 | International Business Machines Corporation | Automated data analysis and transformation |
CN103425762A (zh) * | 2013-08-05 | 2013-12-04 | 南京邮电大学 | 基于Hadoop平台的电信运营商海量数据处理方法 |
CN105824974A (zh) * | 2016-04-19 | 2016-08-03 | 平安科技(深圳)有限公司 | 数据分析处理的方法和系统 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110716968A (zh) * | 2019-09-22 | 2020-01-21 | 南京信易达计算技术有限公司 | 一种大气科学计算容器包系统及方法 |
Also Published As
Publication number | Publication date |
---|---|
AU2017254506A1 (en) | 2017-11-23 |
AU2017254506B2 (en) | 2019-08-15 |
EP3279816A1 (en) | 2018-02-07 |
US20180150530A1 (en) | 2018-05-31 |
JP2018523203A (ja) | 2018-08-16 |
CN105824974B (zh) | 2019-03-26 |
CN105824974A (zh) | 2016-08-03 |
EP3279816A4 (en) | 2018-11-14 |
KR102133906B1 (ko) | 2020-07-22 |
JP6397587B2 (ja) | 2018-09-26 |
KR20180133375A (ko) | 2018-12-14 |
SG11201708941TA (en) | 2017-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017181786A1 (zh) | 数据分析处理的方法、装置、计算机设备及存储介质 | |
Chang et al. | Asynchronous distributed ADMM for large-scale optimization—Part I: Algorithm and convergence analysis | |
WO2017190561A1 (zh) | 通过虚拟键盘实现密码输入的方法、终端、服务器、系统和存储介质 | |
WO2018149082A1 (zh) | 合同生成方法、装置、服务器和存储介质 | |
CN108027722A (zh) | 在编译和部署中动态更新应用 | |
WO2022179009A1 (zh) | 设备批量测试方法、装置、计算机设备及介质 | |
CN111324610A (zh) | 一种数据同步的方法及装置 | |
CN111078205B (zh) | 一种模块化编程方法、装置、存储介质及电子设备 | |
WO2020087981A1 (zh) | 风控审核模型生成方法、装置、设备及可读存储介质 | |
WO2018036160A1 (zh) | 应用程序界面的显示方法和装置、终端和存储介质 | |
WO2019153555A1 (zh) | Er关系生成方法、装置、计算机设备及存储介质 | |
WO2023124543A1 (zh) | 用于大数据的数据处理方法和数据处理装置 | |
CN114036183A (zh) | 一种数据etl处理方法、装置、设备及介质 | |
US20230259358A1 (en) | Documentation enforcement during compilation | |
WO2018045610A1 (zh) | 用于执行分布式计算任务的方法和装置 | |
WO2017157125A1 (zh) | 在云计算环境中删除云主机的方法、装置、服务器及存储介质 | |
WO2017041544A1 (zh) | Android系统中获取网页内容的方法及装置 | |
CN112286557B (zh) | 一种非覆盖式更新代码内容的方法和装置 | |
WO2020215680A1 (zh) | 自动生成pojo类的方法、装置及存储介质、计算机设备 | |
WO2016108677A1 (ko) | 웹 컨텐츠 출력 장치 및 방법 | |
WO2021179636A1 (zh) | Gdb数据动态扩展方法、装置及存储介质 | |
CN111240665A (zh) | 一种基于Spring MVC技术优化Vuejs用户界面的方法 | |
JP3251382B2 (ja) | データフロー図作成方法 | |
WO2023219194A1 (ko) | 워크플로우 생성을 위한 서비스 플랫폼 시스템 및 워크플로우 생성 방법 | |
包航宇 et al. | AIOps in Practice: Status Quo and Standardization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 11201708941T Country of ref document: SG |
|
ENP | Entry into the national phase |
Ref document number: 2017254506 Country of ref document: AU Date of ref document: 20170310 Kind code of ref document: A Ref document number: 2017561743 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15578690 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 20187015128 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |